Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Closed) Logs which contains non-english letter turns out to be gibberish #2281

Closed
roikatz opened this issue Dec 13, 2018 · 16 comments

Comments

@roikatz
Copy link

commented Dec 13, 2018

Log output in the iOS client for Couchbase lite is turning out to be gibberish of non-unicode letters when the data inside the database is not in English (for example Hebrew).
That making debugging uneasy for non-latin alphabets.

for example:
{QueryEnum#2} --> [{"XXX":2,"AAA":"תוצאת בדיקה","WWW":true,"XXX":"12312","QQQ":0,"type":"FEF","AA":[{"string":"1010","string":"חזה 1


  • Version: 2.1.2
  • Client OS: IOS
  • Server: Couchbase EE 5.5.2
@snej

This comment has been minimized.

Copy link
Member

commented Dec 13, 2018

That shouldn't be happening -- we use UTF-8 everywhere, so log output should be UTF-8 too.

Is the Hebrew text possibly being stored in some other encoding? But that shouldn't be possible since it's presumably going into the database via NSStrings. Or does is this text pulled from a server? Is it possible it got stored by a server SDK as non-UTF-8?

@snej snej transferred this issue from couchbase/couchbase-lite-ios Dec 13, 2018

@snej

This comment has been minimized.

Copy link
Member

commented Dec 13, 2018

@roikatz

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

So it is initially UTF8 as Couchbase is storing everything in UTF8. then from what I've seen in the code it's running the built in toDictionary function and just print the data for reference, there is where the logger is printing hebrew wrong.

@snej

This comment has been minimized.

Copy link
Member

commented Dec 13, 2018

But the line you gave as an example is logged directly by LiteCore, not by the app. (SQLiteQuery.cc:243)

                alloc_slice json = _iter->asArray()->toJSON();
                logVerbose("--> %.*s", SPLAT(json));
@snej

This comment has been minimized.

Copy link
Member

commented Dec 13, 2018

I just wrote a test case in LiteCore with non-ASCII doc properties, including Hebrew, and the logging works:

...
15:02:34.106441| [Query]: {QueryEnum#5}==> litecore::SQLiteQueryEnumerator 0x60d000006470 @0x60d000006470
15:02:34.106485| [Query]: {QueryEnum#5} Created on {Query#4} with 3 rows (74 bytes) in 0.290ms
15:02:34.106668| [Query]: {QueryEnum#5} --> ["Mötörhead"]
15:02:34.106789| [Query]: {QueryEnum#5} --> ["¯\\_(ツ)_/¯"]
15:02:34.106877| [Query]: {QueryEnum#5} --> ["מאגר מידע"]
15:02:34.106952| [Query]: {QueryEnum#5} END
@jayahariv

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

Hi @snej,

I can get it replicated here.

2018-12-13 15:09:47.442009-0800 xctest[2543:15248] CouchbaseLite Query Info: {QueryEnum#3}==> litecore::SQLiteQueryEnumerator 0x60d00003b310 @0x60d00003b310
2018-12-13 15:09:47.442064-0800 xctest[2543:15248] CouchbaseLite Query Info: {QueryEnum#3} Created on {Query#2} with 1 rows (108 bytes) in 0.238ms
2018-12-13 15:09:47.442252-0800 xctest[2543:15248] CouchbaseLite Query Info: Beginning query enumeration (0x607000038fe8)
2018-12-13 15:09:47.442389-0800 xctest[2543:15248] CouchbaseLite Query Verbose: {QueryEnum#3} --> ["-hhFtgEix1riYF_ZMxuhIAU","אני יכול לאכול זכוכית וזה לא מזיק לי."]
2018-12-13 15:09:47.442515-0800 xctest[2543:15248] CouchbaseLite Query Verbose: {QueryEnum#3} END

Pasin showed me some steps to fix at iOS end. If we convert the string to UTF8, it can be fixed. Below are the master and temporary-fix branches

master
temp-fix

@roikatz

This comment has been minimized.

Copy link
Author

commented Dec 14, 2018

But the line you gave as an example is logged directly by LiteCore, not by the app. (SQLiteQuery.cc:243)

                alloc_slice json = _iter->asArray()->toJSON();
                logVerbose("--> %.*s", SPLAT(json));

oh right, there were 2 places the docs were written. one under the database change listener and the other when they do a query.

@jayahariv How was that replicated? what is the actual issue?

@roikatz roikatz closed this Dec 14, 2018

@roikatz roikatz reopened this Dec 14, 2018

@jayahariv

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2018

I can replicate it, every time. I enabled the logging verbose for all, then saved a Hebrew(or any non-ASCII) string, and fetch the doc. I can see the weird characters in console.

test case wrote

@pasin

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2018

@jayahariv the temp-fix won't work if it cannot format the string correctly if there is a %@ in the format.

@pasin

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2018

@snej I think this is CouchbaseLite iOS issue. So I'm moving the issue back to CouchbaseLite iOS repo.

@pasin pasin transferred this issue from couchbase/couchbase-lite-core Dec 14, 2018

@pasin pasin added ffc bug labels Dec 14, 2018

@pasin pasin added this to the Iridium milestone Dec 14, 2018

@djpongh djpongh added ready and removed backlog labels Dec 14, 2018

@snej

This comment has been minimized.

Copy link
Member

commented Dec 14, 2018

OK, I get it now — the corruption happens during CBL's logging callback. So I didn't see the problem because I was running LiteCore unit tests directly, and their callback just writes straight to stderr.

@jayahariv 's fix is presumably just a proof of concept, since it also uses a fixed-size stack buffer, which would be a serious vulnerability in real code.

@snej

This comment has been minimized.

Copy link
Member

commented Dec 14, 2018

I think this problem is due to Foundation's archaic notion of a "default encoding" for strings, which dates back to the '90s when it was cross-platform. Unfortunately it's set to MacRoman, which is the old encoding the Classic Mac OS used to use. This was useful back in 2001 when there were a lot of files using that encoding, but it's useless now.

The problem IIRC is that when -stringWithFormat: processes a %s escape, it uses that default encoding to interpret the C string. So it's going to treat it as MacRoman and transcode that to UTF-8, which turns it to garbage.

@snej

This comment has been minimized.

Copy link
Member

commented Dec 15, 2018

It doesn't look like there's a viable way to get NSString to handle %s using UTF-8. That means we'll need to tell LiteCore that the logging callback wants preformatted strings. Then the callback just needs to log the string directly.

Unfortunately this also means CBL can't call c4log with any %@ escapes, because LiteCore will be doing the formatting and it doesn't recognize them. So the CBLLog functions will have to preformat their strings and call c4slog. This is slightly less efficient for the binary logger, but if Couchbase Lite doesn't write a ton of its own logging messages it shouldn't matter much.

@pasin

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

@snej I think you mean calling c4log with already-formatted string right? The c4slog() function is special that it will not call the call back.

@snej

This comment has been minimized.

Copy link
Member

commented Dec 15, 2018

@pasin Yes, exactly.

@pasin

This comment has been minimized.

Copy link
Contributor

commented Dec 20, 2018

I believed this issue is fixed as part of the new logging API implementation (53ef585).

@pasin pasin closed this Dec 20, 2018

@pasin pasin removed the ready label Dec 20, 2018

@pasin pasin reopened this Jan 23, 2019

@pasin pasin closed this Jan 23, 2019

@pasin pasin changed the title Logs which contains non-english letter turns out to be gibberish (Closed) Logs which contains non-english letter turns out to be gibberish Jan 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.