Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined language on a page that looks normal #8

Closed
GoogleCodeExporter opened this issue Mar 15, 2015 · 6 comments
Closed

Undefined language on a page that looks normal #8

GoogleCodeExporter opened this issue Mar 15, 2015 · 6 comments

Comments

@GoogleCodeExporter
Copy link

Apparently, CLD2 has some difficulties(*) with 
http://drugoi.livejournal.com/3971967.html 

We are seeing UND (undefined) on chrome://translate-internals

*: or maybe we are mis-using it...

Original issue reported on code.google.com by kenjibaheux@chromium.org on 5 Mar 2014 at 6:59

@GoogleCodeExporter
Copy link
Author

Cannot reproduce.
I opened http://drugoi.livejournal.com/3971967.html in Firefox and did 
copy/paste of all the text into a UTF8 file, then ran
 ./compact_lang_det_test_chrome0122_2 should_not_be_unk_chrome_8.utf8
and got 
  ExtLanguage RUSSIAN(80% 1027p), UKRAINIAN(2% 450p), INDONESIAN(0% 637p), 40/45 KB of non-tag letters, Summary: RUSSIAN
  SummaryLanguage RUSSIAN at 0 of 46701 2617us (17 MB/sec), should_not_be_unk_chrome_8.utf8

If you are not getting that result, please rerun in your context, setting 
kCLDFlagEcho as the flag value in the call to ExtDetectLanguageSummary and send 
me stderr (not post or email, which open the possibility of various 
svn/web/mail/browser software changing the exact bytes), or run with flags  
  kCLDFlagHtml | kCLDFlagCr  
and send me stderr, or compare to the attached file of the output that I got.

Is it possible that there is an encoding problem and you are not passing clean 
UTF-8 to CLD2?


Original comment by dsi...@google.com on 5 Mar 2014 at 6:26

Attachments:

@GoogleCodeExporter
Copy link
Author

Seems like we are still using R84. Would this explain the difference?

Original comment by kenjibaheux@chromium.org on 6 Mar 2014 at 4:19

@GoogleCodeExporter
Copy link
Author

No R84 does not explain the difference. Please capture the actual bytes sent to 
CLD2. Thanks, /dick

Original comment by dsi...@google.com on 6 Mar 2014 at 9:54

@GoogleCodeExporter
Copy link
Author

FWIW, I am planning to roll Chromium to the latest CLD2 in the Very Near(TM) 
future.

Original comment by andrewha...@chromium.org on 11 Mar 2014 at 12:44

@GoogleCodeExporter
Copy link
Author

Re #4: please try the subject URL  http://drugoi.livejournal.com/3971967.html 
and send the requested debugging output fomr #1 if the detected language is 
Unknown. /dick

Original comment by dsi...@google.com on 11 Mar 2014 at 6:33

@GoogleCodeExporter
Copy link
Author

current version of Chrome Version 38.0.2125.104 (64-bit) detects Russian and 
translates correctly. Closing as Fixed.

Original comment by dsi...@google.com on 23 Oct 2014 at 8:18

  • Changed state: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant