Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong language detection #5

Open
FLasH3r opened this issue Jan 3, 2021 · 3 comments
Open

wrong language detection #5

FLasH3r opened this issue Jan 3, 2021 · 3 comments

Comments

@FLasH3r
Copy link

FLasH3r commented Jan 3, 2021

I have the following text with the corresponding language as detected by this package (all English)
Only the bold text is correct.

  • Announcing the GitHub Education Classroom Report 2020 - en
  • Highlights from Game Off 2020 - en
  • How to launch a tech career in 2021 - it
  • Let’s talk about securing open source projects - tl
  • Git clone: a data-driven study on cloning behaviors - tl
  • Get up to speed with partial clone and shallow clone - it
  • GitHub joins amicus brief warning of systemic risk from private sector offensive actors - af
  • Visualizing GitHub’s global community - tl
  • How we built the GitHub globe - en
  • How to make DevOps your competitive advantage - pt

besides using composer install ... I have done anything

The text here is just an example, it's from github blog (title of the last 10 posts)

if I do new \LanguageDetector\LanguageDetector(null,['en']); it will work, but that is not the goal.

the code looks like this:

$languageDetector = new \LanguageDetector\LanguageDetector();

foreach($titles AS $title) {

    $languages = $languageDetector->evaluate($title)->getLanguage();

    echo $title.' - '.(string)$languages.PHP_EOL;
}
@vesper8
Copy link

vesper8 commented May 2, 2021

Looks like this suffers from the same thing as the more popular https://github.com/patrickschur/language-detection

It does a good job with long texts but is borderless useless for short sentences.. getting it wrong at an alarmingly high rate

Still looking for a reliable language detector that works well with short sentences in case anyone finds one please share

@FabianoLothor
Copy link

ward

@dmaicher
Copy link

dmaicher commented Dec 2, 2021

Still looking for a reliable language detector that works well with short sentences in case anyone finds one please share

@vesper8 https://github.com/fntlnz/cld2-php-ext works good for my use-cases also with rather short texts. It detects all the above cases as English

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants