Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Simplemma instead of pycld3 for language detection #626

Merged
merged 6 commits into from Nov 15, 2022

Conversation

osma
Copy link
Member

@osma osma commented Sep 28, 2022

This draft PR is similar to PR #615 except it replaces the language detection previously performed using pycld3 with Simplemma based language detection (by @adbar)

I intend to benchmark this against PR #615 and the current status quo (with pycld3) in the near future.

Fixes #593

@codecov
Copy link

codecov bot commented Sep 29, 2022

Codecov Report

Base: 99.58% // Head: 99.54% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (7af51fb) compared to base (63af34c).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #626      +/-   ##
==========================================
- Coverage   99.58%   99.54%   -0.04%     
==========================================
  Files          87       87              
  Lines        5972     5968       -4     
==========================================
- Hits         5947     5941       -6     
- Misses         25       27       +2     
Impacted Files Coverage Δ
tests/test_transform_langfilter.py 100.00% <ø> (ø)
annif/transform/langfilter.py 100.00% <100.00%> (ø)
annif/transform/__init__.py 93.75% <0.00%> (-6.25%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@osma osma force-pushed the issue593-simplemma-language-detection branch from 365cd82 to 7af51fb Compare November 11, 2022 14:36
@sonarcloud
Copy link

sonarcloud bot commented Nov 11, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@osma
Copy link
Member Author

osma commented Nov 11, 2022

Rebased on current master (with black & isort reformatting), fixed up and force-pushed.

@osma osma marked this pull request as ready for review November 11, 2022 14:51
Copy link
Member

@juhoinkinen juhoinkinen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@osma osma merged commit 02e60ed into master Nov 15, 2022
@osma osma deleted the issue593-simplemma-language-detection branch November 15, 2022 12:56
@osma osma added this to the 0.60 milestone Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace pycld3 dependency?
2 participants