Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lang detection #2506

Merged
merged 4 commits into from May 9, 2023
Merged

Improve lang detection #2506

merged 4 commits into from May 9, 2023

Conversation

noamzbr
Copy link
Collaborator

@noamzbr noamzbr commented May 8, 2023

Instead of langdetect, download a fasttext model. Advantages:

  1. Zero mistakes on the twitter dataset, amazon review dataset and ted (ted_talks_iwslt) dataset
  2. 1 second for 100K samples (where samples are a few sentences)

Disadvantages:

  1. Downloads a 130M model

@noamzbr noamzbr added the feature Feature update or code change to the package label May 8, 2023
@noamzbr noamzbr requested a review from ItayGabbay as a code owner May 8, 2023 18:26
@noamzbr noamzbr self-assigned this May 8, 2023
@noamzbr noamzbr requested a review from a team as a code owner May 8, 2023 18:26
requirements/nlp-prop-requirements.txt Show resolved Hide resolved
deepchecks/nlp/utils/text_properties.py Outdated Show resolved Hide resolved
deepchecks/nlp/utils/text_properties.py Show resolved Hide resolved
@noamzbr noamzbr enabled auto-merge (squash) May 9, 2023 10:05
@noamzbr noamzbr disabled auto-merge May 9, 2023 11:55
@noamzbr noamzbr disabled auto-merge May 9, 2023 11:55
@noamzbr noamzbr disabled auto-merge May 9, 2023 11:55
@noamzbr noamzbr disabled auto-merge May 9, 2023 11:55
@noamzbr noamzbr disabled auto-merge May 9, 2023 12:03
@noamzbr noamzbr disabled auto-merge May 9, 2023 13:07
@noamzbr noamzbr merged commit 122263b into main May 9, 2023
16 of 21 checks passed
@delete-merged-branch delete-merged-branch bot deleted the noam/feature/improve-lang-detection branch May 9, 2023 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature update or code change to the package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants