Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoftTFIDF get_raw_score failing with float division by zero #62

Open
rafamonge opened this issue Jun 25, 2020 · 0 comments
Open

SoftTFIDF get_raw_score failing with float division by zero #62

rafamonge opened this issue Jun 25, 2020 · 0 comments

Comments

@rafamonge
Copy link

rafamonge commented Jun 25, 2020

I'm getting an exception while calling the get_raw_score function with the SoftTFIDF similarity measure. It only happens with a specific corpus, which I'm unfortunately unable to share, so the code snipped isnt' fully reproducible.

import py_stringmatching as sm
print(sm.__version__)
soft_tfidf =sm.SoftTfIdf(corpus, threshold=0.9)
soft_tfidf.get_raw_score(['AWN', 'AL'], ['ONEP'])
0.4.1
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-100-fcbb2f491b64> in <module>
      2 print(sm.__version__)
      3 soft_tfidf =sm.SoftTfIdf(corpus, threshold=0.9)
----> 4 soft_tfidf.get_raw_score(['AWN', 'AL'], ['ONEP'])

C:\ProgramData\Anaconda3\lib\site-packages\py_stringmatching\similarity_measure\soft_tfidf.py in get_raw_score(self, bag1, bag2)
    134             v_y = idf * tf_y.get(element, 0)
    135             v_y_2 += v_y * v_y
--> 136         return result if v_x_2 == 0 else result / (sqrt(v_x_2) * sqrt(v_y_2))
    137 
    138     def get_corpus_list(self):

ZeroDivisionError: float division by zero

I added a print right before line 136. The root cause is that v_y_2 is equal to zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant