-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_document_vector() and get_postings_list() Stemming ? #47
Comments
hi @Oulaolay - welcome! To be clear, you'd want a variant of There's actually already an outstanding issue: I'm not sure when we'll get to it... but you're welcome to send a pull request... |
haha, got to it! |
Thanks to all these modification ! The errors that i found are in pyclass.py :
Thanks ! Best Regards ! |
Hi @Oulaolay, The errors are because of a recent change in Anserini. Pyserini needs to be changed accordingly. I already submitted a PR for this. In order to make a PR you can fork the repository and push to the fork. Then you can create a PR with your fork. |
It's perfect ! Have a good day @chriskamphuis |
Hi @lintool !
I have a new issue :
I created a new index with the dataset "DUC-2001" by mean of this function :
I also installed Luke Toolbox project to understand how the index working.
When i run this code :
it works for some terms but not for all...
I think there are two different indexes, the first one applies a stemming ( the word "Cherokee" become "cheroke") and the second keeps the word without stemming.
So, how can i stemming the posting index ?
Best regards
The text was updated successfully, but these errors were encountered: