Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

Term Frequency - Inverse Document Frequency (TF-IDF) #55

Closed
cesarsouza opened this issue Dec 19, 2014 · 2 comments
Closed

Term Frequency - Inverse Document Frequency (TF-IDF) #55

cesarsouza opened this issue Dec 19, 2014 · 2 comments

Comments

@cesarsouza
Copy link
Member

It would be a nice addition for those using Accord.NET in text applications, specially now that more linear optimization algorithms are available.

Kory Becker has created a nice implementation (under a compatible license) which can be used as a basis for the implementation. The current code can be found here:

https://github.com/primaryobjects/TFIDF/blob/master/TFIDFExample/TFIDF.cs

However, it seems the implementation needs a stemmer. In this case, the stemmer could be incorporated in the project by specifying a new ITextStemmer interface. Different stemmers could than be created using the Snowball project:

https://github.com/cesarsouza/snowball

It would be cool to add a new text generator in Snowball for C#. It shouldn't be that difficult given that there are working Java generators available.

@cesarsouza
Copy link
Member Author

Snowball generators have been added in commit 1896fe0

@cesarsouza
Copy link
Member Author

Fixed in 3.5.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant