Skip to content

Java API for extracting TF (term frequency), IDF (inverse document frequency) and TFIDF from a large corpus

License

Notifications You must be signed in to change notification settings

AlonEirew/tf-idf-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tf-idf-java

Java API for extracting TF (term Frequency), IDF (inverse document frequency) and TFIDF from a large corpus

Code Example:

    MapDataSet mapDataSet = new MapDataSet("/src/test/resources/corpus");
    TFIDF tfidf = new TFIDF(mapDataSet.iterator());
    final double ml1 = tfidf.getTFIDF("Machine_Learning.txt", "Machine Learning");
    Assert.assertEquals("TFIDF value for Machine Learning", 0.0266, ml1, 0.0001);
    final double ml2 = tfidf.getTFIDF("Machine_Learning.txt", "Learning");
    Assert.assertEquals("TFIDF value for Learning", 0.0373, ml2, 0.0001);

About

Java API for extracting TF (term frequency), IDF (inverse document frequency) and TFIDF from a large corpus

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages