python;sklearn;KNN;SVM;Random Forest
the analyze.py require three argument: python analyze.py /document/trainset.csv /document/testset.csv svm the train dataset path the test/evaluate dataset path the method
the eva.py count tokens for each line
the re-eva.py help reconstruct the dataset for screenshot
Dataset: Due to the term of use, the whole data set is not avaliable.
the project make use of a most100.csv file which contains top 100 frequency tokens.
the test and train data show frequency of each token.