This is an implementation of the paper
I have provided a pre processed csv file using the preprocess.py
and dataFrameGen.py
. Also the vector for text is generated using the vectorizer.py
which uses features of the articles. (without using NLTK)
original dataset contained:
- Total lines in articles :: 10405
- Total words in articles :: 358695
- Total characters in articles :: 1889183
- Total no of unique words :: 73889
- line count
- char count
- word count
- average word size
- vowels per word
- consonants per word
- matras per word
- count of words of fize size
- count of words below size
- count of words above size
Sr. No. | Classifier | Accuracy |
---|---|---|
1 | Naive Bayes (Guassian) | 89 % |
2 | SVC | 70 % |
3 | Decision Tree | 99 % |
4 | K Nearest Neighbour | 98.8 % |