Author-Identification

This is an implementation of the paper

Stats

1. Information about the .csv file

I have provided a pre processed csv file using the preprocess.py and dataFrameGen.py. Also the vector for text is generated using the vectorizer.py which uses features of the articles. (without using NLTK)

original dataset contained:

Total lines in articles :: 10405
Total words in articles :: 358695
Total characters in articles :: 1889183
Total no of unique words :: 73889

2. Features selected

line count
char count
word count
average word size
vowels per word
consonants per word
matras per word
count of words of fize size
count of words below size
count of words above size

3. Model stats

Sr. No.	Classifier	Accuracy
1	Naive Bayes (Guassian)	89 %
2	SVC	70 %
3	Decision Tree	99 %
4	K Nearest Neighbour	98.8 %

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
lib		lib
LICENSE		LICENSE
README.md		README.md
author_identification_article_splits.csv		author_identification_article_splits.csv
classification.py		classification.py
dataFrameGen.py		dataFrameGen.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
stemmers.txt		stemmers.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Author-Identification

Stats

1. Information about the .csv file

2. Features selected

3. Model stats

About

Releases

Packages

Languages

License

AP-Atul/Author-Identification

Folders and files

Latest commit

History

Repository files navigation

Author-Identification

Stats

1. Information about the .csv file

2. Features selected

3. Model stats

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages