Text-Classification

Apart from manual classification, another approach to text classification is machine-learning based classification. This type of classification is a supervised learning because the set of labeled data serves as training data for the machine. This means that manual classification can't be completed eliminated. We need some annotated data in order to train our machine.

Here, for text classification, 20 newsgroups dataset is used. This dataset has 20 classes, but I have demonstrated using only 5 classes.

There are many methods for text classification. Of those,multinomial Naive Bayes and K-Nearest Neighbour are used here.

Feature Selection

Feature selection is the process of selecting a subset of the terms ocuuring in the training set and then use only this subset of features for text classification. Here, TF-IDF based feature selection and Mutual Information are used. The two feature-selection techniques are applied to both the classifiers.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
MI feature selection.ipynb		MI feature selection.ipynb
README.md		README.md
TF_IDF feature selection.ipynb		TF_IDF feature selection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MI feature selection.ipynb

MI feature selection.ipynb

README.md

README.md

TF_IDF feature selection.ipynb

TF_IDF feature selection.ipynb

Repository files navigation

Text-Classification

Feature Selection

About

Releases

Packages

Languages

Nyble23/Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Text-Classification

Feature Selection

About

Topics

Resources

Stars

Watchers

Forks

Languages