Text-Classification

machine learning for text classification

In this notebook, we will introduce a text classification project, the main task is a topic prediction for a text(question or statement etc.).

The labels include 16 classes, which is described in data/label.csv eg. 生活｜心理学｜电影｜游戏｜恋爱｜音乐｜大学｜心理｜情感｜互联网｜社会｜人际交往｜教育｜汽车｜医学｜法律
The datasets include train(129176) / test(32614), you can see in the dir(data)
We will use a small dataset to set the example. eg.10000
We will use some traditional statistical features like TFIDF..
Model type : XgBoost/RandomForest
pipeline of this project feature extractor | model training | params selection | data balance etc..

for about feature selection / params selection :

You can directly run each step in the notebook sequentially so that you understand what each step does.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
images		images
LICENSE		LICENSE
README.md		README.md
Text-Classification.ipynb		Text-Classification.ipynb

Provide feedback