In this project, we explore the machine learning pipeline and utilize 3 different methods (Naive Bayes, Logistic Regression, Neural Networks) to do the following task: categorize Chinese news articles from the web, given the title and content, into one of the following categories:
- 科技 (Technology)
- 產經 (Business and Economy)
- 娛樂 (Entertainment)
- 運動 (Sports)
- 社會 (Society)
- 政治 (Politics)
The news articles are obtained from CNA. We show that after proper data preprocessing, we can achieve a decent accuracy of at least 93% using any of the three models. See the report for more details.