This repository contains the code and data for a topic classification project using various Machine Learning (ML) models. The goal is to classify news articles into four categories: World, Sports, Business, and Sci/Tech.
- Overview
- Prerequisites
- Project Structure
- Data Preprocessing
- ML Models
- Confusion Matrix
- Usage
- License
In this project, we explore different ML models, including Random Forest, Naive Bayes, SVM, and CNN, for topic classification on a dataset of news articles. We compare their performance using accuracy and F1-score metrics.
To run the code in this repository, you'll need the following Python libraries:
- pandas
- numpy
- scikit-learn
- nltk
- keras
- gradio
You can install these dependencies using pip:
data: get your data from "https://www.kaggle.com/datasets/amananandrai/ag-news-classification-dataset?select=test.csv".trained_cnn_model.h5: The saved trained CNN model.main.py: Python script to load the trained CNN model and make predictions on a gradio app.all_in_one.ipynb: Jupyter notebook with the code for data preprocessing, model training, and evaluation.README.md: This readme file providing an overview of the project.
In the Jupyter notebook all_in_one.ipynb, we load and preprocess the news articles data. The preprocessing steps include tokenization, removing stop words, stemming, and converting the text data into sequences for feeding into the models.
We explore four different ML models for topic classification:
- Random Forest
- Naive Bayes
- SVM
- CNN
Each model is trained on the preprocessed data and evaluated on the test set. We present the accuracy and the classification report that includes precision, recall, and F1-score for each class.
We visualize the performance of each ML model using confusion matrices. The confusion matrix helps us to understand the model's predictions and identify areas of improvement.
Run the main.py script in your local environment or on your server. The script will start the Gradio app and provide a user interface for entering text. The Gradio app will display the predicted class on the screen