This project aims to classify toxic comments using natural language processing and machine learning techniques. It uses a dataset of comments labeled with different categories of toxicity and employs a neural network model to predict the categories of toxicity present in new comments.
- Installation
- Usage
- Model Training
- Model Evaluation
- Model Saving
- Visualization
- Docker
- Contributing
- License
- Contact
-
Clone the repository:
git clone https://github.com/atharvv8/toxic_comment_classification cd yourrepository
-
Install the required dependencies using the provided
requirements.txt
:pip install -r requirements.txt
The dependencies include:
- TensorFlow
- Pandas
- NumPy
These libraries are used for model training, data manipulation, and other computations.
The project provides a trained model for toxicity classification in text comments. To use the model, follow these steps:
-
Run the script:
python Comments_Toxicity.py
-
Input the comment you want to classify, and the script will display the predicted categories of toxicity present in the comment.
The training process includes data preprocessing, text vectorization, and neural network model training. The script uses the following steps:
- Load the dataset from CSV files.
- Preprocess the data and split it into training and validation sets.
- Train a neural network model using the preprocessed data.
- Use callbacks to monitor and adjust the learning rate during training.
The model is evaluated on a separate test dataset and provides the loss and binary accuracy metrics.
The trained model is saved using TensorFlow's model.save()
method in the script. The saved model can be used for predictions on new data.
The script includes a visualization of the training and validation accuracy over epochs. This helps to assess the model's performance and convergence during training.
You can also run the project inside a Docker container. A Dockerfile is provided in the repository for this purpose.
The Dockerfile uses the following specifications:
-
FROM python:3.9
-
Sets the base image to Python 3.9.
-
WORKDIR /jigsaw-toxic-comment-classification-challenge
-
Sets the working directory inside the container.
-
COPY requirements.txt .
-
Copies the
requirements.txt
file into the working directory. -
RUN pip install --upgrade pip
-
Upgrades
pip
to the latest version. -
RUN pip install --no-cache-dir -r requirements.txt
-
Installs the project dependencies listed in the
requirements.txt
file. -
COPY . .
-
Copies all files and directories from the current directory to the working directory inside the container.
-
CMD ["python3","./Comments_Toxicity.py"]
-
Sets the default command to execute the Python script that runs the toxicity classification model.
To build the Docker image, use the following command in your terminal:
docker build -t toxic-comment-classification .