SE464 Machine Learning Project

Hate Speech Labeler

Streamlit app for hate speech detection using a fine-tuned BERT-based model. The model is trained on the Jigsaw Toxic Comment Classification Challenge dataset for multi-label classification.

Code and data is available at this notebook
The app is deployed and can be tested here (also available at this link)
The model is available at hugging face

Local Installation

Clone the repository:

git clone https://github.com/berkaysahiin/SE464.git

Change into the directory:
```
cd SE464
```
Virtual Environments:
```
virtualenv venv
.\venv\Scripts\activate
```

Requirements:

pip install -r requirements.txt
# if fails try before: pip install pipreqs && pipreqs

Run the Streamlit app:
```
streamlit run main.py
```

Model

Data preprocessing involves cleaning text data, tokenization, and formatting for multi-label classification.
The model is trained with TrainingArguments and Trainer from the Transformers library.
Metrics such as F1 score, ROC AUC, and accuracy are used to evaluate the model's performance on the test set.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
model		model
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SE464 Machine Learning Project

Hate Speech Labeler

Local Installation

Model

About

Releases

Languages

berkaysahiin/SE464

Folders and files

Latest commit

History

Repository files navigation

SE464 Machine Learning Project

Hate Speech Labeler

Local Installation

Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages