Skip to content

berkaysahiin/SE464

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SE464 Machine Learning Project

Hate Speech Labeler

Streamlit app for hate speech detection using a fine-tuned BERT-based model. The model is trained on the Jigsaw Toxic Comment Classification Challenge dataset for multi-label classification.

  • Code and data is available at this notebook

  • The app is deployed and can be tested here (also available at this link)

  • The model is available at hugging face

image

Local Installation

  • Clone the repository:

    git clone https://github.com/berkaysahiin/SE464.git
  • Change into the directory:

    cd SE464
  • Virtual Environments:

    virtualenv venv
    .\venv\Scripts\activate
  • Requirements:

    pip install -r requirements.txt
    # if fails try before: pip install pipreqs && pipreqs 
    
  • Run the Streamlit app:

    streamlit run main.py
    

Model

  • Data preprocessing involves cleaning text data, tokenization, and formatting for multi-label classification.

  • The model is trained with TrainingArguments and Trainer from the Transformers library.

  • Metrics such as F1 score, ROC AUC, and accuracy are used to evaluate the model's performance on the test set.

About

Machine Learning | Hate Speech Labeler

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages