Discrimination Detector

Problem

Discrimination is one of the most important problems in social media. Solutions generally does not include discrimination towards health and mental health. Studies showed that; "Young adults ages 18 to 28 years old who have experienced frequent discrimination have a higher risk of short-term and long-term behavioral and mental health problems, according to a new UCLA study." (https://www.everydayhealth.com/emotional-health/young-people-who-experience-frequent-discrimination-more-likely-to-have-behavioral-and-mental-problems/)

Aim:

Discrimination Detection uses natural language processing (NLP) to analyse text from social media and provide feedback to users on the level of discrimination in their text, along with suggestions for how to rewrite it.

Dataset

Twitter Webscraping with Snscrape (https://github.com/JustAnotherArchivist/snscrape)
210.000 discriminatory tweets
100.000 control group (without specific discriminatory word, random tweets)

Analysis:

Discrimination category detection
Discrimination level detection & suggestions 2.1. Analysis of most liked and retweeted tweets

Roadmap

Scrap a diverse dataset of discriminatory language from Twitter covering gender, race, ethnicity, sexual orientation, mental health, and health related keywords.
Clean the data by removing unnecessary words, urls, emoji and characters.
Extract key features from text to train ML models, using approaches bag of words, TF-IDF.
Sentiment Analysis with the dataset ( for both discriminatory and control groups)
Train ML models on extracted features to classify given text by discrimination level and give suggestion based on that level.
Evaluate the performance of the model using various metrics such as accuracy, precision, recall, and F1-score.
Use cross-validation techniques to ensure the model does not overfit the training data.
Develop an user interface that allows users to input text and receive feedback on the level of discrimination in their writing, along with suggestions.
Preperation of presentation and presenting the project.

Future Step 1 - Gather feedback from users and apply it to enhance the machine learning model and user interface over time.

Future Step 2 - Regularly update the model with new data and capabilities to ensure it stays current with the latest trends and language used on social media.

Presentation

https://prezi.com/view/DUydh8GOBVuu7uOqdGRM/

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
BERTandTF - Classification Model.ipynb		BERTandTF - Classification Model.ipynb
DiscriminationDetector_controlgroup_data_collection.ipynb		DiscriminationDetector_controlgroup_data_collection.ipynb
DiscriminationDetector_data_cleaning.ipynb		DiscriminationDetector_data_cleaning.ipynb
DiscriminationDetector_data_collection.ipynb		DiscriminationDetector_data_collection.ipynb
DiscriminationDetector_second_data_collection-.ipynb		DiscriminationDetector_second_data_collection-.ipynb
Logistic Regression and Random Forest with TF IDF.ipynb		Logistic Regression and Random Forest with TF IDF.ipynb
README.md		README.md
SGD Regressor .ipynb		SGD Regressor .ipynb
discriminatory_words_list.ipynb		discriminatory_words_list.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

Data

Data

BERTandTF - Classification Model.ipynb

BERTandTF - Classification Model.ipynb