Discrimination is one of the most important problems in social media. Solutions generally does not include discrimination towards health and mental health. Studies showed that; "Young adults ages 18 to 28 years old who have experienced frequent discrimination have a higher risk of short-term and long-term behavioral and mental health problems, according to a new UCLA study." (https://www.everydayhealth.com/emotional-health/young-people-who-experience-frequent-discrimination-more-likely-to-have-behavioral-and-mental-problems/)
Discrimination Detection uses natural language processing (NLP) to analyse text from social media and provide feedback to users on the level of discrimination in their text, along with suggestions for how to rewrite it.
- Twitter Webscraping with Snscrape (https://github.com/JustAnotherArchivist/snscrape)
- 210.000 discriminatory tweets
- 100.000 control group (without specific discriminatory word, random tweets)
- Discrimination category detection
- Discrimination level detection & suggestions 2.1. Analysis of most liked and retweeted tweets
- Scrap a diverse dataset of discriminatory language from Twitter covering gender, race, ethnicity, sexual orientation, mental health, and health related keywords.
- Clean the data by removing unnecessary words, urls, emoji and characters.
- Extract key features from text to train ML models, using approaches bag of words, TF-IDF.
- Sentiment Analysis with the dataset ( for both discriminatory and control groups)
- Train ML models on extracted features to classify given text by discrimination level and give suggestion based on that level.
- Evaluate the performance of the model using various metrics such as accuracy, precision, recall, and F1-score.
- Use cross-validation techniques to ensure the model does not overfit the training data.
- Develop an user interface that allows users to input text and receive feedback on the level of discrimination in their writing, along with suggestions.
- Preperation of presentation and presenting the project.
Future Step 1 - Gather feedback from users and apply it to enhance the machine learning model and user interface over time.
Future Step 2 - Regularly update the model with new data and capabilities to ensure it stays current with the latest trends and language used on social media.