GitHub - Unusuala1l2e3x4/Emotion-Classification-of-Tweets-with-Search-Engine

Project Report: project report/project_report.pdf

Data (~400,000 tweets, six emotion labels, 48 MB)

download 'merged_training.pkl' from the source denoted in the report, and put it in the folder 'emotion\datasets\Emotion Dataset for Emotion Recognition Tasks'
- updated: https://www.icloud.com/iclouddrive/084E9TMZ_lykn3QhU-kIX1DDQ
- original github repo: https://github.com/dair-ai/emotion_dataset

How to run:

Execute "project_classify.ipynb"
- This will train the classification model and vectorizer, and save them both
- Please ignore the commented-out code used for comparing models and hyperparameter tuning
- This also saves metric results to a folder
- The saved model is:
  - LinearSVC(random_state=RANDOM_STATE, max_iter=10000, C=0.085)
Execute "project_ltr_v3.ipynb"
- This will create the index (if it doesn't exist) and train the learning-to-rank pipeline
- This will also save the documents dataframe for the interface, but not the trained learning-to-rank pipeline
- This also saves metric results to a folder
Execute "project_interface.py"
- This is the user interface. It first loads in the model, vectorizer, and documents dataframe. Then it trains the learning-to-rank model (takes a few seconds), then runs the text-based interface
- Choice 2 is supposed to show a graph in a popup via plt.show()

Disclaimer

BM25 and TF-IDF both perform better than my best trained ML model. I am using the ML model for purpose of this assignment

Known issues:

Bug with PyTerrier: program throws error and exits when no results found
- terrier-org/pyterrier#352

Resume Entry

Emotion Classification of Tweets with Search Engine
September 2022 – December 2022
SI 650 – Information Retrieval

Created a specialized search engine in a dual-model setup, utilizing a dataset of 400,000 tweets to classify six emotion labels with SVM and retrieve similar tweets.
Integrated a Learning-to-Rank model using PyTerrier to rank tweets by relevance to user queries, employing techniques like BM25, TF-IDF, and Bayesian smoothing.
Engineered features such as POS tagging, sentiment measures, and n-grams to optimize the performance of both classifier and retrieval models.
Attained high-quality retrieval results, indicated by NDCG scores of 0.84 - 0.93 for the top 5-20 ranked items and an F1 score of 0.85 for emotion classification.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
emotion/datasets/Emotion Dataset for Emotion Recognition Tasks		emotion/datasets/Emotion Dataset for Emotion Recognition Tasks
indices/carer7		indices/carer7
project report		project report
project_classify_results		project_classify_results
project_ltr_results		project_ltr_results
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
project_classify ablation.ipynb		project_classify ablation.ipynb
project_classify.ipynb		project_classify.ipynb
project_interface.py		project_interface.py
project_ltr_v3.ipynb		project_ltr_v3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emotion/datasets/Emotion Dataset for Emotion Recognition Tasks

emotion/datasets/Emotion Dataset for Emotion Recognition Tasks

indices/carer7

indices/carer7

project report

project report

project_classify_results

project_classify_results

project_ltr_results

project_ltr_results

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

project_classify ablation.ipynb

project_classify ablation.ipynb

project_classify.ipynb

project_classify.ipynb

project_interface.py

project_interface.py

project_ltr_v3.ipynb

project_ltr_v3.ipynb

Repository files navigation

Data (~400,000 tweets, six emotion labels, 48 MB)

How to run:

Disclaimer

Known issues:

Resume Entry

About

Releases

Packages

Languages

Unusuala1l2e3x4/Emotion-Classification-of-Tweets-with-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Data (~400,000 tweets, six emotion labels, 48 MB)

How to run:

Disclaimer

Known issues:

Resume Entry

About

Topics

Resources

Stars

Watchers

Forks

Languages