Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.79 KB

File metadata and controls

33 lines (24 loc) · 1.79 KB

Project Report: project report/project_report.pdf

Data (~400,000 tweets, six emotion labels, 48 MB)

How to run:

  1. Execute "project_classify.ipynb"
    • This will train the classification model and vectorizer, and save them both
    • Please ignore the commented-out code used for comparing models and hyperparameter tuning
    • This also saves metric results to a folder
    • The saved model is:
      • LinearSVC(random_state=RANDOM_STATE, max_iter=10000, C=0.085)
  2. Execute "project_ltr_v3.ipynb"
    • This will create the index (if it doesn't exist) and train the learning-to-rank pipeline
    • This will also save the documents dataframe for the interface, but not the trained learning-to-rank pipeline
    • This also saves metric results to a folder
  3. Execute "project_interface.py"
    • This is the user interface. It first loads in the model, vectorizer, and documents dataframe. Then it trains the learning-to-rank model (takes a few seconds), then runs the text-based interface
    • Choice 2 is supposed to show a graph in a popup via plt.show()

Disclaimer

  • BM25 and TF-IDF both perform better than my best trained ML model. I am using the ML model for purpose of this assignment

Known issues: