Skip to content

Shubhammalik/tweet_tagging_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Online Learning

Goal for this project is to classify twitter review sentiment with implementation of Online learning.

Online learning is the process of retraining the model as the data comes in streams of continuously generated data.

Dataset contains Twitter tweets records (Size - 1.6M unique records).

Coverage and Modules insights

  1. Data processing extraction of useful data features
  2. ETL Cleaning and Filtering of data with operations like removal of stop words, punctuations, urls, repeating phrases, encodings
  3. Visualizations of data distributions, word clouds
  4. Microservice Implementing - each service developed can be used as a module in external environment
  5. NLP using techniques like Tokenization, Stemming and Lemmatization
  6. Model comparator which compares multiple model stats and saves the best performing model
  7. Model selector keeps checking for best performing model and selects the top model for production
  8. Clock function which garbage collects the obsolete models and data files based on business rules
  9. Model run history covers all previous best runs of every model

Tech Stack

TBU

  1. Data Cleaning, Filtering & Manipulation - Regular expressions, pandas and numpy dataframes
  2. Data Visualization - Plotly, Seaborn, Matplotlib, word cloud
  3. Data Storage - local
  4. Webapp - TBA

Running Instructions

  1. Download the project and run the below requirements in the project folder terminal

    pip install -r /path/to/requirements.txt

Task at Hand

  1. Implement logging at a modular level
  2. Exception Handling for data transformation and model selector
  3. Enhance model training and history to parametric modules
  4. Implement clock function to remove obsolete models/data
  5. Create and load environment variable file

Illustration from Data

Data Stats Data Distribution

Positive Word Cloud Positive

Negative Word Cloud Negative

MODEL EVALUATION

Beroulli NB Model Beroulli NB Model Beroulli NB Model

Linear Model Linear Model Linear Model

Logistic Regression Model Logistic Regression Model Logistic Regression Model

Mutlinomial NB Model Mutlinomial NB Model Mutlinomial NB Model

XGBoost Model XGBoost Model XGBoost Model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages