Skip to content

This project employs transformer models for sentiment analysis on primate-themed posts, encompassing data preprocessing, model training, evaluation, and optional quantization. It provides a streamlined workflow for analyzing sentiment in primate-related textual data, offering insights into public perceptions of primates.

Notifications You must be signed in to change notification settings

JaskaranSingh-01/Sentiment_Analyzer

Repository files navigation

Sentiment Analysis on Primate Dataset

This project aims to perform sentiment analysis on a dataset containing posts related to primates. Leveraging transformer-based models, sentiment labels are predicted for the textual data. Below is an elaborate description of the project workflow:

Workflow Overview

  1. Data Preprocessing:

    • The dataset (primate_dataset.json) is loaded and preprocessed to prepare it for model training.
    • The clean_text function in preprocess_data.py cleans the text data by converting it to lowercase, removing punctuation, tokenizing, removing stopwords, and stemming.
    import pandas as pd
    import nltk
    import string
    from nltk import word_tokenize
    from nltk.corpus import stopwords
    from nltk.stem.porter import PorterStemmer
    
    nltk.download('punkt')
    nltk.download('stopwords')
    
    def clean_text(data):
        # Implementation of text cleaning
        ...
        return result
  2. Data Splitting:

    • The preprocessed data is split into training and testing sets for model evaluation.
    from sklearn.model_selection import train_test_split
    
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
  3. Model Training:

    • The sentiment analysis model is trained using the training data.
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    import torch
    from torch.utils.data import DataLoader, TensorDataset
    
    tokenizer = AutoTokenizer.from_pretrained("sbcBI/sentiment_analysis_model")
    model = AutoModelForSequenceClassification.from_pretrained("sbcBI/sentiment_analysis_model")
    
    # Model training code...
  4. Model Evaluation:

    • The trained model is evaluated using the testing data to assess its performance.
    from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
    import numpy as np
    
    # Model evaluation code...
  5. Optional Quantization:

    • Optionally, the trained model can be quantized to reduce memory usage and improve inference speed.
    from torch.quantization import quantize_dynamic
    
    quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)

Code Structure

  • preprocess_data.py: Contains functions for cleaning and preprocessing the dataset.
  • train_model.py: Script for training the sentiment analysis model.
  • evaluate_model.py: Script for evaluating the trained model on the test dataset.
  • quantize_model.py: Optional script for quantizing the trained model.
  • utils.py: Utility functions used across different scripts.

Setup and Dependencies

  1. Install the required Python packages:

    pip install pandas transformers torch nltk
    
  2. Download NLTK data:

    import nltk
    nltk.download('punkt')
    nltk.download('stopwords')
  3. Ensure GPU support for faster model training if available.

Usage

  1. Preprocess the dataset using preprocess_data.py.
  2. Train the sentiment analysis model using train_model.py.
  3. Evaluate the trained model using evaluate_model.py.
  4. Optionally, quantize the trained model using quantize_model.py.

Model Deployment

The trained and optionally quantized model can be deployed for inference in production environments. Ensure compatibility with the deployment platform and optimize for performance if necessary.

About

This project employs transformer models for sentiment analysis on primate-themed posts, encompassing data preprocessing, model training, evaluation, and optional quantization. It provides a streamlined workflow for analyzing sentiment in primate-related textual data, offering insights into public perceptions of primates.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published