Skip to content

agrawalanuj25/NLP

Repository files navigation

Twitter Sentiment Analysis Overview This project focuses on performing sentiment analysis on a Twitter dataset, classifying tweets as positive, negative, or neutral based on their content. By leveraging machine learning techniques, we aim to extract meaningful insights from social media data, allowing for a deeper understanding of public sentiment.

Project Objectives

  • Develop a machine learning model capable of accurately classifying the sentiment of tweets.
  • Preprocess the text data to improve the quality and relevance of the input data.
  • Implement word embedding techniques for feature extraction.
  • Achieve high accuracy in sentiment classification using a Long Short-Term Memory (LSTM) neural network.

Dataset

The dataset consists of Twitter data containing tweets with their associated sentiment labels (positive, negative, or neutral). This data was used to train and test the machine learning models developed in the project.

Methodology

1. Data Preprocessing

To prepare the Twitter data for machine learning, several text-processing techniques were applied:

  • Tokenization: Splitting the text into individual words or tokens.
  • Stopword Removal: Removing common stopwords (e.g., "and", "the") that do not contribute to the sentiment.
  • Lemmatization: Reducing words to their base or root form to minimize variation in the data.

2. Feature Extraction

To convert the processed text data into numerical form suitable for machine learning algorithms, word embedding techniques were utilized:

  • Word2Vec: A popular word embedding technique that captures the context of words in a document.
  • TF-IDF (Term Frequency-Inverse Document Frequency): A statistical measure that evaluates the importance of a word in a document relative to a collection of documents.

3. Model Development

A deep learning approach was taken to classify the sentiment of the tweets:

  • LSTM Neural Network: A type of recurrent neural network (RNN) that is well-suited for sequential data like text. LSTMs are capable of learning long-term dependencies, making them ideal for understanding context in language.

4. Performance

The final model achieved an accuracy of 92% on the test dataset, demonstrating the effectiveness of the word embedding techniques and the LSTM architecture in capturing sentiment from Twitter data.

Results

  • Successfully classified tweets with an accuracy of 92%, indicating that the model is highly effective in identifying sentiment from text.
  • The combination of text preprocessing, word embeddings, and LSTM neural networks proved to be a robust approach for sentiment analysis.

Technologies Used

  • Python: Programming language for implementation.
  • NLTK: Natural Language Toolkit for text preprocessing.
  • Gensim: For Word2Vec implementation.
  • Scikit-learn: For TF-IDF and evaluation metrics.
  • Keras/TensorFlow: For building and training the LSTM neural network.

Conclusion

This project demonstrates the power of machine learning in text-based sentiment analysis. By applying advanced text processing techniques and deep learning models, we were able to achieve a high level of accuracy in classifying the sentiment of tweets. This approach can be extended to other text-based datasets for similar sentiment analysis tasks.

Future Work

  • Further fine-tuning of the LSTM model to improve accuracy.
  • Exploring other deep learning architectures like Transformer models for better performance.
  • Applying the model to real-time sentiment analysis of Twitter data streams.

How to Run the Project

  • Clone the repository.
  • Install the necessary dependencies listed in requirements.txt.
  • Download the Twitter dataset and place it in the designated directory.
  • Run the Jupyter notebook or Python script to preprocess the data, train the model, and evaluate the results.

About

This repository is dedicated to Twitter Sentiment Analysis, focusing on classifying tweet sentiments using machine learning. It includes text-processing techniques for data enhancement and utilizes Word Embedding methods and an LSTM neural network for accurate sentiment classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors