GitHub - agrawalanuj25/NLP: This repository is dedicated to Twitter Sentiment Analysis, focusing on classifying tweet sentiments using machine learning. It includes text-processing techniques for data enhancement and utilizes Word Embedding methods and an LSTM neural network for accurate sentiment classification

Twitter Sentiment Analysis Overview This project focuses on performing sentiment analysis on a Twitter dataset, classifying tweets as positive, negative, or neutral based on their content. By leveraging machine learning techniques, we aim to extract meaningful insights from social media data, allowing for a deeper understanding of public sentiment.

Project Objectives

Develop a machine learning model capable of accurately classifying the sentiment of tweets.
Preprocess the text data to improve the quality and relevance of the input data.
Implement word embedding techniques for feature extraction.
Achieve high accuracy in sentiment classification using a Long Short-Term Memory (LSTM) neural network.

Dataset

The dataset consists of Twitter data containing tweets with their associated sentiment labels (positive, negative, or neutral). This data was used to train and test the machine learning models developed in the project.

Methodology

1. Data Preprocessing

To prepare the Twitter data for machine learning, several text-processing techniques were applied:

Tokenization: Splitting the text into individual words or tokens.
Stopword Removal: Removing common stopwords (e.g., "and", "the") that do not contribute to the sentiment.
Lemmatization: Reducing words to their base or root form to minimize variation in the data.

2. Feature Extraction

To convert the processed text data into numerical form suitable for machine learning algorithms, word embedding techniques were utilized:

Word2Vec: A popular word embedding technique that captures the context of words in a document.
TF-IDF (Term Frequency-Inverse Document Frequency): A statistical measure that evaluates the importance of a word in a document relative to a collection of documents.

3. Model Development

A deep learning approach was taken to classify the sentiment of the tweets:

LSTM Neural Network: A type of recurrent neural network (RNN) that is well-suited for sequential data like text. LSTMs are capable of learning long-term dependencies, making them ideal for understanding context in language.

4. Performance

The final model achieved an accuracy of 92% on the test dataset, demonstrating the effectiveness of the word embedding techniques and the LSTM architecture in capturing sentiment from Twitter data.

Results

Successfully classified tweets with an accuracy of 92%, indicating that the model is highly effective in identifying sentiment from text.
The combination of text preprocessing, word embeddings, and LSTM neural networks proved to be a robust approach for sentiment analysis.

Technologies Used

Python: Programming language for implementation.
NLTK: Natural Language Toolkit for text preprocessing.
Gensim: For Word2Vec implementation.
Scikit-learn: For TF-IDF and evaluation metrics.
Keras/TensorFlow: For building and training the LSTM neural network.

Conclusion

This project demonstrates the power of machine learning in text-based sentiment analysis. By applying advanced text processing techniques and deep learning models, we were able to achieve a high level of accuracy in classifying the sentiment of tweets. This approach can be extended to other text-based datasets for similar sentiment analysis tasks.

Future Work

Further fine-tuning of the LSTM model to improve accuracy.
Exploring other deep learning architectures like Transformer models for better performance.
Applying the model to real-time sentiment analysis of Twitter data streams.

How to Run the Project

Clone the repository.
Install the necessary dependencies listed in requirements.txt.
Download the Twitter dataset and place it in the designated directory.
Run the Jupyter notebook or Python script to preprocess the data, train the model, and evaluate the results.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Article Summarization.ipynb		Article Summarization.ipynb
Neural machine translation.ipynb		Neural machine translation.ipynb
README.md		README.md
Sentiment analysis - naive bayes.ipynb		Sentiment analysis - naive bayes.ipynb
Sentiment analysis - regression.ipynb		Sentiment analysis - regression.ipynb
word embedding generator.ipynb		word embedding generator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Objectives

Dataset

Methodology

1. Data Preprocessing

2. Feature Extraction

3. Model Development

4. Performance

Results

Technologies Used

Conclusion

Future Work

How to Run the Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Objectives

Dataset

Methodology

1. Data Preprocessing

2. Feature Extraction

3. Model Development

4. Performance

Results

Technologies Used

Conclusion

Future Work

How to Run the Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages