Skip to content

Analyzing Tweets of people whether positive or negative through Natural Language Processing (NLP) techniques like Tokenization, etc

Notifications You must be signed in to change notification settings

DURGESH716/NLP_Twitter_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"Twitter Sentiment Analysis with Natural Language Processing NLP 🤖"

Problem Statement and Business Case:-

Most of the companies struggles in making analysis of the collected data, may be of customer's feedback form, product reviews, comments or posts on social media or Tweets on Twitter. They requires various teams of expertise to classify good and bad comments, then set plan to accomplish them. To overcome such a problem, new technology has been trending in the market known as Natural Language Processing.

  • This Model works on Naive Bayes Theoram and conditional Probability to classify tweets as positive 🙂 or negative 😈
  • The Model converts words into numbers, then these numbers are used to train the NLP Model, and prediction is done
  • The Model works completely independent and does not require any human help or support, also saves cost and time.

Pre-Requisites / Technologies Used:-

  • Python Programming Language (Intermediate), Statistics, Probability and Naive Bayes Theoram
  • Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn and nltk (natural language toolkit)

Step_1: Dataset Cleaning and Data Exploration:-

  • Dealing with null values and missing values like is.na() to check them
  • Using functions to get familiar with data like info(), describe()
  • Visualizing data using graphs like histogram, count-plot, etc.

Step_2: Data Pre-Processing:-

  • Removing Punctuations (!"#$%&'{|}~) and Stop-Words (who, whom, which, and, is, etc.)
  • Performing Tokenization (Vectorization): "Process of converting words into small pieces of useful strings"

Step_3: Training and Measuring the Model:-

  • Splitting the dataset into two parts: Training (80% data) and testing (20% data)
  • Now, train the model using Multinomial Naive Bayes Classifier
  • Achived Accuracy of 0.94 (94%) on testing dataset
  • Used Confusion Matrix:"Compares true value with predicted value"

Show 💗 by ⭐ My Repository

About

Analyzing Tweets of people whether positive or negative through Natural Language Processing (NLP) techniques like Tokenization, etc

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published