Most of the companies struggles in making analysis of the collected data, may be of customer's feedback form, product reviews, comments or posts on social media or Tweets on Twitter. They requires various teams of expertise to classify good and bad comments, then set plan to accomplish them. To overcome such a problem, new technology has been trending in the market known as Natural Language Processing.
- This Model works on Naive Bayes Theoram and conditional Probability to classify tweets as positive 🙂 or negative 😈
- The Model converts words into numbers, then these numbers are used to train the NLP Model, and prediction is done
- The Model works completely independent and does not require any human help or support, also saves cost and time.
- Python Programming Language (Intermediate), Statistics, Probability and Naive Bayes Theoram
- Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn and nltk (natural language toolkit)
- Dealing with null values and missing values like is.na() to check them
- Using functions to get familiar with data like info(), describe()
- Visualizing data using graphs like histogram, count-plot, etc.
- Removing Punctuations (!"#$%&'{|}~) and Stop-Words (who, whom, which, and, is, etc.)
- Performing Tokenization (Vectorization): "Process of converting words into small pieces of useful strings"
- Splitting the dataset into two parts: Training (80% data) and testing (20% data)
- Now, train the model using Multinomial Naive Bayes Classifier
- Achived Accuracy of 0.94 (94%) on testing dataset
- Used Confusion Matrix:"Compares true value with predicted value"