> ## Key Takeaways

> By the end of this module, students will gain a comprehensive understanding of sentiment analysis techniques, be able to preprocess text data effectively, build and evaluate sentiment analysis models using both traditional machine learning algorithms and deep learning techniques, and apply these skills to real-world NLP tasks.
--------------------------------------------------------------------------------------------------------------------------------

### 1. Introduction to NLP
- Understanding Natural Language Processing (NLP) and its applications.
- Overview of NLP techniques for text analysis and understanding.
- Importance of sentiment analysis in understanding customer opinions and feedback.

### 2. Text Preprocessing
- __Tokenization:__ Breaking down text into smaller units such as words or sentences.
- __Stopword Removal:__ Removing common words (e.g., "the", "is", "and") that do not carry significant meaning.
- __Lemmatization:__ Reducing words to their base or root form (e.g., "running" to "run", "better" to "good").

### 3. Bag of Words Model
- Introduction to the Bag of Words (BoW) model.
- Creating a vocabulary of unique words from the text corpus.
- Representing text documents as numerical vectors based on word frequency.

### 4. Sentiment Analysis Algorithms

#### Naive Bayes Classifier:

- Understanding the Naive Bayes algorithm for text classification.
- Training a Naive Bayes classifier using labeled data for sentiment analysis.
- Evaluating the classifier performance using accuracy, precision, recall, and F1-score.

#### LSTM (Long Short-Term Memory) Model:

- Introduction to Recurrent Neural Networks (RNNs) and LSTM architecture.
- Preprocessing text data for LSTM input.
- Building and training an LSTM model for sentiment analysis.
- Fine-tuning the model and optimizing hyperparameters.
- Evaluating the LSTM model performance and comparing it with traditional methods.

### 5. Model Evaluation Metrics

- __Accuracy:__ The proportion of correctly classified instances.
- __Precision:__ The proportion of true positive predictions among all positive predictions.
- __Recall:__ The proportion of true positive predictions among all actual positive instances.
- __F1-score:__ The harmonic mean of precision and recall, providing a balanced measure of model performance.

### 6. Practical Applications and Case Studies
- Sentiment analysis on customer reviews for products or services.
- Analyzing social media sentiments for brand perception.
- Understanding public opinions on political or social issues through text analysis.

In [6]:
#Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.tokenize import word_tokenize
import string
from collections import Counter

import warnings
warnings.simplefilter('ignore', category=Warning, lineno=0, append=False)

## **Preprocess Text Data

In [7]:
text_data = [
    "Wow... love this place", 
    "Crust is not good", 
    "Not tasty and the texture was just nasty"
]

In [21]:
text_lower = [text.lower() for text in text_data]
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Kennedy\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.


True

In [22]:
token_data = [nltk.word_tokenize(text) for text in text_lower]
token_data

[['wow', '...', 'love', 'this', 'place'],
 ['crust', 'is', 'not', 'good'],
 ['not', 'tasty', 'and', 'the', 'texture', 'was', 'just', 'nasty']]

In [28]:
#nltk.download('stopwords')
english_stopwatch = set(stopwords.words('english'))
english_stopwatch

{'a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 'her',
 'here',
 'hers',
 'herself',
 'him',
 'himself',
 'his',
 'how',
 'i',
 'if',
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it's",
 'its',
 'itself',
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'only',
 'or',
 'other',
 'our',
 'ours',
 'ourselves',
 'out',
 'over',
 'own',
 'r

In [33]:
'Crust is not good'.translate(str.maketrans('', '', string.punctuation))

'Crust is not good'