<a href="https://colab.research.google.com/github/Natural-Language-Processing-YU/Exercises/blob/main/M3_exercise_naive_bayes_sentiment_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Naïve Bayes and Sentiment Classification

## Objective:
Students will understand the Naïve Bayes algorithm, its application in Natural Language Processing (NLP), particularly for sentiment classification, and will be able to implement a basic Naïve Bayes classifier in Python.


## Naïve Bayes Algorithm
Naïve Bayes is a probabilistic classifier based on Bayes' theorem, which calculates the probability of a class given the presence of features. It is called "naïve" because it assumes that the features are conditionally independent of each other given the class label, which is rarely true in practice but simplifies the calculations significantly.

### Bayes' Theorem
Bayes' theorem is stated as:

$$ P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)} $$

Where:
- $ P(C|X) $ is the posterior probability of class $ C $ given the feature vector $ X $.
- $ P(X|C) $ is the likelihood of feature vector $ X $ given class $ C $.
- $ P(C) $ is the prior probability of class $ C $.
- $ P(X) $ is the probability of the feature vector $ X $ (which can be considered a normalization factor).

## Naïve Bayes Classifier
In the context of text classification, documents are represented as feature vectors. The Naïve Bayes classifier calculates the probability of each class given the document's features and assigns the document to the class with the highest probability.

### Steps in Naïve Bayes Classification:

1. **Training Phase:**
    - **Calculate Prior Probabilities:** The prior probability of each class $ P(C) $ is calculated based on the training data.
    - **Calculate Likelihood:** The likelihood $ P(X|C) $ is calculated for each feature (word) in the vocabulary given each class. This is typically done using the frequency of the word in documents of the given class.

2. **Classification Phase:**
    - For a new document, the classifier computes the posterior probability for each class using the features present in the document.
    - The document is assigned to the class with the highest posterior probability.

## Example: Sentiment Analysis
Consider a simple example of classifying movie reviews as positive or negative. Suppose we have the following training data:

| Sentence                   | Label    |
|----------------------------|----------|
| "The movie was great!"     | Positive |
| "I didn't like the movie." | Negative |
| "The acting was fantastic!"| Positive |
| "It was a terrible movie." | Negative |

1. **Calculate Priors:**
$$ P(\text{Positive}) = \frac{2}{4} = 0.5 $$
$$ P(\text{Negative}) = \frac{2}{4} = 0.5 $$

2. **Calculate Likelihoods:**
For each word in the vocabulary, calculate the likelihoods given the class. For example:
$$ P(\text{great}|\text{Positive}) = \frac{1}{4} = 0.25 $$
$$ P(\text{terrible}|\text{Negative}) = \frac{1}{4} = 0.25 $$

3. **Classification:**
For a new sentence, e.g., "The acting was great":
$$ P(\text{Positive}|\text{The acting was great}) = P(\text{Positive}) \cdot P(\text{The}|\text{Positive}) \cdot P(\text{acting}|\text{Positive}) \cdot P(\text{was}|\text{Positive}) \cdot P(\text{great}|\text{Positive}) $$
$$ = 0.5 \cdot 0.5 \cdot 0.5 \cdot 0.5 \cdot 0.25 = 0.015625 $$

Similarly, calculate for the Negative class and compare.

### Advantages:
- **Simplicity:** Easy to implement and understand.
- **Efficiency:** Works well with large datasets.
- **Scalability:** Scales linearly with the number of features and data points.

### Limitations:
- **Conditional Independence Assumption:** The assumption that features are independent given the class label is rarely true in practice.
- **Data Scarcity:** Performance can be poor if the training data is limited.

## Implementation in Python
Here is a basic example of implementing Naïve Bayes for sentiment analysis using Python and the `sklearn` library:

In [None]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Sample data
X_train = ["The movie was great!", "I didn't like the movie.", "The acting was fantastic!", "It was a terrible movie."]
y_train = ["Positive", "Negative", "Positive", "Negative"]

X_test = ["The movie was awesome!", "I hated the movie."]

# Vectorize the data
vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
X_test_counts = vectorizer.transform(X_test)

# Train Naïve Bayes classifier
clf = MultinomialNB()
clf.fit(X_train_counts, y_train)

# Predict
y_pred = clf.predict(X_test_counts)

# Output results
for doc, category in zip(X_test, y_pred):
    print(f'{doc} => {category}')

By following this lesson plan and detailed explanation, students will gain a solid understanding of Naïve Bayes and its application in sentiment classification.