# 1. Introduction
## 1.1 Definition
**Sentiment Analysis** is a Natural Language Processing **(NLP)** technique used to determine the emotional tone or attitude expressed in a piece of text.

It involves analyzing the text to classify opinions or emotions as *positive, negative, or neutral.* The process can go beyond basic classification by detecting more nuanced emotions, such as *joy, anger, sadness, or fear.*

# 2. Import libraries
Importing required libraries

In [None]:
# Natural Language Toolkit for text processing
import nltk

# To remove common stopwords from the text
from nltk.corpus import stopwords

# For tokenizing the text into individual words
from nltk.tokenize import word_tokenize

# For converting text data to numerical features
from sklearn.feature_extraction.text import TfidfVectorizer

# To split the dataset into training and testing sets
from sklearn.model_selection import train_test_split

# The Naive Bayes algorithm for classification
from sklearn.naive_bayes import MultinomialNB

# For evaluating the performance of the model
from sklearn.metrics import classification_report

# 3. Downloading NLTK data
required for tokenization and stopword removal

In [None]:
nltk.download('punkt')       # For tokenization
nltk.download('stopwords')   # For removing common stopwords

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

# 4. Data Collection
Creating a small dataset of text samples with associated sentiment labels


In [None]:
texts = [
    "I love this product!",  # Positive sentiment
    "This is the worst experience.",  # Negative sentiment
    "I'm not sure if I like it.",  # Neutral sentiment
    "Absolutely fantastic!"  # Positive sentiment
]
# Corresponding sentiment labels for the texts
labels = ['positive', 'negative', 'neutral', 'positive']

# 5. Text Preprocessing
The text is preprocessed to clean and standardize it. This involves
* tokenizing,
* converting to lowercase, and
* removing stopwords.

In [None]:
def preprocess_text(text):
    # Tokenize the text (convert it to lowercase and split it into individual words)
    tokens = word_tokenize(text.lower())
    # Remove stopwords (common words like "the", "is" that do not contribute much to sentiment)
    tokens = [word for word in tokens if word not in stopwords.words('english')]
    # Join the tokens back into a single string
    return ' '.join(tokens)

In [None]:
# Applying text preprocessing to each text sample in the dataset
texts = [preprocess_text(text) for text in texts]
texts

['love product !',
 'worst experience .',
 "'m sure like .",
 'absolutely fantastic !']

# 6. Feature Extraction
Text is converted to numerical features using **TF-IDF** to weigh the importance of each word in the dataset.
Using **TF-IDF** (*Term Frequency-Inverse Document Frequency*) to convert the text data into numerical features