# Sentiment Analysis
- Sentiment Analysis, also known as opinion mining, is a technique used in natural language processing (NLP) to determine the sentiment expressed in a piece of text. 
- The sentiment can be positive, negative, or neutral. This technique is widely used in various applications such as customer feedback analysis, social media monitoring, market research, and more.

### Key Concepts in Sentiment Analysis:

- **Polarity:** Indicates whether the sentiment is positive, negative, or neutral.
- **Subjectivity:** Measures how subjective or objective a piece of text is.
- **Feature Extraction:** Converts text into numerical representations, often using techniques like Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), or word embeddings.
- **Classification Model:** A machine learning model trained to predict the sentiment of a given text based on extracted features.

# Implementation of Sentiment Analysis using TextBlob:
TextBlob is a simple library for processing textual data. It provides an easy-to-use API for common NLP tasks including sentiment analysis.

In [2]:
#Step 1: Install TextBlob
#First, you need to install the TextBlob library:

!pip install textblob



Collecting textblob
  Downloading textblob-0.18.0.post0-py3-none-any.whl.metadata (4.5 kB)
Downloading textblob-0.18.0.post0-py3-none-any.whl (626 kB)
   ---------------------------------------- 0.0/626.3 kB ? eta -:--:--
    --------------------------------------- 10.2/626.3 kB ? eta -:--:--
   - ------------------------------------- 30.7/626.3 kB 435.7 kB/s eta 0:00:02
   ----- --------------------------------- 92.2/626.3 kB 880.9 kB/s eta 0:00:01
   ---------------------------- ----------- 450.6/626.3 kB 3.1 MB/s eta 0:00:01
   ---------------------------------------- 626.3/626.3 kB 3.6 MB/s eta 0:00:00
Installing collected packages: textblob
Successfully installed textblob-0.18.0.post0


In [3]:
!python -m textblob.download_corpora

Finished.


[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\brown.zip.
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\conll2000.zip.
[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\movie_reviews.zip.


In [4]:
#Step 2: Perform Sentiment Analysis
#Here's how to use TextBlob for sentiment analysis:

from textblob import TextBlob

# Sample text
text = "I love this product! It's absolutely amazing."

# Create a TextBlob object
blob = TextBlob(text)

# Get the sentiment
sentiment = blob.sentiment

print(f"Polarity: {sentiment.polarity}")
print(f"Subjectivity: {sentiment.subjectivity}")


Polarity: 0.6125
Subjectivity: 0.75


In [5]:
#Step 1: Install Required Libraries
#Make sure to install the necessary libraries if you haven't already:
!pip install nltk scikit-learn



In [6]:
#Step 2: Import Libraries and Load Data
#We'll use the movie_reviews dataset from the nltk library, which contains positive and negative movie reviews.
import nltk
from nltk.corpus import movie_reviews
import random

# Download the movie_reviews dataset if not already downloaded
nltk.download('movie_reviews')

# Load the movie reviews dataset
documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

# Shuffle the documents to mix positive and negative reviews
random.shuffle(documents)


[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


In [7]:
#Step 3: Preprocess the Text Data
#Convert the text into a format suitable for machine learning.

In [8]:
from nltk.corpus import stopwords
from nltk import FreqDist
from nltk.tokenize import word_tokenize

# Download stopwords if not already downloaded
nltk.download('stopwords')
nltk.download('punkt')

# Get the list of all words in the dataset
all_words = FreqDist(word.lower() for word in movie_reviews.words())

# Get the most common 2000 words as features
word_features = list(all_words)[:2000]

# Function to extract features from a document
def document_features(document):
    document_words = set(document)
    features = {}
    for word in word_features:
        features[f'contains({word})'] = (word in document_words)
    return features

# Create feature sets for all documents
featuresets = [(document_features(d), c) for (d, c) in documents]


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [9]:
#Step 4: Split the Data into Training and Testing Sets
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
train_set, test_set = train_test_split(featuresets, test_size=0.2, random_state=42)

# Separate the features and labels
X_train, y_train = zip(*train_set)
X_test, y_test = zip(*test_set)


In [10]:
#Step 5: Train a Classifier
#We will use a Naive Bayes classifier, which is suitable for text classification tasks.
from nltk.classify import NaiveBayesClassifier
from nltk.classify.util import accuracy as nltk_accuracy

# Train the Naive Bayes classifier
classifier = NaiveBayesClassifier.train(train_set)

# Evaluate the classifier
print(f'Accuracy: {nltk_accuracy(classifier, test_set) * 100:.2f}%')

# Show the most informative features
classifier.show_most_informative_features(10)


Accuracy: 80.25%
Most Informative Features
   contains(outstanding) = True              pos : neg    =     20.9 : 1.0
        contains(seagal) = True              neg : pos    =      7.0 : 1.0
         contains(damon) = True              pos : neg    =      6.4 : 1.0
        contains(wasted) = True              neg : pos    =      6.4 : 1.0
          contains(lame) = True              neg : pos    =      5.7 : 1.0
         contains(awful) = True              neg : pos    =      5.1 : 1.0
        contains(poorly) = True              neg : pos    =      5.0 : 1.0
    contains(ridiculous) = True              neg : pos    =      5.0 : 1.0
   contains(wonderfully) = True              pos : neg    =      4.8 : 1.0
        contains(allows) = True              pos : neg    =      4.7 : 1.0
