It's worth noting that NLTK also requires some additional data to be downloaded before it can be used effectively. This data includes pre-trained models, corpora, and other resources that NLTK uses to perform various NLP tasks. To download this data, run the following command in terminal or your Python script

## End-to-end Sentiment Analysis Example in Python
To perform sentiment analysis using NLTK in Python, the text data must first be preprocessed using techniques such as tokenization, stop word removal, and stemming or lemmatization. Once the text has been preprocessed, we will then pass it to the Vader sentiment analyzer for analyzing the sentiment of the text (positive or negative).


### Step 1 - Import libraries and load dataset

First, we’ll import the necessary libraries for text analysis and sentiment analysis, such as pandas for data handling, nltk for natural language processing, and SentimentIntensityAnalyzer for sentiment analysis.

We’ll then download all of the NLTK corpus (a collection of linguistic data) using nltk.download().

Once the environment is set up, we will load a dataset of Amazon reviews using pd.read_csv(). This will create a DataFrame object in Python that we can use to analyze the data. We'll display the contents of the DataFrame using df.

In [None]:
# import libraries
import pandas as pd
import nltk
nltk.download('all')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

In [None]:
# Load the amazon review dataset
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/amazon.csv')
df

Unnamed: 0,reviewText,Positive
0,This is a one of the best apps acording to a b...,1
1,This is a pretty good version of the game for ...,1
2,this is a really cool game. there are a bunch ...,1
3,"This is a silly game and can be frustrating, b...",1
4,This is a terrific game on any pad. Hrs of fun...,1
...,...,...
19995,this app is fricken stupid.it froze on the kin...,0
19996,Please add me!!!!! I need neighbors! Ginger101...,1
19997,love it! this game. is awesome. wish it had m...,1
19998,I love love love this app on my side of fashio...,1


### Step 2 - Preprocess text
Let’s create a function preprocess_text in which we first tokenize the documents using word_tokenize function from NLTK, then we remove step words using stepwords module from NLTK and finally, we lemmatize the filtered_tokens using WordNetLemmatizer from NLTK.

In [None]:
text = "Please add me!!!!! I couldn't meet neighbors! Ginger101."

In [None]:
text.lower()

"please add me!!!!! i couldn't meet neighbors! ginger101."

In [None]:
word_tokenize(text.lower())

In [None]:
stopwords.words('english')

In [None]:
filtered_tokens = [token for token in word_tokenize(text.lower()) if token not in stopwords.words('english')]
filtered_tokens

['please',
 'add',
 '!',
 '!',
 '!',
 '!',
 '!',
 'could',
 "n't",
 'meet',
 'neighbors',
 '!',
 'ginger101',
 '.']

In [None]:
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]
lemmatized_tokens

['please',
 'add',
 '!',
 '!',
 '!',
 '!',
 '!',
 'could',
 "n't",
 'meet',
 'neighbor',
 '!',
 'ginger101',
 '.']

In [None]:
# create preprocess_text function
def preprocess_text(text):

    # Tokenize the text
    tokens = word_tokenize(text.lower())



    # Remove stop words
    filtered_tokens = [token for token in tokens if token not in stopwords.words('english')]


    # Lemmatize the tokens
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]


    # Join the tokens back into a string
    processed_text = ' '.join(lemmatized_tokens)
    return processed_text

In [None]:
df['reviewText'].apply(lambda reviewText : preprocess_text(text=reviewText))

0        one best apps acording bunch people agree bomb...
1        pretty good version game free . lot different ...
2        really cool game . bunch level find golden egg...
3        silly game frustrating , lot fun definitely re...
4        terrific game pad . hr fun . grandkids love . ...
                               ...                        
19995    app fricken stupid.it froze kindle wont allow ...
19996    please add ! ! ! ! ! need neighbor ! ginger101...
19997    love ! game . awesome . wish free stuff house ...
19998    love love love app side fashion story fight wo...
19999    game rip . list thing make better & bull ; fir...
Name: reviewText, Length: 20000, dtype: object

In [None]:
df['reviewText'].apply(preprocess_text)

0        one best apps acording bunch people agree bomb...
1        pretty good version game free . lot different ...
2        really cool game . bunch level find golden egg...
3        silly game frustrating , lot fun definitely re...
4        terrific game pad . hr fun . grandkids love . ...
                               ...                        
19995    app fricken stupid.it froze kindle wont allow ...
19996    please add ! ! ! ! ! need neighbor ! ginger101...
19997    love ! game . awesome . wish free stuff house ...
19998    love love love app side fashion story fight wo...
19999    game rip . list thing make better & bull ; fir...
Name: reviewText, Length: 20000, dtype: object

In [None]:
type(df['reviewText'].apply(lambda reviewText : preprocess_text(text=reviewText)) )

pandas.core.series.Series

In [None]:
type(df['reviewText'].apply(preprocess_text))

pandas.core.series.Series

In [None]:
# apply the function df

df['reviewText'] = df['reviewText'].apply(preprocess_text)
df

Unnamed: 0,reviewText,Positive
0,one best apps acording bunch people agree bomb...,1
1,pretty good version game free . lot different ...,1
2,really cool game . bunch level find golden egg...,1
3,"silly game frustrating , lot fun definitely re...",1
4,terrific game pad . hr fun . grandkids love . ...,1
...,...,...
19995,app fricken stupid.it froze kindle wont allow ...,0
19996,please add ! ! ! ! ! need neighbor ! ginger101...,1
19997,love ! game . awesome . wish free stuff house ...,1
19998,love love love app side fashion story fight wo...,1


In [None]:
analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(text)

{'neg': 0.0, 'neu': 0.634, 'pos': 0.366, 'compound': 0.5374}

In [None]:
# initialize NLTK sentiment analyzer

analyzer = SentimentIntensityAnalyzer()


# create get_sentiment function

def get_sentiment(text):

    scores = analyzer.polarity_scores(text)

    sentiment = 1 if scores['pos'] > 0 else 0

    return sentiment




# apply get_sentiment function

df['sentiment'] = df['reviewText'].apply(get_sentiment)

df

Unnamed: 0,reviewText,Positive,sentiment
0,one best apps acording bunch people agree bomb...,1,1
1,pretty good version game free . lot different ...,1,1
2,really cool game . bunch level find golden egg...,1,1
3,"silly game frustrating , lot fun definitely re...",1,1
4,terrific game pad . hr fun . grandkids love . ...,1,1
...,...,...,...
19995,app fricken stupid.it froze kindle wont allow ...,0,0
19996,please add ! ! ! ! ! need neighbor ! ginger101...,1,1
19997,love ! game . awesome . wish free stuff house ...,1,1
19998,love love love app side fashion story fight wo...,1,1


In [None]:
from sklearn.metrics import confusion_matrix
print(confusion_matrix(df['Positive'], df['sentiment']))

[[ 1131  3636]
 [  576 14657]]


In [None]:
from sklearn.metrics import classification_report

print(classification_report(df['Positive'], df['sentiment']))

              precision    recall  f1-score   support

           0       0.66      0.24      0.35      4767
           1       0.80      0.96      0.87     15233

    accuracy                           0.79     20000
   macro avg       0.73      0.60      0.61     20000
weighted avg       0.77      0.79      0.75     20000

