# Sentiment Analysis for Customer Reviews Challenge

## Challenge:
Develop a robust Sentiment Analysis classifier for XYZ customer reviews, automating the categorization into positive, negative, or neutral sentiments. Utilize Natural Language Processing (NLP) techniques, exploring different sentiment analysis methods.

## Problem Statement:
XYZ organization, a global online retail giant, accumulates a vast number of customer reviews daily. Extracting sentiments from these reviews offers insights into customer satisfaction, product quality, and market trends. The challenge is to create an effective sentiment analysis model that accurately classifies XYZ customer reviews.

### Important Instructions:

1. Make sure this ipynb file that you have cloned is in the __Project__ folder on the Desktop. The Dataset is also available in the same folder.
2. Ensure that all the cells in the notebook can be executed without any errors.
3. Once the Challenge has been completed, save the SentimentAnalysis.ipynb notebook in the __*Project*__ Folder on the desktop. If the file is not present in that folder, autoevalution will fail.
4. Print the evaluation metrics of the model. 
5. Before you submit the challenge for evaluation, please make sure you have assigned the Accuracy score of the model that was created for evaluation.
6. Assign the Accuracy score obtained for the model created in this challenge to the specified variable in the predefined function *submit_accuracy_score*. The solution is to be written between the comments `# code starts here` and `# code ends here`
7. Please do not make any changes to the variable names and the function name *submit_accuracy_score* as this will be used for automated evaluation of the challenge. Any modification in these names will result in unexpected behaviour.

Imports

In [9]:
import pandas as pd
import numpy as np
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
import transformers
print(transformers.__version__)
from transformers import pipeline
nltk.download('stopwords')
pd.set_option('display.max_colwidth',None)

4.32.1


2023-11-22 09:08:08.634780: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-22 09:08:10.390984: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-22 09:08:10.391048: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-22 09:08:10.391093: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-22 09:08:10.500020: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: A

Read Review data

In [2]:
def read_data():
    df=pd.read_csv('Reviews.csv')
    sample_text=df['Text'].tolist()
    return df,sample_text


#Data Cleaning

In [3]:

 
# Download NLTK resources (uncomment the following lines if you haven't downloaded them yet)
# nltk.download('stopwords')
# nltk.download('punkt')
 
def clean_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove special characters, numbers, and punctuation
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    # Tokenize the text
    tokens = word_tokenize(text)
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    tokens = [word for word in tokens if word not in stop_words]
    # Perform stemming
    stemmer = PorterStemmer()
    tokens = [stemmer.stem(word) for word in tokens]
    # Rejoin the tokens into a single string
    cleaned_text = ' '.join(tokens)
    return cleaned_text
 


Function to set newtral sentiment

In [4]:
def set_neutral(sentiment_list):
    sentiment_list_2=[]
    for dct in sentiment_list:
        if dct['score']<0.95:
            dct['label']='NEUTRAL'
        sentiment_list_2.append(dct)
    return sentiment_list_2

Extract the labels/sentiments

In [5]:
def extract_label(sentiment_list_2):
    label=[]
    for dct in sentiment_list_2:
        label.append(dct['label'])
    return label

In [8]:
def main():
    df,sample_text=read_data()
    sample_text=sample_text[:100]

    filtered_list=[]
    for review in sample_text:
        filtered_review=clean_text(review)
        filtered_list.append(filtered_review)
        
    #defining the pipeline
    sentiment_pipeline = pipeline("sentiment-analysis")

    #generating the sentiments 
    sentiment_list=sentiment_pipeline(filtered_list)

    #setting neutral label wrt threshold
    sentiment_list_2=set_neutral(sentiment_list)

    #extracting the labels
    label=extract_label(sentiment_list_2)

    #reducing to 100 samples
    df2=df.head(100)

    df2['Sentiment']=np.array(label)

    df2[['Text','Sentiment']]
    
    return df2
    

In [9]:
df2=main()

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['Sentiment']=np.array(label)


In [18]:
import requests
# Define the URL of the Flask app
url = 'http://127.0.0.1:5000/analyze_sentiment'
query = {'text': 'i am bad'}

# Making request to flask API
response = requests.post(url, json=query)

# flask API response for given input data
results = response.json()
print(results)

{'sentiment': 'NEG'}
