Develop a python program that performs sentiment analysis on amazon reviews containing the following steps:
1. Implement a sentiment analysis model using spaCy
2. Preprocess the text data
3 . Create a function for sentiment analysis
4. Test the model on sample reviews
5. Write a brief report in a PDF file


In [53]:
# import relevant packages
import spacy
import pandas as pd
from spacytextblob.spacytextblob import SpacyTextBlob

In [58]:
# load the pipeline from spaCy (small)
nlp = spacy.load('en_core_web_sm')
# add spacytextblob to be used for sentiment and polarity
nlp.add_pipe('spacytextblob')

<spacytextblob.spacytextblob.SpacyTextBlob at 0x20f8fee1490>

In [59]:
# open csv file and set as 'df' variable
df = pd.read_csv('amazon_product_reviews.csv')
# select the 'reviews.text' column and drop any rows containing NaNs
reviews_data = df['reviews.text']
reviews_data = reviews_data.dropna()


Create a function that takes in the review text and outputs a sentiment prediction.

In [60]:
# convert sentences to lower case, remove special characters, remove stop words, perform lemmatisation
# handle negations (e.g. transform 'not good' to 'not_good'
def sentiment_analysis(text):
    # tokenise the text to produce a doc object
    doc = nlp(text)
    # create two empty lists to contain the edited tokens (words)
    filtered_tokens = []
    filtered_tokens_two = []
    # loop through the tokens in the text
    for token in doc:
        # if a token isn't a stop-word, lemmatise it and add it to the filtered_tokens list
        if not token.is_stop: 
            filtered_tokens.append(token.lemma_)
    # loop through the tokens in the filtered_tokens list (as the following tasks can't be completed on doc object)
    for token in filtered_tokens:
        # convert token to string and lower case
        token = str(token).lower()
        # if the token only contains letters or numbers, append to the filtered_tokens_two list
        if token.isalnum():
            filtered_tokens_two.append(token)
    # join the list together to become a string
    new_text = ' '.join(filtered_tokens_two)
    # convert the string to doc object so polarity and sentiment can be applied
    new_text_doc = nlp(new_text)
    polarity = new_text_doc._.blob.polarity
    sentiment = new_text_doc._.blob.sentiment
    # work out the average sentiment from the two values (excluding subjectivity as not relevant)
    average_polarity = (polarity + sentiment[0])/2 
    # print sentiment prediction
    print(f"Sentiment prediction: {average_polarity}")

# ask user for index of review data to input into sentiment function
review_index = int(input("Enter index for the review data you would like a sentiment prediction: "))
text = reviews_data[review_index]
# print the review text and sentiment prediction
print(f"""Review text:
      {text}
      """)
sentiment_analysis(text)

Review text:
      This my second order and they seem to work as good as name brand and ship to my door.
      
Sentiment prediction: 0.35


Create a function which compares two reviews and outputs a similarity measure

In [62]:
def compare_two_reviews(review_one, review_two):
    # convert the two reviews into NLP doc objects
    doc1 = nlp(review_one)
    doc2 = nlp(review_two)
    # calculate the similarity and print
    print(f"Similarity: {doc1.similarity(doc2)}")

# ask user to input the index of the reviews they would like to compare
review_one_index = int(input("Enter index for review one to be compared: "))
review_two_index = int(input("Enter index for the second review: "))
# call the comparison function
compare_two_reviews(reviews_data[review_one_index], reviews_data[review_two_index])

Similarity: 0.3892909317696686


  print(f"Similarity: {doc1.similarity(doc2)}")
