Skip to content

aouataf-djillani/Amazon-review-sentiment-analysis

Repository files navigation

Amazon review data analysis using SVM

In this repository we provide sentiment analysis using a supervised machine learning method. In a previous project, we applied the VADER (Valence Aware Dictionary for sEntiment Reasoning), a sentiment intensity analyser implemented in NLTK to analyse our unlabeled amazon reviews data-set, we obtained a performance score of 71%. Please refer to my article on Vader Our main goal is the achieve a better performance in predicting positive and negative reviews.

Dataset

For the vader classifier, we used labeld dataset consisting of 10000 reviews on Amazon products.

label review
0 pos Stuning even for the non-gamer: This sound tra...
1 pos The best soundtrack ever to anything.: I'm rea...
2 pos Amazing!: This soundtrack is my favorite music...
3 pos Excellent Soundtrack: I truly like this soundt...
4 pos Remember, Pull Your Jaw Off The Floor After He...

Steps

  1. Exploring : our exploratory analysis of our data showed that there is a balance between positive and negative reviews.
  2. Cleaning and prepping : dealing with empty records and splitting data into train and test data-sets
  3. Feature Extraction: using the TF-IDF technique (term frequency-inverse document frequency) to measure the relevance of words in the reviews.
  4. Train and test: training and testing the SVM model using Scikit learn
  5. Visualizing the performance results: using matplotlib and seaborn to show the classification report and the confusion matrix by comparing our classification results with a gold standard (manual labels).

Results

Our model achied a score of 87%. Our models struggle with identifying negative reviews could be due to sarcastic comments. This could be suject to further analysis.

Classification Report

#Visualizing Classification Report 
predictions= my_model.predict(X_test)
report = classification_report(y_test,predictions, output_dict=True)

df_report = pd.DataFrame(report).transpose().round(2)

#df_report.style.background_gradient(cmap='greens').set_precision(2)
cm = sns.light_palette("green", as_cmap=True)
df_report.style.background_gradient(cmap=cm)
precision recall f1-score support
neg 0.86 0.89 0.87 1649
pos 0.89 0.85 0.87 1651
accuracy 0.87 0.87 0.87 0.87
macro avg 0.87 0.87 0.87 3300
weighted avg 0.87 0.87 0.87 3300

Confusion Matrix

# Visualizing the confision matrix 
predictions=my_model.predict(X_test)
import matplotlib.pyplot as plt
import seaborn as sns
ax= plt.subplot()
cm=confusion_matrix(y_test,predictions)

sns.heatmap(cm, annot=True, fmt='g', ax=ax,cmap='Greens');  


# labels, title and ticks
ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels');
ax.set_title('Confusion Matrix');
ax.xaxis.set_ticklabels(['neg', 'pos']); ax.yaxis.set_ticklabels(['neg', 'pos']);

png

Requirements and Setup

Virtual Environment Setup

python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt

Requirements Installation

pip install -r requirements.txt