Skip to content

ChrisBarton107/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Amazon_Vine_Analysis

Overview

The analysis used PySpark to perform the extract, transform, and load (ETL) process on a select dataset (US Music) from Amazon. The ETL process used a Google Colaboratory notebook and then connected the data to an Amazon Web Services relational database (AWS RDS). The data was then loaded into a postgres database.

Data: Amazon - US Music

Results

Vine vs. Non-Vine Reviews

drawing

drawing

  • Vine reviews: 7
  • Non-Vine reviews: 105,979

Five Star Reviews

drawing

drawing

  • Five Star Vine reviews: 0
  • Five Star Non-Vine reviews: 67,580

Five Star Review Percentages

drawing

drawing

  • Percentage of Five Star Vine reviews: 0%
  • Percentage of Five Star Non-Vine reviews: Approximately 64%

Summary

Based on the data collected, it is evident Amazon Vine reviews didn't influence this particular data set. Approximately 64% of Non-Vine reviews were five star reviews while 0% of Vine reviews were five star reviews, revealing no positivity bias for Amazon Vine reviewers in this dataset. Because of the 0% five star Vine review statistic, no additional data is needed to support the claim of no positivity bias. Using Natural Language Processing (NLP) on reviews could unlock additional insights about the data

About

Analyzing Amazon reviews with PySpark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published