Skip to content

This is a Big Data project using AWS, pyspark-sql, pyspark and Google Collaboratory to determine if there is any bias in the reviews of vine and non-vine reviewers on Amazon.

Notifications You must be signed in to change notification settings

GR8505/Big_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

Big Data Analysis: Amazon Vine vs Non-Vine Customers

Video Games Category

drawing


Executive Overview


Who are vine customers? Amazon Vine invites are the most trusted reviewers on Amazon who post opinions about new and pre-release items to help their fellow customers make informed purchase decisions. Customers become Vine Voices based on their reviewer rank. This program was created to provide customers with more honest and unbiased feedback from some of Amazon's most trusted reviewers.

drawing

Key Findings

  1. In the video games category, there is a total of just 4,290 vine customers compared to 1,781,596 non-vine customers.
  2. Non-vine customers recorded the highest number of Total Votes as well as Helpful Votes.
  3. Both vine and non-vine reviews received similar average ratings of 4.07 and 4.06, respectively.
  4. However, the number of Helpful Votes per customer is slightly higher for vine customers (2.35) compared to non-vine customers (2.26).
  5. Furthermore, non-vine customers received the lion's share of five star ratings (1,025,249) and the highest ratio of five star ratings per customer (0.58) compared to vine reviewers (1,607 and 0.37).
  6. Nevertheless, vine customers were less likely to provide ratings on the lower end of the spectrum. The ratio of vine customers that gave one star ratings were 0.01 compared to 0.11 for non-vine customers.

Check the following link to see full text file and Appendix with tables:

Resources


  • pyspark
  • pyspark-sql
  • Amazon Web Services (AWS)
  • Google Collaboratory

Data


The data was obtained from AWS S3 bucket file.

ETL and EDA Process


Preprocessing and Exploratory Data Analysis was performed on Google Collaboratory. The following link highlights all the steps to attain the following results.


© 2020 GitHub, Inc.

About

This is a Big Data project using AWS, pyspark-sql, pyspark and Google Collaboratory to determine if there is any bias in the reviews of vine and non-vine reviewers on Amazon.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published