The analysis used PySpark to perform the extract, transform, and load (ETL) process on a select dataset (US Music) from Amazon. The ETL process used a Google Colaboratory notebook and then connected the data to an Amazon Web Services relational database (AWS RDS). The data was then loaded into a postgres database.
Data: Amazon - US Music
- Vine reviews: 7
- Non-Vine reviews: 105,979
- Five Star Vine reviews: 0
- Five Star Non-Vine reviews: 67,580
- Percentage of Five Star Vine reviews: 0%
- Percentage of Five Star Non-Vine reviews: Approximately 64%
Based on the data collected, it is evident Amazon Vine reviews didn't influence this particular data set. Approximately 64% of Non-Vine reviews were five star reviews while 0% of Vine reviews were five star reviews, revealing no positivity bias for Amazon Vine reviewers in this dataset. Because of the 0% five star Vine review statistic, no additional data is needed to support the claim of no positivity bias. Using Natural Language Processing (NLP) on reviews could unlock additional insights about the data