- In this project I have picked a data set from amazon reviews regarding video games and used Pyspark to perform the ETL process by extracting the data, transforming the data and connecting to the database that was generated for me through the AWS webserver. With the reviews my goal is to try and determine if there is favorable review bias from the Vine members of our data set.
- There were a total of of 4,291 vine reviews in our dataset, and 40,471 non-vine reviews in the complete dataset.
- In the data set their was a total of 15,711 5-star reviews
- 15,663 of the 5-star reviews were non-vine
- 38.2% of the five_star reviews were vine
- 38.9% of the five_star reviews were non-vine
-After I had come up with my analysis there does not appear to be any sort of positivity bias because the percentages shown above are very similar at 38%. To conclude the analysis the vine program does not show any bias.