To analyze Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products.Companies pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.
In cloud ETL process, an AWS RDS database is created with tables in pgAdmin and the dataset is extracted into a DataFrame using PySpark in Google Colab Notebook and transformed into four separate DataFrames that match the table schema in pgAdmin. Data is uploaded into the appropriate tables and queries are run in pgAdmin to confirm that the data has been uploaded.
There is negligible percentage of vine reviews and so it is safe to say there is no bias towards reviews and having a paid Vine reviews makes almost no difference in the percentage of 5-star reviews.
Additional analysis
- Analysis can be performed by analyzing reviews across various star ratings to determine the consistency of results.
- Additional analysis can be performed on different categories of products to support the statement.