- Overview of the analysis: Explain the purpose of this analysis.
I was working with Jennifer on the Sellby project and was asked to do a larger project. I needed to analize amazon reviews written by members of the paid Amazon Vine program. The program is a service than allows manufacturers ad publishers to receive reviews for their products. I had access to many databases. I chose to work with the one that reviewed personal care appliances.
I used Pyspark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data to pgAdmin. I also had to analyze if there was any bias toward favorable reviews from Vine members in my database.
- Results: Using bulleted lists and images of DataFrames as support, address the following questions:
How many Vine reviews and non-Vine reviews were there?
There were 85981 Vine Reviews and 2901 non-vien reviews.
How many Vine reviews were 5 stars? How many non-Vine reviews were 5 stars?
There were 48897 Vine reviews that were 5 stars, and 2901 reviews that were non-vine
What percentage of Vine reviews were 5 stars? What percentage of non-Vine reviews were 5 stars?
Percentage Vine= 56.87% Percentage non vine = 3.37%
- Summary: In your summary, state if there is any positivity bias for reviews in the Vine program. Use the results of your analysis to support your statement. Then, provide one additional analysis that you could do with the dataset to support your statement.
According to my research and analysis there is a bias when it comes to the vine program. Users who are paid for their reviews and likely to give 5 stars to the product as opposed to those who are not paid.
Deliverable 1 charts from PgAdmin
Review_id_table
Customer_table
Vine_Table
Products_table