# Lab - EDA Bivariate Analysis: Diving into Amazon UK Product Insights Part II
__Objective:__ Delve into the dynamics of product pricing on Amazon UK to uncover insights that can inform business strategies and decision-making.

Dataset: This lab utilizes the Amazon UK product dataset which provides information on product categories, brands, prices, ratings, and more from from Amazon UK. You'll need to download it to start working with it.

In [8]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import skew, kurtosis

%matplotlib inline

dataset = 'amz_uk_price_prediction_dataset.csv'
df = pd.read_csv(dataset)

df.head()

Unnamed: 0,uid,asin,title,stars,reviews,price,isBestSeller,boughtInLastMonth,category
0,1,B09B96TG33,"Echo Dot (5th generation, 2022 release) | Big ...",4.7,15308,21.99,False,0,Hi-Fi Speakers
1,2,B01HTH3C8S,"Anker Soundcore mini, Super-Portable Bluetooth...",4.7,98099,23.99,True,0,Hi-Fi Speakers
2,3,B09B8YWXDF,"Echo Dot (5th generation, 2022 release) | Big ...",4.7,15308,21.99,False,0,Hi-Fi Speakers
3,4,B09B8T5VGV,"Echo Dot with clock (5th generation, 2022 rele...",4.7,7205,31.99,False,0,Hi-Fi Speakers
4,5,B09WX6QD65,Introducing Echo Pop | Full sound compact Wi-F...,4.6,1881,17.99,False,0,Hi-Fi Speakers


In [26]:
df.shape

(2443651, 9)

## Part 1: Analyzing Best-Seller Trends Across Product Categories

__Objective:__ Understand the relationship between product categories and their best-seller status.

1. Crosstab Analysis:
   * Create a crosstab between the product category and the _isBestSeller_ status.
   * Are there categories where being a best-seller is more prevalent?
   * Hint: one option is to calculate the proportion of best-sellers for each category and then sort the categories based on this proportion in descending order.

In [34]:
# 1. Creating a crosstab between 'category' and 'isBestSeller'
# This crosstab shows the counts of best-sellers and non-best-sellers across different product categories.
crosstab_category_bestseller = pd.crosstab(df['category'], df['isBestSeller'])

# Display the crosstab
display(crosstab_category_bestseller)

isBestSeller,False,True
category,Unnamed: 1_level_1,Unnamed: 2_level_1
3D Printers,247,1
3D Printing & Scanning,4065,2
Abrasive & Finishing Products,245,5
Action Cameras,1696,1
Adapters,251,3
...,...,...
Wind Instruments,243,7
Window Treatments,234,5
Women,17559,213
Women's Sports & Outdoor Shoes,1939,20


In [36]:
# 2. Calculate the Proportion of Best-Sellers
# Adding a column to calculate the proportion of best-sellers
crosstab_category_bestseller['Total'] = crosstab_category_bestseller.sum(axis=1)
crosstab_category_bestseller['BestSeller_Proportion'] = crosstab_category_bestseller[True] / crosstab_category_bestseller['Total']

# Display the updated crosstab with the proportion of best-sellers
crosstab_category_bestseller

isBestSeller,False,True,Total,BestSeller_Proportion
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3D Printers,247,1,248,0.004032
3D Printing & Scanning,4065,2,4067,0.000492
Abrasive & Finishing Products,245,5,250,0.020000
Action Cameras,1696,1,1697,0.000589
Adapters,251,3,254,0.011811
...,...,...,...,...
Wind Instruments,243,7,250,0.028000
Window Treatments,234,5,239,0.020921
Women,17559,213,17772,0.011985
Women's Sports & Outdoor Shoes,1939,20,1959,0.010209


In [38]:
# 3. Sort categories by the proportion of best-sellers in descending order
crosstab_sorted = crosstab_category_bestseller.sort_values(by='BestSeller_Proportion', ascending=False)

# Display the sorted table
crosstab_sorted[['BestSeller_Proportion']]

isBestSeller,BestSeller_Proportion
category,Unnamed: 1_level_1
Grocery,0.058135
Smart Home Security & Lighting,0.057692
Health & Personal Care,0.057686
Mobile Phone Accessories,0.042471
Power & Hand Tools,0.035339
...,...
Projectors,0.000000
Printer Accessories,0.000000
Power Supplies,0.000000
Basketball Footwear,0.000000


__Categories where being a Best-Seller is more prevalent:__

- __Grocery:__ This category has the highest proportion of best-sellers, with 5.8% of products being best-sellers.

- __Smart Home Security & Lighting:__ This category follows closely behind, with 5.77% of products being best-sellers.

- __Health & Personal Care:__ Another popular category, with 5.78% of products being best-sellers. Health-related products are essential and are often repeat purchases, which may explain why this category has a high proportion of best-sellers.

- __Mobile Phone Accessories and Power & Hand Tools:__ These categories also show a moderate prevalence of best-sellers, with around 4.24% and 3.53%, respectively.

## Part 2: Exploring Product Prices and Ratings Across Categories and Brands

## Part 3: Investigating the Interplay Between Product Prices and Ratings