# Adidas Products Retail Analysis

This project is designed to understand the different categories of Adidas retail products, we will be analyzing the data set of adidas retail products to help answer the following questions.

- Q1: Which colors are most popular among different age groups or genders?
- Q2: what are the most popular category in the united states?
- Q3: What product Per category is the most popular?
- Q4: Determine the correlation between product reviews and ratings?

## A) Data Exploration & Cleaning

In [53]:
#Importing the appropriate libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [54]:
#Loading the data set in pandas data frame
df = pd.read_csv('adidas.csv')
df.head()

Unnamed: 0,url,name,sku,selling_price,original_price,currency,availability,color,category,source,source_website,breadcrumbs,description,brand,images,country,language,average_rating,reviews_count,crawled_at
0,https://www.adidas.com/us/beach-shorts/FJ5089....,Beach Shorts,FJ5089,40,,USD,InStock,Black,Clothing,adidas United States,https://www.adidas.com,Women/Clothing,Splashing in the surf. Making memories with yo...,adidas,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.5,35,2021-10-23 17:50:17.331255
1,https://www.adidas.com/us/five-ten-kestrel-lac...,Five Ten Kestrel Lace Mountain Bike Shoes,BC0770,150,,USD,InStock,Grey,Shoes,adidas United States,https://www.adidas.com,Women/Shoes,Lace up and get after it. The Five Ten Kestrel...,adidas,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.8,4,2021-10-23 17:50:17.423830
2,https://www.adidas.com/us/mexico-away-jersey/G...,Mexico Away Jersey,GC7946,70,,USD,InStock,White,Clothing,adidas United States,https://www.adidas.com,Kids/Clothing,"Clean and crisp, this adidas Mexico Away Jerse...",adidas,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.9,42,2021-10-23 17:50:17.530834
3,https://www.adidas.com/us/five-ten-hiangle-pro...,Five Ten Hiangle Pro Competition Climbing Shoes,FV4744,160,,USD,InStock,Black,Shoes,adidas United States,https://www.adidas.com,Five Ten/Shoes,The Hiangle Pro takes on the classic shape of ...,adidas,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,3.7,7,2021-10-23 17:50:17.615054
4,https://www.adidas.com/us/mesh-broken-stripe-p...,Mesh Broken-Stripe Polo Shirt,GM0239,65,,USD,InStock,Blue,Clothing,adidas United States,https://www.adidas.com,Men/Clothing,Step up to the tee relaxed. This adidas golf p...,adidas,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.7,11,2021-10-23 17:50:17.702680


In [55]:
#Exploring the currency options.
print("Number of currencies in the dataset:",df.currency.nunique(),df.currency[0])

#Exploring the countries in the data set
print("Number of counties in the dataset:",df.country.nunique(),df.country[1])

Number of currencies in the dataset: 1 USD
Number of counties in the dataset: 1 USA


By exploring the data we identified that the items in the dataset are all located for adidas USA and the price for these items are in US dollars.

I can see also that there some columns in the datasets that are not needed like:

URL
Source Website
Brand => Adidas for sure
In the next section, I will drop the unwanted columns and replace the NaN values for the original price with the selling price of this item.

In [56]:
df.drop(['url','source_website','brand'],axis=1,inplace=True)

In [57]:
df.head()

Unnamed: 0,name,sku,selling_price,original_price,currency,availability,color,category,source,breadcrumbs,description,images,country,language,average_rating,reviews_count,crawled_at
0,Beach Shorts,FJ5089,40,,USD,InStock,Black,Clothing,adidas United States,Women/Clothing,Splashing in the surf. Making memories with yo...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.5,35,2021-10-23 17:50:17.331255
1,Five Ten Kestrel Lace Mountain Bike Shoes,BC0770,150,,USD,InStock,Grey,Shoes,adidas United States,Women/Shoes,Lace up and get after it. The Five Ten Kestrel...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.8,4,2021-10-23 17:50:17.423830
2,Mexico Away Jersey,GC7946,70,,USD,InStock,White,Clothing,adidas United States,Kids/Clothing,"Clean and crisp, this adidas Mexico Away Jerse...","https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.9,42,2021-10-23 17:50:17.530834
3,Five Ten Hiangle Pro Competition Climbing Shoes,FV4744,160,,USD,InStock,Black,Shoes,adidas United States,Five Ten/Shoes,The Hiangle Pro takes on the classic shape of ...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,3.7,7,2021-10-23 17:50:17.615054
4,Mesh Broken-Stripe Polo Shirt,GM0239,65,,USD,InStock,Blue,Clothing,adidas United States,Men/Clothing,Step up to the tee relaxed. This adidas golf p...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.7,11,2021-10-23 17:50:17.702680


In [58]:
#Correcting the original price data type
df['original_price'] = df['original_price'].replace({'\$':''},regex=True).astype(float)

#Replacing NaN values with selling price values
df['original_price'] = df['original_price'].fillna(df.selling_price)
df['original_price'] = df['original_price'].astype(int)

Since we already have the category as a column in our dataset, There is not need to have the category mentioned in the breadcrumbs column. We will split the breadcrumbs column so we can have a clear view on the gender and/or age group for this data set.

In [59]:
# Splitting the breadcrumbs column and removing the category
df['breadcrumbs'] = df['breadcrumbs'].str.split('/').str[0]
df.head()

Unnamed: 0,name,sku,selling_price,original_price,currency,availability,color,category,source,breadcrumbs,description,images,country,language,average_rating,reviews_count,crawled_at
0,Beach Shorts,FJ5089,40,40,USD,InStock,Black,Clothing,adidas United States,Women,Splashing in the surf. Making memories with yo...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.5,35,2021-10-23 17:50:17.331255
1,Five Ten Kestrel Lace Mountain Bike Shoes,BC0770,150,150,USD,InStock,Grey,Shoes,adidas United States,Women,Lace up and get after it. The Five Ten Kestrel...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.8,4,2021-10-23 17:50:17.423830
2,Mexico Away Jersey,GC7946,70,70,USD,InStock,White,Clothing,adidas United States,Kids,"Clean and crisp, this adidas Mexico Away Jerse...","https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.9,42,2021-10-23 17:50:17.530834
3,Five Ten Hiangle Pro Competition Climbing Shoes,FV4744,160,160,USD,InStock,Black,Shoes,adidas United States,Five Ten,The Hiangle Pro takes on the classic shape of ...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,3.7,7,2021-10-23 17:50:17.615054
4,Mesh Broken-Stripe Polo Shirt,GM0239,65,65,USD,InStock,Blue,Clothing,adidas United States,Men,Step up to the tee relaxed. This adidas golf p...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.7,11,2021-10-23 17:50:17.702680


### Data Wrangling

We have a huge variations in the review count which might impact the integrity of the product rating score, because it is not fair to consider a product with 4.9 rating score out of 1 review similar to a prodcut with 4.9 rating score out of 100+ reviews.

I have decided to calculate the Bayesian average for the product rating score as the product score, in the following secion I will perform the math for bayesian average:

In [100]:
"""
I am calculating the bayesian average for the product reviews
* formula ==> S = wR + (1-w)c
- R --> Avg rating for the product
- c --> average of user ratings for all products
- w --> Weight assigned to R calculated as v/(v+m)
- v --> review count
- m --> average(review_count)
"""
c = round(df['average_rating'].mean(),2)
m = round(df['reviews_count'].mean(),2)
w = df['reviews_count']/(df['reviews_count']+m)

df['prouct_score'] = w*df['average_rating'] + (1-w)*c
df.head()

Unnamed: 0,name,sku,selling_price,original_price,currency,availability,color,category,source,breadcrumbs,description,images,country,language,average_rating,reviews_count,crawled_at,prouct_score
0,Beach Shorts,FJ5089,40,40,USD,InStock,Black,Clothing,adidas United States,Women,Splashing in the surf. Making memories with yo...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.5,35,2021-10-23 17:50:17.331255,4.601652
1,Five Ten Kestrel Lace Mountain Bike Shoes,BC0770,150,150,USD,InStock,Grey,Shoes,adidas United States,Women,Lace up and get after it. The Five Ten Kestrel...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.8,4,2021-10-23 17:50:17.423830,4.611767
2,Mexico Away Jersey,GC7946,70,70,USD,InStock,White,Clothing,adidas United States,Kids,"Clean and crisp, this adidas Mexico Away Jerse...","https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.9,42,2021-10-23 17:50:17.530834,4.636016
3,Five Ten Hiangle Pro Competition Climbing Shoes,FV4744,160,160,USD,InStock,Black,Shoes,adidas United States,Five Ten,The Hiangle Pro takes on the classic shape of ...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,3.7,7,2021-10-23 17:50:17.615054,4.595295
4,Mesh Broken-Stripe Polo Shirt,GM0239,65,65,USD,InStock,Blue,Clothing,adidas United States,Men,Step up to the tee relaxed. This adidas golf p...,"https://assets.adidas.com/images/w_600,f_auto,...",USA,en,4.7,11,2021-10-23 17:50:17.702680,4.612265


### Extracting and Visualizing Insights

I have decided to use Tableau to extract insights and build interactive visuals/dashboard for better manipulation and visualization.

I will export the cleaned dataset to CSV file, import the csv to Tableau, create my visuals and dashboard then re-embed the visual here in jupyter notebook.

In [61]:
#Exporting the data to csv file
df.to_csv('Adidas Retail Cleaned.csv')

Let's start with exploring the average selling price in each of the 3 categories and the section of Adidas US retail.
From the following visual, we can see that:
- Originals Shoes have the highest average selling price
- Generally Adidas shoes has the highest average of selling price.
- Closely enough, sports wear in the clothing category have a relatively high average selling price

In [109]:
%%html
<div class='tableauPlaceholder' id='viz1668082187325' style='position: relative'><noscript><a href='#'><img alt='Average Selling Price Per Category and Section ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Av&#47;AverageSellingPricePerCategorySection&#47;Sheet7&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AverageSellingPricePerCategorySection&#47;Sheet7' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Av&#47;AverageSellingPricePerCategorySection&#47;Sheet7&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1668082187325');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

Now, I want to explore the color popularity for the different sections, gender, and age group (Adults Vs Kids)
1. For All adidas sections, it looks like black and white colors are the most popular across different sections, age groups and genders.

2. Based on the gender insights, it is pretty clear that Males prefer white, black and blue. While, Females prefer white, black and pink.

3. Based on the age group, it is pretty clear that both adults and kids prefer black and white but pink could be considered for kids as well.

In [101]:
%%html
<div class='tableauPlaceholder' id='viz1668083521185' style='position: relative'><noscript><a href='#'><img alt='Color Popularity ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Co&#47;ColorPopularityforAdidasProducts&#47;Sheet1&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='ColorPopularityforAdidasProducts&#47;Sheet1' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Co&#47;ColorPopularityforAdidasProducts&#47;Sheet1&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1668083521185');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

# Adidas US Dashboard Insights

I have build an interactive dashboard to explore the following:
1. Correlation between Selling Price and Average Rating.
2. Most Popular Category based on the product score.
3. Most popular product per category in Adidas US retail.

In [112]:
%%html
<div class='tableauPlaceholder' id='viz1668082528719' style='position: relative'><noscript><a href='#'><img alt='Dashboard 1 ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ad&#47;AdidasUSRetailDashboard&#47;Dashboard1&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AdidasUSRetailDashboard&#47;Dashboard1' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ad&#47;AdidasUSRetailDashboard&#47;Dashboard1&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1668082528719');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='1127px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

## 1. Correlation between Selling Price and Average Rating

Although, there is no clear correlation between selling price and average rating but we can see that there is slightly positive correlation so whenever the selling price gets higher the product average rating gets higher as well. This makes sense because the quality of the product is relatively high when the selling price is high.

## 2. Most Popular Category

It is pretty clear that **Adidas Shoes** is the most loved category at Adidas US store followed by clothing category.

### 3. Most popular product
The most popular product should be **ZK 1k and 2k Boost shoes** among all categories but if we take a closer look on the different categories, we can see:
- For the accessories, the most popular products are: **Santiago Lunch Bag** and **Team speed soccer OTC Socks**
- For the clothing, the most popular products are: **Adidas sportswear Future icons logo graphic Tee** and **Runner Tee**
- For the shoes, of course: **ZK 1k and 2k Boost Shoes**

# Conclusion

To summarize the insights we have extracted from the Adidas US Retail dataset:
- Adidas need to maintain the quality of their products as much as they can to get a better score for their products to increase their sales.
- Adidas should continue investing in designing new black and white products.
- Adidas should produce more Shoes as they are the most loved category for Adidas Retail in US.