# Analysis of US Stock X market using the example of “Nike x Off-White” and “Adidas x Yeezy” collaborations competition

# Part 2

 ### Content
 <ul>
1. Introduction<br>
2. Data Description and objectives<br>
3. Data Preparation<br>
4. Data Alalysis<br>
</ul>

## 1. Introduction:

StockX is an online marketplace and clothing reseller, primarily of sneakers. The Detroit-based company was founded by Dan Gilbert, Josh Luber, Greg Schwartz, and Chris Kaufman in 2015–2016. StockX has more than 800 employees in Downtown Detroit. StockX currently has international offices in London, UK, in Eindhoven, the Netherlands, and has authentication facilities in Detroit's Corktown neighborhood, Moonachie, NJ, and Tempe, AZ. Scott Cutler and Schwartz serve as chief executive officer and chief operating officer, respectively, and Deena Bahri became the company's first chief marketing officer in 2019.

Nike and Off-White: "The Ten" was the sneaker collaboration between Nike and Off-White, designed by Virgil Abloh in 2017. It initially involved the deconstruction of ten iconic Nike silhouettes by Abloh. The ten shoes were individually broken down by Abloh, and then rebuilt with a different design and rearranged components. The collaboration sold out and the shoes became highly sought after.Further additions to this collaboration were then added throughout 2018 and 2019 which included endorsements from athletes and celebrities but are not included in the original “ten”.

Adidas Yeezy is a fashion collaboration between the German sportswear brand Adidas and American designer/rapper Kanye West. The collaboration has become notable for it's high-end limited edition colorways and general releases offered by the Yeezy Boost sneaker line up. The collaboration has also produced shirts, jackets, track pants, socks, women's shoes and their newly released slides.

Sources:

https://en.wikipedia.org/wiki/StockX
https://en.wikipedia.org/wiki/Nike_and_Off-White:_%27The_Ten%27
https://en.wikipedia.org/wiki/Adidas_Yeezy

## 2. Data description and objectives:


We will analyse data from Stock X database (over 10000 Observations from 2015 to 2019).

Our data set has the following entities:

<ul>
Order Date - Date when order has been placed<br>
Brand - Brand Name<br>
Sneaker Name - Sneaker model name<br>
Sale Price -  Sell price by Stock X seller<br>
Retail Price - Retail price by brand<br>
Release Date - Date when particular sneaker has been placed<br>
Shoe Size - Sneaker size<br>
Buyer Region - Region of Stock X buyer<br>
</ul>

Our objectives:

<ol>
Analyze relation between buyer region and brand<br>
Analyze relation between shoe size and price<br>
Analyze Stock X users interest rate in (both Nike and Adidas)<br>
Analyze the most popular sneaker models and changes in their resales prices over years<br>
Analyze which brand produces more sneaker models and how its affects their prices (both retail and resales)<br> 
</ol>


## 3. Data Preparation

In [1]:
import pandas as pd
import numpy as np
import scipy as sp
import pandas as pd
import plotly as pl
import plotly.express as px

In [2]:
#Data Stock X
dataframe = pd.read_csv("StockX-Data 2019.csv")
dataframe.head(10)

Unnamed: 0,Order Date,Brand,Sneaker Name,Sale Price,Retail Price,Release Date,Shoe Size,Buyer Region
0,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-Low-V2-Beluga,"$1,097",$220,9/24/16,11.0,California
1,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Copper,$685,$220,11/23/16,11.0,California
2,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Green,$690,$220,11/23/16,11.0,California
3,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red,"$1,075",$220,11/23/16,11.5,Kentucky
4,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red-2017,$828,$220,2/11/17,11.0,Rhode Island
5,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red-2017,$798,$220,2/11/17,8.5,Michigan
6,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-White,$784,$220,12/17/16,11.0,California
7,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Cream-White,$460,$220,4/29/17,10.0,New York
8,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Cream-White,$465,$220,4/29/17,11.0,Kansas
9,9/1/17,Yeezy,Adidas-Yeezy-Boost-350-V2-Cream-White,$465,$220,4/29/17,11.0,Florida


In [3]:
#total lots in Stock X by Brand
dataframe = pd.read_csv("StockX-Data 2019.csv")
dataframe.set_index(["Brand", "Sneaker Name"]).count(level="Brand")

Unnamed: 0_level_0,Order Date,Sale Price,Retail Price,Release Date,Shoe Size,Buyer Region
Brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Yeezy,72162,72162,72162,72162,72162,72162
Off-White,27794,27794,27794,27794,27794,27794


In [4]:
#let's check our dataframe for dublicates.
dataframe = pd.read_csv("StockX-Data 2019.csv")
dataframe.duplicated()

#all data is unique

0        False
1        False
2        False
3        False
4        False
         ...  
99951    False
99952    False
99953    False
99954    False
99955    False
Length: 99956, dtype: bool

In [5]:
#let's check our dataframe for null_values
dataframe = pd.read_csv("StockX-Data 2019.csv")
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99956 entries, 0 to 99955
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Order Date    99956 non-null  object 
 1   Brand         99956 non-null  object 
 2   Sneaker Name  99956 non-null  object 
 3   Sale Price    99956 non-null  object 
 4   Retail Price  99956 non-null  object 
 5   Release Date  99956 non-null  object 
 6   Shoe Size     99956 non-null  float64
 7   Buyer Region  99956 non-null  object 
dtypes: float64(1), object(7)
memory usage: 6.1+ MB


In [6]:
#let's find mean_shoe_size and group it by Brand and Sale Price
dataframe = pd.read_csv("StockX-Data 2019.csv")
dataframe.groupby(["Brand","Sale Price"]).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Shoe Size
Brand,Sale Price,Unnamed: 2_level_1
Yeezy,"$1,000",9.826531
Yeezy,"$1,001",9.250000
Yeezy,"$1,002",10.000000
Yeezy,"$1,005",10.875000
Yeezy,"$1,009",10.500000
...,...,...
Off-White,$995,9.552632
Off-White,$996,8.750000
Off-White,$997,9.142857
Off-White,$998,10.000000


## 4. Data analysis (visualisation)

### 4.1 Q1: Analyze relation between shoe size and price

In [7]:
#Let's create a new data frame for this objective

sizedf = pd.DataFrame()
sizedf["Sale Price"] = dataframe["Sale Price"]
sizedf["Shoe Size"] = dataframe["Shoe Size"]
sizedf[:]



Unnamed: 0,Sale Price,Shoe Size
0,"$1,097",11.0
1,$685,11.0
2,$690,11.0
3,"$1,075",11.5
4,$828,11.0
...,...,...
99951,$565,8.0
99952,$598,8.5
99953,$605,5.5
99954,$650,11.0


In [23]:
fig = px.scatter(sizedf, x = "Shoe Size", y = "Sale Price", title = "Shoe Size and Price Correlation")
fig.show()

This figure shows us that there are some correlation between shoe size and sale price. After shoe size 7,5 we could see a price up that are reaching their maximum at shoe size 8. Prices starting to decreasing up to the size 11,5. We can see that size range between 8 and 11 is worth more than other sizes.

### 4.1 Q2: Analyze relation between buyer region and brand

In [24]:
#Let's create a new data frame for this objective



branreg = pd.DataFrame()
branreg = dataframe[['Brand', 'Buyer Region', 'Sneaker Name']].groupby(['Brand', 'Buyer Region']).count()
branreg.reset_index(inplace = True)

#We need to add some additional values in order to create a scatter_geo figure

us_state_dict = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}
us_states_abrv = list()
for i in branreg['Buyer Region'] :
  us_states_abrv.append(us_state_dict[str(i)])
branreg['State Abbrev'] = us_states_abrv
#We added all USA states info to our data frame
branreg = branreg.rename(columns = {"Sneaker Name":"Count"})
#We renamed a column name for proper look in map
branreg[:]

Unnamed: 0,Brand,Buyer Region,Count,State Abbrev
0,Yeezy,Alabama,375,AL
1,Yeezy,Alaska,41,AK
2,Yeezy,Arizona,1005,AZ
3,Yeezy,Arkansas,141,AR
4,Yeezy,California,13113,CA
...,...,...,...,...
97,Off-White,Virginia,605,VA
98,Off-White,Washington,501,WA
99,Off-White,West Virginia,25,WV
100,Off-White,Wisconsin,210,WI


In [11]:
fig = px.scatter_geo(branreg, locations= "State Abbrev", color = "Brand", 
                     size = "Count", projection="albers usa",
                    locationmode = "USA-states", hover_data = ["Count"])
fig.show()

This figure shoes us that there are two region with the most orders. They are New York and California States. We can see that Yeezy (Adidas) is the most polular sneaker brand in USA (ordered in StockX).