<style>
:root {
  --r-main-font: "Segoe UI", system-ui, -apple-system, Roboto, Arial, sans-serif !important;
  --r-heading-font: "Segoe UI", system-ui, -apple-system, Roboto, Arial, sans-serif !important;
  --r-code-font: Consolas, "Fira Code", monospace !important;

  /* dimensione base testo */
  --r-main-font-size: 26px;
}

.reveal h1 { font-size: 44px !important; line-height: 1.15; }
.reveal h2 { font-size: 30px !important; line-height: 1.20; }

</style>

# Dal negozio alla rivendita: quanto valgono le sneakers?
## Yeezy vs Off-White — Analisi del dataset StockX (2017–2019)

**Chiara Corvino**  
IBML – Fondamenti di scienza dei dati e laboratorio

<small>Fonte dati: StockX Data Contest 2019</small>

<img src="logo.png" alt="Logo Ateneo" style="position:absolute;right:18px;bottom:18px;height:110px">

<div style="position:relative; min-height:75vh; padding-bottom:40px; box-sizing:border-box;">

  <div style="display:flex; align-items:center; justify-content:space-between; gap:24px;">
    <div style="flex:1; max-width:58%;">
      <h2 style="margin:0 0 12px 0;">Cos'è StockX?</h2>
      <ul style="margin:0; padding-left:1.2em;">
        <li>Marketplace sneaker/streetwear con prezzi “da borsa” (domanda/offerta live).</li>
        <li>Articoli autenticati prima della consegna.</li>
        <li>Storico prezzi/vendite: mostra il “valore reale” post-lancio.</li>
        <li>Prezzo/Offerta: quando si incontrano, l’ordine si chiude.</li>
      </ul>
    </div>
    <div style="flex:0 0 auto;">
      <img src="stockx1.jpg" alt="Schermata StockX"
           style="width:300px; max-width:30vw; max-height:55vh; height:auto; object-fit:contain; border-radius:8px;">
    </div>
  </div>

  <div style="position:absolute; left:0; bottom:0;">
    Dataset ufficiale: <em>StockX Data Contest 2019</em> (Kaggle:
    <a href="https://www.kaggle.com/datasets/hudsonstuck/stockx-data-contest" target="_blank" rel="noopener noreferrer">
      https://www.kaggle.com/datasets/hudsonstuck/stockx-data-contest
    </a>)
  </div>
</div>

<!DOCTYPE html>
<html lang="it">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
</head>
<body style="margin:0;">


  <div style="display:flex; justify-content:flex-start; align-items:flex-start; gap:12px;">
    <img src="stockx2.jpg" alt="Schermata 2"
         style="width:35%; max-height:500px; object-fit:contain; border-radius:6px;">
    <img src="stockx3.jpg" alt="Storico vendite"
         style="width:35%; max-height:500px; object-fit:contain; border-radius:6px; margin-left:auto;">
  </div>

  
  <div style="
    position:fixed;
    left:50%;
    top:50%;
    transform:translate(-50%,-50%);
    font-size:28px;
    font-weight:400;   
    line-height:1.25;
    text-align:center;
    white-space:normal; 
    pointer-events:none;
    z-index:9999;
    color:#000;
  ">
    Andiamo più a fondo...
  </div>

</body>
</html>

# Domande guida

1) Quali modelli concentrano la maggior parte delle transazioni?
2) Come si evolve mese per mese la domanda (numero di ordini)?
3) La taglia più venduta incide sul prezzo medio (per modello)?

4) In quali Stati USA prevale Yeezy o Off-White?
5) Qual è la relazione tra prezzo di listino e prezzo di vendita nel mercato secondario?

In [None]:
import matplotlib as mpl, matplotlib.pyplot as plt
import plotly.io as pio

FONT = "Arial"                       

plt.style.use('seaborn-v0_8-whitegrid')
mpl.rcParams['font.family'] = FONT   # Matplotlib

pio.templates.default = "seaborn"    # Plotly
pio.templates["seaborn"].layout.font.family = FONT
pio.templates["seaborn"].layout.title.font.family = FONT
pio.templates["seaborn"].layout.legend.font.family = FONT

<h2 style="font-size:38px; line-height:1.2; margin:0 0 8px;">
Quali modelli concentrano la maggior parte delle transazioni?
</h2>

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('StockX-Data-Contest-2019-3.csv')
df['Brand'] = df['Brand'].astype(str).str.strip()
mask = (df['Brand'] == 'Yeezy') | df['Sneaker Name'].astype(str).str.contains('Off-White', case=False, na=False)
top10 = df.loc[mask, 'Sneaker Name'].value_counts().head(10).sort_values()

labels = ['-'.join(n.split('-')[-3:]) for n in top10.index]
brands = ['Off-White' if 'Off-White' in n else 'Yeezy' for n in top10.index]
plot_df = pd.DataFrame({'Label': labels, 'Count': top10.values, 'Brand': brands})

sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(10, 5))

sns.barplot(
    data=plot_df, y='Label', x='Count',
    order=labels[::-1], orient='h',                  
    hue='Brand', palette={'Yeezy':'steelblue','Off-White':'red'},
    dodge=False, legend=False, errorbar=None, ax=ax, zorder=3
)

# Griglia, assi e titolo
ax.grid(True, linestyle='-', alpha=0.5, zorder=0)
ax.set_xlabel('Numero di transazioni')
ax.set_ylabel('Modello scarpa')
ax.set_title('Top 10 modelli più venduti')
ax.set_xticks(range(0, int(top10.max()) + 1000, 1000))
ax.set_xlim(0, int(top10.max()) + 1500)
plt.setp(ax.get_yticklabels(), fontsize=9)

# Valori sulle barre
for bar in ax.patches:
    w = int(bar.get_width())
    ax.text(w + 50, bar.get_y() + bar.get_height()/2, f'{w:,}', va='center', fontsize=9, zorder=4)

# Legenda
ax.plot([], [], color='steelblue', label='Yeezy')
ax.plot([], [], color='red', label='Off-White')
ax.legend(title="Marca", loc='lower right', frameon=True, framealpha=1)

plt.show()

<h2 style="font-size:38px; line-height:1.2; margin:0 0 8px;">
Evoluzione mensile della domanda (numero di ordini)
</h2>

In [None]:
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("StockX-Data-Contest-2019-3.csv")
df["Order Date"] = pd.to_datetime(df["Order Date"], format="%m/%d/%y", errors="coerce")
df["Brand"] = df["Brand"].astype(str).str.strip()
df = df.dropna(subset=["Order Date","Brand"])
df = df[df["Brand"].isin(["Yeezy","Off-White"])]

# Conteggi mensili
g = (df.assign(Mese=df["Order Date"].dt.to_period("M").dt.to_timestamp())
       .groupby(["Mese","Brand"]).size().reset_index(name="Sales"))

# Grafico
sns.set_theme(style="whitegrid")
fig, ax = plt.subplots(figsize=(13.5, 4.8), dpi=130)
sns.lineplot(data=g, x="Mese", y="Sales", hue="Brand",
             hue_order=["Yeezy","Off-White"], palette=["#3498db","#e74c3c"],
             marker="o", linewidth=2, ax=ax)

ax.set_xlabel("Mese"); ax.set_ylabel("Numero di ordini")

# Lasso temporale ogni 2 mesi
ticks = g["Mese"].drop_duplicates().sort_values()[::2]
ax.set_xticks(ticks)
ax.set_xticklabels([t.strftime("%Y-%m") for t in ticks])

ax.legend(title="Brand", loc="upper left", frameon=True, framealpha=1)
plt.tight_layout()
plt.show()

<h2 style="font-size:34px; line-height:1.2; margin:0 0 8px;">
La taglia più venduta incide sul prezzo medio?
</h2>

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("StockX-Data-Contest-2019-3.csv")
df["Shoe Size"]  = pd.to_numeric(df["Shoe Size"], errors="coerce")
df["Sale Price"] = pd.to_numeric(df["Sale Price"].astype(str).str.replace(r"[\$,]", "", regex=True), errors="coerce")
df["Brand"]      = df["Brand"].astype(str).str.strip()
df["Sneaker Name"] = df["Sneaker Name"].astype(str)
df = df.dropna(subset=["Shoe Size","Sale Price","Sneaker Name","Brand"])
df = df[df["Brand"].isin(["Yeezy","Off-White"])]
df = df[df["Shoe Size"].between(3,16)]

# Moda taglia + prezzo medio
agg = (df.groupby(["Sneaker Name","Brand"])
         .agg(typ_size=("Shoe Size", lambda s: s.mode().iloc[0]),
              mean_price=("Sale Price","mean"))
         .reset_index())

# Jitter orizzontale
np.random.seed(0)
agg["x_jitter"] = agg["typ_size"] + np.random.uniform(-0.12, 0.12, size=len(agg))

# Scatter
sns.set_theme(style="whitegrid")
ax = sns.scatterplot(data=agg, x="x_jitter", y="mean_price", hue="Brand",
                     palette={"Yeezy":"#3498db","Off-White":"#e74c3c"}, s=60)

ax.set_title("Taglia più venduta vs Prezzo medio")
ax.set_xlabel("Taglia")
ax.set_ylabel("Prezzo medio ($)")

# Tick taglie reali (ogni 0.5)
lo = np.floor(agg["typ_size"].min()*2)/2
hi = np.ceil(agg["typ_size"].max()*2)/2
ticks = np.arange(lo, hi+0.001, 0.5)
ax.set_xticks(ticks)
ax.set_xticklabels([f"{t:.1f}" for t in ticks])

ax.legend(title="Brand", loc="upper right", frameon=True, framealpha=1)
plt.tight_layout()
plt.show()

<h2 style="font-size:34px; line-height:1.10; margin:0 0 4px;">
In quali Stati USA prevale Yeezy o Off-White?
</h2>

In [None]:
import pandas as pd
import plotly.graph_objects as go
from IPython.display import HTML

df = pd.read_csv("StockX-Data-Contest-2019-3.csv")
df['Brand'] = df['Brand'].astype(str).str.strip()
df['Sneaker Name'] = df['Sneaker Name'].astype(str).str.strip()
df['Buyer Region'] = df['Buyer Region'].astype(str).str.strip()

state_to_abbr = {'ALABAMA':'AL','ALASKA':'AK','ARIZONA':'AZ','ARKANSAS':'AR','CALIFORNIA':'CA','COLORADO':'CO',
'CONNECTICUT':'CT','DELAWARE':'DE','FLORIDA':'FL','GEORGIA':'GA','HAWAII':'HI','IDAHO':'ID','ILLINOIS':'IL',
'INDIANA':'IN','IOWA':'IA','KANSAS':'KS','KENTUCKY':'KY','LOUISIANA':'LA','MAINE':'ME','MARYLAND':'MD',
'MASSACHUSETTS':'MA','MICHIGAN':'MI','MINNESOTA':'MN','MISSISSIPPI':'MS','MISSOURI':'MO','MONTANA':'MT',
'NEBRASKA':'NE','NEVADA':'NV','NEW HAMPSHIRE':'NH','NEW JERSEY':'NJ','NEW MEXICO':'NM','NEW YORK':'NY',
'NORTH CAROLINA':'NC','NORTH DAKOTA':'ND','OHIO':'OH','OKLAHOMA':'OK','OREGON':'OR','PENNSYLVANIA':'PA',
'RHODE ISLAND':'RI','SOUTH CAROLINA':'SC','SOUTH DAKOTA':'SD','TENNESSEE':'TN','TEXAS':'TX','UTAH':'UT',
'VERMONT':'VT','VIRGINIA':'VA','WASHINGTON':'WA','WEST VIRGINIA':'WV','WISCONSIN':'WI','WYOMING':'WY',
'DISTRICT OF COLUMBIA':'DC','WASHINGTON DC':'DC','WASHINGTON D.C.':'DC'}
abbr = df['Buyer Region'].str.upper().replace(state_to_abbr)
df['State'] = abbr.where(abbr.isin(set(state_to_abbr.values())))
df = df.dropna(subset=['State'])

# Solo Off-White
b = df[df['Brand'].str.lower()=='off-white']

target_model = b['Sneaker Name'].value_counts().idxmax()

# Vincitore per Stato
winners = (b.groupby(['State','Sneaker Name']).size()
           .unstack(fill_value=0).idxmax(axis=1).rename('Winner').reset_index())

# Stati dove vince il modello target
target_states = winners.loc[winners['Winner']==target_model, 'State'].tolist()
other_states = [s for s in winners['State'].unique().tolist() if s not in target_states]

# Mappa
fig = go.Figure()
if other_states:
    fig.add_choropleth(locations=other_states, locationmode='USA-states', z=[0]*len(other_states),
                       colorscale=[[0,'#e6e6e6'],[1,'#e6e6e6']], showscale=False,
                       marker_line_color='white', marker_line_width=0.5,
                       hoverinfo='skip', name='', showlegend=False)
if target_states:
    fig.add_choropleth(locations=target_states, locationmode='USA-states', z=[1]*len(target_states),
                       colorscale=[[0,'#e74c3c'],[1,'#e74c3c']],  # rosso
                       showscale=False,
                       marker_line_color='white', marker_line_width=0.5,
                       hoverinfo='skip', name=target_model, showlegend=True)

fig.update_geos(scope='usa', projection_type='albers usa')
fig.update_layout(
    title='Modello Off-White prevalente per Stato',
    height=560, width=900, legend_title_text='Modello vincente',
    legend=dict(itemclick=False, itemdoubleclick=False)
)
HTML(fig.to_html(include_plotlyjs='inline', full_html=False))

In [None]:
import pandas as pd
import plotly.graph_objects as go
from IPython.display import HTML

df = pd.read_csv("StockX-Data-Contest-2019-3.csv")
df['Brand'] = df['Brand'].astype(str).str.strip()
df['Sneaker Name'] = df['Sneaker Name'].astype(str).str.strip()
df['Buyer Region'] = df['Buyer Region'].astype(str).str.strip()

state_to_abbr = {'ALABAMA':'AL','ALASKA':'AK','ARIZONA':'AZ','ARKANSAS':'AR','CALIFORNIA':'CA',
'COLORADO':'CO','CONNECTICUT':'CT','DELAWARE':'DE','FLORIDA':'FL','GEORGIA':'GA','HAWAII':'HI','IDAHO':'ID',
'ILLINOIS':'IL','INDIANA':'IN','IOWA':'IA','KANSAS':'KS','KENTUCKY':'KY','LOUISIANA':'LA','MAINE':'ME',
'MARYLAND':'MD','MASSACHUSETTS':'MA','MICHIGAN':'MI','MINNESOTA':'MN','MISSISSIPPI':'MS','MISSOURI':'MO',
'MONTANA':'MT','NEBRASKA':'NE','NEVADA':'NV','NEW HAMPSHIRE':'NH','NEW JERSEY':'NJ','NEW MEXICO':'NM',
'NEW YORK':'NY','NORTH CAROLINA':'NC','NORTH DAKOTA':'ND','OHIO':'OH','OKLAHOMA':'OK','OREGON':'OR',
'PENNSYLVANIA':'PA','RHODE ISLAND':'RI','SOUTH CAROLINA':'SC','SOUTH DAKOTA':'SD','TENNESSEE':'TN',
'TEXAS':'TX','UTAH':'UT','VERMONT':'VT','VIRGINIA':'VA','WASHINGTON':'WA','WEST VIRGINIA':'WV',
'WISCONSIN':'WI','WYOMING':'WY','DISTRICT OF COLUMBIA':'DC','WASHINGTON DC':'DC','WASHINGTON D.C.':'DC'}
abbr = df['Buyer Region'].str.upper().replace(state_to_abbr)
df['State'] = abbr.where(abbr.isin(set(state_to_abbr.values())))
df = df.dropna(subset=['State'])

# Solo Yeezy
b = df[df['Brand'].str.lower()=='yeezy']

target_model = b['Sneaker Name'].value_counts().idxmax()

# Vincitore per Stato
winners = (b.groupby(['State','Sneaker Name']).size()
           .unstack(fill_value=0).idxmax(axis=1).rename('Winner').reset_index())

# Stati dove vince il modello
target_states = winners.loc[winners['Winner']==target_model, 'State'].tolist()
other_states = [s for s in winners['State'].unique().tolist() if s not in target_states]

# Mappa
fig = go.Figure()
if other_states:
    fig.add_choropleth(locations=other_states, locationmode='USA-states', z=[0]*len(other_states),
                       colorscale=[[0,'#e6e6e6'],[1,'#e6e6e6']], showscale=False,
                       marker_line_color='white', marker_line_width=0.5,
                       hoverinfo='skip', name='', showlegend=False)
if target_states:
    fig.add_choropleth(locations=target_states, locationmode='USA-states', z=[1]*len(target_states),
                       colorscale=[[0,'#3498db'],[1,'#3498db']], showscale=False,
                       marker_line_color='white', marker_line_width=0.5,
                       hoverinfo='skip', name=target_model, showlegend=True)

fig.update_geos(scope='usa', projection_type='albers usa')
fig.update_layout(title='Modello Yeezy prevalente per Stato',
                  height=560, width=900, legend_title_text='Modello vincente',
                  legend=dict(itemclick=False, itemdoubleclick=False))
HTML(fig.to_html(include_plotlyjs='inline', full_html=False))

<h2 style="font-size:34px; line-height:1.10; margin:0 0 4px;">
Come varia il prezzo di rivendita rispetto al listino nel mercato secondario?
</h2>

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="whitegrid")

df = pd.read_csv("StockX-Data-Contest-2019-3.csv")

for c in ['Sale Price','Retail Price']:
    df[c] = pd.to_numeric(df[c].astype(str).str.replace(r'[\$,]', '', regex=True), errors='coerce')

df['Brand'] = (df['Brand'].astype(str)
               .str.strip()
               .str.replace(r'(?i)off[\s-]?white','Off-White',regex=True)
               .str.replace(r'\s+',' ',regex=True))

df2 = df[df['Brand'].isin(['Yeezy','Off-White'])].dropna(subset=['Retail Price','Sale Price'])

# Cast del prezzo di listino a stringa (le conto come categorie)
df2['Retail_str'] = df2['Retail Price'].astype(int).astype(str)

# Array ordinato con i prezzi di listino (come stringhe)
order = ['130','150','160','170','190','200','220','250']

# Boxplot
plt.figure(figsize=(10,6), dpi=130)
ax = sns.boxplot(
    data=df2,
    x='Retail_str',      
    y='Sale Price',      
    hue='Brand',         
    order=order,         
    palette={'Yeezy':'tab:blue','Off-White':'tab:red'}
)

plt.yscale('log')
plt.ylim(100, 10000)

# Linee verticali grigie per separare le coppie di box
for i in range(7):
    ax.axvline((i+1) - 0.5, color="lightgrey", linestyle="--", linewidth=1)

plt.xlabel('Prezzo di Listino (USD)')
plt.ylabel('Prezzo di Vendita (USD)')
plt.legend(title='Brand', loc='upper right', frameon=True, framealpha=1)
plt.tight_layout()
plt.show()

## Conclusioni

1. **Modelli:** le transazioni si concentrano soprattutto su Yeezy (specie 350 V2), con poche presenze stabili Off-White.

2. **Evoluzione temporale:** picchi nei mesi dei drop, massimo a fine 2018; Off-White più forte in primavera 2018.

3. **Taglie:** domanda concentrata sulle taglie 9–11; nessun legame chiaro tra taglia modale più grande e prezzo medio -> la taglia non spiega il prezzo

4. **Geografia USA:** Off-White prevale in 43 Stati, mentre Yeezy in 17.

5. **Listino vs vendita:** i prezzi di rivendita stanno quasi sempre sopra il listino; Off-White mostra scarti dal listino più ampi di Yeezy in tutte le fasce.