# Global Sports Footwear Sales (2018-2026)

---------------------------------------------------------------------------------------------------------------------------

### Project Description:

    This project focuses on the comprehensive analysis of a global sports footwear sales dataset spanning nine years (2018â€“2026). It encompasses transactional data from major brands such as Nike, ASICS, Adidas, Puma, Reebok, and New Balance. The project explores sales performance across various footwear categories (Running, Basketball, Lifestyle, etc.), regional markets (USA, UK, India, UAE, etc.), and customer demographics to understand market trends and consumer purchasing patterns.

### Project Goal:

    The primary goal of this project is to leverage data analytics to evaluate global sales performance, identify key growth drivers, and provide actionable insights for optimizing inventory, pricing, and marketing strategies for a global footwear retailer or manufacturer.

### Objectives:

    ~ Trend Analysis: To track revenue and sales volume fluctuations from 2018 to 2026.
    ~ Brand Performance: To identify which brands are leaders in revenue generation and customer satisfaction.
    ~ Product Segmentation: To determine which footwear categories and colors are most popular among different genders and income levels.
    ~ Channel Evaluation: To compare the effectiveness of Online vs. Retail Store sales channels across different regions.
    ~ Pricing Strategy: To assess the impact of discounts on units sold and overall revenue.

### Business Questions:

    1. What is the total Revenue raised by each brand till now ?
    2. What is the Revenue comparision growth for each brand over the years ?
    3. Which is the highest selling category for each brand ?
    4. What is the contribution percentage of each sales channel ?
    5. What countries have most liking toward each brand ?
    6. Which Product category sells more and require better inventory management ?
    7. Which gender have the most traction towards which Brand ?
    8. What is the average customer rating and quality of the product ?
    9. What Shoes size are most likely to be managed well for the market rush ?

### Tools:
    Python(Numpy + Pandas + Matplotlib)

---------------------------------------------------------------------------------------------------------------------------

## Data Collection & Loading
    Objective: To load the file into the notebook using pandas
    
    Observation:
         ~ Succesfully loaded te entire dataset.

In [None]:
# Data Collection & Loading----------------------------------------------------
import pandas as pd 

df = pd.read_csv('global_sports_footwear_sales_2018_2026.csv')
print("File Successfully Read!")
print(df.columns)

---------------------------------------------------------------------------------------------------------------------------

## Data Understanding
    Objective: Familiarizing with the data before any analysis.
   
    Observation :
       ~ Data present of major golbal footwear brands from 2018 to 2026.
       ~ The data contains 30000 rows * 19 columns.
       ~ No null values present in the data.

In [None]:
# Data Understanding-----------------------------------
print(df.head(10))
print(f"Shape of the data is {df.shape}")
print(df.info())
print(df.describe())
print(df.dtypes)

---------------------------------------------------------------------------------------------------------------------------

## Data Cleaning
    Objective: Handling missing values, duplicates, and other outliers as per the requirement.

    Observation:
         ~ No null values present in the dataset.
         ~ No duplicate values present in the dataset.
         ~ Successfully changed the data type of the order_date and discount_percent columns.

In [None]:
# Data Cleaning----------------------------------------------------
print("Null values present in each column:\n",df.isnull().sum())
print('Duplicate values are:', df.duplicated().sum())
df['order_date'] = pd.to_datetime(df['order_date'])
df['discount_percent'] = df['discount_percent'].astype(float)


---------------------------------------------------------------------------------------------------------------------------

## Data Transformation & Aggregation
    Objective: Determining various insights that will be used in data visualization.

    Observation:
         ~ Total revenue for each brand.
         ~ Revenue based on different parameters.
         ~ Valid outputs generated with each code. 

In [None]:
#Revenue by each brand
res = df.groupby('brand')['revenue_usd'].sum()
print(f"Revenue each brand: \n{res}")
#Highest selling category
res = df.groupby('category')['revenue_usd'].sum()
print(f"Category wise Revenue: \n{res}")
#Revenue by each brand per year
df['order_year'] = df['order_date'].dt.year
res = df.groupby(['brand','order_year'])['revenue_usd'].sum()
print(f"Revenue each Brand per Year: \n{res}")
#Revenue by each brand on there sales channel
res = df.groupby(['brand','sales_channel'])['revenue_usd'].sum()
print(f"Most Selling sales channel in each brand: \n{res}")
#Revenue in different country by each brand
res = df.groupby(['brand','country'])['revenue_usd'].sum()
print(f"Revenue generation from each country: \n{res}")
#Revenue for each category in each brand
res = (
    df.groupby(['brand', 'category'], as_index=False)['revenue_usd']
      .sum()
      .sort_values(['brand', 'revenue_usd'], ascending=[True, False])
      .groupby('brand')
      .head(2)
)
print(f"Top 2 Highest selling category for each brand:\n{res}")
#Quantity sold by different gender each brand
res = df.groupby(['brand','gender'])['units_sold'].sum()
print(f"Total Unit Sold per Gender: \n {res}")
#Mean customer rating each brand
res = df.groupby('brand')['customer_rating'].mean().round(2)
print(f"Average Customer Rating per Brand: \n {res}")
#Highest selling shoe size each brand
res = df.groupby(['brand','size'], as_index= False)['units_sold'].sum().sort_values(by = ['brand','units_sold'], ascending= False)
print(f"Highest selling shoes size for each brand: \n{res}")

---------------------------------------------------------------------------------------------------------------------------

## Data Visualization
    Objective: Visualize the insight to have better understanding of the sales data for each brand.

    Observation:
        ~ Generated graph and chart for comparing insights of each brand.
        ~ visualized various graphs like Line, box, pie, bar, scatter, histogram.
        ~ clear understanding of the values for every data.

In [None]:
import matplotlib.pyplot as plt 

In [None]:
# Revenue per Brand
res = df.groupby('brand')['revenue_usd'].sum()
plt.figure(figsize=(14,5))
plt.bar(res.index,res.values, width=0.6, color = ['pink','skyblue','lightgreen','yellow','grey','blue'])
plt.xlabel('Brand')
plt.ylabel('Total Revenue (2018-2026)')
plt.grid(True, linestyle = ":")
plt.title('Total_Revenue (2018-2026) per Brand')
plt.show()

In [None]:
#Top 2 selling category in each brand
res = (
    df.groupby(['brand', 'category'], as_index=False)['revenue_usd']
      .sum()
      .sort_values(['brand', 'revenue_usd'], ascending=[True, False])
      .groupby('brand')
      .head(2)
)
pivot_df = res.pivot(index='brand', columns='category', values='revenue_usd')
pivot_df.plot(kind='bar', figsize=(14, 5), width = 0.5, align = 'center',color = ['pink','skyblue','lightgreen','yellow','grey','blue'])
plt.xlabel("Brand")
plt.ylabel("Categories")
plt.grid(True, linestyle = "--")
plt.title("Top 2 selling category in each brand")
plt.show()

In [None]:
#Revenue by each brand on there sales channel
brands = ['ASICS', 'Reebok', 'Puma', 'Nike', 'Adidas', 'New Balance']

fig, ax = plt.subplots(2, 3, figsize=(14, 6))
ax = ax.flatten()

for i, brand in enumerate(brands):
    brand_df = df[df['brand'] == brand]
    res = brand_df.groupby('sales_channel')['revenue_usd'].sum()

    ax[i].pie(
        res.values,
        labels=res.index,
        autopct='%1.1f%%',
        startangle=60,
        colors=['skyblue', 'pink']
    )
    ax[i].set_title(brand)
plt.suptitle('Revenue per Sales Channel')
plt.tight_layout()
plt.show()

In [None]:
#Revenue by each brand per year
res = df.groupby(['brand','order_year'], as_index = False)['revenue_usd'].sum()

pivot_df = res.pivot(index='order_year', columns='brand', values='revenue_usd')
pivot_df.plot(kind='line', figsize=(14, 5),color = ['pink','skyblue','lightgreen','red','grey','blue'], marker = "o")
plt.xlabel("Year")
plt.ylabel("Brand")
plt.grid(True, linestyle = "--")
plt.legend(loc = 'upper right')
plt.title("Revenue by each brand per year")
plt.show()

In [None]:
#Revenue in different country by each brand
res = df.groupby(['brand','country'], as_index = False)['revenue_usd'].sum()

pivot_df = res.pivot(index='country', columns='brand', values='revenue_usd')
pivot_df.plot(kind='bar', figsize=(14, 5),color = ['pink','skyblue','lightgreen','yellow','grey','blue'])
plt.xlabel("Country")
plt.ylabel("Brand")
plt.grid(True, linestyle = "--")
plt.legend(loc = 'upper right')
plt.title("Revenue by each brand in different countries")
plt.show()

In [None]:
#Revenue for each category in each brand
res = (
    df.groupby(['brand', 'category'], as_index=False)['revenue_usd']
      .sum()
      .sort_values(['brand', 'revenue_usd'], ascending=[True, False])
)
pivot_df = res.pivot(index='brand', columns='category', values='revenue_usd')
pivot_df.plot(kind='bar', figsize=(14, 5), width = 0.5, align = 'center',color = ['pink','skyblue','lightgreen','yellow','grey','blue'])
plt.xlabel("Brand")
plt.ylabel("Categories")
plt.grid(True, linestyle = "--")
plt.legend(loc = 'upper right')
plt.title("Top 2 selling category in each brand")
plt.show()

In [None]:
#Quantity sold by different gender each brand
brands = ['ASICS', 'Reebok', 'Puma', 'Nike', 'Adidas', 'New Balance']

fig, ax = plt.subplots(2, 3, figsize=(14, 6))
ax = ax.flatten()

for i, brand in enumerate(brands):
    brand_df = df[df['brand'] == brand]
    res = brand_df.groupby('gender')['units_sold'].sum()

    ax[i].pie(
        res.values,
        labels=res.index,
        autopct='%1.1f%%',
        startangle=60,
        colors=['skyblue', 'pink', 'royalblue']
    )
    ax[i].set_title(brand)
plt.suptitle('Gender-wise sales')
plt.tight_layout()
plt.show()

In [None]:
#Mean customer rating each brand
plt.figure(figsize=(10,5))
df.boxplot(column='customer_rating', by='brand',vert=False, notch = True,patch_artist=True, boxprops=dict(facecolor='royalblue' ))
plt.title('Customer Rating Distribution by Brand')
plt.grid(False)
plt.xlabel('Brand')
plt.ylabel('Customer Rating')
plt.show()

In [None]:
#Highest selling shoe size each brand
res = df.groupby(['brand','size'], as_index= False)['units_sold'].sum().sort_values(by = ['brand','units_sold'], ascending= False)
pivot_df = res.pivot(index='brand', columns='size', values='units_sold')
pivot_df.plot(kind='bar', figsize=(14, 5),color = ['pink','skyblue','lightgreen','yellow','grey','blue'])
plt.xlabel("brand")
plt.ylabel("Unit Sold")
plt.grid(True, linestyle = "--")
plt.legend(loc = 'upper right')
plt.title("Revenue by each brand in different countries")
plt.show()


---------------------------------------------------------------------------------------------------------------------------