# Business Problem Overview


## Zara: Maximising Gross Margin through Data-Driven Retail Strategy
Zara is among the world’s largest and most recognized fashion brands, with a global presence spanning many countries. The brand is renowned for its fast fashion approach, on-trend designs, and highly efficient supply chain. As Zara expands, its focus is shifting from simply boosting sales to enhancing profitability.

The company’s leadership has pinpointed **Gross Margin** as a crucial business metric. Gross Margin represents the gap between Zara’s revenue and the costs incurred to produce its products. By increasing Gross Margin, Zara can maintain affordable prices for customers while ensuring strong profits.

Zara sees various untapped opportunities to improve this margin. These could include:
- How products are priced
- Which customer groups are targeted
- Identifying top-performing stores
- The way discounts are managed
- The profit contribution of different product categories

**Key Metric:** Gross Margin
- $\text{Gross Margin} = \text{Revenue} − \text{Production Cost}$

Where:
- $\text{Revenue} = \text{Unit Price} \times \text{Quantity} \times (1 − \text{Discount})$
- $\text{Production Cost} = \text{Unit Production Cost} \times \text{Quantity}$
### Objective
A data analyst at Zara works with the company’s global retail dataset to uncover insights that support maximizing Gross Margin across various markets, product lines, and customer segments. This involves:

- Breaking down Gross Margin as a business metric by examining it across different dimensions, such as geography, product categories, time periods, customer profiles, and store performance.
- Identifying which combinations of product types, customer groups, and locations are the most and least profitable.
- Investigating how operational factors-like discount strategies, employee roles, or store sizes-affect margin performance.
- Analyzing trends over time, across regions, and among store types to generate actionable insights for pricing, product focus, and resource allocation.

# Dataset Overview

# Analysis & Visualisation

## 1. Importing & Cleaning Data

The following code will download the datasets from Google Drive and storing them in a directory called `Datasets`.

In [29]:
import pandas as pd
import gdown

# Storing file_ids with their file names in a list
files = [('1gU90GdFLZOO5jPePOAVbNakNX5DCKgtQ', 'transactions'),
         ('1QoHCOAkfdKciP94CxUfw4xf9oZyXBqD6','stores'),
         ('1B-XBx4cHbYMCoY--P3s3Lmy9QVzouCyC','products'),
         ('1lZwCUHlwgX97-xcbQV8FEeQ2JLtJQa1s','employees'),
         ('1NJ0O1NJ20VeMzZYHiFBCrOIFnXKaggTc','discounts'),
         ('13-juheNtpYsXAjm_D0W7EDB86pAl1Pew','customers')] 

for file_id, name in files:
   download_url = f"https://drive.google.com/uc?id={file_id}"
   download_file = f"Datasets/{name}.csv"  # First create the directory Datasets in your Project folder
   gdown.download(download_url, download_file, quiet=False)


Downloading...
From (original): https://drive.google.com/uc?id=1gU90GdFLZOO5jPePOAVbNakNX5DCKgtQ
From (redirected): https://drive.google.com/uc?id=1gU90GdFLZOO5jPePOAVbNakNX5DCKgtQ&confirm=t&uuid=fc5b70b9-4bc2-4ed8-8a0c-65114faaa679
To: d:\NextLeap\Graduation Project - May 2025\Datasets\transactions.csv
100%|██████████| 805M/805M [03:36<00:00, 3.71MB/s] 
Downloading...
From: https://drive.google.com/uc?id=1QoHCOAkfdKciP94CxUfw4xf9oZyXBqD6
To: d:\NextLeap\Graduation Project - May 2025\Datasets\stores.csv
100%|██████████| 2.19k/2.19k [00:00<00:00, 5.27MB/s]
Downloading...
From: https://drive.google.com/uc?id=1B-XBx4cHbYMCoY--P3s3Lmy9QVzouCyC
To: d:\NextLeap\Graduation Project - May 2025\Datasets\products.csv
100%|██████████| 4.99M/4.99M [00:03<00:00, 1.41MB/s]
Downloading...
From: https://drive.google.com/uc?id=1lZwCUHlwgX97-xcbQV8FEeQ2JLtJQa1s
To: d:\NextLeap\Graduation Project - May 2025\Datasets\employees.csv
100%|██████████| 15.2k/15.2k [00:00<00:00, 480kB/s]
Downloading...
From: htt

In [None]:
# Reading the CSVs into dataframes
transactions = pd.read_csv("Datasets/transactions.csv")
stores = pd.read_csv("Datasets/stores.csv")
products = pd.read_csv("Datasets/products.csv")
employees = pd.read_csv("Datasets/employees.csv")
discounts = pd.read_csv("Datasets/discounts.csv")
customers = pd.read_csv("Datasets/customers.csv")

# Key Findings & Recommendations