# Business Problem Overview


## Zara: Maximising Gross Margin through Data-Driven Retail Strategy
Zara is among the world’s largest and most recognized fashion brands, with a global presence spanning many countries. The brand is renowned for its fast fashion approach, on-trend designs, and highly efficient supply chain. As Zara expands, its focus is shifting from simply boosting sales to enhancing profitability.

The company’s leadership has pinpointed **Gross Margin** as a crucial business metric. Gross Margin represents the gap between Zara’s revenue and the costs incurred to produce its products. By increasing Gross Margin, Zara can maintain affordable prices for customers while ensuring strong profits.

Zara sees various untapped opportunities to improve this margin. These could include:
- How products are priced
- Which customer groups are targeted
- Identifying top-performing stores
- The way discounts are managed
- The profit contribution of different product categories

**Key Metric:** Gross Margin
- $\text{Gross Margin} = \text{Revenue} − \text{Production Cost}$

Where:
- $\text{Revenue} = \text{Unit Price} \times \text{Quantity} \times (1 − \text{Discount})$
- $\text{Production Cost} = \text{Unit Production Cost} \times \text{Quantity}$
### Objective
A data analyst at Zara works with the company’s global retail dataset to uncover insights that support maximizing Gross Margin across various markets, product lines, and customer segments. This involves:

- Breaking down Gross Margin as a business metric by examining it across different dimensions, such as geography, product categories, time periods, customer profiles, and store performance.
- Identifying which combinations of product types, customer groups, and locations are the most and least profitable.
- Investigating how operational factors-like discount strategies, employee roles, or store sizes-affect margin performance.
- Analyzing trends over time, across regions, and among store types to generate actionable insights for pricing, product focus, and resource allocation.

# Dataset Overview

- **Dataset Name** : Zara Retail Dataset
- **Number of Tables** : 6
- **List of Tables**:
    - Customers
    - Discounts
    - Employees
    - Products
    - Stores
    - Transactions

## Table Overviews

### 1. Customers

- **Table Name** : Customers
- **Number of Rows** : 1643306
- **Number of Columns** : 9
- **Description** : This table gives details of Zara's customers including contact information, location and job title.

### 2. Discounts

- **Table Name** : Discounts
- **Number of Rows** : 181
- **Number of Columns** : 6
- **Description** : This table gives details of discount sales in Zara stores.

## Column Definitions

### 1. Customers

- **CustomerID**
    - *Description*: Unique number assigned to each customer.
    - *Example*: `1` refers to the customer Tyler Garcia.
- **Name**
    - *Description*: Full name of the customer. May include titles like "Mr." or job-related suffixes.
    - *Example*: `Tyler Garcia`
- **Email**
    - *Description*: Anonymized email address using fake domains (like fake_gmail.com).
    - *Example*: `tyler.garcia@fake_gmail.com`
- **Telephone**
    - *Description*: Customer’s phone number. Formats may vary, and may include country codes or extensions.
    - *Example*: `922.970.2265x47563`	
- **City**
    - *Description*: City where the customer is located.
    - *Example*: `New York`
- **Country**
    - *Description*: Country where the customer resides.
    - *Example*: `United States`
- **Gender**
    - *Description*: Customer's gender. Values can be F (Female), M (Male), or D (Diverse).
    - *Example*: `M`
- **DateOfBirth**
    - *Description*: Customer’s date of birth in YYYY-MM-DD format.
    - *Example*: `1968-12-18`
- **JobTitle**
    - *Description*: Customer’s occupation. May be blank or contain multiple job roles.
    - *Example*: `Restaurant manager`

### 2. Discounts

# Analysis & Visualisation

## 1. Importing & Cleaning Data

The following code will download the datasets from Google Drive and storing them in a directory called `Datasets`.

In [2]:
import pandas as pd
import gdown

In [None]:
# Storing file_ids with their file names in a list
files = [('1gU90GdFLZOO5jPePOAVbNakNX5DCKgtQ', 'transactions'),
         ('1QoHCOAkfdKciP94CxUfw4xf9oZyXBqD6','stores'),
         ('1B-XBx4cHbYMCoY--P3s3Lmy9QVzouCyC','products'),
         ('1lZwCUHlwgX97-xcbQV8FEeQ2JLtJQa1s','employees'),
         ('1NJ0O1NJ20VeMzZYHiFBCrOIFnXKaggTc','discounts'),
         ('13-juheNtpYsXAjm_D0W7EDB86pAl1Pew','customers')] 

for file_id, name in files:
   download_url = f"https://drive.google.com/uc?id={file_id}"
   download_file = f"Datasets/{name}.csv"  # First create the directory Datasets in your Project folder
   gdown.download(download_url, download_file, quiet=False)

In [3]:
# Reading the CSVs into dataframes
transactions = pd.read_csv("Datasets/transactions.csv")
stores = pd.read_csv("Datasets/stores.csv")
products = pd.read_csv("Datasets/products.csv")
employees = pd.read_csv("Datasets/employees.csv")
discounts = pd.read_csv("Datasets/discounts.csv")
customers = pd.read_csv("Datasets/customers.csv")

  customers = pd.read_csv("Datasets/customers.csv")


In [5]:
discounts

Unnamed: 0,Start,End,Discont,Description,Category,Sub Category
0,2020-01-01,2020-01-10,0.40,40% discount during our New Year Winter Sale,Feminine,Coats and Blazers
1,2020-01-01,2020-01-10,0.40,40% discount during our New Year Winter Sale,Feminine,Sweaters and Knitwear
2,2020-01-01,2020-01-10,0.40,40% discount during our New Year Winter Sale,Masculine,Coats and Blazers
3,2020-01-01,2020-01-10,0.40,40% discount during our New Year Winter Sale,Masculine,Sweaters and Sweatshirts
4,2020-01-01,2020-01-10,0.40,40% discount during our New Year Winter Sale,Children,Coats
...,...,...,...,...,...,...
176,2025-03-15,2025-03-31,0.35,35% discount during our Early Spring Collectio...,Feminine,Dresses and Jumpsuits
177,2025-03-15,2025-03-31,0.35,35% discount during our Early Spring Collectio...,Feminine,Shirts and Blouses
178,2025-03-15,2025-03-31,0.35,35% discount during our Early Spring Collectio...,Masculine,T-shirts and Polos
179,2025-03-15,2025-03-31,0.35,35% discount during our Early Spring Collectio...,Masculine,Shirts


# Key Findings & Recommendations