# Fashion chain stores

Imagine you are serving as a Business Intelligence Analyst at the HQs of an international fashion goods chain store. Your boss today asked you for 2 things:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

## Identifying VIP & Preferred Customers

In [1]:
import numpy as np
import pandas as pd

In [2]:
orders = pd.read_csv("orders.csv")

In [3]:
orders.head(5)

Unnamed: 0,index,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


In [4]:
orders.shape

(397924, 14)

In [5]:
orders.columns = orders.columns.str.lower()

In [6]:
orders.head()

Unnamed: 0,index,invoiceno,stockcode,year,month,day,hour,description,quantity,invoicedate,unitprice,customerid,country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


In [7]:
orders.customerid.nunique()

4339

In [8]:
orders.head()

Unnamed: 0,index,invoiceno,stockcode,year,month,day,hour,description,quantity,invoicedate,unitprice,customerid,country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


In [9]:
customer_importance = orders.groupby("customerid").amount_spent.sum().to_frame(name="total_amount_spent") \
                            .sort_values(by="total_amount_spent", ascending=False)

In [10]:
customer_importance["percentile_rank"] = customer_importance.total_amount_spent.rank(pct=True)

In [11]:
customer_importance.percentile_rank = customer_importance.percentile_rank.apply(lambda x: x * 100)

In [12]:
customer_importance.head()

Unnamed: 0_level_0,total_amount_spent,percentile_rank
customerid,Unnamed: 1_level_1,Unnamed: 2_level_1
14646,280206.02,100.0
18102,259657.3,99.976953
17450,194550.79,99.953906
16446,168472.5,99.93086
14911,143825.06,99.907813


In [13]:
customer_importance["status"] = pd.cut(customer_importance.percentile_rank, [0, 75, 95, 100],
                                labels=["standard customer", "preferred customer", "VIP customer"])

In [14]:
customer_importance.head()

Unnamed: 0_level_0,total_amount_spent,percentile_rank,status
customerid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
14646,280206.02,100.0,VIP customer
18102,259657.3,99.976953,VIP customer
17450,194550.79,99.953906,VIP customer
16446,168472.5,99.93086,VIP customer
14911,143825.06,99.907813,VIP customer


In [17]:
customer_importance = customer_importance.reset_index()

In [18]:
customer_importance.head()

Unnamed: 0,customerid,total_amount_spent,percentile_rank,status
0,14646,280206.02,100.0,VIP customer
1,18102,259657.3,99.976953,VIP customer
2,17450,194550.79,99.953906,VIP customer
3,16446,168472.5,99.93086,VIP customer
4,14911,143825.06,99.907813,VIP customer


## Linking customers to country

In [19]:
orders_completed = pd.merge(orders, customer_importance, on="customerid")

In [20]:
orders_completed.head()

Unnamed: 0,index,invoiceno,stockcode,year,month,day,hour,description,quantity,invoicedate,unitprice,customerid,country,amount_spent,total_amount_spent,percentile_rank,status
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3,5391.21,94.307444,preferred customer
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21,94.307444,preferred customer
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0,5391.21,94.307444,preferred customer
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21,94.307444,preferred customer
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21,94.307444,preferred customer


In [24]:
customers_per_country = orders_completed.groupby("country").status.value_counts().unstack()

In [26]:
customers_per_country.sort_values(by="VIP customer", ascending=False).head(5)

status,VIP customer,preferred customer,standard customer
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
United Kingdom,84185.0,137450.0,132710.0
EIRE,7077.0,161.0,
France,3290.0,3011.0,2041.0
Germany,3127.0,4222.0,1693.0
Netherlands,2080.0,,283.0


In [28]:
customers_per_country.columns = customers_per_country.columns.str.replace(" ", "_")

In [36]:
customers_per_country = customers_per_country.fillna(0)

In [37]:
customers_per_country["top_customer"] = customers_per_country.VIP_customer + customers_per_country.preferred_customer

In [39]:
customers_per_country.sort_values(by="top_customer", ascending=False).head()

status,VIP_customer,preferred_customer,standard_customer,top_customer
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
United Kingdom,84185.0,137450.0,132710.0,221635.0
Germany,3127.0,4222.0,1693.0,7349.0
EIRE,7077.0,161.0,0.0,7238.0
France,3290.0,3011.0,2041.0,6301.0
Netherlands,2080.0,0.0,283.0,2080.0
