# Challenge 3

In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.

You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

## Q1: How to identify VIP & Preferred Customers?

We start by importing all the required libraries:

In [1]:
# import required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Next, import `Orders` from Ironhack's database into a dataframe variable called `orders`. Print the head of `orders` to overview the data:

In [3]:
ls

Orders.csv          challenge-1.ipynb   challenge-3.ipynb
Pokemon.csv         challenge-2.ipynb   pokemonEsteban.csv


In [40]:
# your code here

orders= pd.read_csv('Orders.csv', sep=",")
orders.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


---

"Identify VIP and Preferred Customers" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:

## How to label customers whose aggregated `amount_spent` is in a given quantile range?


We break down the main problem into several sub problems:

#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?

#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?

#### Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?

*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*

Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps.

In [3]:
# your code here

In [41]:
gastos_clientes = orders.groupby('CustomerID')['amount_spent'].sum().reset_index()

gastos_clientes.head()

Unnamed: 0,CustomerID,amount_spent
0,12346,77183.6
1,12347,4310.0
2,12348,1797.24
3,12349,1757.55
4,12350,334.4


In [42]:
# Creamos dos variables con percentiles el 75% y el 95% sobre los amount_spent

percentil75 = orders['amount_spent'].quantile(0.75)
percentil95 = orders['amount_spent'].quantile(0.95)

# Utilizamos la función filter en combinación con groupby para filtrar los clientes basándonos en el rango de cuantiles especificado

clienteselec = orders.groupby('CustomerID').filter(lambda x: percentil75 <= x['amount_spent'].sum() <= percentil95)
clienteselec

# La función groupby agrupa los datos por el 'CustomerID',
# mientras que la función filter nos permite aplicar una condición para seleccionar las filas que cumplan dicha condición. 
# En este caso, utilizamos una función lambda para verificar si la suma de los gastos (amount_spent) de cada cliente está dentro del rango de cuantiles establecido.

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
2022,3189,536608,22863,2010,12,4,9,soap dish brocante,6,2010-12-02 09:37:00,2.95,12855,United Kingdom,17.70
2023,3190,536608,22962,2010,12,4,9,jam jar with pink lid,12,2010-12-02 09:37:00,0.85,12855,United Kingdom,10.20
2024,3191,536608,22963,2010,12,4,9,jam jar with green lid,12,2010-12-02 09:37:00,0.85,12855,United Kingdom,10.20
3666,4933,536823,22508,2010,12,4,17,doorstop retrospot heart,4,2010-12-02 17:22:00,3.75,13011,United Kingdom,15.00
3667,4934,536823,82494L,2010,12,4,17,wooden frame antique white,6,2010-12-02 17:22:00,2.95,13011,United Kingdom,17.70
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
368921,499682,578684,23322,2011,11,5,8,large white heart of wicker,6,2011-11-25 08:59:00,2.95,16856,United Kingdom,17.70
384650,521361,580361,35001G,2011,12,5,16,hand open shape gold,12,2011-12-02 16:25:00,1.25,14865,United Kingdom,15.00
390447,530032,580772,22839,2011,12,2,11,3 tier cake tin green and cream,1,2011-12-06 11:00:00,14.95,15992,United Kingdom,14.95
390448,530033,580772,21136,2011,12,2,11,painted metal pears assorted,8,2011-12-06 11:00:00,1.69,15992,United Kingdom,13.52


In [82]:
# Label the selected customers as 'Preferred'

orders.loc[orders['amount_spent'] > percentil95, 'CustomerType'] = 'VIP'

orders.loc[(orders['amount_spent'] < percentil75) & (orders['amount_spent'] < percentil95), 'CustomerType'] = 'Preferred'


# Check the updated 'customer_spending' DataFrame
orders

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,CustomerType
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.30,Preferred
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.00,VIP
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397919,541904,581587,22613,2011,12,5,12,pack of 20 spaceboy napkins,12,2011-12-09 12:50:00,0.85,12680,France,10.20,Preferred
397920,541905,581587,22899,2011,12,5,12,children's apron dolly girl,6,2011-12-09 12:50:00,2.10,12680,France,12.60,Preferred
397921,541906,581587,23254,2011,12,5,12,childrens cutlery dolly girl,4,2011-12-09 12:50:00,4.15,12680,France,16.60,Preferred
397922,541907,581587,23255,2011,12,5,12,childrens cutlery circus parade,4,2011-12-09 12:50:00,4.15,12680,France,16.60,Preferred


In [84]:
sorted_orden = orders.sort_values(by='CustomerType', ascending=True)
sorted_orden


Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,CustomerType
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.30,Preferred
249313,349011,567475,22417,2011,9,2,13,pack of 60 spaceboy cake cases,3,2011-09-20 13:11:00,0.55,16360,United Kingdom,1.65,Preferred
249312,349010,567475,21977,2011,9,2,13,pack of 60 pink paisley cake cases,1,2011-09-20 13:11:00,0.55,16360,United Kingdom,0.55,Preferred
249311,349009,567475,23166,2011,9,2,13,medium ceramic top storage jar,2,2011-09-20 13:11:00,1.25,16360,United Kingdom,2.50,Preferred
249310,349008,567475,84692,2011,9,2,13,box of 24 cocktail parasols,4,2011-09-20 13:11:00,0.42,16360,United Kingdom,1.68,Preferred
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
133460,193911,553546,22699,2011,5,2,15,roses regency teacup and saucer,24,2011-05-17 15:42:00,2.55,12415,Australia,61.20,VIP
310313,424123,573195,85014B,2011,10,5,11,red retrospot umbrella,48,2011-10-28 11:40:00,4.95,12432,Norway,237.60,VIP
310312,424122,573195,85014A,2011,10,5,11,black/blue polkadot umbrella,48,2011-10-28 11:40:00,4.95,12432,Norway,237.60,VIP
310324,424134,573195,22980,2011,10,5,11,pantry scrubbing brush,12,2011-10-28 11:40:00,1.65,12432,Norway,19.80,VIP


Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:

## Q2: How to identify which country has the most VIP Customers?

In [89]:
# your code here

clientes_vip = orders[orders['CustomerType'] == 'VIP']
clientes_vip


Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,CustomerType
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.00,VIP
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
6,6,536365,21730,2010,12,3,8,glass star frosted t-light holder,6,2010-12-01 08:26:00,4.25,17850,United Kingdom,25.50,VIP
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397908,541893,581586,20685,2011,12,5,12,doormat red retrospot,10,2011-12-09 12:49:00,7.08,13113,United Kingdom,70.80,VIP
397909,541894,581587,22631,2011,12,5,12,circus parade lunch box,12,2011-12-09 12:50:00,1.95,12680,France,23.40,VIP
397910,541895,581587,22556,2011,12,5,12,plasters in tin circus parade,12,2011-12-09 12:50:00,1.65,12680,France,19.80,VIP
397911,541896,581587,22555,2011,12,5,12,plasters in tin strongman,12,2011-12-09 12:50:00,1.65,12680,France,19.80,VIP


In [92]:
conteo_vip_por_pais = clientes_vip.groupby('Country')['CustomerID'].count().reset_index()

conteo_vip_por_pais


Unnamed: 0,Country,CustomerID
0,Australia,859
1,Austria,148
2,Bahrain,12
3,Belgium,714
4,Brazil,16
5,Canada,50
6,Channel Islands,332
7,Cyprus,216
8,Czech Republic,20
9,Denmark,232


## Q3: How to identify which country has the most VIP+Preferred Customers combined?

In [94]:
# your code here
clientes_seleccionados = orders[(orders['CustomerType'] == 'VIP') | (orders['CustomerType'] == 'Preferred')]

conteo_clientes = clientes_seleccionados.groupby('Country')['CustomerID'].count().reset_index()

pais_con_mas_clientes = conteo_clientes.loc[conteo_clientes['CustomerID'].idxmax(), 'Country']

pais_con_mas_clientes

'United Kingdom'