# Challenge 3

In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.

You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

## Q1: How to identify VIP & Preferred Customers?

We start by importing all the required libraries:

In [24]:
# import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Next, import `Orders` from Ironhack's database into a dataframe variable called `orders`. Print the head of `orders` to overview the data:

In [25]:
# your code here
orders= pd.read_csv('Orders.csv')

orders.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


---

"Identify VIP and Preferred Customers" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:

## How to label customers whose aggregated `amount_spent` is in a given quantile range?


We break down the main problem into several sub problems:

#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?

#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?

#### Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?

*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*

Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps.

In [26]:
# your code here, group by por ID 

customer = orders.groupby("CustomerID").sum(numeric_only=True) #para ver la tabla agrupada por customer ID
cust_orden = customer.sort_values('amount_spent', ascending=False) # Ordenamos por cantidad gastada por cliente
cust_orden.head()

Unnamed: 0_level_0,Unnamed: 0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,amount_spent
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
14646,544561120,1163267611,4182810,14191,6552,24488,197491,5176.09,280206.02
18102,138022684,243297801,866723,3746,1261,5587,64124,1940.92,259657.3
17450,92683919,188845149,677704,2292,842,4140,69993,1143.32,194550.79
16446,929130,1688629,6033,22,11,27,80997,4.98,168472.5
14911,1737367680,3196374868,11416155,46220,18930,68148,80515,26185.72,143825.06


In [27]:
per95 = np.percentile(cust_orden.amount_spent, 95) # Definimos los percentiles 95 y 75
per75 = np.percentile(cust_orden.amount_spent,75)

In [28]:
vip = cust_orden[cust_orden.amount_spent >= per95] # df[condicion df] -> Los que gasten más del percentil 95 son VIP
vip.head()

Unnamed: 0_level_0,Unnamed: 0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,amount_spent
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
14646,544561120,1163267611,4182810,14191,6552,24488,197491,5176.09,280206.02
18102,138022684,243297801,866723,3746,1261,5587,64124,1940.92,259657.3
17450,92683919,188845149,677704,2292,842,4140,69993,1143.32,194550.79
16446,929130,1688629,6033,22,11,27,80997,4.98,168472.5
14911,1737367680,3196374868,11416155,46220,18930,68148,80515,26185.72,143825.06


In [29]:
vip.shape # cuantos vip hay

(217, 9)

In [30]:
preferred = cust_orden[(cust_orden.amount_spent >= per75) & (cust_orden.amount_spent < per95)] 
# df[condicion df] -> Aquellos que gasten más del percentil75 y menos del 95 son preferred
preferred.head()

Unnamed: 0_level_0,Unnamed: 0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,amount_spent
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
13050,88073344,223872574,810364,2992,1278,4940,3748,1204.52,5836.86
12720,90950303,197785584,711849,2760,1106,4153,4672,956.36,5781.73
15218,41221248,92641005,333826,1009,302,1823,3329,513.44,5756.89
17686,72758466,159766169,575146,1824,905,3433,2478,1103.64,5739.46
13178,59276272,147365548,532883,1858,872,3321,3570,542.34,5725.47


In [31]:
def cliente(x, es75, es95): # Defino una función para devolver si un cliente es vip o preferred
    if (x>=es75) & (x<es95):
        return 'Preferred' # devuelve 'preferred' si el valor x que recibe está entre percentil75 y 95
    elif x>=es95:
        return 'VIP' # devuelve 'vip' si el valor x que recibe está por encima de 95
    else:
        return '-'

In [32]:
cust_orden['VIP_OR_PREFERRED'] = cust_orden.amount_spent.apply(lambda x: cliente(x,per75,per95)) # Aplica la función que hemos definido a la columna amount_spent y con el return de la función creamos una columna nueva
cust_orden.head()

Unnamed: 0_level_0,Unnamed: 0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,amount_spent,VIP_OR_PREFERRED
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
14646,544561120,1163267611,4182810,14191,6552,24488,197491,5176.09,280206.02,VIP
18102,138022684,243297801,866723,3746,1261,5587,64124,1940.92,259657.3,VIP
17450,92683919,188845149,677704,2292,842,4140,69993,1143.32,194550.79,VIP
16446,929130,1688629,6033,22,11,27,80997,4.98,168472.5,VIP
14911,1737367680,3196374868,11416155,46220,18930,68148,80515,26185.72,143825.06,VIP


In [35]:
orders['VIP_OR_PREFERRED'] = orders['amount_spent'].apply(lambda x: cliente(x, per75, per95))

orders.Country.head()

0    United Kingdom
1    United Kingdom
2    United Kingdom
3    United Kingdom
4    United Kingdom
Name: Country, dtype: object

Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:

## Q2: How to identify which country has the most VIP Customers?

In [39]:
# your code here
Country = orders.groupby(['Country', 'CustomerID']).sum()
Country.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,amount_spent,VIP_OR_PREFERRED
Country,CustomerID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Australia,12386,228213,5381968,2256722915229262295321906224952255522557214222...,20102,98,32,96,20 dolly pegs retrospotassorted bottle top ma...,354,2010-12-08 09:53:002010-12-08 09:53:002010-12-...,23.91,401.9,----------
Australia,12388,23878866,55733658,84970L71459224292226247590B47590A2266922148229...,201100,592,381,1230,single heart zinc t-light holderhanging jam ja...,1462,2011-01-17 11:12:002011-01-17 11:12:002011-01-...,277.77,2780.66,----------------------------------------------...
Australia,12393,12849375,35452768,215812261984997B207272072622383212492237822175...,128704,315,213,732,skulls design cotton tote bagset of 6 soldie...,816,2011-01-11 09:47:002011-01-11 09:47:002011-01-...,145.9,1582.6,----------------------------------------------...
Australia,12415,164201777,398543981,2207822079220802207722505225162251722518225192...,1439876,4254,2169,8061,ribbon reel lace design ribbon reel hearts des...,77670,2011-01-06 11:12:002011-01-06 11:12:002011-01-...,2097.08,124914.53,----------------------------------------------...
Australia,12422,3428310,11563488,207282071321937219362193220717225032071285099C...,42231,85,47,189,lunch bag cars bluejumbo bag owlsstrawberry ...,195,2011-01-19 09:13:002011-01-19 09:13:002011-01-...,51.12,386.2,---------------------


In [40]:
Country['VIP_OR_PREFERRED'] = Country.amount_spent.apply(lambda x: cliente(x,per75,per95))
Country.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,amount_spent,VIP_OR_PREFERRED
Country,CustomerID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Australia,12386,228213,5381968,2256722915229262295321906224952255522557214222...,20102,98,32,96,20 dolly pegs retrospotassorted bottle top ma...,354,2010-12-08 09:53:002010-12-08 09:53:002010-12-...,23.91,401.9,-
Australia,12388,23878866,55733658,84970L71459224292226247590B47590A2266922148229...,201100,592,381,1230,single heart zinc t-light holderhanging jam ja...,1462,2011-01-17 11:12:002011-01-17 11:12:002011-01-...,277.77,2780.66,Preferred
Australia,12393,12849375,35452768,215812261984997B207272072622383212492237822175...,128704,315,213,732,skulls design cotton tote bagset of 6 soldie...,816,2011-01-11 09:47:002011-01-11 09:47:002011-01-...,145.9,1582.6,-
Australia,12415,164201777,398543981,2207822079220802207722505225162251722518225192...,1439876,4254,2169,8061,ribbon reel lace design ribbon reel hearts des...,77670,2011-01-06 11:12:002011-01-06 11:12:002011-01-...,2097.08,124914.53,VIP
Australia,12422,3428310,11563488,207282071321937219362193220717225032071285099C...,42231,85,47,189,lunch bag cars bluejumbo bag owlsstrawberry ...,195,2011-01-19 09:13:002011-01-19 09:13:002011-01-...,51.12,386.2,-


In [41]:
countryindex = Country.reset_index()
countryindex.head()

Unnamed: 0.1,Country,CustomerID,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,amount_spent,VIP_OR_PREFERRED
0,Australia,12386,228213,5381968,2256722915229262295321906224952255522557214222...,20102,98,32,96,20 dolly pegs retrospotassorted bottle top ma...,354,2010-12-08 09:53:002010-12-08 09:53:002010-12-...,23.91,401.9,-
1,Australia,12388,23878866,55733658,84970L71459224292226247590B47590A2266922148229...,201100,592,381,1230,single heart zinc t-light holderhanging jam ja...,1462,2011-01-17 11:12:002011-01-17 11:12:002011-01-...,277.77,2780.66,Preferred
2,Australia,12393,12849375,35452768,215812261984997B207272072622383212492237822175...,128704,315,213,732,skulls design cotton tote bagset of 6 soldie...,816,2011-01-11 09:47:002011-01-11 09:47:002011-01-...,145.9,1582.6,-
3,Australia,12415,164201777,398543981,2207822079220802207722505225162251722518225192...,1439876,4254,2169,8061,ribbon reel lace design ribbon reel hearts des...,77670,2011-01-06 11:12:002011-01-06 11:12:002011-01-...,2097.08,124914.53,VIP
4,Australia,12422,3428310,11563488,207282071321937219362193220717225032071285099C...,42231,85,47,189,lunch bag cars bluejumbo bag owlsstrawberry ...,195,2011-01-19 09:13:002011-01-19 09:13:002011-01-...,51.12,386.2,-


In [42]:
countryindex2= countryindex[countryindex.VIP_OR_PREFERRED == 'VIP']

In [44]:
vips=countryindex2.groupby('Country').CustomerID.count().sort_values(ascending=False)
vips

Country
United Kingdom     177
Germany             10
France               9
Switzerland          3
Spain                2
Portugal             2
Japan                2
EIRE                 2
Finland              1
Channel Islands      1
Netherlands          1
Norway               1
Singapore            1
Denmark              1
Sweden               1
Cyprus               1
Australia            1
Name: CustomerID, dtype: int64

In [47]:
paisesindexpreferred = countryindex[countryindex.VIP_OR_PREFERRED == 'Preferred']
most = paisesindexpreferred.groupby('Country').CustomerID.count().sort_values(ascending=False)
most

Country
United Kingdom     755
Germany             29
France              20
Belgium             11
Switzerland          6
Norway               6
Spain                5
Portugal             5
Italy                5
Finland              4
Australia            3
Channel Islands      3
Israel               2
Japan                2
Denmark              2
Cyprus               2
Greece               1
Austria              1
EIRE                 1
Lebanon              1
Malta                1
Poland               1
Sweden               1
Canada               1
Iceland              1
Name: CustomerID, dtype: int64

## Q3: How to identify which country has the most VIP+Preferred Customers combined?

In [5]:
# your code here