# Subsetting and Descriptive Stats

## Before your start:
   - Remember that you just need to do one of the challenges.
   - Keep in mind that you need to use some of the functions you learned in the previous lessons.
   - All datasets are provided in IronHack's database.
   - Elaborate your codes and outputs as much as you can.
   - Try your best to answer the questions and complete the tasks and most importantly: enjoy the process!
   
#### Import all the necessary libraries here:

In [1]:
# import libraries here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly.express as xp
%matplotlib inline

# [ONLY ONE MANDATORY] Challenge 1
#### In this challenge we will use the `Temp_States`  dataset. 

#### First import it into a dataframe called `temp`.

In [2]:
# your code here
temp = pd.read_csv('../Temp_states.csv')

#### Print `temp`.

In [3]:
# your code here
temp.head(10)

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
1,Albany,New York,9.444444
2,Buffalo,New York,3.333333
3,Hartford,Connecticut,17.222222
4,Bridgeport,Connecticut,14.444444
5,Treton,New Jersey,22.222222
6,Newark,New Jersey,20.0


#### Explore the data types of the *temp* dataframe. What types of data do we have? Comment your result.

In [4]:
# your code here
temp.dtypes

City            object
State           object
Temperature    float64
dtype: object

In [5]:
"""
City and State are objects, while temperatures are floats. Temperatures look to be in Celcius rather than Farhenheit. 
"""

'\nCity and State are objects, while temperatures are floats. Temperatures look to be in Celcius rather than Farhenheit. \n'

#### Select the rows where state is New York.

In [6]:
# your code here
#https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/#loc-selection
ny_temp = temp.loc[temp['State'] == 'New York', 'City':'Temperature']
ny_temp

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
1,Albany,New York,9.444444
2,Buffalo,New York,3.333333


#### What is the average temperature of cities in New York?

In [7]:
# your code here
ny_temp.mean()

Temperature    10.740741
dtype: float64

#### Which states and cities have a temperature above 15 degrees Celsius?

In [8]:
# your code here
pleasant_temp = temp[(temp['Temperature']>15)]
pleasant_temp

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
3,Hartford,Connecticut,17.222222
5,Treton,New Jersey,22.222222
6,Newark,New Jersey,20.0


#### Now, return only the cities that have a temperature above 15 degrees Celsius.

In [9]:
# your code here
pleasant_cities = pleasant_temp['City']
pleasant_cities

0         NYC
3    Hartford
5      Treton
6      Newark
Name: City, dtype: object

#### Which cities have a temperature above 15 degrees Celcius and below 20 degrees Celsius?

**Hint**: First, write the condition. Then, select the rows.

In [10]:
# your code here
pleasant_restricted = temp[(temp['Temperature']>15) & (temp['Temperature']<20)]
pleasant_cities_restricted = pleasant_restricted['City']
pleasant_cities_restricted

0         NYC
3    Hartford
Name: City, dtype: object

#### Find the mean and standard deviation of the temperature of each state.

In [11]:
# your code here
state_mean_temp = temp.groupby('State')['Temperature'].mean()
state_mean_std = temp.groupby('State')['Temperature'].std()
print(state_mean_temp)
print(state_mean_std)

State
Connecticut    15.833333
New Jersey     21.111111
New York       10.740741
Name: Temperature, dtype: float64
State
Connecticut    1.964186
New Jersey     1.571348
New York       8.133404
Name: Temperature, dtype: float64


# [ONLY ONE MANDATORY]  Challenge 2

#### Load the `employees` dataset into a dataframe. Call the dataframe `employees`.

In [12]:
# your code here
employees = pd.read_csv('../Employee.csv')

#### Explore the data types of the `employees` dataframe. Comment your results.

In [13]:
# your code here
employees.head(10)

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30
3,Sonia,HR,Bachelor,F,analyst,4,35
4,Samuel,Sales,Master,M,associate,3,55
5,Eva,Sales,Bachelor,F,associate,2,55
6,Carlos,IT,Master,M,VP,8,70
7,Pedro,IT,Phd,M,associate,7,60
8,Ana,HR,Master,F,VP,8,70


In [14]:
employees.describe()

Unnamed: 0,Years,Salary
count,9.0,9.0
mean,4.111111,48.888889
std,2.803767,16.541194
min,1.0,30.0
25%,2.0,35.0
50%,3.0,55.0
75%,7.0,60.0
max,8.0,70.0


In [15]:
"""
Small dataframe (9 rows for 9 employees). Could mean it's a small company. 
It also looks to be a relatively young company, as the max amount of years worked there caps at 8 years, which includes
both the VP's. Salary: mean = 48.88; std = 16.54; min = 30; max = 70. Salary distributions appears left skewed.
"""

"\nSmall dataframe (9 rows for 9 employees). Could mean it's a small company. \nIt also looks to be a relatively young company, as the max amount of years worked there caps at 8 years, which includes\nboth the VP's. Salary: mean = 48.88; std = 16.54; min = 30; max = 70. Salary distributions appears left skewed.\n"

#### What's the average salary in this company?

In [16]:
# your code here
employees['Salary'].mean()

48.888888888888886

#### What's the highest salary?

In [17]:
# your code here
employees['Salary'].max()

70

#### What's the lowest salary?

In [18]:
# your code here
employees['Salary'].min()

30

#### Who are the employees with the lowest salary?

In [19]:
# your code here
print(employees[employees.Salary == employees.Salary.min()])

    Name Department Education Gender    Title  Years  Salary
1  Maria         IT    Master      F  analyst      2      30
2  David         HR    Master      M  analyst      2      30


#### Find all the information about an employee called David.

In [20]:
# your code here
print(employees[employees.Name == 'David'])

    Name Department Education Gender    Title  Years  Salary
2  David         HR    Master      M  analyst      2      30


#### Could you return only David's salary?

In [21]:
# your code here
david_salary = employees.loc[employees['Name'] == 'David', 'Salary']
david_salary

2    30
Name: Salary, dtype: int64

#### Print all the rows where job title is associate.

In [22]:
# your code here
associates = employees[(employees['Title']== 'associate')]
associates

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
4,Samuel,Sales,Master,M,associate,3,55
5,Eva,Sales,Bachelor,F,associate,2,55
7,Pedro,IT,Phd,M,associate,7,60


#### Print the first 3 rows of your dataframe.
**Tip**: There are 2 ways to do it. Do it both ways.

In [23]:
# Method 1
# your code here
employees.head(3)

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30


In [24]:
# Method 2
# your code here


#### Find the employees whose title is associate and whose salary is above 55.

In [25]:
# your code here
top_associates = associates.loc[associates['Salary'] > 55]
top_associates

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
7,Pedro,IT,Phd,M,associate,7,60


#### Group the employees by number of years of employment. What are the average salaries in each group?

In [26]:
# your code here
years_employed = employees.groupby('Years')['Salary'].mean()
years_employed

Years
1    35.000000
2    38.333333
3    55.000000
4    35.000000
7    60.000000
8    70.000000
Name: Salary, dtype: float64

####  What is the average salary per title?

In [27]:
# your code here
position_salary = employees.groupby('Title')['Salary'].mean()
position_salary

Title
VP           70.000000
analyst      32.500000
associate    56.666667
Name: Salary, dtype: float64

####  Find the salary quartiles.


In [28]:
# your code here
print(employees['Salary'].quantile(0.25))
print(employees['Salary'].quantile(0.50))
print(employees['Salary'].quantile(0.75))

35.0
55.0
60.0


#### Is the mean salary different per gender?

In [29]:
# your code here
avg_salary_gender = employees.groupby('Gender')['Salary'].mean()
avg_salary_gender


Gender
F    47.5
M    50.0
Name: Salary, dtype: float64

#### Find the minimum, mean and maximum of all numeric columns for each company department.



In [30]:
# your code here
departments_min = employees.groupby('Department')['Years','Salary'].min()
departments_max = employees.groupby('Department')['Years','Salary'].max()

print(departments_min)
print(departments_max)

            Years  Salary
Department               
HR              2      30
IT              1      30
Sales           2      55
            Years  Salary
Department               
HR              8      70
IT              8      70
Sales           3      55


#### Bonus Question:  for each department, compute the difference between the maximum and the minimum salary.
**Hint**: try using `agg` or `apply` combined with `lambda` functions.

In [31]:
# your code here
#https://stackoverflow.com/questions/40183800/pandas-difference-between-largest-and-smallest-value-within-group
department_range = employees.groupby('Department')['Salary'].apply(lambda x: x.max() - x.min())
department_range

Department
HR       40
IT       40
Sales     0
Name: Salary, dtype: int64

# [ONLY ONE MANDATORY] Challenge 3
#### Open the `Orders` dataset. Name your dataset `orders`.

In [32]:
# your code here
orders = pd.read_csv('../Orders.csv')

#### Explore your dataset by looking at the data types and summary statistics. Comment your results.

In [33]:
# your code here
orders.describe()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,CustomerID,amount_spent
count,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0
mean,278465.221859,560617.126645,2010.934259,7.612537,3.614555,12.728247,13.021823,3.116174,15294.315171,22.394749
std,152771.368303,13106.167695,0.247829,3.416527,1.928274,2.273535,180.42021,22.096788,1713.169877,309.055588
min,0.0,536365.0,2010.0,1.0,1.0,6.0,1.0,0.0,12346.0,0.0
25%,148333.75,549234.0,2011.0,5.0,2.0,11.0,2.0,1.25,13969.0,4.68
50%,284907.5,561893.0,2011.0,8.0,3.0,13.0,6.0,1.95,15159.0,11.8
75%,410079.25,572090.0,2011.0,11.0,5.0,14.0,12.0,3.75,16795.0,19.8
max,541908.0,581587.0,2011.0,12.0,7.0,20.0,80995.0,8142.75,18287.0,168469.6


In [34]:
orders.shape

(397924, 14)

In [35]:
orders.head(5)

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


In [36]:
orders.dtypes

Unnamed: 0        int64
InvoiceNo         int64
StockCode        object
year              int64
month             int64
day               int64
hour              int64
Description      object
Quantity          int64
InvoiceDate      object
UnitPrice       float64
CustomerID        int64
Country          object
amount_spent    float64
dtype: object

In [37]:
#Practicing shortening my outputs for readability. Say when! 
null_cols = orders.isnull().sum()
print(null_cols[:5])

duplicate_invoice = orders[orders.duplicated(['InvoiceNo'])]
print(duplicate_invoice[:5])

Unnamed: 0    0
InvoiceNo     0
StockCode     0
year          0
month         0
dtype: int64
   Unnamed: 0  InvoiceNo StockCode  year  month  day  hour  \
1           1     536365     71053  2010     12    3     8   
2           2     536365    84406B  2010     12    3     8   
3           3     536365    84029G  2010     12    3     8   
4           4     536365    84029E  2010     12    3     8   
5           5     536365     22752  2010     12    3     8   

                           Description  Quantity          InvoiceDate  \
1                  white metal lantern         6  2010-12-01 08:26:00   
2       cream cupid hearts coat hanger         8  2010-12-01 08:26:00   
3  knitted union flag hot water bottle         6  2010-12-01 08:26:00   
4       red woolly hottie white heart.         6  2010-12-01 08:26:00   
5         set 7 babushka nesting boxes         2  2010-12-01 08:26:00   

   UnitPrice  CustomerID         Country  amount_spent  
1       3.39       17850  United Kingd

In [38]:
"""
397924 rows, 14 columns. Avg order Qty is 13. Avg p.p unit is about 3.11. Avg amount spent per order is 22.39. 
The dataframe captures orders based on item (duplicate invoices).
"""

'\n397924 rows, 14 columns. Avg order Qty is 13. Avg p.p unit is about 3.11. Avg amount spent per order is 22.39. \nThe dataframe captures orders based on item (duplicate invoices).\n'

####  What is the average purchase price?

In [39]:
# your code here. Amount spent grouped by Invoice no.
avg_purchase_price = orders.groupby('InvoiceNo')['amount_spent'].mean()
print(avg_purchase_price)

total_avg_purchase_price = orders['amount_spent'].mean()
print(total_avg_purchase_price)

InvoiceNo
536365    19.874286
536366    11.100000
536367    23.227500
536368    17.512500
536369    17.850000
            ...    
581583    62.300000
581584    70.320000
581585    15.669048
581586    84.800000
581587    16.630000
Name: amount_spent, Length: 18536, dtype: float64
22.39474850474768


#### What are the highest and lowest purchase prices? 

In [40]:
# your code here
min_purchase_price = orders.groupby('InvoiceNo')['amount_spent'].min()
print(min_purchase_price)

max_purchase_price = orders.groupby('InvoiceNo')['amount_spent'].max()
print(max_purchase_price)

total_min_purchase_price = orders['amount_spent'].min()
print(total_min_purchase_price)

total_max_purchase_price = orders['amount_spent'].max()
print(total_max_purchase_price)

InvoiceNo
536365    15.30
536366    11.10
536367     9.90
536368    14.85
536369    17.85
          ...  
581583    58.00
581584    51.84
581585     4.56
581586    23.60
581587    10.20
Name: amount_spent, Length: 18536, dtype: float64
InvoiceNo
536365     25.50
536366     11.10
536367     54.08
536368     25.50
536369     17.85
           ...  
581583     66.60
581584     88.80
581585     30.00
581586    214.80
581587     23.40
Name: amount_spent, Length: 18536, dtype: float64
0.0
168469.6


#### Select all the customers from Spain.
**Hint**: Remember that you are not asked to find orders from Spain but customers. A customer might have more than one order associated. 

In [115]:
# your code here
cus_spain = orders.loc[orders['Country'] == 'Spain', 'InvoiceNo':'amount_spent']
print(cus_spain)
customers_grouped = cus_spain.groupby('CustomerID') 
customers_grouped.first().shape

        InvoiceNo StockCode  year  month  day  hour  \
4250       536944     22383  2010     12    5    12   
4251       536944     22384  2010     12    5    12   
4252       536944     20727  2010     12    5    12   
4253       536944     20725  2010     12    5    12   
4254       536944     20728  2010     12    5    12   
...           ...       ...   ...    ...  ...   ...   
394733     581193     23291  2011     12    3    17   
394734     581193    85232D  2011     12    3    17   
394735     581193     22721  2011     12    3    17   
394736     581193     23241  2011     12    3    17   
394737     581193     23247  2011     12    3    17   

                          Description  Quantity          InvoiceDate  \
4250          lunch bag suki  design         70  2010-12-03 12:20:00   
4251          lunch bag pink polkadot       100  2010-12-03 12:20:00   
4252          lunch bag  black skull.        60  2010-12-03 12:20:00   
4253          lunch bag red retrospot        70  20

(30, 12)

#### How many customers do we have in Spain?

In [None]:
# your code here
#30 

#### Select all the customers who have bought more than 50 items.
**Hint**: Remember that you are not asked to find orders with more than 50 items but customers who bought more than 50 items. A customer with two orders of 30 items each should appear in the selection.

In [105]:
# your code here
grouped = orders.groupby('CustomerID')
high_qty = (grouped.Quantity.sum() >= 50)
high_qty

CustomerID
12346     True
12347     True
12348     True
12349     True
12350     True
         ...  
18280    False
18281     True
18282     True
18283     True
18287     True
Name: Quantity, Length: 4339, dtype: bool

#### Select orders from Spain that include more than 50 items.

In [106]:
# your code here
high_qty.loc[orders['Country'] == 'Spain']


CustomerID
13974     True
13975     True
13976     True
13978     True
13979     True
13980     True
13982     True
13983     True
13984     True
13985     True
13986     True
13988     True
13989     True
13990     True
13991     True
13992     True
13993     True
13994     True
13995     True
13999     True
14000     True
14001     True
14002     True
14004     True
14005     True
14006     True
14009     True
14012     True
14013     True
17368     True
17370     True
17371     True
17372     True
17373     True
17374     True
17375     True
17376     True
17377     True
17379     True
17381     True
17382    False
17383     True
17384     True
17385     True
17386     True
17387     True
Name: Quantity, dtype: bool

#### Select all free orders.

In [110]:
# your code here
free = orders.loc[orders['amount_spent'] == 0]
free

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
6914,9302,537197,22841,2010,12,7,14,round cake tin vintage green,1,2010-12-05 14:02:00,0.0,12647,Germany,0.0
22539,33576,539263,22580,2010,12,4,14,advent calendar gingham sack,4,2010-12-16 14:36:00,0.0,16560,United Kingdom,0.0
25379,40089,539722,22423,2010,12,2,13,regency cakestand 3 tier,10,2010-12-21 13:45:00,0.0,14911,EIRE,0.0
29080,47068,540372,22090,2011,1,4,16,paper bunting retrospot,24,2011-01-06 16:41:00,0.0,13081,United Kingdom,0.0
29082,47070,540372,22553,2011,1,4,16,plasters in tin skulls,24,2011-01-06 16:41:00,0.0,13081,United Kingdom,0.0
34494,56674,541109,22168,2011,1,4,15,organiser wood antique white,1,2011-01-13 15:10:00,0.0,15107,United Kingdom,0.0
53788,86789,543599,84535B,2011,2,4,13,fairy cakes notebook a6 size,16,2011-02-10 13:08:00,0.0,17560,United Kingdom,0.0
85671,130188,547417,22062,2011,3,3,10,ceramic bowl with love heart design,36,2011-03-23 10:25:00,0.0,13239,United Kingdom,0.0
92875,139453,548318,22055,2011,3,3,12,mini cake stand hanging strawbery,5,2011-03-30 12:45:00,0.0,13113,United Kingdom,0.0
97430,145208,548871,22162,2011,4,1,14,heart garland rustic padded,2,2011-04-04 14:42:00,0.0,14410,United Kingdom,0.0


#### Select all orders whose description starts with `lunch bag`.
**Hint**: use string functions.

In [113]:
# your code here
lunch_bag = orders.loc[orders['Description'].str.match('lunch bag')]
lunch_bag

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
93,93,536378,20725,2010,12,3,9,lunch bag red retrospot,10,2010-12-01 09:37:00,1.65,14688,United Kingdom,16.50
172,174,536385,22662,2010,12,3,9,lunch bag dolly girl design,10,2010-12-01 09:56:00,1.65,17420,United Kingdom,16.50
354,363,536401,22662,2010,12,3,11,lunch bag dolly girl design,1,2010-12-01 11:21:00,1.65,15862,United Kingdom,1.65
359,368,536401,20725,2010,12,3,11,lunch bag red retrospot,1,2010-12-01 11:21:00,1.65,15862,United Kingdom,1.65
360,369,536401,22382,2010,12,3,11,lunch bag spaceboy design,2,2010-12-01 11:21:00,1.65,15862,United Kingdom,3.30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397465,540436,581486,23207,2011,12,5,9,lunch bag alphabet design,10,2011-12-09 09:38:00,1.65,17001,United Kingdom,16.50
397713,541695,581538,20727,2011,12,5,11,lunch bag black skull.,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397714,541696,581538,20725,2011,12,5,11,lunch bag red retrospot,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397877,541862,581581,23681,2011,12,5,12,lunch bag red vintage doily,10,2011-12-09 12:20:00,1.65,17581,United Kingdom,16.50


#### Select all `lunch bag` orders made in 2011.

In [114]:
# your code here
lunch_bag.loc[lunch_bag['year'] == 2011]

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
26340,42678,540015,20725,2011,1,2,11,lunch bag red retrospot,10,2011-01-04 11:40:00,1.65,13319,United Kingdom,16.50
26341,42679,540015,20726,2011,1,2,11,lunch bag woodland,10,2011-01-04 11:40:00,1.65,13319,United Kingdom,16.50
26512,42851,540023,22382,2011,1,2,12,lunch bag spaceboy design,2,2011-01-04 12:58:00,1.65,15039,United Kingdom,3.30
26513,42852,540023,20726,2011,1,2,12,lunch bag woodland,1,2011-01-04 12:58:00,1.65,15039,United Kingdom,1.65
26860,43616,540098,22384,2011,1,2,15,lunch bag pink polkadot,1,2011-01-04 15:50:00,1.65,16241,United Kingdom,1.65
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397465,540436,581486,23207,2011,12,5,9,lunch bag alphabet design,10,2011-12-09 09:38:00,1.65,17001,United Kingdom,16.50
397713,541695,581538,20727,2011,12,5,11,lunch bag black skull.,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397714,541696,581538,20725,2011,12,5,11,lunch bag red retrospot,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
397877,541862,581581,23681,2011,12,5,12,lunch bag red vintage doily,10,2011-12-09 12:20:00,1.65,17581,United Kingdom,16.50


#### Show the frequency distribution of the amount spent in Spain.

In [127]:
# your code here
spain = orders.loc[orders['Country']=='Spain']
spain['amount_spent'].value_counts()

15.00     186
17.70     122
19.80      99
17.40      86
10.20      76
         ... 
29.85       1
7.56        1
280.00      1
360.00      1
4.74        1
Name: amount_spent, Length: 316, dtype: int64

#### Select all orders made in the month of August.

In [131]:
# your code here
aug = orders.loc[orders['month']==8]
aug

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
199475,285421,561904,22075,2011,8,1,8,6 ribbons elegant christmas,96,2011-08-01 08:30:00,1.45,17941,United Kingdom,139.20
199476,285422,561904,85049E,2011,8,1,8,scandinavian reds ribbons,156,2011-08-01 08:30:00,1.06,17941,United Kingdom,165.36
199477,285423,561905,21385,2011,8,1,9,ivory hanging decoration heart,24,2011-08-01 09:31:00,0.85,14947,United Kingdom,20.40
199478,285424,561905,84970L,2011,8,1,9,single heart zinc t-light holder,12,2011-08-01 09:31:00,0.95,14947,United Kingdom,11.40
199479,285425,561905,84970S,2011,8,1,9,hanging heart zinc t-light holder,12,2011-08-01 09:31:00,0.85,14947,United Kingdom,10.20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
226483,320688,565067,22644,2011,8,3,17,ceramic cherry cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90
226484,320689,565067,22645,2011,8,3,17,ceramic heart fairy cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90
226485,320690,565067,22637,2011,8,3,17,piggy bank retrospot,2,2011-08-31 17:16:00,2.55,15856,United Kingdom,5.10
226486,320691,565067,22646,2011,8,3,17,ceramic strawberry cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90


#### Find the number of orders made by each country in the month of August.
**Hint**: Use value_counts().

In [None]:
# your code here


#### What's the  average amount of money spent by country?

In [51]:
# your code here

#### What's the most expensive item?

In [52]:
# your code here

#### What is the average amount spent per year?

In [53]:
# your code here