## Exercises on pandas operations


This notebooks contains some exercises to practice the concepts learned in the notebook [02 - pandas operations](02%20-%20pandas%20operations.ipynb) as well as [01 - pandas basics](01%20-%20pandas%20basics.ipynb).

#### Instructions & suggestions

- Run the first cell below these instructions to load the data
- For more complex questions, you can iteratively build your solution starting from the simpler pieces 
- There are enough questions to go overtime, feel free to take some of them home

In [None]:
%matplotlib inline

import pandas as pd
import numpy as np

clients_file = '../data/fake_shop/fake_clients.csv'
transactions_file = '../data/fake_shop/fake_transactions.csv'

clients = pd.read_csv(clients_file, parse_dates=['date_of_birth'])
transactions = pd.read_csv(transactions_file, parse_dates=['date'])

### 1. List the number of customers per city

In [None]:
# In order of frequency high-to-low
clients['city'].value_counts()

In [None]:
# In alphabetical order A-Z
clients.groupby('city').size()

### 2. Which product brings in the highest revenue?

In [None]:
# Use can use .head(1) to visualise just one
transactions.groupby('product')['total'].sum().sort_values(ascending=False)

### 3. List the revenue (sum of the total) per day

In [None]:
transactions.groupby('date')['total'].sum()

### 4. List the number of transactions per day

In [None]:
transactions.groupby('date')['transaction_id'].nunique()

### 5. Plot the number of clients per city (in alphabetical order)

Plot with `kind='bar'` to product a bar chart.

In [None]:
# groupby() returns in alphabetical order
clients.groupby('city').size().plot(kind='bar')

In [None]:
# value_counts() returns in order of frequency
# so we need to sort using the index/label, which is the city name
clients['city'].value_counts().sort_index().plot(kind='bar')

### 6. List the revenue (sum of the total) per city

In [None]:
join_df = pd.merge(left=clients, right=transactions, on='client_id')
join_df.groupby('city')['total'].sum()

### 7. List the number of transactions per client

Show the client name and the number.

In [None]:
join_df = pd.merge(left=clients, right=transactions, on='client_id')
join_df.groupby('name')['transaction_id'].nunique()

### 8. Plot the revenue (sum of the total) per city

- Plot using `kind='bar'` to product a bar chart.
- Add the title "Revenue per city" to the plot
- Add the label "Revenue" to the Y-axis

In [None]:
join_df = pd.merge(left=clients, right=transactions, on='client_id')
ax = join_df.groupby('city')['total'].sum().plot(kind='bar', title='Revenue per city')
ax.set_ylabel('Revenue');

### 9. Which product has sold the highest number of units?

In [None]:
transactions.groupby('product')['quantity'].sum().sort_values(ascending=False)

### 10. Plot the total revenue per product

Plot using `kind='bar'` to product a bar chart.

In [None]:
transactions.groupby('product')['total'].sum().plot(kind='bar');

### 11. For each city, list the revenue (sum of total) per product 

In [None]:
join_df = pd.merge(left=clients, right=transactions, on='client_id')
join_df.groupby(['city', 'product'])['total'].sum()

### 12. For each different day, what's the average transaction total?

In [None]:
# aggregate per transaction, summing the total and showing the date
total = transactions.groupby('transaction_id').agg({'total': 'sum', 'date': 'first'})
# aggregate per date, computing the average of the total
total.groupby('date')['total'].mean()

### 13. For each different day, show the minimum and maximum revenue on a single transaction

In [None]:
# aggregate per transaction, summing the total and showing the date
total = transactions.groupby('transaction_id').agg({'total': 'sum', 'date': 'first'})
# aggregate per date, computing the average of the total
total.groupby('date')['total'].agg(minimum=np.min, maximum=np.max)