# 2019: Week 13

May 08, 2019

This week, we are looking at a common challenge in customer analytics - the summary table. For new and intermediate analysts, the challenge of connecting to massive data sets and using all the data points is all too tempting. However, complexity and software performance maybe hit due to trying to work out complex calculations or rendering vast numbers of data points. After all, 'Big Data', which means a lot of things to a lot of different people, encompassed this challenge among many others.

In Financial Services, the regular flow of transactions is a wonderful data source for analysis but is also a challenge. To create more simplistic snapshots of behaviour (often around balances), balances would be averaged over a time period. But what level do you need a customer's balance aggregated to? This is the challenge we will explore but for our favourite soap company, Chin & Beard Suds co.

Using three different company's data who buy our products, we are looking at the balance of credit that they hold with us. As any business owner will know, 'cashflow is king' and therefore, we only provide supplies to those who hold a positive balance with us at the start of the month. But are these companies 'gaming' our system? We want to know:

* What is the average weekly, monthly and quarterly balance?
* What is the average weekly, monthly and quarterly transaction value?
* How many days does the customer have a negative balance?
* How many days does the customer exceed their credit limit? (credit limit is a positive number in the input but needs to be made negative as it how much we allow the customer to owe us)

# Requirements

<img src="https://1.bp.blogspot.com/-nSaKCQrGi4w/XNL8otdr-ZI/AAAAAAAAAOU/i9Pt7fSR9PsEVooSH9aSHrQiaa8oEZ7JACLcBGAs/s320/Input%2BCustomer%2BLookup.JPG" width="700" height="200">

Input of Customer Details

<img src="https://3.bp.blogspot.com/-x3-23t6xYYs/XNL8ouA64oI/AAAAAAAAAOQ/jsvAlfskma8UAgOpLqb5l3q2gwqY5zdawCLcBGAs/s320/Input%2BTransactions.JPG" width="700" height="300">

Input of Transactions

* Input data
* Create an average (mean) for balance to two decimal places and average (mean) for transactions per customer to no decimal places, per time period
* Each row will be a customer per time period 
* Date recorded will be the beginning of that time period (ie first day in the week for the weekly table)
* Bring in the customer name
* Determine the number of days a customer's balance is below zero
* Determine the number of days a customer's balance is below their credit limit (ie have gone beyond our allowance).  


# Output

Weekly

<img src="https://3.bp.blogspot.com/-0OTlGnvSPaY/XNMEFvJEs7I/AAAAAAAAAOs/imijzJkF8XghIQJ0cptwyYHDptm5lOrowCLcBGAs/s320/Weekly%2BOutput.JPG">

Monthly

<img src="https://4.bp.blogspot.com/-F6I-673Ls84/XNMEFlEUV8I/AAAAAAAAAOo/iC-OL0JMfG0uYJgeXFLVRivAAA8GU34fACEwYBhgL/s320/Monthly%2BOutput.JPG">

Quarterly

<img src="https://2.bp.blogspot.com/-RxyzzT97Q44/XNMEFhn4YGI/AAAAAAAAAOw/Wrl5k1eT6j4XvR30wMWbeI0Av2ZS0FZMQCEwYBhgL/s320/Quarterly%2BOutput.JPG">

3 Output files: weekly, monthly, quarterly
Outputs (excluding headers in counts):
* Weekly: 81 rows, 8 columns
* Monthly: 18 rows, 8 columns
* Quarterly: 6 rows, 8 columns

In [22]:
import pandas as pd
from datetime import timedelta

In [23]:
input = 'input.xlsx'
df1 = pd.read_excel(input, sheet_name='Transactions')
df2 = pd.read_excel(input, sheet_name='Customer Look-up')

# Định dạng lại data type
df1['Account'] = df1['Account'].astype(int)
df2['Max Credit'] = df2['Max Credit'].astype(int)
print(df1.head(5))
print(df1.info())
print("=================================")
print(df2.head(5))
print(df2.info())

   Account       Date  Transaction   Balance
0  1237421 2019-01-01        578.0  100000.0
1  1237421 2019-01-02        198.0   99422.0
2  1237421 2019-01-03       1806.0   99224.0
3  1237421 2019-01-04        144.0   97418.0
4  1237421 2019-01-05       2240.0   97274.0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 543 entries, 0 to 542
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Account      543 non-null    int32         
 1   Date         543 non-null    datetime64[ns]
 2   Transaction  543 non-null    float64       
 3   Balance      543 non-null    float64       
dtypes: datetime64[ns](1), float64(2), int32(1)
memory usage: 15.0 KB
None
      Account                 Name  Max Credit
0   1237421.0  Bubbly McBubbleface       10000
1   4271819.0          Bubblicious       30000
2  12371202.0          Bubbleburst        5000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
D

In [24]:
# Tạo cột Week, Month, Quarter với điều kiện coi Sunday là ngày đầu tuần
df1['Sunday'] = df1['Date'] + pd.offsets.Day(1)
df1['week'] = df1['Sunday'].dt.isocalendar().week
df1['month'] = df1['Date'].dt.month
df1['quarter'] = df1['Date'].dt.quarter
print(df1.head(5))
print(df1.info())

   Account       Date  Transaction   Balance     Sunday  week  month  quarter
0  1237421 2019-01-01        578.0  100000.0 2019-01-02     1      1        1
1  1237421 2019-01-02        198.0   99422.0 2019-01-03     1      1        1
2  1237421 2019-01-03       1806.0   99224.0 2019-01-04     1      1        1
3  1237421 2019-01-04        144.0   97418.0 2019-01-05     1      1        1
4  1237421 2019-01-05       2240.0   97274.0 2019-01-06     1      1        1
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 543 entries, 0 to 542
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Account      543 non-null    int32         
 1   Date         543 non-null    datetime64[ns]
 2   Transaction  543 non-null    float64       
 3   Balance      543 non-null    float64       
 4   Sunday       543 non-null    datetime64[ns]
 5   week         543 non-null    UInt32        
 6   month        543 non-null    in

In [25]:
# Join lấy tên Account
df3 = df1.merge(df2, on='Account')
df3['Negative_balance_?'] = df3.apply(lambda row: 1 if row['Balance'] < 0 else 0, axis=1)
df3['Exceed_limit_?'] = df3.apply(lambda row: 1 if row['Balance'] + row['Max Credit'] < 0 else 0, axis=1)
print(df3.head(5))
print(df3.info())

   Account       Date  Transaction   Balance     Sunday  week  month  quarter  \
0  1237421 2019-01-01        578.0  100000.0 2019-01-02     1      1        1   
1  1237421 2019-01-02        198.0   99422.0 2019-01-03     1      1        1   
2  1237421 2019-01-03       1806.0   99224.0 2019-01-04     1      1        1   
3  1237421 2019-01-04        144.0   97418.0 2019-01-05     1      1        1   
4  1237421 2019-01-05       2240.0   97274.0 2019-01-06     1      1        1   

                  Name  Max Credit  Negative_balance_?  Exceed_limit_?  
0  Bubbly McBubbleface       10000                   0               0  
1  Bubbly McBubbleface       10000                   0               0  
2  Bubbly McBubbleface       10000                   0               0  
3  Bubbly McBubbleface       10000                   0               0  
4  Bubbly McBubbleface       10000                   0               0  
<class 'pandas.core.frame.DataFrame'>
Int64Index: 543 entries, 0 to 542
Dat

In [26]:
# Tạo output_week
output_week = df3.groupby(['Account', 'Name', 'week']).agg({'Date':'min', 'Negative_balance_?':'sum', 'Exceed_limit_?':'sum', 'Balance':'mean', 'Transaction':'mean'})
output_week.reset_index(inplace=True)
output_week.rename(columns={'Balance':'Weekly Avg Balance', 'Transaction': 'Weekly Avg Transaction', 'Negative_balance_?':'Days Below Zero Balance', 'Exceed_limit_?':'Days Beyond Max Credit'}, inplace=True)
print(output_week.head(5))
print(output_week.info())

   Account                 Name  week       Date  Days Below Zero Balance  \
0  1237421  Bubbly McBubbleface     1 2019-01-01                        0   
1  1237421  Bubbly McBubbleface     2 2019-01-06                        0   
2  1237421  Bubbly McBubbleface     3 2019-01-13                        0   
3  1237421  Bubbly McBubbleface     4 2019-01-20                        0   
4  1237421  Bubbly McBubbleface     5 2019-01-27                        0   

   Days Beyond Max Credit  Weekly Avg Balance  Weekly Avg Transaction  
0                       0        98667.600000              993.200000  
1                       0        90139.714286             1271.000000  
2                       0        82050.285714             1318.714286  
3                       0        73950.428571             1159.714286  
4                       0        68574.285714             1242.428571  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 81 entries, 0 to 80
Data columns (total 8 columns):
 # 

In [27]:
# Tạo output_month
output_month = df3.groupby(['Account', 'Name', 'month']).agg({'Date':'min', 'Negative_balance_?':'sum', 'Exceed_limit_?':'sum', 'Balance':'mean', 'Transaction':'mean'})
output_month.reset_index(inplace=True)
output_month.rename(columns={'Balance':'Monthly Avg Balance', 'Transaction': 'Monthly Avg Transaction', 'Negative_balance_?':'Days Below Zero Balance', 'Exceed_limit_?':'Days Beyond Max Credit'}, inplace=True)
print(output_month.head(5))
print(output_month.info())

   Account                 Name  month       Date  Days Below Zero Balance  \
0  1237421  Bubbly McBubbleface      1 2019-01-01                        0   
1  1237421  Bubbly McBubbleface      2 2019-02-01                        0   
2  1237421  Bubbly McBubbleface      3 2019-03-01                        0   
3  1237421  Bubbly McBubbleface      4 2019-04-01                        0   
4  1237421  Bubbly McBubbleface      5 2019-05-01                       10   

   Days Beyond Max Credit  Monthly Avg Balance  Monthly Avg Transaction  
0                       0         82167.548387              1208.838710  
1                       0         57972.785714              1208.392857  
2                       0         41392.064516              1218.806452  
3                       0         20931.333333              1387.400000  
4                       4          7229.451613              1357.483871  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (tot

In [28]:
# Tạo output_quarter
output_quarter = df3.groupby(['Account', 'Name', 'quarter']).agg({'Date':'min', 'Negative_balance_?':'sum', 'Exceed_limit_?':'sum', 'Balance':'mean', 'Transaction':'mean'})
output_quarter.reset_index(inplace=True)
output_quarter.rename(columns={'Balance':'Quarterly Avg Balance', 'Transaction': 'Quarterly Avg Transaction', 'Negative_balance_?':'Days Below Zero Balance', 'Exceed_limit_?':'Days Beyond Max Credit'}, inplace=True)
print(output_quarter.head(5))
print(output_quarter.info())

    Account                 Name  quarter       Date  Days Below Zero Balance  \
0   1237421  Bubbly McBubbleface        1 2019-01-01                        0   
1   1237421  Bubbly McBubbleface        2 2019-04-01                       10   
2   4271819          Bubblicious        1 2019-01-01                        0   
3   4271819          Bubblicious        2 2019-04-01                        7   
4  12371202          Bubbleburst        1 2019-01-01                       13   

   Days Beyond Max Credit  Quarterly Avg Balance  Quarterly Avg Transaction  
0                       0           60595.400000                1212.133333  
1                       4           19285.901099                1353.395604  
2                       0           61268.844444                1155.988889  
3                       0           22532.791209                1189.549451  
4                       0            5763.422222                 247.588889  
<class 'pandas.core.frame.DataFrame'>
RangeIn