Preppin Data 2023: Week 1 The Data Source Bank

Requirements:

1. Input the data.
2. Split the Transaction Code to extract the letters at the start of the transaction code. These identify the bank who processes the transaction.
3. Rename the new field with the Bank code 'Bank'. 
4. Rename the values in the Online or In-person field, Online if the 1 values and In-Person for the 2 values. 
5. Change the date to be the day of the week.
6. Different levels of detail are required in the outputs. You will need to sum up the values of the transactions in three ways:
    1. Total Values of Transactions by each bank
    2. Total Values by Bank, Day of the Week and Type of Transaction (Online or In-Person)
    3. Total Values by Bank and Customer Code
7. Output each data file.

In [79]:
import pandas as pd
import numpy as np

In [19]:
# 1. Input the data

bank_df = pd.read_csv('Preppin Data Inputs/PD 2023 Wk 1 Input.csv')

In [21]:
bank_df

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
0,DTB-716-679-576,1448,100001,2,20/03/2023 00:00:00
1,DS-795-814-303,7839,100001,2,15/11/2023 00:00:00
2,DSB-807-592-406,5520,100005,1,14/07/2023 00:00:00
3,DS-367-545-264,7957,100007,2,18/08/2023 00:00:00
4,DSB-474-374-857,5375,100000,2,26/08/2023 00:00:00
...,...,...,...,...,...
360,DTB-116-439-102,6708,100001,1,29/01/2023 00:00:00
361,DS-849-981-514,8500,100000,2,29/10/2023 00:00:00
362,DS-726-686-279,9455,100006,2,10/08/2023 00:00:00
363,DS-551-937-380,475,100002,1,11/10/2023 00:00:00


In [13]:
bank_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Transaction Code     365 non-null    object
 1   Value                365 non-null    int64 
 2   Customer Code        365 non-null    int64 
 3   Online or In-Person  365 non-null    int64 
 4   Transaction Date     365 non-null    object
dtypes: int64(3), object(2)
memory usage: 14.4+ KB


In [59]:
# 2. Split the Transaction Code to extract the letters at the start of the transaction code.
# 3. Rename the new field with the Bank code 'Bank'.

Bank = bank_df['Transaction Code'].str.split('-',n=1, expand=True)[0]

bank_df.insert(loc=0, column='Bank', value=Bank)

In [77]:
bank_df

Unnamed: 0,Bank,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
0,DTB,DTB-716-679-576,1448,100001,2,20/03/2023 00:00:00
1,DS,DS-795-814-303,7839,100001,2,15/11/2023 00:00:00
2,DSB,DSB-807-592-406,5520,100005,1,14/07/2023 00:00:00
3,DS,DS-367-545-264,7957,100007,2,18/08/2023 00:00:00
4,DSB,DSB-474-374-857,5375,100000,2,26/08/2023 00:00:00
...,...,...,...,...,...,...
360,DTB,DTB-116-439-102,6708,100001,1,29/01/2023 00:00:00
361,DS,DS-849-981-514,8500,100000,2,29/10/2023 00:00:00
362,DS,DS-726-686-279,9455,100006,2,10/08/2023 00:00:00
363,DS,DS-551-937-380,475,100002,1,11/10/2023 00:00:00


In [85]:
# 4. Rename the values in the Online or In-person field, Online if the 1 values and In-Person for the 2 values.

bank_df['Online or In-Person'] = np.where( bank_df['Online or In-Person'] == 1, 'Online', 'In-Person' )

In [87]:
bank_df

Unnamed: 0,Bank,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
0,DTB,DTB-716-679-576,1448,100001,In-Person,20/03/2023 00:00:00
1,DS,DS-795-814-303,7839,100001,In-Person,15/11/2023 00:00:00
2,DSB,DSB-807-592-406,5520,100005,Online,14/07/2023 00:00:00
3,DS,DS-367-545-264,7957,100007,In-Person,18/08/2023 00:00:00
4,DSB,DSB-474-374-857,5375,100000,In-Person,26/08/2023 00:00:00
...,...,...,...,...,...,...
360,DTB,DTB-116-439-102,6708,100001,Online,29/01/2023 00:00:00
361,DS,DS-849-981-514,8500,100000,In-Person,29/10/2023 00:00:00
362,DS,DS-726-686-279,9455,100006,In-Person,10/08/2023 00:00:00
363,DS,DS-551-937-380,475,100002,Online,11/10/2023 00:00:00


In [106]:
# 5. Change the date to be the day of the week.

bank_df['Transaction Date'] = pd.to_datetime(bank_df['Transaction Date'],dayfirst=True)

bank_df['Transaction Date'] = bank_df['Transaction Date'].dt.dayofweek

In [108]:
bank_df

Unnamed: 0,Bank,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
0,DTB,DTB-716-679-576,1448,100001,In-Person,0
1,DS,DS-795-814-303,7839,100001,In-Person,2
2,DSB,DSB-807-592-406,5520,100005,Online,4
3,DS,DS-367-545-264,7957,100007,In-Person,4
4,DSB,DSB-474-374-857,5375,100000,In-Person,5
...,...,...,...,...,...,...
360,DTB,DTB-116-439-102,6708,100001,Online,6
361,DS,DS-849-981-514,8500,100000,In-Person,6
362,DS,DS-726-686-279,9455,100006,In-Person,3
363,DS,DS-551-937-380,475,100002,Online,2


In [110]:
day_dict = {0:'Monday', 1:'Tuesday', 2:'Wednesday', 3:'Thursday', 4:'Friday', 5:'Saturday', 6:'Sunday'}

bank_df['Transaction Date'] = bank_df['Transaction Date'].map(day_dict)

In [112]:
bank_df

Unnamed: 0,Bank,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
0,DTB,DTB-716-679-576,1448,100001,In-Person,Monday
1,DS,DS-795-814-303,7839,100001,In-Person,Wednesday
2,DSB,DSB-807-592-406,5520,100005,Online,Friday
3,DS,DS-367-545-264,7957,100007,In-Person,Friday
4,DSB,DSB-474-374-857,5375,100000,In-Person,Saturday
...,...,...,...,...,...,...
360,DTB,DTB-116-439-102,6708,100001,Online,Sunday
361,DS,DS-849-981-514,8500,100000,In-Person,Sunday
362,DS,DS-726-686-279,9455,100006,In-Person,Thursday
363,DS,DS-551-937-380,475,100002,Online,Wednesday


In [128]:
# 6. Different levels of detail are required in the outputs. You will need to sum up the values of the transactions in three ways:
    # 1. Total Values of Transactions by each bank

value_by_bank = bank_df.groupby(by=Bank)['Value'].sum()

In [130]:
value_by_bank

0
DS     653940
DSB    530489
DTB    618238
Name: Value, dtype: int64

In [140]:
# 6. Different levels of detail are required in the outputs. You will need to sum up the values of the transactions in three ways:
    # 2. Total Values by Bank, Day of the Week and Type of Transaction (Online or In-Person)

value_by_bank_day_transactiontype = bank_df.groupby(['Bank', 'Transaction Date', 'Online or In-Person'])['Value'].sum()

In [146]:
value_by_bank_day_transactiontype

Bank  Transaction Date  Online or In-Person
DS    Friday            In-Person              58599
                        Online                 58731
      Monday            In-Person              42806
                        Online                 33563
      Saturday          In-Person              34867
                        Online                 71357
      Sunday            In-Person              51301
                        Online                 21761
      Thursday          In-Person              75582
                        Online                 13337
      Tuesday           In-Person              32607
                        Online                 36639
      Wednesday         In-Person              63686
                        Online                 59104
DSB   Friday            In-Person               9402
                        Online                 45647
      Monday            In-Person              43546
                        Online                 31692
  

In [152]:
# 6. Different levels of detail are required in the outputs. You will need to sum up the values of the transactions in three ways:
    # 3. Total Values by Bank and Customer Code

value_by_bank_customercode = bank_df.groupby(['Bank','Customer Code'])['Value'].sum()

In [158]:
value_by_bank_customercode

Bank  Customer Code
DS    100000           57909
      100001           53063
      100002           69803
      100003           25482
      100004           63315
      100005           39668
      100006           77636
      100007           76190
      100008           56400
      100009           56581
      100010           77893
DSB   100000           27585
      100001           67856
      100002           27936
      100003           58154
      100004           39003
      100005           56396
      100006           32333
      100007           29702
      100008           47121
      100009           51749
      100010           92654
DTB   100000           77252
      100001           60675
      100002           48616
      100003           84574
      100004           44435
      100005           37795
      100006           41909
      100007           29308
      100008           69352
      100009           52926
      100010           71396
Name: Value, dtype: int