2023: Week 3 - Targets for DSB

1. For the transactions file:
    A. Filter the transactions to just look at DSB. These will be transactions that contain DSB in the Transaction Code field
    B. Rename the values in the Online or In-person field, Online of the 1 values and In-Person for the 2 values
    C. Change the date to be the quarter
    D. Sum the transaction values for each quarter and for each Type of Transaction (Online or In-Person)
2. For the targets file:
    A. Pivot the quarterly targets so we have a row for each Type of Transaction and each Quarter
    B. Rename the fields
    C. Remove the 'Q' from the quarter field and make the data type numeric
3. Join the two datasets together
   - You may need more than one join clause!
4. Remove unnecessary fields
5. Calculate the Variance to Target for each row
6. Output the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
transactions = pd.read_csv('Preppin Data Inputs/Transactions wk1.csv')

In [5]:
targets = pd.read_csv('Preppin Data Inputs/Targets.csv')

In [7]:
transactions

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
0,DTB-716-679-576,1448,100001,2,20/03/2023 00:00:00
1,DS-795-814-303,7839,100001,2,15/11/2023 00:00:00
2,DSB-807-592-406,5520,100005,1,14/07/2023 00:00:00
3,DS-367-545-264,7957,100007,2,18/08/2023 00:00:00
4,DSB-474-374-857,5375,100000,2,26/08/2023 00:00:00
...,...,...,...,...,...
360,DTB-116-439-102,6708,100001,1,29/01/2023 00:00:00
361,DS-849-981-514,8500,100000,2,29/10/2023 00:00:00
362,DS-726-686-279,9455,100006,2,10/08/2023 00:00:00
363,DS-551-937-380,475,100002,1,11/10/2023 00:00:00


In [9]:
targets

Unnamed: 0,Online or In-Person,Q1,Q2,Q3,Q4
0,Online,72500,70000,60000,60000
1,In-Person,75000,70000,70000,60000


In [11]:
# 1A. Filter the transactions to just look at DSB

transactions = transactions[transactions['Transaction Code'].str.contains('DSB')]

In [13]:
transactions

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
2,DSB-807-592-406,5520,100005,1,14/07/2023 00:00:00
4,DSB-474-374-857,5375,100000,2,26/08/2023 00:00:00
5,DSB-448-546-348,4525,100009,1,27/05/2023 00:00:00
11,DSB-422-218-322,118,100010,1,12/05/2023 00:00:00
12,DSB-669-227-170,830,100001,1,15/04/2023 00:00:00
...,...,...,...,...,...
350,DSB-618-298-395,9280,100008,1,10/03/2023 00:00:00
351,DSB-637-369-281,6060,100002,2,09/08/2023 00:00:00
353,DSB-322-596-206,900,100001,1,04/02/2023 00:00:00
354,DSB-384-247-358,6446,100003,1,28/06/2023 00:00:00


In [15]:
# 1B. Rename the values in the Online or In-person field, Online of the 1 values and In-Person for the 2 values

transactions['Online or In-Person'] = np.where(transactions['Online or In-Person'] == 1, 'Online', 'In-Person')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  transactions['Online or In-Person'] = np.where(transactions['Online or In-Person'] == 1, 'Online', 'In-Person')


In [17]:
transactions

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
2,DSB-807-592-406,5520,100005,Online,14/07/2023 00:00:00
4,DSB-474-374-857,5375,100000,In-Person,26/08/2023 00:00:00
5,DSB-448-546-348,4525,100009,Online,27/05/2023 00:00:00
11,DSB-422-218-322,118,100010,Online,12/05/2023 00:00:00
12,DSB-669-227-170,830,100001,Online,15/04/2023 00:00:00
...,...,...,...,...,...
350,DSB-618-298-395,9280,100008,Online,10/03/2023 00:00:00
351,DSB-637-369-281,6060,100002,In-Person,09/08/2023 00:00:00
353,DSB-322-596-206,900,100001,Online,04/02/2023 00:00:00
354,DSB-384-247-358,6446,100003,Online,28/06/2023 00:00:00


In [19]:
# 1C. Change the date to be the quarter

transactions['Transaction Date'] = pd.to_datetime(transactions['Transaction Date'], format= '%d/%m/%Y %H:%M:%S')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  transactions['Transaction Date'] = pd.to_datetime(transactions['Transaction Date'], format= '%d/%m/%Y %H:%M:%S')


In [21]:
transactions['Quarter'] = transactions['Transaction Date'].dt.quarter

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  transactions['Quarter'] = transactions['Transaction Date'].dt.quarter


In [23]:
transactions

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date,Quarter
2,DSB-807-592-406,5520,100005,Online,2023-07-14,3
4,DSB-474-374-857,5375,100000,In-Person,2023-08-26,3
5,DSB-448-546-348,4525,100009,Online,2023-05-27,2
11,DSB-422-218-322,118,100010,Online,2023-05-12,2
12,DSB-669-227-170,830,100001,Online,2023-04-15,2
...,...,...,...,...,...,...
350,DSB-618-298-395,9280,100008,Online,2023-03-10,1
351,DSB-637-369-281,6060,100002,In-Person,2023-08-09,3
353,DSB-322-596-206,900,100001,Online,2023-02-04,1
354,DSB-384-247-358,6446,100003,Online,2023-06-28,2


In [25]:
# 1D. Sum the transaction values for each quarter and for each Type of Transaction (Online or In-Person)

transactions = transactions.groupby(['Online or In-Person', 'Quarter'])['Value'].sum().reset_index()

In [27]:
transactions

Unnamed: 0,Online or In-Person,Quarter,Value
0,In-Person,1,77576
1,In-Person,2,70634
2,In-Person,3,74189
3,In-Person,4,43223
4,Online,1,74562
5,Online,2,69325
6,Online,3,59072
7,Online,4,61908


In [29]:
# 2A. Pivot the quarterly targets so we have a row for each Type of Transaction and each Quarter

targets = pd.melt(targets, id_vars='Online or In-Person', value_vars=['Q1','Q2','Q3','Q4'])

In [31]:
targets

Unnamed: 0,Online or In-Person,variable,value
0,Online,Q1,72500
1,In-Person,Q1,75000
2,Online,Q2,70000
3,In-Person,Q2,70000
4,Online,Q3,60000
5,In-Person,Q3,70000
6,Online,Q4,60000
7,In-Person,Q4,60000


In [33]:
# 2B. Rename the fields

targets.columns = ['Online or In-Person', 'Quarter', 'Targets']

In [35]:
targets

Unnamed: 0,Online or In-Person,Quarter,Targets
0,Online,Q1,72500
1,In-Person,Q1,75000
2,Online,Q2,70000
3,In-Person,Q2,70000
4,Online,Q3,60000
5,In-Person,Q3,70000
6,Online,Q4,60000
7,In-Person,Q4,60000


In [37]:
# 2C. Remove the 'Q' from the quarter field and make the data type numeric

targets['Quarter'] = targets['Quarter'].str.replace('Q','')

targets['Quarter'] =  targets['Quarter'].astype(int)

In [39]:
targets

Unnamed: 0,Online or In-Person,Quarter,Targets
0,Online,1,72500
1,In-Person,1,75000
2,Online,2,70000
3,In-Person,2,70000
4,Online,3,60000
5,In-Person,3,70000
6,Online,4,60000
7,In-Person,4,60000


In [41]:
# 3. Join the two tables together,

df = transactions.merge(targets, on=['Online or In-Person', 'Quarter'])

In [43]:
df

Unnamed: 0,Online or In-Person,Quarter,Value,Targets
0,In-Person,1,77576,75000
1,In-Person,2,70634,70000
2,In-Person,3,74189,70000
3,In-Person,4,43223,60000
4,Online,1,74562,72500
5,Online,2,69325,70000
6,Online,3,59072,60000
7,Online,4,61908,60000


In [45]:
# 5. Calculate the Variance to Target for each row

df['Variance to Targets'] = df['Value'] - df['Targets']

In [47]:
df

Unnamed: 0,Online or In-Person,Quarter,Value,Targets,Variance to Targets
0,In-Person,1,77576,75000,2576
1,In-Person,2,70634,70000,634
2,In-Person,3,74189,70000,4189
3,In-Person,4,43223,60000,-16777
4,Online,1,74562,72500,2062
5,Online,2,69325,70000,-675
6,Online,3,59072,60000,-928
7,Online,4,61908,60000,1908


In [49]:
# 6. Output the Data

df.to_csv('pd2023wk3_output', index=False)