# Preppin Data 2023 W03

source: https://preppindata.blogspot.com/2023/01/2023-week-3-targets-for-dsb.html

### Load data

In [97]:
import pandas as pd
import datetime as dt

In [98]:
txn = pd.read_csv('PD 2023 Wk 1 Input.csv')

In [99]:
tgt = pd.read_csv('Targets.csv')

### Filter transactions to DSB

In [100]:
df = txn[txn['Transaction Code'].str.contains('DSB')]

In [101]:
df.head()

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
2,DSB-807-592-406,5520,100005,1,14/07/2023 00:00:00
4,DSB-474-374-857,5375,100000,2,26/08/2023 00:00:00
5,DSB-448-546-348,4525,100009,1,27/05/2023 00:00:00
11,DSB-422-218-322,118,100010,1,12/05/2023 00:00:00
12,DSB-669-227-170,830,100001,1,15/04/2023 00:00:00


### Add values for online or in-person

In [102]:
df['Online or In-Person'] = df['Online or In-Person'].replace({1:'Online',2:'In-Person'})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Online or In-Person'] = df['Online or In-Person'].replace({1:'Online',2:'In-Person'})


In [103]:
df.head()

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date
2,DSB-807-592-406,5520,100005,Online,14/07/2023 00:00:00
4,DSB-474-374-857,5375,100000,In-Person,26/08/2023 00:00:00
5,DSB-448-546-348,4525,100009,Online,27/05/2023 00:00:00
11,DSB-422-218-322,118,100010,Online,12/05/2023 00:00:00
12,DSB-669-227-170,830,100001,Online,15/04/2023 00:00:00


### Change date to be quarter

In [104]:
df['Transaction Date'] = pd.to_datetime(df['Transaction Date'], dayfirst=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Transaction Date'] = pd.to_datetime(df['Transaction Date'], dayfirst=True)


In [105]:
df['Transaction Date'] = df['Transaction Date'].dt.quarter

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Transaction Date'] = df['Transaction Date'].dt.quarter


In [106]:
df = df.rename(columns={'Transaction Date':'Quarter'})

In [107]:
df.head()

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Quarter
2,DSB-807-592-406,5520,100005,Online,3
4,DSB-474-374-857,5375,100000,In-Person,3
5,DSB-448-546-348,4525,100009,Online,2
11,DSB-422-218-322,118,100010,Online,2
12,DSB-669-227-170,830,100001,Online,2


### Sum values by quarter

In [108]:
df = df.drop(['Transaction Code','Customer Code'], axis=1)

In [109]:
df = df.groupby(['Online or In-Person','Quarter']).sum()

In [110]:
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Value
Online or In-Person,Quarter,Unnamed: 2_level_1
In-Person,1,77576
In-Person,2,70634
In-Person,3,74189
In-Person,4,43223
Online,1,74562


### Unpivot targets

In [111]:
tgt.head()

Unnamed: 0,Online or In-Person,Q1,Q2,Q3,Q4
0,Online,72500,70000,60000,60000
1,In-Person,75000,70000,70000,60000


In [112]:
df_tgt = pd.melt(tgt, id_vars=['Online or In-Person'], var_name='Quarter', value_name='Quarterly Targets')

In [113]:
df_tgt.head()

Unnamed: 0,Online or In-Person,Quarter,Quarterly Targets
0,Online,Q1,72500
1,In-Person,Q1,75000
2,Online,Q2,70000
3,In-Person,Q2,70000
4,Online,Q3,60000


### Remove letter Q from quarter

In [114]:
df_tgt['Quarter'] = df_tgt['Quarter'].str.replace('Q','')

In [115]:
df_tgt['Quarter'] = df_tgt['Quarter'].astype(int)

In [116]:
df_tgt.head()

Unnamed: 0,Online or In-Person,Quarter,Quarterly Targets
0,Online,1,72500
1,In-Person,1,75000
2,Online,2,70000
3,In-Person,2,70000
4,Online,3,60000


### Join datasets

In [117]:
df_dsb = pd.merge(df, df_tgt, on=['Online or In-Person','Quarter'], how='left')

In [118]:
df_dsb.head()

Unnamed: 0,Online or In-Person,Quarter,Value,Quarterly Targets
0,In-Person,1,77576,75000
1,In-Person,2,70634,70000
2,In-Person,3,74189,70000
3,In-Person,4,43223,60000
4,Online,1,74562,72500


### Calculate variance

In [119]:
df_dsb['Variance'] = df_dsb['Value'] - df_dsb['Quarterly Targets']

In [121]:
df_dsb

Unnamed: 0,Online or In-Person,Quarter,Value,Quarterly Targets,Variance
0,In-Person,1,77576,75000,2576
1,In-Person,2,70634,70000,634
2,In-Person,3,74189,70000,4189
3,In-Person,4,43223,60000,-16777
4,Online,1,74562,72500,2062
5,Online,2,69325,70000,-675
6,Online,3,59072,60000,-928
7,Online,4,61908,60000,1908


### Export

In [123]:
df_dsb.to_csv('2023W03_output.csv', index=False)