### Purpose of this Project

+ Freedom has run a recent Marketing campaign promoting their value propositon
+ The total cost of this campagin was $5 million
+ There are five months of data provided, with the campaign occurring on the third month
+ It is now our turn to present to Marketing, Sales & Operations whether this campaign was successful or not

### This notebook will contain..

+ Data exploration and a quantitative assessment of the campaign's performance
  - Outlining which metrics were chosen and why
  - Recommendations for company strategy to improve future campaign performance

In [1]:
# importing libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


# loading the first dataset - client data
client_data = pd.read_csv('data/client_data.csv')

# displaying ten random samples from the dataset
client_data.sample(n = 10, random_state = 42)

Unnamed: 0,client_id,client_geographical_region,client_residence_status,client_age
6065,686872246977897,West,Own,35
28070,470518677565299,South,Rent,56
6936,883739117234218,South,Own,36
23158,822713875250308,South,Own,52
25506,442404955135328,South,Rent,53
28636,631431542327134,West,Own,56
12027,150678144870429,South,Rent,41
10827,521971892376941,South,Own,40
12010,769453548675001,West,Own,41
10936,627255228194699,West,Own,40


In [2]:
# looking at and understanding the data types and columns in the client dataset
client_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46347 entries, 0 to 46346
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   client_id                   46347 non-null  int64 
 1   client_geographical_region  46347 non-null  object
 2   client_residence_status     46347 non-null  object
 3   client_age                  46347 non-null  int64 
dtypes: int64(2), object(2)
memory usage: 1.4+ MB


In [3]:
# loading in the second dataset - deposit data
deposit_data = pd.read_csv('data/deposit_data.csv')

# displaying ten random samples from the dataset
deposit_data.sample(n = 10, random_state = 42)

Unnamed: 0,client_id,deposit_type,deposit_amount,deposit_cadence,deposit_date
229803,903285839400111,Scheduled Deposit,458.0,Monthly,2019-08-16
92181,19473282206850,Scheduled Deposit,334.0,Monthly,2019-10-08
284306,240342552445691,Actual Deposit,500.0,Biweekly,2019-10-14
117592,503016878166122,Scheduled Deposit,30.0,Biweekly,2019-07-03
453913,774929593989542,Scheduled Deposit,476.0,Monthly,2019-09-30
427363,745708866884363,Scheduled Deposit,121.0,Biweekly,2019-09-18
90436,377197113377463,Scheduled Deposit,484.0,Monthly,2019-10-18
391277,87877453117377,Actual Deposit,271.0,Biweekly,2019-09-06
417793,405758738732483,Scheduled Deposit,170.0,Biweekly,2019-09-16
454206,584109001324997,Scheduled Deposit,344.0,Monthly,2019-09-30


In [4]:
# looking at and understanding the data types and columns in the deposit dataset
deposit_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 480394 entries, 0 to 480393
Data columns (total 5 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   client_id        480394 non-null  int64  
 1   deposit_type     480394 non-null  object 
 2   deposit_amount   480394 non-null  float64
 3   deposit_cadence  480394 non-null  object 
 4   deposit_date     480394 non-null  object 
dtypes: float64(1), int64(1), object(3)
memory usage: 18.3+ MB


In [5]:
# loading the third dataset - calendar data
calendar_data = pd.read_csv('data/calendar_data.csv')

# displaying ten random samples from the dataset
calendar_data.sample(n = 10, random_state = 42)

Unnamed: 0,gregorian_date,month_name
84,2019-08-24,Month 3
86,2019-08-26,Month 3
97,2019-09-06,Month 4
115,2019-09-24,Month 4
29,2019-06-30,Month 1
114,2019-09-23,Month 4
78,2019-08-18,Month 3
81,2019-08-21,Month 3
18,2019-06-19,Month 1
15,2019-06-16,Month 1


In [6]:
# looking at and understanding the data types and columns in the calendar dataset
calendar_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 153 entries, 0 to 152
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   gregorian_date  153 non-null    object
 1   month_name      153 non-null    object
dtypes: object(2)
memory usage: 2.5+ KB


In [7]:
# merging the client and deposit data  on the client_id
df = client_data.merge(deposit_data, on = 'client_id')

In [10]:
# merging the calendar data
df = df.merge(calendar_data, left_on = 'deposit_date', right_on = 'gregorian_date', copy = False)

df.sample(n = 10, random_state = 42)

Unnamed: 0,client_id,client_geographical_region,client_residence_status,client_age,deposit_type,deposit_amount,deposit_cadence,deposit_date,gregorian_date_x,month_name_x,gregorian_date_y,month_name_y
229803,118418867104868,West,Own,56,Scheduled Deposit,240.0,Monthly,2019-09-15,2019-09-15,Month 4,2019-09-15,Month 4
92181,191569209118275,West,Own,40,Actual Deposit,2046.0,Monthly,2019-08-22,2019-08-22,Month 3,2019-08-22,Month 3
284306,45082134288198,West,Rent,46,Actual Deposit,150.0,Biweekly,2019-07-26,2019-07-26,Month 2,2019-07-26,Month 2
117592,180858736543465,West,Own,74,Actual Deposit,598.0,Monthly,2019-07-07,2019-07-07,Month 2,2019-07-07,Month 2
453913,411360469164409,West,Rent,40,Actual Deposit,298.0,Biweekly,2019-10-13,2019-10-13,Month 5,2019-10-13,Month 5
427363,521434988306466,Northeast,Own,78,Scheduled Deposit,304.0,Monthly,2019-10-30,2019-10-30,Month 5,2019-10-30,Month 5
90436,377733278118703,West,Own,43,Actual Deposit,166.0,Biweekly,2019-08-07,2019-08-07,Month 3,2019-08-07,Month 3
391277,330910093231962,Midwest,Rent,60,Scheduled Deposit,274.0,Monthly,2019-06-19,2019-06-19,Month 1,2019-06-19,Month 1
417793,55253177121961,Northeast,Own,63,Actual Deposit,258.0,Monthly,2019-06-20,2019-06-20,Month 1,2019-06-20,Month 1
454206,440652444240597,Northeast,Own,42,Scheduled Deposit,432.0,Monthly,2019-10-13,2019-10-13,Month 5,2019-10-13,Month 5
