# Marketing Campaign Results

## Background

### About the Business

Company A's customers have substantial debt and because of unexpected hardship are no longer able to make their minimum monthly on their debts. Customers enroll with Company A and rather than making payments to creditors, they make affordable deposits into a dedicated account with Company A, who uses these funds to negotiate with the creditors to settle the customer's outstanding debt. Company A then collects fees from the client for the that was settled. 

Company A earns fees for each account it successfully negotiates a settlement agreement; the number of settlement agreements Company A can negotiate is proportional to the monthly deposited amount. 

### Project Details

Company A ran a recent marketing campaign to promote the value proposition of how the debt relief program helps people achieve financial freedom; the cost of this campaign was $5 million. The goal of this analysis is to **show marketing, sales and operations the success of the campaign**. Specifically, the analysis includes: 

- A quantitative assessment of whether the marketing campaign was successful.
- Recommended adjustments to the campaign strategy to improve performance.

### Defining Success
What success looks like....

### Data Overview

There are three datasets provided for the analysis; each is already cleaned and prepared for analysis.  

**client_data.csv: Fictional clients**

| Name | Description |
|---|---|
client_id|Randomly generated unique surrogate identifier for a client|
client_geographical_region|Client geographical location in relation to U.S. Census definitions|
client_residence_status|Client residence status in relation to whether they rent or own|
client_age|Client age in relation to date of birth|


**deposit_data.csv: Client deposit behavior**

| Name | Description |
|---|---|
client_id|Randomly generated unique surrogate identifier for a client|
deposit_type|Delineates whether a client deposit is the scheduled record or actual record|
deposit_amount|Client deposit amount to the dedicated bank account with Freedom|
deposit_cadence|Timing and pattern of client deposit activity|
deposit_date|Deposit date for deposit type|


**calendar_data.csv: Calendar reference table**

| Name | Description |
|---|---|
|gregorian_date|This date aligns with the Gregorian calendar|
|month_name|These are the designated months in the case study|

Notes: 

- Month 1 and 2 are pre-campaign
- Month 3 is the campaign
- Month 4 and 5 are post-campaign

Assumptions: 

- There is no seasonality in the results
- The campaign spend was distributed evenly across Month 3 (i.e., spend on the first day is the same as spend on the last day)

## Analysis

In [181]:
# import packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

### Data Wrangling

For this first section, I'm taking a quick look at the provided data. A few things to note from my review:

- There are more 

In [182]:
# read in data
clients = pd.read_csv('data/client_data.csv')
deposits = pd.read_csv('data/deposit_data.csv')
calendar = pd.read_csv('data/calendar_data.csv')

# return basic details on dataframes
df_group = [clients, deposit,calendar]
for i in df_group:
    print(i.head())
    print('Total rows: ',len(clients))
    print('')

         client_id client_geographical_region client_residence_status  client_age
0  538839486596724                  Northeast                    Rent          91
1  321708286091707                       West                     Own          83
2  848531901757235                    Midwest                     Own          84
3  854405182328779                  Northeast                     Own          83
4  769102176031316                       West                     Own          85
Total rows:  46347

         client_id       deposit_type  deposit_amount deposit_cadence deposit_date
0  446495122764671     Actual Deposit           303.0         Monthly   2019-10-23
1  446495122764671     Actual Deposit           303.0         Monthly   2019-09-23
2  446495122764671  Scheduled Deposit           303.0         Monthly   2019-09-23
3  446495122764671  Scheduled Deposit           303.0         Monthly   2019-10-23
4  446495122764671  Scheduled Deposit           303.0         Monthly   2

In [183]:
# verify types of deposits
deposits.deposit_type.unique()

array(['Actual Deposit', 'Scheduled Deposit'], dtype=object)

In [185]:
# get dummies for deposit types
df = pd.get_dummies(data=deposits, prefix='', prefix_sep='', columns=['deposit_type'])
print(df.head())
print('Total rows: ',len(df))

         client_id  deposit_amount deposit_cadence deposit_date  Actual Deposit  Scheduled Deposit
0  446495122764671           303.0         Monthly   2019-10-23            True              False
1  446495122764671           303.0         Monthly   2019-09-23            True              False
2  446495122764671           303.0         Monthly   2019-09-23           False               True
3  446495122764671           303.0         Monthly   2019-10-23           False               True
4  446495122764671           303.0         Monthly   2019-06-23           False               True
Total rows:  480394
         client_id  deposit_amount deposit_cadence deposit_date  Actual Deposit  Scheduled Deposit
0  446495122764671           303.0         Monthly   2019-10-23            True              False
1  446495122764671           303.0         Monthly   2019-09-23            True              False
2  446495122764671           303.0         Monthly   2019-09-23           False          

In [168]:
df = df.drop_duplicates()
print(df.head())
print('Total rows: ',len(df))

         client_id  deposit_amount deposit_cadence deposit_date  Actual Deposit  Scheduled Deposit
0  446495122764671           303.0         Monthly   2019-10-23            True              False
1  446495122764671           303.0         Monthly   2019-09-23            True              False
2  446495122764671           303.0         Monthly   2019-09-23           False               True
3  446495122764671           303.0         Monthly   2019-10-23           False               True
4  446495122764671           303.0         Monthly   2019-06-23           False               True
Total rows:  472019
         client_id  deposit_amount deposit_cadence deposit_date  Actual Deposit  Scheduled Deposit
0  446495122764671           303.0         Monthly   2019-10-23            True              False
1  446495122764671           303.0         Monthly   2019-09-23            True              False
2  446495122764671           303.0         Monthly   2019-09-23           False          

In [163]:
# group scheduled/actual deposits for matching
df = df.groupby(['client_id', 'deposit_amount', 'deposit_cadence', 'deposit_date'], as_index=False).agg({'Actual Deposit': 'max', 'Scheduled Deposit': 'max'})
print(df.head())
print('Total Rows: ',len(df))

      client_id  deposit_amount deposit_cadence deposit_date  Actual Deposit  Scheduled Deposit
0  146046305811           247.0        Biweekly   2019-06-05           False               True
1  146046305811           247.0        Biweekly   2019-06-19            True               True
2  146046305811           247.0        Biweekly   2019-07-03            True               True
3  146046305811           247.0        Biweekly   2019-07-17            True               True
4  146046305811           247.0        Biweekly   2019-07-31            True               True
Total Rows:  256143
      client_id  deposit_amount deposit_cadence deposit_date  Actual Deposit  Scheduled Deposit
0  146046305811           247.0        Biweekly   2019-06-05           False               True
1  146046305811           247.0        Biweekly   2019-06-19            True               True
2  146046305811           247.0        Biweekly   2019-07-03            True               True
3  146046305811     

In [138]:
# merge dataframes
df = pd.merge(df, calendar, left_on='deposit_date', right_on='gregorian_date', how='inner')
df = df.drop('gregorian_date', axis=1)
df = pd.merge(df, clients, on='client_id', how='inner')
print(df.head())
print('Total rows: ',len(df))

      client_id  deposit_amount deposit_cadence deposit_date  Actual Deposit  Scheduled Deposit month_name client_geographical_region client_residence_status  client_age
0  146046305811           247.0        Biweekly   2019-06-05           False               True    Month 1                    Midwest                    Rent          42
1  146046305811           247.0        Biweekly   2019-06-19            True               True    Month 1                    Midwest                    Rent          42
2  146046305811           247.0        Biweekly   2019-07-03            True               True    Month 2                    Midwest                    Rent          42
3  146046305811           247.0        Biweekly   2019-07-17            True               True    Month 2                    Midwest                    Rent          42
4  146046305811           247.0        Biweekly   2019-07-31            True               True    Month 2                    Midwest                 