# Data Challenge
 The data for this exercise consists of about 120,000 data points split in a 2:1 ratio among training and test files. In the experiment simulated by the data, an advertising promotion was tested to see if it would bring more customers to purchase a specific product priced at $10. 
 
 Since it costs the company 0.15 to send out each promotion, it would be best to limit that promotion only to those that are most receptive to the promotion. Each data point includes one column indicating whether or not an individual was sent a promotion for the product, and one column indicating whether or not that individual eventually purchased that product. Each individual also has seven additional features associated with them, which are provided abstractly as V1-V7.
## Instructions
A randomized experiment was conducted and the results are in 'Training.csv'
<br>
Treatment: indicates if the customer was part of treatment or control
<br>
Purchase: indicates if the customer purchased the product
<br>
ID: customer ID
<br>
V1 to V7: features of the customer

## In addition:

Cost of sending a promotion: \$0.15
<br>
Revenue from purchase of product: \$10 (There is only one product)
<br>

## Questions
1. Analyze the results of the experiment and identify the effect of the Treatment on product purchase and Net Incremental Revenue
2. Build a model to select the best customers to target that maximizes the Incremental Response Rate (IRR) and Net Incremental Revenue (NIR)


## Metrics
IRR depicts how many more customers purchased the product with the promotion, as compared to if they didn't receive the promotion. Mathematically, it's the ratio of the number of purchasers in the promotion group to the total number of customers in the purchasers group (treatment) minus the ratio of the number of purchasers in the non-promotional group to the total number of customers in the non-promotional group (control).

$$IRR = \frac{purch_{treat}}{cust_{treat}}{purch_{ctrl}}{cust_{ctrl}}$$

NIR depicts how much is made (or lost) by sending out the promotion. Mathematically, this is 10 times the total number of purchasers that received the promotion minus 0.15 times the number of promotions sent out, minus 10 times the number of purchasers who were not given the promotion.

$$NIR = (10 \times purch_{treat} - 0.15\times cust_{treat})- 10 \times purch_{ctrl}$$

------

## How To Test Your Strategy?

When you feel like you have an optimization strategy, complete the `promotion_strategy` function to pass to the `test_results` function.

- A `test_results` function is provided to determine the performance of your model. The function requires that a promotion strategy is provided (see `promotion_strategy` function). 
- The `promotion_strategy` indicates whether or not an individual should receive a promotion following a trained classification model.

-----

## Good luck!

In [1]:
def test_results(promotion_strategy, df,features, model=False):
    """
    Test results of the promotion strategy on the test set
    
    Arguments:
        promotion_strategy: function that returns 'Yes'/'No' values
            for whether or not to send the promotion to the customer
        model: ml model used in the promotion strategy
    Returns:
        None    
    """
    
    
    promos = promotion_strategy(df[features], model)
    
    score_df = df.iloc[np.where(promos == 'Yes')]
    irr, nir = get_eval_metrics(score_df)


def get_eval_metrics(data, print_results=True):
    
    """
    Given a dataframe, it computes the IRR and the NIR
    
    Arguments:
        data: pandas dataframe
        print_results: Bool for whether to print the metrics
    
    Returns:
        irr, nir: float"""
    
    n_treat       = data.loc[data['Promotion'] == 'Yes',:].shape[0]
    n_control     = data.loc[data['Promotion'] == 'No',:].shape[0]
    n_treat_purch = data.loc[data['Promotion'] == 'Yes', 'purchase'].sum()
    n_ctrl_purch  = data.loc[data['Promotion'] == 'No', 'purchase'].sum()
    
    irr = n_treat_purch / n_treat - n_ctrl_purch / n_control
    nir = (10 * n_treat_purch - 0.15 * n_treat) - 10 * n_ctrl_purch
    
    if print_results==True:
        print('The score to beat is: $436 for the NIR and 0.0190 for the IRR.\nThis is your result:\n')
        print(f'IRR: {irr:.4f}')
        print(f'NIR: {nir:.2f}') 
        print(n_treat,n_control,n_treat_purch,n_ctrl_purch)
    return irr, nir

def promotion_strategy(df, model):
    """
    Indicates whether or not an individual should receive a promotion
    following a trained classification model
    
    Arguments:
        df: pandas dataframe (test data)
    Returns:
        promotion: array with 'Yes' and 'No' values for whether
            or not to send the promotion to the customer
    """
    
    # Use model to decide whether or not the promotion should be sent to the customer
    y_hat = model.predict(df)
    
    # Relabel predictions [0, 1] to [No, Yes]
    y_hat = np.where(y_hat == 1, 'Yes', 'No')
    
    # Print the name of the model used for predicting
    print(model.__class__.__name__)
    
    return y_hat