![Python_logo](https://www.python.org/static/community_logos/python-logo-master-v3-TM.png)


 # **Cortex Game: Round2--Conditional Amount**

Please note that you need to run this notebook 'Round2--Conditional Amount' first, before running the notebook 'Round2--Probability of Giving'.   

> Before playing the game, you need to connect to SASPy first.
>
>> If it is your first time, please follow the 4 steps mentioned below!

***
## **Connect to SASPy**

**0- Connect to your Google Drive folder**

In [None]:
my_folder = "/content/drive/MyDrive/Carrera IFI/7o Semestre/Reto3"

from google.colab import drive
drive.mount('/content/drive')

# Change the following code to set your Drive folder
import os
os.chdir(my_folder)
!pwd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Carrera IFI/7o Semestre/Reto3


**1- Make sure that your Python version is 3.3 or higher as well as your Java version is 1.8.0_162 or higher**

In [None]:
!echo "Python is at" $(which python)
!python --version

Python is at /usr/local/bin/python
Python 3.8.15


In [None]:
!echo "Java is at" $(which java)
!/usr/bin/java -version

Java is at /usr/bin/java
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu218.04, mixed mode, sharing)


**2- Install SASPy**

In [None]:
!pip install saspy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**3- Create the configuration file "sascfg_personal.py"**
Please, check that your Home Region is correct, you can check it at [ODA-SAS](https://welcome.oda.sas.com/home)

In [None]:
%%writefile sascfg_personal.py
SAS_config_names=['oda']
oda = {'java' : '/usr/bin/java',
#US Home Region 1
#'iomhost' : ['odaws01-usw2.oda.sas.com','odaws02-usw2.oda.sas.com','odaws03-usw2.oda.sas.com','odaws04-usw2.oda.sas.com'],
#US Home Region 2
'iomhost' : ['odaws01-usw2-2.oda.sas.com','odaws02-usw2-2.oda.sas.com'],
#European Home Region 1
#'iomhost' : ['odaws01-euw1.oda.sas.com','odaws02-euw1.oda.sas.com'],
#Asia Pacific Home Region 1
#'iomhost' : ['odaws01-apse1.oda.sas.com','odaws02-apse1.oda.sas.com'],
#Asia Pacific Home Region 2
#'iomhost' : ['odaws01-apse1-2.oda.sas.com','odaws02-apse1-2.oda.sas.com'],
'iomport' : 8591,
'authkey' : 'oda',
'encoding' : 'utf-8'
}

Overwriting sascfg_personal.py


**4- Create your .authinfo**

If there is no .authinfo file, you can create this

In [None]:
#%%writefile .authinfo
#oda user USR password PSW

Copy this file to home

In [None]:
!cp .authinfo ~/.authinfo

**5- Establish Connection (Need to do this step each time you use SASPy)**

In [None]:
import saspy
sas_session = saspy.SASsession(cfgfile=os.path.join(
    my_folder,"sascfg_personal.py"))
sas_session

Using SAS Config named: oda
SAS Connection established. Subprocess id is 1111



Access Method         = IOM
SAS Config name       = oda
SAS Config file       = /content/drive/MyDrive/Carrera IFI/7o Semestre/Reto3/sascfg_personal.py
WORK Path             = /saswork/SAS_work5DA900015B72_odaws02-usw2-2.oda.sas.com/SAS_work62E700015B72_odaws02-usw2-2.oda.sas.com/
SAS Version           = 9.04.01M6P11072018
SASPy Version         = 4.4.1
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = utf-8
Python Encoding value = utf-8
SAS process Pid value = 88946


***
## Connect to Cortex Data Sets

Load Cortex datasets from SAS Studio

In [None]:
ps = sas_session.submit("""
    libname cortex '~/my_shared_file_links/u39842936/Cortex Data Sets';
    """)
print(ps["LOG"])


5                                                          The SAS System                      Friday, December  2, 2022 09:31:00 AM

24         ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
24       ! ods graphics on / outputfmt=png;
25         
26         
27             libname cortex '~/my_shared_file_links/u39842936/Cortex Data Sets';
28         
29         
30         
31         ods html5 (id=saspy_internal) close;ods listing;
32         

6                                                          The SAS System                      Friday, December  2, 2022 09:31:00 AM

33         


For local Jupyter

In [None]:
#%%SAS sas_session
#libname cortex '~/my_shared_file_links/u39842936/Cortex Data Sets';

### Transform cloud SAS dataset to Python dataframe (pandas)

> For reference: 

> 1- [Pandas library](https://pandas.pydata.org/docs/user_guide/index.html)

> 2- [sklearn.model_selection for data partition](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

In [None]:
import pandas as pd

#comment: Transform cloud sas dataset to python dataframe(pandas) ==> might take some time.

data1 = sas_session.sasdata2dataframe(
table='hist',
libref='cortex'
)

data2 = sas_session.sasdata2dataframe(
table='target_rd2',
libref='cortex'
)

## Merge the Data

In [None]:
#Step1 Merge the Data
data_merge = pd.merge(data1, data2, on=["ID"],how="right")
data_merge = data_merge.loc[(data_merge['GaveThisYear'] ==1)]
data_merge.sample(2)

Unnamed: 0,ID,LastName,FirstName,Woman,Age,Salary,Education,City,SeniorList,NbActivities,...,Frequency,Seniority,TotalGift,MinGift,MaxGift,GaveLastYear,AmtLastYear,Contact,GaveThisYear,AmtThisYear
294350,2294351.0,DAVIE,RITA,1.0,55.0,44600.0,High School,Downtown,8.0,3.0,...,2.0,4.0,30.0,10.0,20.0,1.0,75.0,0.0,1.0,100.0
54602,2054603.0,GREINER,PAMELA,1.0,23.0,48700.0,High School,City,1.0,0.0,...,1.0,1.0,30.0,30.0,30.0,0.0,0.0,0.0,1.0,40.0


In [None]:
data_merge.rename(columns={"City": "Location"}, inplace = True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [None]:
def dummies(df, column):
  df = pd.concat([df, pd.get_dummies(df[column])], axis = 1)
  return df.drop(column, axis=1)

data_mergeX = dummies(data_merge, "Education")
data_mergeX = dummies(data_mergeX, "Location")

In [None]:
data_merge.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 149457 entries, 3 to 999998
Data columns (total 22 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   ID            149457 non-null  float64
 1   LastName      149454 non-null  object 
 2   FirstName     149457 non-null  object 
 3   Woman         149457 non-null  float64
 4   Age           149457 non-null  float64
 5   Salary        149457 non-null  float64
 6   Education     149457 non-null  object 
 7   Location      149457 non-null  object 
 8   SeniorList    149457 non-null  float64
 9   NbActivities  149457 non-null  float64
 10  Referrals     149457 non-null  float64
 11  Recency       68433 non-null   float64
 12  Frequency     68433 non-null   float64
 13  Seniority     68433 non-null   float64
 14  TotalGift     68433 non-null   float64
 15  MinGift       68433 non-null   float64
 16  MaxGift       68433 non-null   float64
 17  GaveLastYear  149457 non-null  float64
 18  AmtL

## Treating Missing Values

>Please be aware that deleting all missing values can induce a selection bias. 
Some missing values are very informative. For example, when MinGift is missing, it means that the donor never gave in the past 10 years (leading to but excluding last year). Instead of deleting this information, replacing it by 0 is more appropriate!

> A good understanding of the business case and the data can help you come up with more appropriate strategies to deal with missing values.

In [None]:
data_merge['Salary'].describe()

count    149457.000000
mean      63898.012137
std       60194.539355
min           0.000000
25%       20300.000000
50%       43000.000000
75%       87000.000000
max      250000.000000
Name: Salary, dtype: float64

In [None]:
import numpy as np
data_mergeX["SalaryLog10"] = np.log10(data_merge['Salary']+1)


In [None]:
data_mergeX['SalaryLog10'].describe()

count    149457.000000
mean          4.575241
std           0.529042
min           0.000000
25%           4.307517
50%           4.633479
75%           4.939524
max           5.397942
Name: SalaryLog10, dtype: float64

In [None]:
# In this case, we are replacing MinGift by 0.
# You can do the same for what you think is reasonable for dealing with the other variables.

data_mergeX[['MinGift']] = data_merge[['MinGift']].fillna(value=0)  

data_mergeX[['HistoricDonor']] = data_merge[['Frequency']].notna().astype(int)
data_mergeX[['Frequency']] = data_merge[['Frequency']].fillna(value=0)  
data_mergeX[['Recency']] = data_merge[['Frequency']].fillna(value=20)  

data_mergeX[['TotalGift']] = data_merge[['TotalGift']].fillna(value=0)  


data_mergeX.sample(3)

Unnamed: 0,ID,LastName,FirstName,Woman,Age,Salary,SeniorList,NbActivities,Referrals,Recency,...,AmtThisYear,Elementary,High School,University / College,City,Downtown,Rural,Suburban,SalaryLog10,HistoricDonor
601003,2601004.0,MILLER,RAYMOND,0.0,33.0,229300.0,5.0,2.0,0.0,1.0,...,20.0,0,0,1,0,1,0,0,5.360406,1
22573,2022574.0,MAGDALENO,JOHN,0.0,19.0,69100.0,4.0,0.0,0.0,20.0,...,80.0,0,0,1,0,1,0,0,4.839484,0
480304,2480305.0,HILDMAN,MYRA,1.0,25.0,193400.0,7.0,1.0,1.0,20.0,...,25.0,0,0,1,0,0,1,0,5.286459,0


## Data Partition

In [None]:
# The code below is an illustration on how to sample data on train and validation samples.
# You could use another library or a built-in function to perform sampling.

from sklearn.model_selection import train_test_split
train, validation = train_test_split(data_mergeX, test_size=0.5,random_state=5678) # you can change the percentage
train.sample(5)

Unnamed: 0,ID,LastName,FirstName,Woman,Age,Salary,SeniorList,NbActivities,Referrals,Recency,...,AmtThisYear,Elementary,High School,University / College,City,Downtown,Rural,Suburban,SalaryLog10,HistoricDonor
184243,2184244.0,MARTIN,KELLY,1.0,23.0,2700.0,8.0,3.0,0.0,20.0,...,15.0,0,1,0,0,0,0,1,3.431525,0
305259,2305260.0,PROPST,CATHERINE,1.0,58.0,37300.0,0.0,0.0,0.0,20.0,...,250.0,0,1,0,1,0,0,0,4.57172,0
646901,2646902.0,LLOYD,STEVE,0.0,72.0,217500.0,2.0,0.0,0.0,1.0,...,10.0,0,0,1,0,0,1,0,5.337461,1
36761,2036762.0,DEMPSEY,WILLIAM,0.0,56.0,3800.0,1.0,0.0,0.0,20.0,...,25.0,0,0,1,1,0,0,0,3.579898,0
807171,2807172.0,KING,KATHY,1.0,32.0,145600.0,6.0,5.0,1.0,3.0,...,150.0,0,0,1,1,0,0,0,5.163164,1


In [None]:
# v6

#X_train = train[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'Elementary', 'High School',	'University / College', 'HistoricDonor']] 
#Y_train = train['AmtThisYear']
#X_valid = validation[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'Elementary', 'High School',	'University / College', 'HistoricDonor']] 
#Y_valid = validation['AmtThisYear']

In [None]:
# v7

X_train = train[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban']] 
Y_train = train['AmtThisYear']
X_valid = validation[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban']] 
Y_valid = validation['AmtThisYear']

In [None]:
# v8

#X_train = train[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban', 'Elementary', 'High School',	'University / College']] 
#Y_train = train['AmtThisYear']
#X_valid = validation[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban', 'Elementary', 'High School',	'University / College']] 
#Y_valid = validation['AmtThisYear']

In [None]:
# v9

#X_train = train[['Age', 'Salary', 'Recency', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban']] 
#Y_train = train['AmtThisYear']
#X_valid = validation[['Age', 'Salary', 'Recency', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban']] 
#Y_valid = validation['AmtThisYear']


## Prebuilt Models

***
### Linear Regression Model


> The [sk-learn library]( https://scikit-learn.org/stable/index.html) offers more advanced models. 

In [None]:
from sklearn import linear_model

#comment: it's numpy array
#X_train = train[['Age', 'Salary','Contact','MinGift', 'GaveLastYear','AmtLastYear','Woman', 'NbActivities' ]] 
#Y_train = train['AmtThisYear']
#X_valid = validation[['Age', 'Salary','Contact','MinGift', 'GaveLastYear','AmtLastYear','Woman', 'NbActivities']] 
#Y_valid = validation['AmtThisYear']

regr = linear_model.LinearRegression()

regr.fit(X_train,Y_train)

regr_predict=regr.predict(X_valid)

print(regr_predict)

[ 51.58735105  16.44312608 122.3109886  ...  28.89700093  70.48925099
  60.48070206]


In [None]:
#you can change the criteria

import numpy as np
from sklearn import metrics
#MAE
print(metrics.mean_absolute_error(Y_valid,regr_predict))
#MSE
print(metrics.mean_squared_error(Y_valid,regr_predict))
#RMSE
print(np.sqrt(metrics.mean_squared_error(Y_valid,regr_predict)))

63.963528511663135
54354.21647448695
233.1399075115347


## Regression Tree Model（Py）

In [None]:
from sklearn.tree import DecisionTreeRegressor

DT_model = DecisionTreeRegressor(max_depth=5, random_state=0).fit(X_train,Y_train)
DT_predict = DT_model.predict(X_valid) #Predictions on Testing data
print(DT_predict)

[ 51.71708524  27.27249619 139.59760274 ...  51.66566604  46.50993087
  67.92655699]


In [None]:
#you can change the criteria
#MAE
print(metrics.mean_absolute_error(Y_valid,DT_predict))
#MSE
print(metrics.mean_squared_error(Y_valid,DT_predict))
#RMSE
print(np.sqrt(metrics.mean_squared_error(Y_valid,DT_predict)))

63.280853108485026
54160.71724638968
232.7245523067768


## XGB Boost Model

In [None]:
from xgboost import XGBRegressor

##xgb_model = XGBRegressor(objective='reg:squarederror', min_child_weight= 105, n_estimators=125, gamma = '2.155050244116633', learning_rate = '0.04', max_delta_step = '150', max_depth = '3').fit(X_train, Y_train)

xgb_model = XGBRegressor(objective='reg:squarederror', min_child_weight= 105, n_estimators=125, gamma = '2.155050244116633', learning_rate = '0.04', max_delta_step = '150', max_depth = '3').fit(X_train, Y_train)

XGB_predict = xgb_model.predict(X_valid)
# 53820


In [None]:
#you can change the criteria
#MAE
print(metrics.mean_absolute_error(Y_valid,XGB_predict))
#MSE
print(metrics.mean_squared_error(Y_valid,XGB_predict))
#RMSE
print(np.sqrt(metrics.mean_squared_error(Y_valid,XGB_predict)))

62.86649777038073
53820.56243096468
231.99259132775055


### **Other models may also be helpful for this game**

Reference: https://scikit-learn.org/stable/supervised_learning.html


### Tuning

In [None]:
!pip install bayesian-optimization

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bayesian-optimization
  Downloading bayesian_optimization-1.4.1-py3-none-any.whl (18 kB)
Collecting colorama
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: colorama, bayesian-optimization
Successfully installed bayesian-optimization-1.4.1 colorama-0.4.6


In [None]:
def hyperparam_loss(param_x, param_y):
    
    # 1. Define machine learning model using param_x, param_y as hyper parameters
    # 2. Train the model
    # 3. Calculate loss on cross-validation set
    
    return loss

In [None]:
import numpy as np
from bayes_opt import BayesianOptimization
from sklearn.model_selection import cross_val_score

In [None]:
pbounds = {
    'learning_rate': (0.01, 0.2),
    'max_depth': (3,5),
    'max_delta_step': (0, 500),  
    'gamma': (0, 5)}

def xgboost_hyper_param(learning_rate,
                        max_depth,
                        max_delta_step,
                        gamma):

    max_depth = int(max_depth)

    clf = XGBRegressor(
        objective='reg:squarederror',
        max_depth=max_depth,
        learning_rate=learning_rate,
        gamma=gamma)
    return np.mean(cross_val_score(clf, X_train, Y_train, cv=3, scoring='neg_mean_squared_error'))

optimizer = BayesianOptimization(
    f=xgboost_hyper_param,
    pbounds=pbounds,
    random_state=1,
)

In [None]:
optimizer.maximize(
    init_points=10,
    n_iter=10,
)

|   iter    |  target   |   gamma   | learni... | max_de... | max_depth |
-------------------------------------------------------------------------
| [0m21       [0m | [0m-4.956e+0[0m | [0m4.944    [0m | [0m0.1522   [0m | [0m140.2    [0m | [0m4.579    [0m |
| [0m22       [0m | [0m-4.837e+0[0m | [0m0.5161   [0m | [0m0.0951   [0m | [0m454.3    [0m | [0m3.587    [0m |
| [0m23       [0m | [0m-4.827e+0[0m | [0m1.439    [0m | [0m0.03471  [0m | [0m9.683    [0m | [0m4.358    [0m |
| [0m24       [0m | [0m-4.824e+0[0m | [0m1.058    [0m | [0m0.06045  [0m | [0m245.8    [0m | [0m3.107    [0m |
| [0m25       [0m | [0m-4.829e+0[0m | [0m2.871    [0m | [0m0.03788  [0m | [0m294.7    [0m | [0m4.4      [0m |
| [0m26       [0m | [0m-4.833e+0[0m | [0m0.5117   [0m | [0m0.08867  [0m | [0m347.2    [0m | [0m3.828    [0m |
| [0m27       [0m | [0m-4.92e+04[0m | [0m0.2498   [0m | [0m0.1118   [0m | [0m331.9    [0m | [0m4.03     [0m 

In [None]:
print(optimizer.max["params"])



{'gamma': 2.155050244116633, 'learning_rate': 0.03566342641904599, 'max_delta_step': 90.92416094047529, 'max_depth': 3.062140798125786}


## Scoring New Data

### Prepare data for scoring

In [None]:
data3 = sas_session.sasdata2dataframe(
table='score',
libref='cortex'
)
data4 = sas_session.sasdata2dataframe(
table='score_rd2_contact',
libref='cortex'
)
data5 = sas_session.sasdata2dataframe(
table='score_rd2_nocontact',
libref='cortex'
)

 ### Score new data based on your champion model
 
 Pick your champion model from previous steps and use it to predict next year donations. 
 
 In this case, the linear regression model performed better than the regression tree based on the MSE criteria.

### Predict 'amount given' for members who were contacted

In [None]:
scoring_data_contact = pd.merge(data3, data4, on=["ID"],how="right")

# Perform the same strategy for handling missing values for the score dataset.
# In this case, we will only replace missing values of the MinGift variable.
scoring_data_contact.rename(columns={"City": "Location"}, inplace = True)

scoring_data_contact[['MinGift']] = scoring_data_contact[['MinGift']].fillna(value=0) 
scoring_data_contact[['HistoricDonor']] = scoring_data_contact[['Frequency']].notna().astype(int)
scoring_data_contact[['Frequency']] = scoring_data_contact[['Frequency']].fillna(value=0)  
scoring_data_contact[['TotalGift']] = scoring_data_contact[['TotalGift']].fillna(value=0) 

scoring_data_contact = dummies(scoring_data_contact, "Education")
scoring_data_contact = dummies(scoring_data_contact, "Location")

X = scoring_data_contact[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban']] 

regr_predict_contact=xgb_model.predict(X)

scoring_data_contact['Prediction'] = regr_predict_contact

scoring_data_contact= scoring_data_contact[['ID','Prediction']]
scoring_data_contact = scoring_data_contact.rename({'Prediction': 'AmtContact'}, axis=1) 
scoring_data_contact.head()

Unnamed: 0,ID,AmtContact
0,2000001.0,56.780594
1,2000002.0,50.64743
2,2000003.0,82.759285
3,2000004.0,33.338512
4,2000005.0,102.746193


### Predict 'amount given' for members who were not contacted

In [None]:
scoring_data_nocontact = pd.merge(data3, data5, on=["ID"],how="right")

# Perform the same strategy for handling missing values for the score dataset.
# In this case, we will only replace missing values of the MinGift variable.
scoring_data_nocontact.rename(columns={"City": "Location"}, inplace = True)

scoring_data_nocontact[['MinGift']] = scoring_data_nocontact[['MinGift']].fillna(value=0) 
scoring_data_nocontact[['HistoricDonor']] = scoring_data_nocontact[['Frequency']].notna().astype(int)
scoring_data_nocontact[['Frequency']] = scoring_data_nocontact[['Frequency']].fillna(value=0)  
scoring_data_nocontact[['TotalGift']] = scoring_data_nocontact[['TotalGift']].fillna(value=0) 

scoring_data_nocontact = dummies(scoring_data_nocontact, "Education")
scoring_data_nocontact = dummies(scoring_data_nocontact, "Location")

X = scoring_data_nocontact[['Age', 'Salary', 'Contact', 'Referrals', 'TotalGift', 'GaveLastYear', 'AmtLastYear', 'Woman', 'NbActivities', 'City', 'Downtown', 'Rural', 'Suburban']] 

regr_predict_nocontact=xgb_model.predict(X)

scoring_data_nocontact['Prediction'] = regr_predict_nocontact

scoring_data_nocontact= scoring_data_nocontact[['ID','Prediction']]
scoring_data_nocontact = scoring_data_nocontact.rename({'Prediction': 'AmtNoContact'}, axis=1) 
scoring_data_nocontact.head()

Unnamed: 0,ID,AmtNoContact
0,2000001.0,57.71006
1,2000002.0,52.63324
2,2000003.0,85.504852
3,2000004.0,36.08411
4,2000005.0,108.254433


In [None]:
result_Amt = pd.merge(scoring_data_contact, scoring_data_nocontact, on=["ID"],how="right")
result_Amt.sort_values(by=['ID'], inplace=True)
result_Amt.head(3)

Unnamed: 0,ID,AmtContact,AmtNoContact
0,2000001.0,56.780594,57.71006
1,2000002.0,50.64743,52.63324
2,2000003.0,82.759285,85.504852


## Exporting Results to a CSV File

In [None]:
result_Amt.to_csv('Round2_Output_amt.csv', index=False)

In [None]:
# Reminder: You are now done with step 1 of Round 2 on predicting the conditional amount.
# Next, to complete Round2, you need to perform step 2 to predict the probability of giving, calculate the uplift and prepare your decision.

In [None]:
!head Round2_Output_amt.csv

ID,AmtContact,AmtNoContact
2000001.0,66.63118007939372,66.63118007939372
2000002.0,72.3890134529148,72.3890134529148
2000003.0,97.00667302192565,97.00667302192565
2000004.0,40.5796395518753,40.5796395518753
2000005.0,97.00667302192565,97.00667302192565
2000006.0,32.56816359262229,32.56816359262229
2000007.0,27.272496187086933,27.272496187086933
2000008.0,51.717085235920855,51.717085235920855
2000009.0,43.66751990898749,43.66751990898749
