![Python_logo](https://www.python.org/static/community_logos/python-logo-master-v3-TM.png)


   # **Cortex Game: Round1--Amount**

> Before playing the game, you need to connect to SASPy first.
>
>> If it is your first time, please follow the 4 steps mentioned below!

***
## **Connect to SASPy**

**0- Connect to your Google Drive folder**

In [None]:
my_folder = "/content/drive/MyDrive/COLAB-SAS"

from google.colab import drive
drive.mount('/content/drive')

# Change the following code to set your Drive folder
import os
os.chdir(my_folder)
!pwd

Mounted at /content/drive
/content/drive/MyDrive/COLAB-SAS


**1- Make sure that your Python version is 3.3 or higher as well as your Java version is 1.8.0_162 or higher**

In [None]:
!echo "Python is at" $(which python)
!python --version

Python is at /usr/local/bin/python
Python 3.8.15


In [None]:
!echo "Java is at" $(which java)
!/usr/bin/java -version

Java is at /usr/bin/java
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu218.04, mixed mode, sharing)


**2- Install SASPy**

In [None]:
!pip install saspy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting saspy
  Downloading saspy-4.4.0.tar.gz (9.9 MB)
[K     |████████████████████████████████| 9.9 MB 10.9 MB/s 
[?25hBuilding wheels for collected packages: saspy
  Building wheel for saspy (setup.py) ... [?25l[?25hdone
  Created wheel for saspy: filename=saspy-4.4.0-py3-none-any.whl size=9937189 sha256=d84572f839c8c0b3cf523192754f6da48ea6f5b4e8d17aa42dcb88771a427b74
  Stored in directory: /root/.cache/pip/wheels/78/ce/27/57cfb223c6e6232856fe149e532b99faeaf94b8d47bc273ccb
Successfully built saspy
Installing collected packages: saspy
Successfully installed saspy-4.4.0


**3- Create the configuration file "sascfg_personal.py"**
Please, check that your Home Region is correct, you can check it at [ODA-SAS](https://welcome.oda.sas.com/home)

In [None]:
%%writefile sascfg_personal.py
SAS_config_names=['oda']
oda = {'java' : '/usr/bin/java',
#US Home Region 1
'iomhost' : ['odaws01-usw2.oda.sas.com','odaws02-usw2.oda.sas.com','odaws03-usw2.oda.sas.com','odaws04-usw2.oda.sas.com'],
#US Home Region 2
#'iomhost' : ['odaws01-usw2-2.oda.sas.com','odaws02-usw2-2.oda.sas.com'],
#European Home Region 1
#'iomhost' : ['odaws01-euw1.oda.sas.com','odaws02-euw1.oda.sas.com'],
#Asia Pacific Home Region 1
#'iomhost' : ['odaws01-apse1.oda.sas.com','odaws02-apse1.oda.sas.com'],
#Asia Pacific Home Region 2
#'iomhost' : ['odaws01-apse1-2.oda.sas.com','odaws02-apse1-2.oda.sas.com'],
'iomport' : 8591,
'authkey' : 'oda',
'encoding' : 'utf-8'
}

Overwriting sascfg_personal.py


**4- Create your .authinfo**

If there is no .authinfo file, you can create this

In [None]:
#%%writefile .authinfo
#oda user a01731813@tec.mx password AAAA

Writing .authinfo


Copy this file to home

In [None]:
!cp .authinfo ~/.authinfo

**5- Establish Connection (Need to do this step each time you use SASPy)**

In [None]:
import saspy
sas_session = saspy.SASsession(cfgfile="/content/drive/MyDrive/COLAB-SAS/sascfg_personal.py")
sas_session

Using SAS Config named: oda
SAS Connection established. Subprocess id is 717



Access Method         = IOM
SAS Config name       = oda
SAS Config file       = /content/drive/MyDrive/COLAB-SAS/sascfg_personal.py
WORK Path             = /saswork/SAS_work89BC0001F827_odaws01-usw2.oda.sas.com/SAS_work00240001F827_odaws01-usw2.oda.sas.com/
SAS Version           = 9.04.01M6P11072018
SASPy Version         = 4.4.0
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = utf-8
Python Encoding value = utf-8
SAS process Pid value = 129063


***
## Connect to Cortex Data Sets

Load Cortex datasets from SAS Studio

In [None]:
ps = sas_session.submit("""
    libname cortex '~/my_shared_file_links/u39842936/Cortex Data Sets';
    """)
print(ps["LOG"])


5                                                          The SAS System                    Thursday, December  1, 2022 09:50:00 PM

24         ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
24       ! ods graphics on / outputfmt=png;
25         
26         
27             libname cortex '~/my_shared_file_links/u39842936/Cortex Data Sets';
28         
29         
30         
31         ods html5 (id=saspy_internal) close;ods listing;
32         

6                                                          The SAS System                    Thursday, December  1, 2022 09:50:00 PM

33         


For local Jupyter

In [None]:
#%%SAS sas_session
#libname cortex '~/my_shared_file_links/u39842936/Cortex Data Sets';

## Transform cloud SAS dataset to Python dataframe (pandas)


> **For reference**:

> 1. [Pandas library](https://pandas.pydata.org/docs/user_guide/index.html)

> 2. [sklearn.model_selection for data partition](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)


In [None]:
import pandas as pd

data1 = sas_session.sasdata2dataframe(
table='hist',
libref='cortex'
)

data2 = sas_session.sasdata2dataframe(
table='target_rd1',
libref='cortex'
)

## Merge the Data

In [None]:
data_merge = pd.merge(data1, data2, on=["ID"],how="right")
data_merge.sample(5)
data_merge

Unnamed: 0,ID,LastName,FirstName,Woman,Age,Salary,Education,City,SeniorList,NbActivities,...,Recency,Frequency,Seniority,TotalGift,MinGift,MaxGift,GaveLastYear,AmtLastYear,GaveThisYear,AmtThisYear
0,2000001.0,ROMMES,RODNEY,0.0,25.0,107200.0,University / College,City,2.0,0.0,...,1.0,2.0,2.0,1010.0,10.0,1000.0,0.0,0.0,0.0,0.0
1,2000002.0,RAMIREZ,SHARON,1.0,38.0,15800.0,High School,Rural,4.0,1.0,...,,,,,,,0.0,0.0,0.0,0.0
2,2000003.0,TSOSIE,KAREN,1.0,37.0,57400.0,University / College,Rural,5.0,0.0,...,,,,,,,0.0,0.0,0.0,0.0
3,2000004.0,LEE,MARY,1.0,78.0,23700.0,High School,Rural,3.0,0.0,...,,,,,,,0.0,0.0,0.0,0.0
4,2000005.0,HUMPHRES,ANGIE,1.0,34.0,71900.0,University / College,Rural,8.0,0.0,...,,,,,,,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999995,2999996.0,SCHUBERT,FRANCES,1.0,29.0,15100.0,High School,Suburban,8.0,3.0,...,5.0,1.0,5.0,20.0,20.0,20.0,0.0,0.0,0.0,0.0
999996,2999997.0,LUGGE,MARY,1.0,22.0,7000.0,High School,Suburban,10.0,0.0,...,,,,,,,0.0,0.0,0.0,0.0
999997,2999998.0,ROY,REGINALD,0.0,17.0,1000.0,High School,City,10.0,1.0,...,1.0,1.0,1.0,20.0,20.0,20.0,0.0,0.0,0.0,0.0
999998,2999999.0,LIBERTI,PAMELA,1.0,32.0,43900.0,University / College,Rural,0.0,0.0,...,,,,,,,0.0,0.0,0.0,0.0


In [None]:
data_merge.describe()

Unnamed: 0,ID,Woman,Age,Salary,SeniorList,NbActivities,Referrals,Recency,Frequency,Seniority,TotalGift,MinGift,MaxGift,GaveLastYear,AmtLastYear,GaveThisYear,AmtThisYear
count,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,336334.0,336334.0,336334.0,336334.0,336334.0,336334.0,1000000.0,1000000.0,1000000.0,1000000.0
mean,2500000.0,0.516936,46.36617,65531.654,4.582967,0.561413,0.560556,3.039636,1.666882,4.474148,104.454055,43.130251,85.897932,0.122101,7.673015,0.122232,7.74108
std,288675.3,0.499713,18.945324,61051.122343,3.325844,0.996803,1.054027,2.120722,1.05209,2.588119,303.760477,150.249476,283.581066,0.327403,82.224854,0.327554,83.110552
min,2000001.0,0.0,16.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,10.0,10.0,10.0,0.0,0.0,0.0,0.0
25%,2250001.0,0.0,30.0,20700.0,2.0,0.0,0.0,1.0,1.0,2.0,20.0,15.0,20.0,0.0,0.0,0.0,0.0
50%,2500000.0,1.0,46.0,44000.0,4.0,0.0,0.0,2.0,1.0,4.0,40.0,20.0,30.0,0.0,0.0,0.0,0.0
75%,2750000.0,1.0,61.0,91200.0,7.0,1.0,1.0,4.0,2.0,6.0,90.0,30.0,75.0,0.0,0.0,0.0,0.0
max,3000000.0,1.0,90.0,250000.0,10.0,10.0,16.0,10.0,10.0,10.0,15150.0,10000.0,10000.0,1.0,10000.0,1.0,10000.0


In [None]:
data_mergefloats = data_merge.loc[:, ~data_merge.columns.isin(['LastName', 'FirstName', 'Education', 'City'])]
#data_mergefloats = data_mergefloats.head(100000)

In [None]:
data_mergefloats.isna().sum()

ID                   0
Woman                0
Age                  0
Salary               0
SeniorList           0
NbActivities         0
Referrals            0
Recency         663666
Frequency       663666
Seniority       663666
TotalGift       663666
MinGift         663666
MaxGift         663666
GaveLastYear         0
AmtLastYear          0
GaveThisYear         0
AmtThisYear          0
dtype: int64

In [None]:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imputer = IterativeImputer(imputation_order='ascending',max_iter=15,random_state=13,n_nearest_features=10)

In [None]:
for i in data_mergefloats:
  data_mergefloats.loc[:,i]=imputer.fit_transform(data_mergefloats[[i]])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[selected_item_labels] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[selected_item_labels] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[selected_item_labels] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_in

In [None]:
data_mergefloats = data_mergefloats.sort_index()

In [None]:
from sklearn.preprocessing import LabelEncoder
encoder_label = LabelEncoder()
data_mergefloats['City'] = encoder_label.fit_transform(data_merge['City'])
data_mergefloats['Education'] = encoder_label.fit_transform(data_merge['Education'])

In [None]:
#data_mergefloats.to_csv('drecency.csv', index=False)

In [None]:
abs(data_mergefloats.corr()['AmtThisYear'])

ID              0.002595
Woman           0.014477
Age             0.013522
Salary          0.022055
SeniorList      0.001249
NbActivities    0.048435
Referrals       0.045224
Recency         0.018032
Frequency       0.031570
Seniority       0.005093
TotalGift       0.027554
MinGift         0.016050
MaxGift         0.024772
GaveLastYear    0.032035
AmtLastYear     0.017809
GaveThisYear    0.249599
AmtThisYear     1.000000
City            0.008012
Education       0.006558
Name: AmtThisYear, dtype: float64

## Treat Missing Values

> Please be aware that deleting all missing values can induce a selection bias. 
Some missing values are very informative. For example, when MinGift is missing, it means that the donor never gave in the past 10 years (leading to but excluding last year). Instead of deleting this information, replacing it by 0 is more appropriate!

> A good understanding of the business case and the data can help you come up with more appropriate strategies to deal with missing values.


In [None]:
# In this case, we are replacing MinGift by 0.
# You can do the same for what you think is reasonable for dealing with the other variables.

data_merge[['MinGift']] = data_merge[['MinGift']].fillna(value=0)  

data_merge.sample(3)

Unnamed: 0,ID,LastName,FirstName,Woman,Age,Salary,Education,City,SeniorList,NbActivities,...,Recency,Frequency,Seniority,TotalGift,MinGift,MaxGift,GaveLastYear,AmtLastYear,GaveThisYear,AmtThisYear
568517,2568518.0,PICANCO,SUZANNE,1.0,19.0,1000.0,University / College,Rural,7.0,0.0,...,1.0,1.0,1.0,20.0,20.0,20.0,0.0,0.0,0.0,0.0
824732,2824733.0,BORK,RAY,0.0,65.0,184900.0,University / College,Rural,5.0,0.0,...,,,,,0.0,,0.0,0.0,0.0,0.0
125930,2125931.0,KASPER,BRENT,0.0,69.0,71400.0,High School,Suburban,10.0,0.0,...,,,,,0.0,,0.0,0.0,0.0,0.0


## Data Partition

In [None]:
# The code below is an illustration on how to sample data on train and validation samples.
# You could use another library or a built-in function to perform sampling.

from sklearn.model_selection import train_test_split
train, validation = train_test_split(data_mergefloats, test_size=0.2, random_state=12) 

#train.head()
train.sample(2)

Unnamed: 0,ID,Woman,Age,Salary,SeniorList,NbActivities,Referrals,Recency,Frequency,Seniority,TotalGift,MinGift,MaxGift,GaveLastYear,AmtLastYear,GaveThisYear,AmtThisYear,City,Education
53504,2053505.0,0.0,44.0,47700.0,5.0,1.0,1.0,3.039636,1.666882,4.474148,104.454055,43.130251,85.897932,0.0,0.0,0.0,0.0,0,2
165286,2165287.0,1.0,54.0,33300.0,8.0,0.0,0.0,3.039636,1.666882,4.474148,104.454055,43.130251,85.897932,0.0,0.0,0.0,0.0,2,2


## Prebuilt Models
***

### **Linear Regression Model**


> The [sk-learn library](https://scikit-learn.org/stable/index.html ) offers more advanced models.


In [None]:
#data_merge.loc[:, ~data_merge.columns.isin(['LastName', 'FirstName', 'Education', 'City'])]

In [None]:
from sklearn import linear_model

#comment: it's numpy array
#X_train = train[['Age', 'Salary','MinGift', 'AmtLastYear','Woman', 'NbActivities' ]] 
#X_train = train[['Referrals','Frequency', 'AmtLastYear','GaveLastYear', 'NbActivities']] 
X_train = train.loc[:, ~train.columns.isin(['AmtThisYear','ID','GaveThisYear'])]
Y_train = train['AmtThisYear']
#X_valid = validation[['Age', 'Salary','MinGift', 'AmtLastYear','Woman', 'NbActivities']] 
#X_valid = validation[['Referrals','Frequency', 'AmtLastYear','GaveLastYear', 'NbActivities']] 
X_valid = validation.loc[:, ~validation.columns.isin(['AmtThisYear','ID','GaveThisYear'])] 
Y_valid = validation['AmtThisYear']

regr = linear_model.LinearRegression()
regr.fit(X_train,Y_train)
regr_predict=regr.predict(X_valid)

In [None]:
#you can change the criteria

import numpy as np
from sklearn import metrics
#MAE
print(metrics.mean_absolute_error(Y_valid,regr_predict))
#MSE
print(metrics.mean_squared_error(Y_valid,regr_predict))
#RMSE
print(np.sqrt(metrics.mean_squared_error(Y_valid,regr_predict)))

13.046254836039001
6819.727839381812
82.58164347711792


## **Regression Tree Model**

In [None]:
from sklearn.tree import DecisionTreeRegressor

X_train = train[['Age', 'Salary','MinGift', 'AmtLastYear','Woman', 'NbActivities']] 
Y_train = train['AmtThisYear']
X_valid = validation[['Age', 'Salary','MinGift', 'AmtLastYear','Woman', 'NbActivities']] 
Y_valid = validation['AmtThisYear']

DT_model = DecisionTreeRegressor(max_depth=5).fit(X_train,Y_train)

DT_predict = DT_model.predict(X_valid) #Predictions on Testing data


In [None]:
#you can change the criteria
#MAE
print(metrics.mean_absolute_error(Y_valid,DT_predict))
#MSE
print(metrics.mean_squared_error(Y_valid,DT_predict))
#RMSE
print(np.sqrt(metrics.mean_squared_error(Y_valid,DT_predict)))

13.269056773933272
7611.489839474201
87.24385273172088


### **Other models may also be helpful for this game**

Reference: https://scikit-learn.org/stable/supervised_learning.html

***


## Scoring New Data

### Prepare data for scoring

In [None]:
data3 = sas_session.sasdata2dataframe(
table='score_rd1',
libref='cortex'
)
data4 = sas_session.sasdata2dataframe(
table='score',
libref='cortex'
)

 ### Score new data based on your champion model
 
> Pick your champion model from previous steps and use it to predict next year donations. 
 
> In this case, the linear regression model performed better than the regression tree based on the MSE criterion.

In [None]:
scoring_data = pd.merge(data3, data4, on=["ID"],how="right")

# Perform the same strategy for handling missing values for the score dataset.
# In this case, we will only replace missing values of the MinGift variable.

#scoring_data[['MinGift']] = scoring_data[['MinGift']].fillna(value=0) 

scoring_data.head()

Unnamed: 0,ID,GaveLastYear,AmtLastYear,LastName,FirstName,Woman,Age,Salary,Education,City,SeniorList,NbActivities,Referrals,Recency,Frequency,Seniority,TotalGift,MinGift,MaxGift
0,2000001.0,0.0,0.0,ROMMES,RODNEY,0.0,25.0,107200.0,University / College,City,2.0,0.0,0.0,1.0,2.0,2.0,1010.0,10.0,1000.0
1,2000002.0,0.0,0.0,RAMIREZ,SHARON,1.0,38.0,15800.0,High School,Rural,4.0,1.0,1.0,,,,,,
2,2000003.0,0.0,0.0,TSOSIE,KAREN,1.0,37.0,57400.0,University / College,Rural,5.0,0.0,0.0,,,,,,
3,2000004.0,0.0,0.0,LEE,MARY,1.0,78.0,23700.0,High School,Rural,3.0,0.0,0.0,,,,,,
4,2000005.0,0.0,0.0,HUMPHRES,ANGIE,1.0,34.0,71900.0,University / College,Rural,8.0,0.0,0.0,,,,,,


In [None]:
scoring_datafloats = scoring_data.loc[:, ~scoring_data.columns.isin(['LastName', 'FirstName', 'Education', 'City'])]
#data_mergefloats = data_mergefloats.head(100000)

In [None]:
imputer = IterativeImputer(imputation_order='ascending',max_iter=15,random_state=13,n_nearest_features=10)

In [None]:
for i in scoring_datafloats:
  scoring_datafloats.loc[:,i]=imputer.fit_transform(scoring_datafloats[[i]])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[selected_item_labels] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[selected_item_labels] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[selected_item_labels] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_in

In [None]:
#merge

In [None]:
scoring_datafloats['City'] = encoder_label.fit_transform(scoring_data['City'])
scoring_datafloats['Education'] = encoder_label.fit_transform(scoring_data['Education'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  scoring_datafloats['City'] = encoder_label.fit_transform(scoring_data['City'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  scoring_datafloats['Education'] = encoder_label.fit_transform(scoring_data['Education'])


In [None]:
# In this case, based on MSE (Mean Squared Error) criterion,
# the linear regression model performed better than the regression tree.

X = scoring_datafloats.loc[:, ~scoring_datafloats.columns.isin(['AmtThisYear','ID'])]
regr_predict_end=regr.predict(X)

scoring_datafloats['Prediction'] = regr_predict_end
scoring_datafloats.sort_values(by=['Prediction'], inplace=True,ascending=False)
scoring_datafloats.head()

Feature names must be in the same order as they were in fit.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  scoring_datafloats['Prediction'] = regr_predict_end
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


Unnamed: 0,ID,GaveLastYear,AmtLastYear,Woman,Age,Salary,SeniorList,NbActivities,Referrals,Recency,Frequency,Seniority,TotalGift,MinGift,MaxGift,City,Education,Prediction
286775,2286776.0,0.0,0.0,1.0,23.0,249000.0,6.0,5.0,3.0,0.0,1.0,0.0,7000.0,7000.0,7000.0,3,2,809373.465207
773117,2773118.0,0.0,0.0,1.0,38.0,242100.0,5.0,0.0,0.0,3.0,1.0,3.0,10000.0,10000.0,10000.0,3,2,800939.550648
893679,2893680.0,0.0,0.0,1.0,29.0,239900.0,4.0,0.0,0.0,2.0,1.0,2.0,10000.0,10000.0,10000.0,3,2,794064.298508
576904,2576905.0,0.0,0.0,1.0,63.0,239700.0,8.0,2.0,4.0,2.0,1.0,2.0,10000.0,10000.0,10000.0,3,2,793431.213264
448342,2448343.0,0.0,0.0,0.0,29.0,249300.0,3.0,0.0,0.0,2.0,1.0,2.0,1500.0,1500.0,1500.0,2,2,786163.241552


In [None]:
scoring_data.shape

(1000000, 20)

## Exporting Results to a CSV File

In [None]:
Result= scoring_datafloats[['ID','Prediction']]
Result_1 = Result[['ID']].astype(int)
#Result.to_csv('Round1_Output.csv', index=False)

In [None]:
# Define your cutoff and choose a number of rows to submit to the leaderboard

NB = 80000
submission = Result_1.head(NB)
submission.to_csv('Round1 AllVarEduCity.csv', index=False,header=False)

In [None]:
submission

Unnamed: 0,ID
286775,2286776
773117,2773118
893679,2893680
576904,2576905
448342,2448343
...,...
436534,2436535
939394,2939395
999205,2999206
407906,2407907


In [None]:
# Reminder: Please note that you need only one column (the list of donors' IDs) to submit to the leaderboard.


In [None]:
!head Round1\ Output.csv

ID,Prediction
2420891.0,203.53100264852912
2631674.0,157.07423550309795
2334250.0,154.56589904654052
2954314.0,149.93662884564222
2416111.0,149.59033329198328
2100799.0,145.1098616263633
2094131.0,144.0934294771176
2132411.0,143.607439950489
2265980.0,143.60338124946452
