Version 1.1.0

# Mean encodings

In this programming assignment you will be working with `1C` dataset from the final competition. You are asked to encode `item_id` in 4 different ways:

    1) Via KFold scheme;  
    2) Via Leave-one-out scheme;
    3) Via smoothing scheme;
    4) Via expanding mean scheme.

**You will need to submit** the correlation coefficient between resulting encoding and target variable up to 4 decimal places.

### General tips

* Fill NANs in the encoding with `0.3343`.
* Some encoding schemes depend on sorting order, so in order to avoid confusion, please use the following code snippet to construct the data frame. This snippet also implements mean encoding without regularization.

In [2]:
import pandas as pd
import numpy as np
from itertools import product
from grader import Grader

# Read data

In [3]:
sales = pd.read_csv('../readonly/final_project_data/sales_train.csv.gz')

In [4]:
sales.head()

Unnamed: 0,date,date_block_num,shop_id,item_id,item_price,item_cnt_day
0,02.01.2013,0,59,22154,999.0,1.0
1,03.01.2013,0,25,2552,899.0,1.0
2,05.01.2013,0,25,2552,899.0,-1.0
3,06.01.2013,0,25,2554,1709.05,1.0
4,15.01.2013,0,25,2555,1099.0,1.0


In [5]:
sales['date'].unique()

array(['02.01.2013', '03.01.2013', '05.01.2013', ..., '28.10.2015',
       '25.10.2015', '13.10.2015'], dtype=object)

# Aggregate data

Since the competition task is to make a monthly prediction, we need to aggregate the data to montly level before doing any encodings. The following code-cell serves just that purpose.

In [6]:
index_cols = ['shop_id', 'item_id', 'date_block_num']
    
# For every month we create a grid from all shops/items combinations from that month
grid = [] 
for block_num in sales['date_block_num'].unique():
    cur_shops = sales[sales['date_block_num']==block_num]['shop_id'].unique()
    cur_items = sales[sales['date_block_num']==block_num]['item_id'].unique()
    grid.append(np.array(list(product(*[cur_shops, cur_items, [block_num]])),dtype='int32'))

#turn the grid into pandas dataframe
grid = pd.DataFrame(np.vstack(grid), columns = index_cols,dtype=np.int32)

#get aggregated values for (shop_id, item_id, month)
gb = sales.groupby(index_cols,as_index=False).agg({'item_cnt_day':{'target':'sum'}})

#fix column names
gb.columns = [col[0] if col[-1]=='' else col[-1] for col in gb.columns.values]

#join aggregated data to the grid
all_data = pd.merge(grid,gb,how='left',on=index_cols).fillna(0)
#sort the data
all_data.sort_values(['date_block_num','shop_id','item_id'],inplace=True)

  return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)


In [7]:
grid.head()

Unnamed: 0,shop_id,item_id,date_block_num
0,59,22154,0
1,59,2552,0
2,59,2554,0
3,59,2555,0
4,59,2564,0


In [44]:
gb.columns.values

array(['shop_id', 'item_id', 'date_block_num', 'target'], dtype=object)

In [45]:
gb.head()

Unnamed: 0,shop_id,item_id,date_block_num,target
0,0,30,1,31.0
1,0,31,1,11.0
2,0,32,0,6.0
3,0,32,1,10.0
4,0,33,0,3.0


In [46]:
all_data.head()

Unnamed: 0,shop_id,item_id,date_block_num,target
139255,0,19,0,0.0
141495,0,27,0,0.0
144968,0,28,0,0.0
142661,0,29,0,0.0
138947,0,32,0,6.0


In [None]:
#Simlified code for line 4

In [84]:
index_cols = ['shop_id', 'item_id', 'date_block_num']

In [85]:
for block_num in sales['date_block_num'].unique():
    cur_shops = sales[sales['date_block_num']== block_num]
    
cur_shops

Unnamed: 0,date,date_block_num,shop_id,item_id,item_price,item_cnt_day
2882335,23.10.2015,33,45,13315,649.0,1.0
2882336,05.10.2015,33,45,13880,229.0,1.0
2882337,02.10.2015,33,45,13881,659.0,1.0
2882338,12.10.2015,33,45,13881,659.0,1.0
2882339,04.10.2015,33,45,13923,169.0,1.0
2882340,31.10.2015,33,45,14227,99.0,1.0
2882341,12.10.2015,33,45,14931,799.0,1.0
2882342,05.10.2015,33,45,14101,449.0,1.0
2882343,20.10.2015,33,45,14957,299.0,1.0
2882344,05.10.2015,33,45,14102,749.0,1.0


In [86]:
for block_num in sales['date_block_num'].unique():
    cur_shops = sales[sales['date_block_num']== block_num]['shop_id'].unique()
    
cur_shops

array([45, 46, 44, 41, 39, 42, 34, 31, 35, 38, 37, 36,  6, 56, 57, 55, 58,
       59, 48, 47, 49, 53, 52, 50, 16, 15, 12, 14, 20, 19, 22, 18,  5,  4,
        2,  3,  9,  7, 10, 28, 26, 25, 24, 21])

In [87]:
for block_num in sales['date_block_num'].unique():
    cur_items = sales[sales['date_block_num']==block_num]
    
    
cur_items

Unnamed: 0,date,date_block_num,shop_id,item_id,item_price,item_cnt_day
2882335,23.10.2015,33,45,13315,649.0,1.0
2882336,05.10.2015,33,45,13880,229.0,1.0
2882337,02.10.2015,33,45,13881,659.0,1.0
2882338,12.10.2015,33,45,13881,659.0,1.0
2882339,04.10.2015,33,45,13923,169.0,1.0
2882340,31.10.2015,33,45,14227,99.0,1.0
2882341,12.10.2015,33,45,14931,799.0,1.0
2882342,05.10.2015,33,45,14101,449.0,1.0
2882343,20.10.2015,33,45,14957,299.0,1.0
2882344,05.10.2015,33,45,14102,749.0,1.0


In [88]:
for block_num in sales['date_block_num'].unique():
    cur_items = sales[sales['date_block_num']==block_num]['item_id'].unique()
    
    
cur_items

array([13315, 13880, 13881, ...,  7640,  7632,  7440])

In [103]:
grid = []
for block_num in sales['date_block_num'].unique():
    grid.append(np.array(list(product(*[cur_shops, cur_items, [block_num]])),dtype='int32'))



In [106]:
index_cols = ['shop_id', 'item_id', 'date_block_num']
    
# For every month we create a grid from all shops/items combinations from that month
grid = [] 
for block_num in sales['date_block_num'].unique():
    cur_shops = sales[sales['date_block_num']==block_num]['shop_id'].unique()
    cur_items = sales[sales['date_block_num']==block_num]['item_id'].unique()
    grid.append(np.array(list(product(*[cur_shops, cur_items, [block_num]])),dtype='int32'))




In [107]:
#turn the grid into pandas dataframe
grid = pd.DataFrame(np.vstack(grid), columns = index_cols,dtype=np.int32)
grid.head()

Unnamed: 0,shop_id,item_id,date_block_num
0,59,22154,0
1,59,2552,0
2,59,2554,0
3,59,2555,0
4,59,2564,0


In [108]:
index_cols

['shop_id', 'item_id', 'date_block_num']

In [109]:
gd = sales.aggregate({"item_cnt_day":['sum']})
gd

Unnamed: 0,item_cnt_day
sum,3648206.0


In [110]:
#get aggregated values for (shop_id, item_id, month)
gb = sales.groupby(index_cols,as_index=False).aggregate({"item_cnt_day":['sum']})

In [111]:
gb.head()

Unnamed: 0_level_0,shop_id,item_id,date_block_num,item_cnt_day
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,sum
0,0,30,1,31.0
1,0,31,1,11.0
2,0,32,0,6.0
3,0,32,1,10.0
4,0,33,0,3.0


In [112]:

#fix column names
gb.columns = [col[0] if col[-1]=='' else col[-1] for col in gb.columns.values]

In [113]:
gb.head()

Unnamed: 0,shop_id,item_id,date_block_num,sum
0,0,30,1,31.0
1,0,31,1,11.0
2,0,32,0,6.0
3,0,32,1,10.0
4,0,33,0,3.0


In [114]:
gb = gb.rename(columns={'sum': 'target'})

In [115]:
gb.head()

Unnamed: 0,shop_id,item_id,date_block_num,target
0,0,30,1,31.0
1,0,31,1,11.0
2,0,32,0,6.0
3,0,32,1,10.0
4,0,33,0,3.0


In [116]:
#join aggregated data to the grid
all_data = pd.merge(grid,gb,how='left',on=index_cols).fillna(0)
#sort the data
all_data.sort_values(['date_block_num','shop_id','item_id'],inplace=True)

In [118]:
#Compare the line 46 & 117

In [117]:
all_data.head()

Unnamed: 0,shop_id,item_id,date_block_num,target
139255,0,19,0,0.0
141495,0,27,0,0.0
144968,0,28,0,0.0
142661,0,29,0,0.0
138947,0,32,0,6.0


# Mean encodings without regularization

After we did the techinical work, we are ready to actually *mean encode* the desired `item_id` variable. 

Here are two ways to implement mean encoding features *without* any regularization. You can use this code as a starting point to implement regularized techniques. 

#### Method 1

In [18]:
# Calculate a mapping: {item_id: target_mean}
item_id_target_mean = all_data.groupby('item_id').target.mean()

# In our non-regularized case we just *map* the computed means to the `item_id`'s
all_data['item_target_enc'] = all_data['item_id'].map(item_id_target_mean)

# Fill NaNs
all_data['item_target_enc'].fillna(0.3343, inplace=True) 

# Print correlation
encoded_feature = all_data['item_target_enc'].values
print(np.corrcoef(all_data['target'].values, encoded_feature)[0][1])

0.483038698862


In [9]:
# Calculate a mapping: {item_id: target_mean}
item_id_target_mean = all_data.groupby('item_id').target.mean()
item_id_target_mean.head()

item_id
0    0.020000
1    0.023810
2    0.019802
3    0.019802
4    0.020000
Name: target, dtype: float64

In [11]:
all_data['item_id'].head()

139255    19
141495    27
144968    28
142661    29
138947    32
Name: item_id, dtype: int32

In [10]:
# In our non-regularized case we just *map* the computed means to the `item_id`'s
all_data['item_target_enc'] = all_data['item_id'].map(item_id_target_mean)

all_data['item_target_enc'].head()

139255    0.022222
141495    0.056834
144968    0.141176
142661    0.037383
138947    1.319042
Name: item_target_enc, dtype: float64

#### Method 2

In [19]:
'''
     Differently to `.target.mean()` function `transform` 
   will return a dataframe with an index like in `all_data`.
   Basically this single line of code is equivalent to the first two lines from of Method 1.
'''
all_data['item_target_enc'] = all_data.groupby('item_id')['target'].transform('mean')

# Fill NaNs
all_data['item_target_enc'].fillna(0.3343, inplace=True) 

# Print correlation
encoded_feature = all_data['item_target_enc'].values
print(np.corrcoef(all_data['target'].values, encoded_feature)[0][1])

0.483038698862


In [12]:
all_data['item_target_enc'] = all_data.groupby('item_id')['target'].transform('mean')
all_data['item_target_enc'].head()

139255    0.022222
141495    0.056834
144968    0.141176
142661    0.037383
138947    1.319042
Name: item_target_enc, dtype: float64

In [None]:
#Example to understand transform() function 

In [5]:
import pandas as pd
df = pd.read_csv(
   "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
   )
pd.options.display.max_rows = 10
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [6]:
#Mean value for each day
a = df.groupby('day')['total_bill'].mean()
a.head()

day
Fri     17.151579
Sat     20.441379
Sun     21.410000
Thur    17.682742
Name: total_bill, dtype: float64

In [12]:
#To make a new value for each row, we use transform() ,if we use mean() , we get incorrect values
df['day_average'] = df.groupby('day')['total_bill'].mean()
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,day_average
0,16.99,1.01,Female,No,Sun,Dinner,2,
1,10.34,1.66,Male,No,Sun,Dinner,3,
2,21.01,3.5,Male,No,Sun,Dinner,3,
3,23.68,3.31,Male,No,Sun,Dinner,2,
4,24.59,3.61,Female,No,Sun,Dinner,4,


In [14]:
#To make a new value for each row, we use transform().
df['day_average'] = df.groupby('day')['total_bill'].transform(lambda x : x.mean())
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,day_average
0,16.99,1.01,Female,No,Sun,Dinner,2,21.41
1,10.34,1.66,Male,No,Sun,Dinner,3,21.41
2,21.01,3.5,Male,No,Sun,Dinner,3,21.41
3,23.68,3.31,Male,No,Sun,Dinner,2,21.41
4,24.59,3.61,Female,No,Sun,Dinner,4,21.41


See the printed value? It is the correlation coefficient between the target variable and your new encoded feature. You need to **compute correlation coefficient** between the encodings, that you will implement and **submit those to coursera**.

In [20]:
grader = Grader()

# 1. KFold scheme

Explained starting at 41 sec of [Regularization video](https://www.coursera.org/learn/competitive-data-science/lecture/LGYQ2/regularization).

**Now it's your turn to write the code!** 

You may use 'Regularization' video as a reference for all further tasks.

First, implement KFold scheme with five folds. Use KFold(5) from sklearn.model_selection. 

1. Split your data in 5 folds with `sklearn.model_selection.KFold` with `shuffle=False` argument.
2. Iterate through folds: use all but the current fold to calculate mean target for each level `item_id`, and  fill the current fold.

    *  See the **Method 1** from the example implementation. In particular learn what `map` and pd.Series.map functions do. They are pretty handy in many situations.

In [21]:
# YOUR CODE GOES HERE
from sklearn.model_selection import KFold 

folder = KFold(n_splits=5, shuffle=False)
column = "item_id"
encoded_column = column + "_mean_target"
train_new = pd.DataFrame(index=all_data.index, columns=all_data.columns)
train_new[encoded_column] = np.nan
for training_index, validation_index in folder.split(all_data):
    x_train = all_data.iloc[training_index].copy()
    x_validation = all_data.iloc[validation_index].copy()
    means = x_validation[column].map(x_train.groupby(column).target.mean())
    x_validation[encoded_column] = means
    # train_new is a dataframe copy we made of the training data
    train_new.iloc[validation_index] = x_validation
train_new.fillna(0.3343, inplace=True)

encoded_feature = train_new.item_id_mean_target.values

# You will need to compute correlation like that
corr = np.corrcoef(all_data['target'].values, encoded_feature)[0][1]
print(corr)
grader.submit_tag('KFold_scheme', corr)

0.41645907128
Current answer for task KFold_scheme is: 0.41645907128


##  map function
 - map functions expects a function object and any number of iterables like list, dictionary, etc. 
 - It executes the function_object for each element in the sequence and returns a list of the elements modified by the function object.

## Basic syntax

map(function_object, iterable1, iterable2,...)

In [7]:
#Example:
def multiply2(x):
  return x * 2
    
    
m = list(map(multiply2, [1, 2, 3, 4]))  # Output [2, 4, 6, 8]
m

[2, 4, 6, 8]

In [10]:
# Iterating over a dictionary using map and lambda
dict_a = [{'name': 'python', 'points': 10}, {'name': 'java', 'points': 8}]
  
n = list(map(lambda x : x['name'], dict_a)) # Output: ['python', 'java'])
n
map(lambda x : x['points']*10,  dict_a) # Output: [100, 80]

map(lambda x : x['name'] == "python", dict_a) # Output: [True, False]


['python', 'java']

In [11]:
## Multiple iterables to the map function
list_a = [1, 2, 3]
list_b = [10, 20, 30]
  
y = list(map(lambda x, y: x + y, list_a, list_b)) # Output: [11, 22, 33]
y

[11, 22, 33]

In [12]:
# YOUR CODE GOES HERE
from sklearn.model_selection import KFold 

folder = KFold(n_splits=5, shuffle=False)
column = "item_id"
encoded_column = column + "_mean_target"

In [22]:
index=all_data.index
index

Int64Index([  139255,   141495,   144968,   142661,   138947,   138948,
              138949,   139247,   142672,   142065,
            ...
            10771530, 10768090, 10769958, 10769959, 10769704, 10768834,
            10769024, 10769690, 10771216, 10770511],
           dtype='int64', length=10913850)

In [23]:
columns=all_data.columns
columns

Index(['shop_id', 'item_id', 'date_block_num', 'target', 'item_target_enc'], dtype='object')

In [25]:
train_new = pd.DataFrame(index=all_data.index, columns=all_data.columns)
train_new.head()

Unnamed: 0,shop_id,item_id,date_block_num,target,item_target_enc
139255,,,,,
141495,,,,,
144968,,,,,
142661,,,,,
138947,,,,,


In [26]:
train_new[encoded_column] = np.nan
train_new.head()

Unnamed: 0,shop_id,item_id,date_block_num,target,item_target_enc,item_id_mean_target
139255,,,,,,
141495,,,,,,
144968,,,,,,
142661,,,,,,
138947,,,,,,


In [28]:
for training_index, validation_index in folder.split(all_data):
    x_train = all_data.iloc[training_index].copy()

x_train.head()

Unnamed: 0,shop_id,item_id,date_block_num,target,item_target_enc
139255,0,19,0,0.0,0.022222
141495,0,27,0,0.0,0.056834
144968,0,28,0,0.0,0.141176
142661,0,29,0,0.0,0.037383
138947,0,32,0,6.0,1.319042


In [29]:
for training_index, validation_index in folder.split(all_data):    
    x_validation = all_data.iloc[validation_index].copy()

x_validation.head()

Unnamed: 0,shop_id,item_id,date_block_num,target,item_target_enc
8513296,59,7031,24,0.0,0.026042
8513295,59,7044,24,0.0,0.028653
8510778,59,7047,24,0.0,2.061325
8508845,59,7049,24,0.0,0.430769
8512688,59,7050,24,0.0,0.15873


In [32]:
column

'item_id'

In [34]:
m = x_validation[column]
m.head()

8513296    7031
8513295    7044
8510778    7047
8508845    7049
8512688    7050
Name: item_id, dtype: int32

In [46]:
n = x_train.groupby(column)['target'].mean()
n.head()

item_id
0    0.020000
1    0.023810
2    0.019802
3    0.019802
4    0.020000
Name: target, dtype: float64

In [30]:
##Ading x_validation[column] + n from above to get means 
for training_index, validation_index in folder.split(all_data):     
    means = x_validation[column].map(x_train.groupby(column).target.mean())
    
means.head()


8513296    0.027304
8513295    0.031646
8510778    2.826790
8508845    0.501295
8512688    0.158983
Name: item_id, dtype: float64

In [1]:
for training_index, validation_index in folder.split(all_data):     
    x_validation[encoded_column] = means
    # train_new is a dataframe copy we made of the training data
    train_new.iloc[validation_index] = x_validation
train_new.fillna(0.3343, inplace=True)

encoded_feature = train_new.item_id_mean_target.values

NameError: name 'folder' is not defined

In [43]:
#Example:
def multiply2(x):
  return x * 1
m = x_validation[column].map(multiply2, [1])
m.head()

8513296    7031
8513295    7044
8510778    7047
8508845    7049
8512688    7050
Name: item_id, dtype: int64

# 2. Leave-one-out scheme

Now, implement leave-one-out scheme. Note that if you just simply set the number of folds to the number of samples and run the code from the **KFold scheme**, you will probably wait for a very long time. 

To implement a faster version, note, that to calculate mean target value using all the objects but one *given object*, you can:

1. Calculate sum of the target values using all the objects.
2. Then subtract the target of the *given object* and divide the resulting value by `n_objects - 1`. 

Note that you do not need to perform `1.` for every object. And `2.` can be implemented without any `for` loop.

It is the most convenient to use `.transform` function as in **Method 2**.

In [12]:
# YOUR CODE GOES HERE

#1.Calculate sum of the target values using all the objects.

sums = all_data.groupby('item_id')['target'].sum()

counts = all_data.groupby("item_id").target.count()

#print_table(means.head().reset_index(), headers=["Item ID", "mean"])

#Now we'll calculate the total number of items (the sum of the target values for all the items) and how many items there are once you leave one out.

total_sum = all_data.target.sum()
one_less = len(means) - 1

left_out = (total_sum - means)/one_less


corr = np.corrcoef(all_data['target'].values, encoded_feature)[0][1]
print(corr)
grader.submit_tag('Leave-one-out_scheme', corr)

0.41645907128
Current answer for task Leave-one-out_scheme is: 0.41645907128


# 3. Smoothing

Explained starting at 4:03 of [Regularization video](https://www.coursera.org/learn/competitive-data-science/lecture/LGYQ2/regularization).

Next, implement smoothing scheme with $\alpha = 100$. Use the formula from the first slide in the video and $0.3343$ as `globalmean`. Note that `nrows` is the number of objects that belong to a certain category (not the number of rows in the dataset).

In [13]:
target_mean = all_data.groupby('item_id')['target'].transform('mean')
target_mean 

139255      0.022222
141495      0.056834
144968      0.141176
142661      0.037383
138947      1.319042
138948      0.527112
138949      0.146108
139247      0.944681
142672      0.070943
142065      0.085828
139208      0.070596
142670      0.032847
139207      0.086773
138950      0.110971
143764      0.058450
141505      0.076040
139199      0.069005
138952      0.116646
139176      0.044444
138951      0.148802
139177      0.067236
139178      0.119798
139179      0.073126
143769      0.112575
142671      0.052989
144539      0.098361
139180      0.157485
138953      0.071168
144265      0.043796
141744      0.056250
              ...   
10772600    0.830357
10770510    0.140000
10769953    0.502326
10769955    1.362817
10768833    0.163556
10769961    0.370044
10770625    0.159066
10769956    0.699195
10771598    1.937198
10767854    2.173392
10768086    3.324716
10768087    0.751576
10768088    1.317150
10767847    2.267442
10769954    1.003861
10767848    6.594595
10767849    0

In [11]:
# YOUR CODE GOES HERE
alpha = 100
globalmean = 0.3343

#target_mean = all_data.groupby('item_id')['target'].transform('mean')

train_new = all_data.copy()
nrows = train_new.groupby('item_id').size()
means = train_new.groupby('item_id').target.agg('mean')

score = (np.multiply(means,nrows)  + globalmean*alpha) / (nrows+alpha)
train_new['smooth'] = train_new['item_id']
train_new['smooth'] = train_new['smooth'].map(score)
encoded_feature = train_new['smooth'].values




corr = np.corrcoef(all_data['target'].values, encoded_feature)[0][1]
print(corr)
grader.submit_tag('Smoothing_scheme', corr)

0.48181987971
Current answer for task Smoothing_scheme is: 0.48181987971


# 4. Expanding mean scheme

Explained starting at 5:50 of [Regularization video](https://www.coursera.org/learn/competitive-data-science/lecture/LGYQ2/regularization).

Finally, implement the *expanding mean* scheme. It is basically already implemented for you in the video, but you can challenge yourself and try to implement it yourself. You will need [`cumsum`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.cumsum.html) and [`cumcount`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.cumcount.html) functions from pandas.

In [14]:
# YOUR CODE GOES HERE
cumsum = all_data.groupby('item_id').target.cumsum() - all_data['target']
cumcnt = all_data.groupby('item_id').cumcount()
train_new["mean_target"] = cumsum /cumcnt
train_new['mean_target'].fillna(0.3343, inplace=True)
encoded_feature = train_new['mean_target'].values
corr = np.corrcoef(all_data['target'].values, encoded_feature)[0][1]
print(corr)
grader.submit_tag('Expanding_mean_scheme', corr)

0.502524521108
Current answer for task Expanding_mean_scheme is: 0.502524521108


## Authorization & Submission
To submit assignment parts to Cousera platform, please, enter your e-mail and token into variables below. You can generate token on this programming assignment page. Note: Token expires 30 minutes after generation.

In [21]:
STUDENT_EMAIL = 'chode.amar@yahoo.com'
STUDENT_TOKEN = 'saNJyl4bkBwglWUc'
grader.status()

You want to submit these numbers:
Task KFold_scheme: 0.41645907128
Task Leave-one-out_scheme: ----------
Task Smoothing_scheme: 0.48181987971
Task Expanding_mean_scheme: 0.502524521108


In [22]:
grader.submit(STUDENT_EMAIL, STUDENT_TOKEN)

Submitted to Coursera platform. See results on assignment page!
