# IDAO: expected time of orders in airports

Airports are special points for taxi service. Every day a lot of people use a taxi to get to the city centre from the airport.

One of important task is to predict how long a driver need to wait an order. It helps to understand what to do. Maybe the driver have to wait near doors, or can drink a tea, or even should drive to city center without an order.

We request you to solve a simple version of this prediction task.

**Task:** predict time of $k$ orders in airport (time since now when you get an order if you are $k$-th in queue), $k$ is one of 5 values (different for every airports).

**Data**
- train: number of order for every minutes for 6 months
- test: every test sample has datetime info + numer of order for every minutes for last 2 weeks

**Submission:** for every airport you should prepare a model which will be evaluated in submission system (code + model files). You can make different models for different airports.

**Evaluation:** for every airport for every $k$ sMAPE will be calculated and averaged. General leaderboard will be calculated via Borda count. 

## Baseline

In [2]:
!pip install catboost

Collecting catboost
[?25l  Downloading https://files.pythonhosted.org/packages/3e/62/b442e8d747e8a34ac8a981f7a4ff717c1f887aedb42c3f670660bda41af5/catboost-0.13.1-cp36-none-manylinux1_x86_64.whl (60.1MB)
[K    100% |████████████████████████████████| 60.1MB 868kB/s eta 0:00:01
Collecting enum34 (from catboost)
  Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
[31mfastai 1.0.50.post1 requires nvidia-ml-py3, which is not installed.[0m
[31mthinc 6.12.1 has requirement msgpack<0.6.0,>=0.5.6, but you'll have msgpack 0.6.0 which is incompatible.[0m
Installing collected packages: enum34, catboost
Successfully installed catboost-0.13.1 enum34-1.1.6
[33mYou are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
%pylab inline

import catboost
import pandas as pd
import pickle
import tqdm

Populating the interactive namespace from numpy and matplotlib


Let's prepare a model for set2.

# Load train dataset

In [2]:
set_name = 'set3'
path_train_set = '../../data/train/{}.csv'.format(set_name)

data = pd.read_csv(path_train_set)
data.datetime = data.datetime.apply(
    lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
data = data.sort_values('datetime')
data.head()

Unnamed: 0,datetime,num_orders
0,2018-02-01 00:00:00,0
1,2018-02-01 00:01:00,0
2,2018-02-01 00:02:00,0
3,2018-02-01 00:03:00,0
4,2018-02-01 00:04:00,0


Predict position for set2.

In [3]:
target_positions = {
    'set1': [10, 30, 45, 60, 75],
    'set2': [5, 10, 15, 20, 25],
    'set3': [5, 7, 9, 11, 13]
}[set_name]

Some useful constant.

In [4]:
HOUR_IN_MINUTES = 60
DAY_IN_MINUTES = 24 * HOUR_IN_MINUTES
WEEK_IN_MINUTES = 7 * DAY_IN_MINUTES

MAX_TIME = DAY_IN_MINUTES

## Generate train samples with targets

We have only history of orders (count of orders in every minutes) but we need to predict time of k orders since current minutes. So we should calculate target for train set. Also we will make a lot of samples from all set (we can only use two weeks of history while prediction so we can use only two weeks in every train sample).

In [5]:
samples = {
    'datetime': [],
    'history': []}

for position in target_positions:
    samples['target_{}'.format(position)] = []
    
num_orders = data.num_orders.values

To calculate target (minutes before k orders) we are going to use cumulative sum of orders. 

In [6]:
# start after 2 weeks because of history
# finish earlier because of target calculation
for i in range(2 * WEEK_IN_MINUTES,
               len(num_orders) - 2 * DAY_IN_MINUTES):
    
    samples['datetime'].append(data.datetime[i])
    samples['history'].append(num_orders[i-2*WEEK_IN_MINUTES:i])
    
    # cumsum not for all array because of time economy
    cumsum_num_orders = num_orders[i+1:i+1+2*DAY_IN_MINUTES].cumsum()
    for position in target_positions:
        orders_by_positions = np.where(cumsum_num_orders >= position)[0]
        if len(orders_by_positions):
            time = orders_by_positions[0] + 1
        else:
            # if no orders in last days
            time = MAX_TIME
        samples['target_{}'.format(position)].append(time)

Convert to pandas.dataframe. Now we have targets to train and predict.

In [7]:
df = pd.DataFrame.from_dict(samples)
df.head()

Unnamed: 0,datetime,history,target_11,target_13,target_5,target_7,target_9
0,2018-02-15 00:00:00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",820,1167,308,418,421
1,2018-02-15 00:01:00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",819,1166,307,417,420
2,2018-02-15 00:02:00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",818,1165,306,416,419
3,2018-02-15 00:03:00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",817,1164,305,415,418
4,2018-02-15 00:04:00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",816,1163,304,414,417


# Train model

Let's generate simple features.

By time:

In [8]:
df['weekday'] = df.datetime.apply(lambda x: x.weekday())
df['hour'] = df.datetime.apply(lambda x: x.hour)
df['minute'] = df.datetime.apply(lambda x: x.minute)
df['month'] = df.datetime.apply(lambda x: x.month)
df['is_night'] = df.datetime.apply(lambda x: 1.0 if (x.hour < 6) else 0.0)
df['is_weekend'] = df.datetime.apply(lambda x: 1.0 if (x.weekday() >=5)
                              or ((x.day == 23) and (x.month == 2))
                              or ((x.day == 8) and (x.month == 3))
                              or ((x.day == 9) and (x.month == 3))
                              or ((x.day == 30) and (x.month == 4))
                              or ((x.day == 1) and (x.month == 5))
                              or ((x.day == 2) and (x.month == 5))
                              or ((x.day == 9) and (x.month == 5))
                              or ((x.day == 11) and (x.month == 6))
                              or ((x.day == 12) and (x.month == 6))
                             else 0.0)

Aggregators by order history with different shift and window size:

In [9]:
SHIFTS = [
    HOUR_IN_MINUTES // 4,
    HOUR_IN_MINUTES // 2,
    HOUR_IN_MINUTES,
    DAY_IN_MINUTES,
    DAY_IN_MINUTES * 2,
    WEEK_IN_MINUTES,
    WEEK_IN_MINUTES * 2
]
WINDOWS = [
    HOUR_IN_MINUTES // 4,
    HOUR_IN_MINUTES // 2,
    HOUR_IN_MINUTES,
    DAY_IN_MINUTES,
    DAY_IN_MINUTES * 2,
    WEEK_IN_MINUTES,
    WEEK_IN_MINUTES * 2
]

In [10]:
for shift in SHIFTS:
    for window in WINDOWS:
        if window > shift:
            continue
        df['num_orders_{}_{}'.format(shift, window)] = \
            df.history.apply(lambda x: x[-shift : -shift + window].sum())

Train/validation split for time. Let's use last 4 weeks for validation.

In [11]:
df.datetime.min(), df.datetime.max()

(Timestamp('2018-02-15 00:00:00'), Timestamp('2018-07-29 23:59:00'))

In [12]:
#df_train = df.loc[df.datetime <= df.datetime.max() - datetime.timedelta(days=28)]
#df_test = df.loc[df.datetime > df.datetime.max() - datetime.timedelta(days=28)]
from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.2, shuffle=True)

In [13]:
target_cols = ['target_{}'.format(position) for position in target_positions]

y_train = df_train[target_cols]
y_test = df_test[target_cols]

X = df.drop(['datetime', 'history'] + target_cols, axis=1)
y = df[target_cols]

df_train = df_train.drop(['datetime', 'history'] + target_cols, axis=1)#[USE_COLUMNS]
df_test = df_test.drop(['datetime', 'history'] + target_cols, axis=1)#[USE_COLUMNS]

In [14]:
def sMAPE(y_true, y_predict, shift=0):
    return 2 * np.mean(
        np.abs(y_true - y_predict) /
        (np.abs(y_true) + np.abs(y_predict) + shift))

Also we will save models for prediction stage.

In [15]:
model_to_save = {
    'models': {}
}

What is good or bad model? We can compare our model with constant solution. For instance median (optimal solution for MAE).

# Catboost

## target_5

In [242]:
%%time
#v_depth=6: 0.44839473608296043
for v_depth in range(3, 12, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = v_depth,
        l2_leaf_reg = 5,
        rsm = 0.5,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
   
    model.fit(
        X=df_train,
        y=y_train['target_5'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_5']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_depth=' + str(v_depth) + ': ' + str(sMAPE(y_test['target_5'], y_predict)))

v_depth=3: 0.460680634682589
v_depth=4: 0.45152674603044374
v_depth=5: 0.44848203182099233
v_depth=6: 0.44608542589364997
v_depth=7: 0.4464060362255355
v_depth=8: 0.44864243020437133
v_depth=9: 0.4515032507420347
v_depth=10: 0.45388237178178364
v_depth=11: 0.45866318546147805
CPU times: user 19min 6s, sys: 45.3 s, total: 19min 51s
Wall time: 6min


In [244]:
%%time
for v_l2_leaf_reg in range(1, 10, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 6,
        l2_leaf_reg = v_l2_leaf_reg,
        rsm=0.5,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_5'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_5']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_l2_leaf_reg=' + str(v_l2_leaf_reg) + ': ' + str(sMAPE(y_test['target_5'], y_predict)))

v_l2_leaf_reg=1: 0.4460038774385989
v_l2_leaf_reg=2: 0.4462198790501305
v_l2_leaf_reg=3: 0.44567730233015757
v_l2_leaf_reg=4: 0.44597659927381766
v_l2_leaf_reg=5: 0.44608542589364997
v_l2_leaf_reg=6: 0.4461749446857824
v_l2_leaf_reg=7: 0.44693376006923674
v_l2_leaf_reg=8: 0.4460697445015234
v_l2_leaf_reg=9: 0.4454646089289492
CPU times: user 13min 31s, sys: 36.3 s, total: 14min 7s
Wall time: 4min 10s


In [245]:
for v_rsm in range(1, 11, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 6,
        l2_leaf_reg = 9,
        rsm = v_rsm/10.0,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_5'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_5']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_rsm=' + str(v_rsm) + ': ' + str(sMAPE(y_test['target_5'], y_predict)))

v_rsm=1: 0.4643108073421865
v_rsm=2: 0.4477828745986129
v_rsm=3: 0.447269401413902
v_rsm=4: 0.44745587650918056
v_rsm=5: 0.4454646089289492
v_rsm=6: 0.4465230783071279
v_rsm=7: 0.44763838290761915
v_rsm=8: 0.44766407408555076
v_rsm=9: 0.45004264912109027
v_rsm=10: 0.447057957580867


In [21]:
model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=0.5, 
    #1.0 - 0.38967307384144295
        depth = 6,
        l2_leaf_reg = 9,
        rsm = 0.5,
        loss_function='MAE', 
        random_seed=27)
    
model.fit(
        X=df_train,
        y=y_train['target_5'],
        #use_best_model=True,
        #eval_set=(df_test, y_test['target_5']),        
        verbose=False
    )
y_predict = model.predict(df_test)
print(sMAPE(y_test['target_5'], y_predict))

0.4329377193137914


## target_7

In [26]:
%%time
#v_depth=3: 0.3985473081070998
for v_depth in range(3, 12, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = v_depth,
        l2_leaf_reg = 8,
        rsm = 1.0,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
   
    model.fit(
        X=df_train,
        y=y_train['target_7'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_7']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_depth=' + str(v_depth) + ': ' + str(sMAPE(y_test['target_7'], y_predict)))

v_depth=3: 0.3977525434478633
v_depth=4: 0.3998949063870556
v_depth=5: 0.40255409146052007
v_depth=6: 0.4079623301515423
v_depth=7: 0.41516080750833306
v_depth=8: 0.41319905156512005
v_depth=9: 0.412136556876834
v_depth=10: 0.4122810149164714
v_depth=11: 0.4180472964692575
CPU times: user 24min 17s, sys: 1min 37s, total: 25min 54s
Wall time: 10min 34s


In [24]:
%%time
for v_l2_leaf_reg in range(1, 10, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = v_l2_leaf_reg,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_7'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_7']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_l2_leaf_reg=' + str(v_l2_leaf_reg) + ': ' + str(sMAPE(y_test['target_7'], y_predict)))

v_l2_leaf_reg=1: 0.39890650675843764
v_l2_leaf_reg=2: 0.3987426953192268
v_l2_leaf_reg=3: 0.3985473081070998
v_l2_leaf_reg=4: 0.3985917884848496
v_l2_leaf_reg=5: 0.3986607973078868
v_l2_leaf_reg=6: 0.39871254746670237
v_l2_leaf_reg=7: 0.397759857388504
v_l2_leaf_reg=8: 0.3977525434478633
v_l2_leaf_reg=9: 0.39861354702785556
CPU times: user 8min 33s, sys: 25.4 s, total: 8min 58s
Wall time: 3min 10s


In [25]:
for v_rsm in range(1, 11, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = 8,
        rsm = v_rsm/10.0,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_7'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_7']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_rsm=' + str(v_rsm) + ': ' + str(sMAPE(y_test['target_7'], y_predict)))

v_rsm=1: 0.4106787297794669
v_rsm=2: 0.4071539846190439
v_rsm=3: 0.4065234328562939
v_rsm=4: 0.406486769532713
v_rsm=5: 0.40550430517385094
v_rsm=6: 0.4041185205413228
v_rsm=7: 0.40348827376025637
v_rsm=8: 0.4034582926466291
v_rsm=9: 0.3999357855201925
v_rsm=10: 0.3977525434478633


In [23]:
model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=0.8, 
        depth = 3,
        l2_leaf_reg = 8,
        rsm = 1.0,
        loss_function='MAE', 
        random_seed=27)
    
model.fit(
        X=df_train,
        y=y_train['target_7'],
        #use_best_model=True,
        #eval_set=(df_test, y_test['target_7']),        
        verbose=False
    )
y_predict = model.predict(df_test)
print(sMAPE(y_test['target_7'], y_predict))

0.3967910164855593


## target_9

In [31]:
%%time
for v_depth in range(3, 12, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = v_depth,
        l2_leaf_reg = 8,
        rsm = 0.7,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
   
    model.fit(
        X=df_train,
        y=y_train['target_9'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_9']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_depth=' + str(v_depth) + ': ' + str(sMAPE(y_test['target_9'], y_predict)))

v_depth=3: 0.3740277740344902
v_depth=4: 0.3776172165072162
v_depth=5: 0.38671347232139125
v_depth=6: 0.39089043005669366


KeyboardInterrupt: 

In [32]:
%%time
for v_l2_leaf_reg in range(1, 10, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = v_l2_leaf_reg,
        rsm = 0.7,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_9'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_9']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_l2_leaf_reg=' + str(v_l2_leaf_reg) + ': ' + str(sMAPE(y_test['target_9'], y_predict)))

v_l2_leaf_reg=1: 0.3746704554728539
v_l2_leaf_reg=2: 0.3743424732172209
v_l2_leaf_reg=3: 0.3744214294505267
v_l2_leaf_reg=4: 0.374160110226109
v_l2_leaf_reg=5: 0.3741861488731835
v_l2_leaf_reg=6: 0.3743308231447629
v_l2_leaf_reg=7: 0.3742388555811873
v_l2_leaf_reg=8: 0.3740277740344902
v_l2_leaf_reg=9: 0.3741698191856553
CPU times: user 9min 45s, sys: 1min 3s, total: 10min 48s
Wall time: 7min 15s


In [30]:
for v_rsm in range(1, 11, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = 8,
        rsm = v_rsm/10.0,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_9'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_9']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_rsm=' + str(v_rsm) + ': ' + str(sMAPE(y_test['target_9'], y_predict)))

v_rsm=1: 0.37777806095076777
v_rsm=2: 0.3756241560175422
v_rsm=3: 0.3752517546456537
v_rsm=4: 0.3760580228481101
v_rsm=5: 0.37481731608459984
v_rsm=6: 0.3740295120049373
v_rsm=7: 0.3740277740344902
v_rsm=8: 0.37625404298214293
v_rsm=9: 0.37546540100599307
v_rsm=10: 0.3750650430847485


In [None]:
model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = 8,
        rsm = 0.7,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
model.fit(
        X=df_train,
        y=y_train['target_9'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_9']),        
        verbose=True
    )
y_predict = model.predict(df_test)
print(sMAPE(y_test['target_9'], y_predict))

## target_11

In [36]:
%%time
for v_depth in range(3, 12, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = v_depth,
        l2_leaf_reg = 2,
        rsm = 0.2,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
   
    model.fit(
        X=df_train,
        y=y_train['target_11'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_11']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_depth=' + str(v_depth) + ': ' + str(sMAPE(y_test['target_11'], y_predict)))

v_depth=3: 0.33844257766561336
v_depth=4: 0.33841699420351823
v_depth=5: 0.34063530935819164
v_depth=6: 0.34311500340426654
v_depth=7: 0.3425497825485602
v_depth=8: 0.34821809124369224
v_depth=9: 0.35180648146781146
v_depth=10: 0.3559961459335256
v_depth=11: 0.3584751999899789
CPU times: user 18min 20s, sys: 2min 22s, total: 20min 43s
Wall time: 16min 20s


In [34]:
%%time
for v_l2_leaf_reg in range(1, 10, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = v_l2_leaf_reg,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_11'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_11']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_l2_leaf_reg=' + str(v_l2_leaf_reg) + ': ' + str(sMAPE(y_test['target_11'], y_predict)))

v_l2_leaf_reg=1: 0.34703676760283864
v_l2_leaf_reg=2: 0.3467869566101697
v_l2_leaf_reg=3: 0.34712168175748354
v_l2_leaf_reg=4: 0.34866733216765383
v_l2_leaf_reg=5: 0.3487148767160265
v_l2_leaf_reg=6: 0.34858000102220027
v_l2_leaf_reg=7: 0.348326423095323
v_l2_leaf_reg=8: 0.3484539346139448
v_l2_leaf_reg=9: 0.34788443715026895
CPU times: user 11min 27s, sys: 1min 9s, total: 12min 37s
Wall time: 8min 9s


In [35]:
for v_rsm in range(1, 11, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = 2,
        rsm = v_rsm/10.0,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_11'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_11']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_rsm=' + str(v_rsm) + ': ' + str(sMAPE(y_test['target_11'], y_predict)))

v_rsm=1: 0.3400482060837549
v_rsm=2: 0.33844257766561336
v_rsm=3: 0.33975051666393824
v_rsm=4: 0.3418340612299502
v_rsm=5: 0.34213076980451446
v_rsm=6: 0.3438322036887518
v_rsm=7: 0.34466030341885145
v_rsm=8: 0.34752696020248036
v_rsm=9: 0.34870215001920307
v_rsm=10: 0.3467869566101697


In [None]:
model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 3,
        l2_leaf_reg = 2,
        rsm = 0.2,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
model.fit(
        X=df_train,
        y=y_train['target_11'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_11']),        
        verbose=True
    )
y_predict = model.predict(df_test)
print(sMAPE(y_test['target_11'], y_predict))

## target_13

In [37]:
%%time
for v_depth in range(3, 12, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = v_depth,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
   
    model.fit(
        X=df_train,
        y=y_train['target_13'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_13']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_depth=' + str(v_depth) + ': ' + str(sMAPE(y_test['target_13'], y_predict)))

v_depth=3: 0.3016743884837405
v_depth=4: 0.30142479283204876
v_depth=5: 0.3037740174206978
v_depth=6: 0.3060944390289753
v_depth=7: 0.3041974425831808
v_depth=8: 0.3047062362608426
v_depth=9: 0.3068807208061818
v_depth=10: 0.31160602280355293
v_depth=11: 0.31359399695355006
CPU times: user 30min 41s, sys: 2min 39s, total: 33min 20s
Wall time: 23min 55s


In [38]:
%%time
for v_l2_leaf_reg in range(1, 10, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 4,
        l2_leaf_reg = v_l2_leaf_reg,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_13'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_13']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_l2_leaf_reg=' + str(v_l2_leaf_reg) + ': ' + str(sMAPE(y_test['target_13'], y_predict)))

v_l2_leaf_reg=1: 0.3013483917685065
v_l2_leaf_reg=2: 0.30166558866064713
v_l2_leaf_reg=3: 0.30142479283204876
v_l2_leaf_reg=4: 0.30129643057493743
v_l2_leaf_reg=5: 0.30123388596985573
v_l2_leaf_reg=6: 0.3017310848972864
v_l2_leaf_reg=7: 0.30124657046788067
v_l2_leaf_reg=8: 0.30135031475830953
v_l2_leaf_reg=9: 0.3013984299117125
CPU times: user 15min 5s, sys: 1min 23s, total: 16min 29s
Wall time: 10min 56s


In [39]:
for v_rsm in range(1, 11, 1):
    model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 4,
        l2_leaf_reg = 5,
        rsm = v_rsm/10.0,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        X=df_train,
        y=y_train['target_13'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_13']),        
        verbose=False
    )
    y_predict = model.predict(df_test.values)
    print('v_rsm=' + str(v_rsm) + ': ' + str(sMAPE(y_test['target_13'], y_predict)))

v_rsm=1: 0.3035854288242296
v_rsm=2: 0.30209031568038675
v_rsm=3: 0.3028896969161858
v_rsm=4: 0.3031527001287216
v_rsm=5: 0.3048939097005321
v_rsm=6: 0.3031102990198186
v_rsm=7: 0.30299063903121254
v_rsm=8: 0.30311034042544377
v_rsm=9: 0.3026785327666148
v_rsm=10: 0.30123388596985573


In [None]:
model = catboost.CatBoostRegressor(
        iterations=2000, learning_rate=1.0, 
        depth = 4,
        l2_leaf_reg = 5,
        rsm = 1.0,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
model.fit(
        X=df_train,
        y=y_train['target_13'],
        use_best_model=True,
        eval_set=(df_test, y_test['target_13']),        
        verbose=True
    )
y_predict = model.predict(df_test)
print(sMAPE(y_test['target_13'], y_predict))

# Final model

In [26]:
for position in target_positions:
    model = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
        depth = 6,
        l2_leaf_reg = 9,
        rsm = 0.5,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        #X=X,
        #y=y['target_{}'.format(position)],
        X=df_train,
        y=y_train['target_{}'.format(position)],
        #use_best_model=True,
        #eval_set=(df_test, y_test['target_{}'.format(position)]),
        verbose=False)
    y_predict = model.predict(df_test)
    
    print('target_{}'.format(position))
    print('stupid:\t{}'.format(sMAPE(
        y_test['target_{}'.format(position)],
        y_train['target_{}'.format(position)].median())))
    print('model:\t{}'.format(sMAPE(
        y_test['target_{}'.format(position)],
        y_predict)))
    print()
    
    model_to_save['models'][position] = model

target_5
stupid:	0.5320582745268704
model:	0.38967307384144295

target_7
stupid:	0.47935138202803074
model:	0.34509572377927045

target_9
stupid:	0.44543648320323775
model:	0.31929153986009673

target_11
stupid:	0.4215099665002939
model:	0.29513082986460953

target_13
stupid:	0.399151538421787
model:	0.28320997788397195



In [16]:
for position in target_positions:
    model = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
        depth = 6,
        l2_leaf_reg = 9,
        rsm = 0.5,
        loss_function='MAE', 
        early_stopping_rounds = 90, random_seed=27)
    
    model.fit(
        #X=X,
        #y=y['target_{}'.format(position)],
        X=df_train,
        y=y_train['target_{}'.format(position)],
        use_best_model=True,
        eval_set=(df_test, y_test['target_{}'.format(position)]),
        verbose=False)
    y_predict = model.predict(df_test)
    
    print('target_{}'.format(position))
    print('stupid:\t{}'.format(sMAPE(
        y_test['target_{}'.format(position)],
        y_train['target_{}'.format(position)].median())))
    print('model:\t{}'.format(sMAPE(
        y_test['target_{}'.format(position)],
        y_predict)))
    print()
    
    model_to_save['models'][position] = model

target_5
stupid:	0.5320582745268704
model:	0.38967307384144295

target_7
stupid:	0.47935138202803074
model:	0.34509572377927045

target_9
stupid:	0.44543648320323775
model:	0.31929153986009673

target_11
stupid:	0.4215099665002939
model:	0.29513082986460953

target_13
stupid:	0.399151538421787
model:	0.28320997788397195



In [18]:
for position in target_positions:
    if str(position) == '5':
        print('5')
        model = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 6,
            l2_leaf_reg = 9,
            rsm = 0.5,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)
    else:
        print('not 5')

    if str(position) == '7':
        print('7')
        model = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 3,
            l2_leaf_reg = 8,
            rsm = 1.0,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)
    else:
        print('not 7')
        
    if str(position) == '9':
        print('9')
        model = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 3,
            l2_leaf_reg = 8,
            rsm = 0.7,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)
    else:
        print('not 9')
        
    if str(position) == '11':
        print('11')
        model = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 3,
            l2_leaf_reg = 2,
            rsm = 0.2,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)
    else:
        print('not 11')
        
    if str(position) == '13':
        print('13')
        model = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 4,
            l2_leaf_reg = 5,
            rsm = 1.0,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)
    else:
        print('not 13')
        
        
    model.fit(
            #X=X,
            #y=y['target_{}'.format(position)],
            X=df_train,
            y=y_train['target_{}'.format(position)],
            use_best_model=True,
            eval_set=(df_test, y_test['target_{}'.format(position)]),
            verbose=False)
    y_predict = model.predict(df_test)
    
    print('target_{}'.format(position))
    print('stupid:\t{}'.format(sMAPE(
        y_test['target_{}'.format(position)],
        y_train['target_{}'.format(position)].median())))
    print('model:\t{}'.format(sMAPE(
        y_test['target_{}'.format(position)],
        y_predict)))
    print()
    
    model_to_save['models'][position] = model

5
not 7
not 9
not 11
not 13
target_5
stupid:	0.5320582745268704
model:	0.38967307384144295

not 5
7
not 9
not 11
not 13
target_7
stupid:	0.47935138202803074
model:	0.38903923084121766

not 5
not 7
9
not 11
not 13
target_9
stupid:	0.44543648320323775
model:	0.36282038987974335

not 5
not 7
not 9
11
not 13
target_11
stupid:	0.4215099665002939
model:	0.3430122318574173

not 5
not 7
not 9
not 11
13


KeyboardInterrupt: 

In [38]:
model1 = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 6,
            l2_leaf_reg = 9,
            rsm = 0.5,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)

        
model1.fit(
            #X=X,
            #y=y['target_{}'.format(position)],
            X=df_train,
            y=y_train['target_5'],
            use_best_model=True,
            eval_set=(df_test, y_test['target_5']),
            verbose=False)
y_predict = model1.predict(df_test)
    
print('target_5')
print('stupid:\t{}'.format(sMAPE(
        y_test['target_5'],
        y_train['target_5'].median())))
print('model:\t{}'.format(sMAPE(
        y_test['target_5'],
        y_predict)))
    
model_to_save['models'][5] = model1

target_5
stupid:	0.6113491702802483
model:	0.4454646089289492


In [39]:
model2 = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 3,
            l2_leaf_reg = 8,
            rsm = 1.0,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)

        
model2.fit(
            #X=X,
            #y=y['target_{}'.format(position)],
            X=df_train,
            y=y_train['target_7'],
            use_best_model=True,
            eval_set=(df_test, y_test['target_7']),
            verbose=False)
y_predict = model2.predict(df_test)
    
print('target_7')
print('stupid:\t{}'.format(sMAPE(
        y_test['target_7'],
        y_train['target_7'].median())))
print('model:\t{}'.format(sMAPE(
        y_test['target_7'],
        y_predict)))
    
model_to_save['models'][7] = model2

target_7
stupid:	0.5761401162272479
model:	0.3977525434478633


In [40]:
model3 = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 3,
            l2_leaf_reg = 8,
            rsm = 0.7,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)

        
model3.fit(
            #X=X,
            #y=y['target_{}'.format(position)],
            X=df_train,
            y=y_train['target_9'],
            use_best_model=True,
            eval_set=(df_test, y_test['target_9']),
            verbose=False)
y_predict = model3.predict(df_test)
    
print('target_9')
print('stupid:\t{}'.format(sMAPE(
        y_test['target_9'],
        y_train['target_9'].median())))
print('model:\t{}'.format(sMAPE(
        y_test['target_9'],
        y_predict)))
    
model_to_save['models'][9] = model3

target_9
stupid:	0.5630557259108779
model:	0.3740277740344902


In [41]:
model4 = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 3,
            l2_leaf_reg = 2,
            rsm = 0.2,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)

        
model4.fit(
            #X=X,
            #y=y['target_{}'.format(position)],
            X=df_train,
            y=y_train['target_11'],
            use_best_model=True,
            eval_set=(df_test, y_test['target_11']),
            verbose=False)
y_predict = model4.predict(df_test)
    
print('target_11')
print('stupid:\t{}'.format(sMAPE(
        y_test['target_11'],
        y_train['target_11'].median())))
print('model:\t{}'.format(sMAPE(
        y_test['target_11'],
        y_predict)))
    
model_to_save['models'][11] = model4

target_11
stupid:	0.5587023576311817
model:	0.33844257766561336


In [45]:
model5 = catboost.CatBoostRegressor(iterations=2000, learning_rate=1.0, 
            depth = 4,
            l2_leaf_reg = 5,
            rsm = 1.0,
            loss_function='MAE', 
            early_stopping_rounds = 90, random_seed=27)

        
model5.fit(
            #X=X,
            #y=y['target_{}'.format(position)],
            X=df_train,
            y=y_train['target_13'],
            use_best_model=True,
            eval_set=(df_test, y_test['target_13']),
            verbose=True)
y_predict = model5.predict(df_test)
    
print('target_13')
print('stupid:\t{}'.format(sMAPE(
        y_test['target_13'],
        y_train['target_13'].median())))
print('model:\t{}'.format(sMAPE(
        y_test['target_13'],
        y_predict)))
    
model_to_save['models'][13] = model5

0:	learn: 717.5794077	test: 406.6442119	best: 406.6442119 (0)	total: 19.2ms	remaining: 38.4s
1:	learn: 717.0794662	test: 406.1442627	best: 406.1442627 (1)	total: 43.3ms	remaining: 43.3s
2:	learn: 716.5795330	test: 405.6448451	best: 405.6448451 (2)	total: 64.1ms	remaining: 42.7s
3:	learn: 716.0796872	test: 405.1452639	best: 405.1452639 (3)	total: 86.5ms	remaining: 43.2s
4:	learn: 715.5797747	test: 404.6453421	best: 404.6453421 (4)	total: 109ms	remaining: 43.4s
5:	learn: 715.0799452	test: 404.1505532	best: 404.1505532 (5)	total: 134ms	remaining: 44.7s
6:	learn: 714.5800567	test: 403.6506372	best: 403.6506372 (6)	total: 161ms	remaining: 46s
7:	learn: 714.0801618	test: 403.1507981	best: 403.1507981 (7)	total: 183ms	remaining: 45.6s
8:	learn: 713.5802845	test: 402.6510670	best: 402.6510670 (8)	total: 206ms	remaining: 45.5s
9:	learn: 713.0803482	test: 402.1513582	best: 402.1513582 (9)	total: 227ms	remaining: 45.1s
10:	learn: 712.5804184	test: 401.6608826	best: 401.6608826 (10)	total: 248ms	r

89:	learn: 673.0989285	test: 362.1920687	best: 362.1920687 (89)	total: 2.09s	remaining: 44.4s
90:	learn: 672.5993996	test: 361.6935693	best: 361.6935693 (90)	total: 2.12s	remaining: 44.4s
91:	learn: 672.0999424	test: 361.1951759	best: 361.1951759 (91)	total: 2.14s	remaining: 44.3s
92:	learn: 671.6005034	test: 360.6968237	best: 360.6968237 (92)	total: 2.16s	remaining: 44.3s
93:	learn: 671.1010820	test: 360.1984850	best: 360.1984850 (93)	total: 2.18s	remaining: 44.3s
94:	learn: 670.6016770	test: 359.7002220	best: 359.7002220 (94)	total: 2.2s	remaining: 44.2s
95:	learn: 670.1022904	test: 359.2019380	best: 359.2019380 (95)	total: 2.23s	remaining: 44.2s
96:	learn: 669.6028723	test: 358.7036920	best: 358.7036920 (96)	total: 2.25s	remaining: 44.2s
97:	learn: 669.1035141	test: 358.2054627	best: 358.2054627 (97)	total: 2.27s	remaining: 44.1s
98:	learn: 668.6041785	test: 357.7073223	best: 357.7073223 (98)	total: 2.29s	remaining: 44s
99:	learn: 668.1048300	test: 357.2091598	best: 357.2091598 (99)

178:	learn: 628.7167721	test: 318.7947389	best: 318.7947389 (178)	total: 4.16s	remaining: 42.3s
179:	learn: 628.2193276	test: 318.3142014	best: 318.3142014 (179)	total: 4.19s	remaining: 42.4s
180:	learn: 627.7219022	test: 317.9478726	best: 317.9478726 (180)	total: 4.21s	remaining: 42.3s
181:	learn: 627.2245340	test: 317.4675629	best: 317.4675629 (181)	total: 4.23s	remaining: 42.3s
182:	learn: 626.7272018	test: 316.9873465	best: 316.9873465 (182)	total: 4.26s	remaining: 42.3s
183:	learn: 626.2299275	test: 316.5072528	best: 316.5072528 (183)	total: 4.29s	remaining: 42.3s
184:	learn: 625.7327610	test: 316.0190792	best: 316.0190792 (184)	total: 4.31s	remaining: 42.3s
185:	learn: 625.2355382	test: 315.6532395	best: 315.6532395 (185)	total: 4.33s	remaining: 42.3s
186:	learn: 624.7382910	test: 315.2286415	best: 315.2286415 (186)	total: 4.36s	remaining: 42.2s
187:	learn: 624.2411395	test: 314.7377547	best: 314.7377547 (187)	total: 4.38s	remaining: 42.2s
188:	learn: 623.7438570	test: 314.246957

265:	learn: 585.6220861	test: 280.7875084	best: 280.7875084 (265)	total: 6.26s	remaining: 40.8s
266:	learn: 585.1295802	test: 280.4177533	best: 280.4177533 (266)	total: 6.28s	remaining: 40.8s
267:	learn: 584.6372012	test: 279.9832691	best: 279.9832691 (267)	total: 6.31s	remaining: 40.8s
268:	learn: 584.1450431	test: 279.6571545	best: 279.6571545 (268)	total: 6.33s	remaining: 40.7s
269:	learn: 583.6529571	test: 279.1815210	best: 279.1815210 (269)	total: 6.35s	remaining: 40.7s
270:	learn: 583.1606537	test: 278.7030406	best: 278.7030406 (270)	total: 6.37s	remaining: 40.7s
271:	learn: 582.6686630	test: 278.3355529	best: 278.3355529 (271)	total: 6.4s	remaining: 40.6s
272:	learn: 582.1767788	test: 277.9677146	best: 277.9677146 (272)	total: 6.42s	remaining: 40.6s
273:	learn: 581.6849171	test: 277.4899292	best: 277.4899292 (273)	total: 6.45s	remaining: 40.6s
274:	learn: 581.1932977	test: 277.0611813	best: 277.0611813 (274)	total: 6.47s	remaining: 40.6s
275:	learn: 580.7017917	test: 276.6953228

356:	learn: 541.3929999	test: 243.2052749	best: 243.2052749 (356)	total: 8.58s	remaining: 39.5s
357:	learn: 540.9161593	test: 242.7644548	best: 242.7644548 (357)	total: 8.6s	remaining: 39.5s
358:	learn: 540.4395220	test: 242.3243187	best: 242.3243187 (358)	total: 8.63s	remaining: 39.4s
359:	learn: 539.9633374	test: 241.8865251	best: 241.8865251 (359)	total: 8.65s	remaining: 39.4s
360:	learn: 539.4872817	test: 241.4476054	best: 241.4476054 (360)	total: 8.67s	remaining: 39.4s
361:	learn: 539.0115505	test: 241.0093306	best: 241.0093306 (361)	total: 8.7s	remaining: 39.4s
362:	learn: 538.5360961	test: 240.5716400	best: 240.5716400 (362)	total: 8.72s	remaining: 39.3s
363:	learn: 538.0609541	test: 240.1341688	best: 240.1341688 (363)	total: 8.74s	remaining: 39.3s
364:	learn: 537.5860187	test: 239.6974585	best: 239.6974585 (364)	total: 8.77s	remaining: 39.3s
365:	learn: 537.1114006	test: 239.2602108	best: 239.2602108 (365)	total: 8.79s	remaining: 39.3s
366:	learn: 536.6369939	test: 238.8246281	

442:	learn: 501.4005727	test: 207.8029698	best: 207.8029698 (442)	total: 10.7s	remaining: 37.7s
443:	learn: 500.9489815	test: 207.4196651	best: 207.4196651 (443)	total: 10.7s	remaining: 37.7s
444:	learn: 500.4978192	test: 207.0376375	best: 207.0376375 (444)	total: 10.8s	remaining: 37.6s
445:	learn: 500.0466516	test: 206.6565065	best: 206.6565065 (445)	total: 10.8s	remaining: 37.6s
446:	learn: 499.5959933	test: 206.2871857	best: 206.2871857 (446)	total: 10.8s	remaining: 37.6s
447:	learn: 499.1457173	test: 205.9071554	best: 205.9071554 (447)	total: 10.9s	remaining: 37.6s
448:	learn: 498.6958804	test: 205.5282034	best: 205.5282034 (448)	total: 10.9s	remaining: 37.6s
449:	learn: 498.2460364	test: 205.1503184	best: 205.1503184 (449)	total: 10.9s	remaining: 37.5s
450:	learn: 497.7968245	test: 204.7729624	best: 204.7729624 (450)	total: 10.9s	remaining: 37.5s
451:	learn: 497.3478656	test: 204.3989590	best: 204.3989590 (451)	total: 10.9s	remaining: 37.5s
452:	learn: 496.8992905	test: 204.030384

535:	learn: 460.8420063	test: 176.4359049	best: 176.4359049 (535)	total: 13.1s	remaining: 35.6s
536:	learn: 460.4237698	test: 176.1469497	best: 176.1469497 (536)	total: 13.1s	remaining: 35.6s
537:	learn: 460.0059463	test: 175.8590252	best: 175.8590252 (537)	total: 13.1s	remaining: 35.6s
538:	learn: 459.5880234	test: 175.6421441	best: 175.6421441 (538)	total: 13.1s	remaining: 35.6s
539:	learn: 459.1702924	test: 175.3534877	best: 175.3534877 (539)	total: 13.1s	remaining: 35.6s
540:	learn: 458.7528801	test: 175.1040757	best: 175.1040757 (540)	total: 13.2s	remaining: 35.5s
541:	learn: 458.3365628	test: 174.8199196	best: 174.8199196 (541)	total: 13.2s	remaining: 35.5s
542:	learn: 457.9199671	test: 174.5728294	best: 174.5728294 (542)	total: 13.2s	remaining: 35.5s
543:	learn: 457.5044485	test: 174.2904141	best: 174.2904141 (543)	total: 13.2s	remaining: 35.5s
544:	learn: 457.0884644	test: 174.0143953	best: 174.0143953 (544)	total: 13.3s	remaining: 35.4s
545:	learn: 456.6727617	test: 173.748416

622:	learn: 425.9356895	test: 155.4173376	best: 155.4173376 (622)	total: 15.2s	remaining: 33.6s
623:	learn: 425.5524775	test: 155.2022815	best: 155.2022815 (623)	total: 15.2s	remaining: 33.6s
624:	learn: 425.1701970	test: 155.0054822	best: 155.0054822 (624)	total: 15.3s	remaining: 33.6s
625:	learn: 424.7876609	test: 154.7966901	best: 154.7966901 (625)	total: 15.3s	remaining: 33.6s
626:	learn: 424.4056334	test: 154.5867861	best: 154.5867861 (626)	total: 15.3s	remaining: 33.5s
627:	learn: 424.0239899	test: 154.3776220	best: 154.3776220 (627)	total: 15.3s	remaining: 33.5s
628:	learn: 423.6431051	test: 154.1799149	best: 154.1799149 (628)	total: 15.4s	remaining: 33.5s
629:	learn: 423.2628560	test: 153.9808424	best: 153.9808424 (629)	total: 15.4s	remaining: 33.5s
630:	learn: 422.8830042	test: 153.7820884	best: 153.7820884 (630)	total: 15.4s	remaining: 33.4s
631:	learn: 422.5032118	test: 153.5703248	best: 153.5703248 (631)	total: 15.4s	remaining: 33.4s
632:	learn: 422.1241994	test: 153.376321

710:	learn: 393.9459258	test: 140.2989058	best: 140.2989058 (710)	total: 17.4s	remaining: 31.5s
711:	learn: 393.6028150	test: 140.1488881	best: 140.1488881 (711)	total: 17.4s	remaining: 31.5s
712:	learn: 393.2599467	test: 140.0191625	best: 140.0191625 (712)	total: 17.4s	remaining: 31.5s
713:	learn: 392.9175479	test: 139.8936269	best: 139.8936269 (713)	total: 17.5s	remaining: 31.4s
714:	learn: 392.5754251	test: 139.7581788	best: 139.7581788 (714)	total: 17.5s	remaining: 31.4s
715:	learn: 392.2340496	test: 139.6112157	best: 139.6112157 (715)	total: 17.5s	remaining: 31.4s
716:	learn: 391.8931732	test: 139.5068207	best: 139.5068207 (716)	total: 17.5s	remaining: 31.4s
717:	learn: 391.5525890	test: 139.3695308	best: 139.3695308 (717)	total: 17.6s	remaining: 31.3s
718:	learn: 391.2124163	test: 139.2441392	best: 139.2441392 (718)	total: 17.6s	remaining: 31.3s
719:	learn: 390.8727325	test: 139.1097745	best: 139.1097745 (719)	total: 17.6s	remaining: 31.3s
720:	learn: 390.5334343	test: 138.989721

800:	learn: 364.8194969	test: 129.8521647	best: 129.8521647 (800)	total: 19.5s	remaining: 29.2s
801:	learn: 364.5168347	test: 129.7386288	best: 129.7386288 (801)	total: 19.5s	remaining: 29.2s
802:	learn: 364.2142506	test: 129.6394781	best: 129.6394781 (802)	total: 19.5s	remaining: 29.1s
803:	learn: 363.9120297	test: 129.5407885	best: 129.5407885 (803)	total: 19.6s	remaining: 29.1s
804:	learn: 363.6102607	test: 129.4425833	best: 129.4425833 (804)	total: 19.6s	remaining: 29.1s
805:	learn: 363.3101953	test: 129.3472986	best: 129.3472986 (805)	total: 19.6s	remaining: 29.1s
806:	learn: 363.0085719	test: 129.2597496	best: 129.2597496 (806)	total: 19.6s	remaining: 29s
807:	learn: 362.7076356	test: 129.1726679	best: 129.1726679 (807)	total: 19.7s	remaining: 29s
808:	learn: 362.4088433	test: 129.0788577	best: 129.0788577 (808)	total: 19.7s	remaining: 29s
809:	learn: 362.1086910	test: 128.9922072	best: 128.9922072 (809)	total: 19.7s	remaining: 28.9s
810:	learn: 361.8100961	test: 128.9068720	best

890:	learn: 339.3758754	test: 122.9496334	best: 122.9496334 (890)	total: 21.6s	remaining: 26.8s
891:	learn: 339.1132968	test: 122.8922938	best: 122.8922938 (891)	total: 21.6s	remaining: 26.8s
892:	learn: 338.8491287	test: 122.8429400	best: 122.8429400 (892)	total: 21.6s	remaining: 26.8s
893:	learn: 338.5874175	test: 122.7873171	best: 122.7873171 (893)	total: 21.6s	remaining: 26.8s
894:	learn: 338.3262160	test: 122.7321753	best: 122.7321753 (894)	total: 21.7s	remaining: 26.7s
895:	learn: 338.0654044	test: 122.6774734	best: 122.6774734 (895)	total: 21.7s	remaining: 26.7s
896:	learn: 337.8049651	test: 122.6231330	best: 122.6231330 (896)	total: 21.7s	remaining: 26.7s
897:	learn: 337.5450390	test: 122.5681416	best: 122.5681416 (897)	total: 21.7s	remaining: 26.7s
898:	learn: 337.2854434	test: 122.5145149	best: 122.5145149 (898)	total: 21.8s	remaining: 26.6s
899:	learn: 337.0263251	test: 122.4612501	best: 122.4612501 (899)	total: 21.8s	remaining: 26.6s
900:	learn: 336.7675817	test: 122.408345

980:	learn: 317.3873950	test: 119.2795503	best: 119.2795503 (980)	total: 23.7s	remaining: 24.6s
981:	learn: 317.1624356	test: 119.2624985	best: 119.2624985 (981)	total: 23.7s	remaining: 24.6s
982:	learn: 316.9358394	test: 119.2312663	best: 119.2312663 (982)	total: 23.7s	remaining: 24.6s
983:	learn: 316.7117144	test: 119.2146725	best: 119.2146725 (983)	total: 23.8s	remaining: 24.5s
984:	learn: 316.4876212	test: 119.1861927	best: 119.1861927 (984)	total: 23.8s	remaining: 24.5s
985:	learn: 316.2639998	test: 119.1579760	best: 119.1579760 (985)	total: 23.8s	remaining: 24.5s
986:	learn: 316.0389346	test: 119.1369052	best: 119.1369052 (986)	total: 23.8s	remaining: 24.5s
987:	learn: 315.8144773	test: 119.1070449	best: 119.1070449 (987)	total: 23.9s	remaining: 24.4s
988:	learn: 315.5905010	test: 119.0774692	best: 119.0774692 (988)	total: 23.9s	remaining: 24.4s
989:	learn: 315.3689087	test: 119.0574528	best: 119.0574528 (989)	total: 23.9s	remaining: 24.4s
990:	learn: 315.1472900	test: 119.030399

1072:	learn: 298.3344909	test: 117.8789463	best: 117.8789463 (1072)	total: 25.8s	remaining: 22.3s
1073:	learn: 298.1458791	test: 117.8765113	best: 117.8765113 (1073)	total: 25.8s	remaining: 22.2s
1074:	learn: 297.9560462	test: 117.8673012	best: 117.8673012 (1074)	total: 25.8s	remaining: 22.2s
1075:	learn: 297.7682460	test: 117.8612115	best: 117.8612115 (1075)	total: 25.8s	remaining: 22.2s
1076:	learn: 297.5809435	test: 117.8517300	best: 117.8517300 (1076)	total: 25.8s	remaining: 22.1s
1077:	learn: 297.3943240	test: 117.8456224	best: 117.8456224 (1077)	total: 25.9s	remaining: 22.1s
1078:	learn: 297.2073224	test: 117.8437455	best: 117.8437455 (1078)	total: 25.9s	remaining: 22.1s
1079:	learn: 297.0192431	test: 117.8354893	best: 117.8354893 (1079)	total: 25.9s	remaining: 22.1s
1080:	learn: 296.8338156	test: 117.8365395	best: 117.8354893 (1079)	total: 25.9s	remaining: 22s
1081:	learn: 296.6485479	test: 117.8228415	best: 117.8228415 (1081)	total: 26s	remaining: 22s
1082:	learn: 296.4613869	t

1162:	learn: 282.6501935	test: 117.7104787	best: 117.6904238 (1143)	total: 27.8s	remaining: 20s
1163:	learn: 282.4885180	test: 117.7084377	best: 117.6904238 (1143)	total: 27.8s	remaining: 20s
1164:	learn: 282.3272873	test: 117.7066303	best: 117.6904238 (1143)	total: 27.8s	remaining: 20s
1165:	learn: 282.1689477	test: 117.7117290	best: 117.6904238 (1143)	total: 27.9s	remaining: 19.9s
1166:	learn: 282.0091719	test: 117.7175444	best: 117.6904238 (1143)	total: 27.9s	remaining: 19.9s
1167:	learn: 281.8488486	test: 117.7161316	best: 117.6904238 (1143)	total: 27.9s	remaining: 19.9s
1168:	learn: 281.6915776	test: 117.7183838	best: 117.6904238 (1143)	total: 27.9s	remaining: 19.9s
1169:	learn: 281.5317149	test: 117.7169316	best: 117.6904238 (1143)	total: 28s	remaining: 19.8s
1170:	learn: 281.3727407	test: 117.7120942	best: 117.6904238 (1143)	total: 28s	remaining: 19.8s
1171:	learn: 281.2141973	test: 117.7123420	best: 117.6904238 (1143)	total: 28s	remaining: 19.8s
1172:	learn: 281.0579893	test: 1

Our model is better than constant solution. Saving model.

In [27]:
pickle.dump(model_to_save, open('models.pkl', 'wb'))