### Problem Statement
This is a private hackathon whose primary purpose is for the members of the DSN Ai+ Club Ibadan to apply what they have learnt. If you are part of Ai+ Club Ibadan contact the club leader for the secret code.

You may have seen recent news articles stating that air quality has improved due to COVID-19. This is true for some locations, but as always the truth is a little more complicated. In parts of many African cities, air quality seems to be getting worse as more people stay at home. For this challenge we’ll be digging deeper into the data, finding ways to track air quality and how it is changing, even in places without ground-based sensors. This information will be especially useful in the face of the current crisis, since poor air quality makes a respiratory disease like COVID-19 more dangerous.

We’ve collected weather data and daily observations from the Sentinel 5P satellite tracking various pollutants in the atmosphere. Your goal is to use this information to predict PM2.5 particulate matter concentration (a common measure of air quality that normally requires ground-based sensors to measure) every day for each city. The data covers the last three months, spanning hundreds of cities across the globe.

#### Evaluation Metric
The error metric for this competition is the Root Mean Squared Error

Submissions should follow the sample submission format, with ‘Place_ID X Date’ in one column and predictions for ‘target’ in the other.

Place_ID X Date        target
0OS9LVX X 2020-01-02     2
0OS9LVX X 2020-01-03     91
0OS9LVX X 2020-01-04     34

#### Hackathon link
https://zindi.africa/hackathons/urban-air-pollution-hackathon

In [1]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
import lightgbm as lgb
import catboost as cat
from sklearn.metrics import mean_absolute_error,mean_squared_error
from sklearn.model_selection import KFold,train_test_split,StratifiedKFold
from sklearn.preprocessing import PolynomialFeatures
%matplotlib inline

In [2]:
def freq_enc(df,cols):
    for col in cols:
        df[col] = df[col].map(df[col].value_counts().to_dict())/len(df)
    return df

def rmse(y_true,y_hat):
    return np.sqrt(mean_squared_error(y_true,y_hat))

In [3]:
train = pd.read_csv('Train.csv',parse_dates=['Date'])
test = pd.read_csv('Test.csv',parse_dates=['Date'])
submission = pd.read_csv('SampleSubmission.csv')

In [4]:
print(train.shape)
train.head()

(30557, 82)


Unnamed: 0,Place_ID X Date,Date,Place_ID,target,target_min,target_max,target_variance,target_count,precipitable_water_entire_atmosphere,relative_humidity_2m_above_ground,...,L3_SO2_sensor_zenith_angle,L3_SO2_solar_azimuth_angle,L3_SO2_solar_zenith_angle,L3_CH4_CH4_column_volume_mixing_ratio_dry_air,L3_CH4_aerosol_height,L3_CH4_aerosol_optical_depth,L3_CH4_sensor_azimuth_angle,L3_CH4_sensor_zenith_angle,L3_CH4_solar_azimuth_angle,L3_CH4_solar_zenith_angle
0,010Q650 X 2020-01-02,2020-01-02,010Q650,38.0,23.0,53.0,769.5,92,11.0,60.200001,...,38.593017,-61.752587,22.363665,1793.793579,3227.855469,0.010579,74.481049,37.501499,-62.142639,22.545118
1,010Q650 X 2020-01-03,2020-01-03,010Q650,39.0,25.0,63.0,1319.85,91,14.6,48.799999,...,59.624912,-67.693509,28.614804,1789.960449,3384.226562,0.015104,75.630043,55.657486,-53.868134,19.293652
2,010Q650 X 2020-01-04,2020-01-04,010Q650,24.0,8.0,56.0,1181.96,96,16.4,33.400002,...,49.839714,-78.342701,34.296977,,,,,,,
3,010Q650 X 2020-01-05,2020-01-05,010Q650,49.0,10.0,55.0,1113.67,96,6.911948,21.300001,...,29.181258,-73.896588,30.545446,,,,,,,
4,010Q650 X 2020-01-06,2020-01-06,010Q650,21.0,9.0,52.0,1164.82,95,13.900001,44.700001,...,0.797294,-68.61248,26.899694,,,,,,,


In [5]:
train.drop(['target_min','target_max','target_variance','target_count'],axis=1,inplace=True)

#### Creating date time features

In [6]:
train['Place_ID_freq'] = train['Place_ID'].map(train['Place_ID'].value_counts().to_dict())/len(train)
test['Place_ID_freq'] = test['Place_ID'].map(test['Place_ID'].value_counts().to_dict())/len(train)

In [7]:
def create_date_featues(df,col):

    df['Year'] = df[col].dt.year

    df['Month'] = df[col].dt.month

    df['Day'] = df[col].dt.day

    df['Dayofweek'] = df[col].dt.dayofweek

    df['DayOfyear'] = df[col].dt.dayofyear

    df['Week'] = df[col].dt.week

    df['Quarter'] = df[col].dt.quarter 

    df['Is_month_start'] = df[col].dt.is_month_start

    df['Is_month_end'] = df[col].dt.is_month_end

    df['Is_weekend'] = np.where(df['Dayofweek'].isin([5,6]),1,0)

    return df

#### Dropping columns with >= 70% missing values

In [8]:
train.drop(train.isna().sum()[train.isna().sum()/len(train) > 0.7].index,axis=1,inplace=True)
test.drop(test.isna().sum()[test.isna().sum()/len(test) > 0.7].index,axis=1,inplace=True)

In [9]:
train.columns[:10]

Index(['Place_ID X Date', 'Date', 'Place_ID', 'target',
       'precipitable_water_entire_atmosphere',
       'relative_humidity_2m_above_ground',
       'specific_humidity_2m_above_ground', 'temperature_2m_above_ground',
       'u_component_of_wind_10m_above_ground',
       'v_component_of_wind_10m_above_ground'],
      dtype='object')

#### Creating some Lag features over each day

In [10]:
ntrain = train.shape[0]
df = pd.concat([train, test]).reset_index(drop=True)

In [11]:
features = [c for c in train.columns if c not in ['Place_ID X Date', 'Date', 'Place_ID', 'target','Place_ID_freq']]

In [12]:
for i in range(1, 8):
    df[f'prev_target_{i}'] = df.sort_values(by='Date')['target'].fillna(method='ffill').shift(i).sort_index()
    df[f'next_target_{i}'] = df.sort_values(by='Date')['target'].fillna(method='bfill').shift(-i).sort_index()

In [13]:
for i in range(1, 8):
    df[f'prev_magic_feat_expanding_{i}'] = df.sort_values(by='Date')['target'].shift(i).expanding().mean().fillna(method='ffill').sort_index()
    df[f'next_magic_feat_expanding_{i}'] = df.sort_values(by='Date')['target'].shift(-i).expanding().mean().fillna(method='bfill').sort_index()


In [14]:
df = create_date_featues(df,'Date')

In [15]:
train = df[:ntrain]
test = df[ntrain:]

#### Generating new features for all original features based on place_ID

In [16]:
for col in features:
    train[f"Place_ID_{col}_mean"] = train['Place_ID'].map(train.groupby('Place_ID')[col].mean())
    train[f"Place_ID_{col}_var"] = train['Place_ID'].map(train.groupby('Place_ID')[col].var())
    
    test[f"Place_ID_{col}_mean"] = test['Place_ID'].map(test.groupby('Place_ID')[col].mean())
    test[f"Place_ID_{col}_var"] = test['Place_ID'].map(test.groupby('Place_ID')[col].var())

In [17]:
to_drop = ['Place_ID X Date', 'Date', 'Place_ID', 'target']
y = train['target']
X = train.drop(to_drop,axis=1)
test.drop(to_drop,axis=1,inplace=True)

In [18]:
X.shape,test.shape,y.shape

((30557, 240), (16136, 240), (30557,))

#### Modelling part
I used catboost with 10 CV folds, i also optimize MAE instead of RMSE in order to cater for outliers.<br>
Since MAE is more robust to outliers when compared with RMSE. 

In [21]:
model = cat.CatBoostRegressor(n_estimators=5000,max_depth=6,eval_metric='MAE',reg_lambda=45)
#MAE,6

In [22]:
kf = KFold(n_splits=10,random_state=1900,shuffle=False)

In [23]:
scores = []
pred_test = np.zeros(len(test))
for fold,(train_index,test_index) in enumerate(kf.split(X,y)):
    X_train,X_test = X.iloc[train_index],X.iloc[test_index]
    y_train,y_test = y.iloc[train_index],y.iloc[test_index]
    model.fit(X_train,y_train,eval_set=[(X_train,y_train),(X_test,y_test)],early_stopping_rounds=300,verbose=50,use_best_model=True)
    scores.append(rmse(y_test,model.predict(X_test)))
    pred_test += model.predict(test)

0:	learn: 34.8187085	test: 34.8187085	test1: 30.9392893	best: 30.9392893 (0)	total: 224ms	remaining: 18m 38s
50:	learn: 23.0431923	test: 23.0431923	test1: 22.3838789	best: 22.3838789 (50)	total: 3.9s	remaining: 6m 18s
100:	learn: 19.7354064	test: 19.7354064	test1: 20.5737422	best: 20.5522196 (96)	total: 7.48s	remaining: 6m 2s
150:	learn: 18.2654730	test: 18.2654730	test1: 20.0231401	best: 20.0231401 (150)	total: 10.9s	remaining: 5m 51s
200:	learn: 17.3921496	test: 17.3921496	test1: 19.8060511	best: 19.8054328 (199)	total: 14.4s	remaining: 5m 44s
250:	learn: 16.8335239	test: 16.8335239	test1: 19.6423192	best: 19.6423192 (250)	total: 17.9s	remaining: 5m 38s
300:	learn: 16.4394079	test: 16.4394079	test1: 19.4588282	best: 19.4588282 (300)	total: 21.4s	remaining: 5m 33s
350:	learn: 16.1429105	test: 16.1429105	test1: 19.3569341	best: 19.3442898 (347)	total: 24.8s	remaining: 5m 28s
400:	learn: 15.8915167	test: 15.8915167	test1: 19.2770317	best: 19.2770317 (400)	total: 28.2s	remaining: 5m 23s


750:	learn: 14.2082865	test: 14.2082865	test1: 21.2861327	best: 21.2861327 (750)	total: 52.2s	remaining: 4m 55s
800:	learn: 14.0829911	test: 14.0829911	test1: 21.2158648	best: 21.2109771 (797)	total: 55.6s	remaining: 4m 51s
850:	learn: 13.9458329	test: 13.9458329	test1: 21.1649925	best: 21.1649925 (850)	total: 59s	remaining: 4m 47s
900:	learn: 13.8138019	test: 13.8138019	test1: 21.1259594	best: 21.1194826 (893)	total: 1m 2s	remaining: 4m 44s
950:	learn: 13.6807829	test: 13.6807829	test1: 21.0608267	best: 21.0608267 (950)	total: 1m 5s	remaining: 4m 40s
1000:	learn: 13.5649070	test: 13.5649070	test1: 21.0190285	best: 21.0137665 (997)	total: 1m 9s	remaining: 4m 36s
1050:	learn: 13.4624701	test: 13.4624701	test1: 20.9714485	best: 20.9700694 (1046)	total: 1m 12s	remaining: 4m 33s
1100:	learn: 13.3636229	test: 13.3636229	test1: 20.9455860	best: 20.9455860 (1100)	total: 1m 16s	remaining: 4m 29s
1150:	learn: 13.2723598	test: 13.2723598	test1: 20.9113374	best: 20.9107994 (1149)	total: 1m 19s	re

4350:	learn: 10.3998465	test: 10.3998465	test1: 20.0696036	best: 20.0696036 (4350)	total: 4m 58s	remaining: 44.5s
4400:	learn: 10.3716885	test: 10.3716885	test1: 20.0635501	best: 20.0635501 (4400)	total: 5m 1s	remaining: 41s
4450:	learn: 10.3386446	test: 10.3386446	test1: 20.0528122	best: 20.0524913 (4447)	total: 5m 4s	remaining: 37.6s
4500:	learn: 10.3119920	test: 10.3119920	test1: 20.0521483	best: 20.0520155 (4494)	total: 5m 8s	remaining: 34.2s
4550:	learn: 10.2793594	test: 10.2793594	test1: 20.0485527	best: 20.0466824 (4523)	total: 5m 11s	remaining: 30.8s
4600:	learn: 10.2457805	test: 10.2457805	test1: 20.0456987	best: 20.0414732 (4578)	total: 5m 15s	remaining: 27.4s
4650:	learn: 10.2136580	test: 10.2136580	test1: 20.0464935	best: 20.0414732 (4578)	total: 5m 18s	remaining: 23.9s
4700:	learn: 10.1805255	test: 10.1805255	test1: 20.0402102	best: 20.0381390 (4679)	total: 5m 22s	remaining: 20.5s
4750:	learn: 10.1522268	test: 10.1522268	test1: 20.0265788	best: 20.0265788 (4750)	total: 5m 

2900:	learn: 11.7150246	test: 11.7150246	test1: 20.2952501	best: 20.2887679 (2876)	total: 3m 24s	remaining: 2m 28s
2950:	learn: 11.6785391	test: 11.6785391	test1: 20.2834149	best: 20.2834149 (2950)	total: 3m 28s	remaining: 2m 24s
3000:	learn: 11.6445463	test: 11.6445463	test1: 20.2790810	best: 20.2773033 (2996)	total: 3m 31s	remaining: 2m 21s
3050:	learn: 11.6078543	test: 11.6078543	test1: 20.2781410	best: 20.2760117 (3042)	total: 3m 35s	remaining: 2m 17s
3100:	learn: 11.5707129	test: 11.5707129	test1: 20.2671067	best: 20.2643512 (3078)	total: 3m 38s	remaining: 2m 13s
3150:	learn: 11.5450938	test: 11.5450938	test1: 20.2662744	best: 20.2638850 (3146)	total: 3m 42s	remaining: 2m 10s
3200:	learn: 11.5146307	test: 11.5146307	test1: 20.2636307	best: 20.2593155 (3164)	total: 3m 45s	remaining: 2m 6s
3250:	learn: 11.4785495	test: 11.4785495	test1: 20.2562795	best: 20.2546247 (3235)	total: 3m 49s	remaining: 2m 3s
3300:	learn: 11.4427746	test: 11.4427746	test1: 20.2382382	best: 20.2372460 (3297)

1450:	learn: 13.4365010	test: 13.4365010	test1: 17.7449593	best: 17.7409277 (1448)	total: 1m 53s	remaining: 4m 37s
1500:	learn: 13.3705655	test: 13.3705655	test1: 17.7500224	best: 17.7409277 (1448)	total: 1m 56s	remaining: 4m 32s
1550:	learn: 13.2949782	test: 13.2949782	test1: 17.7407459	best: 17.7313832 (1522)	total: 2m	remaining: 4m 27s
1600:	learn: 13.2254540	test: 13.2254540	test1: 17.7348549	best: 17.7313832 (1522)	total: 2m 3s	remaining: 4m 22s
1650:	learn: 13.1554014	test: 13.1554014	test1: 17.7182787	best: 17.7158902 (1644)	total: 2m 7s	remaining: 4m 18s
1700:	learn: 13.0836609	test: 13.0836609	test1: 17.7044363	best: 17.7044363 (1700)	total: 2m 10s	remaining: 4m 13s
1750:	learn: 13.0045534	test: 13.0045534	test1: 17.6668492	best: 17.6668492 (1750)	total: 2m 14s	remaining: 4m 8s
1800:	learn: 12.9282247	test: 12.9282247	test1: 17.6447344	best: 17.6431819 (1788)	total: 2m 17s	remaining: 4m 4s
1850:	learn: 12.8656983	test: 12.8656983	test1: 17.6280424	best: 17.6279613 (1849)	total

0:	learn: 34.6006459	test: 34.6006459	test1: 31.9165141	best: 31.9165141 (0)	total: 110ms	remaining: 9m 11s
50:	learn: 22.9085202	test: 22.9085202	test1: 22.8895935	best: 22.8895935 (50)	total: 3.81s	remaining: 6m 9s
100:	learn: 19.7464032	test: 19.7464032	test1: 20.7230157	best: 20.7230157 (100)	total: 7.41s	remaining: 5m 59s
150:	learn: 18.3780501	test: 18.3780501	test1: 19.9335182	best: 19.9335182 (150)	total: 10.9s	remaining: 5m 49s
200:	learn: 17.5636007	test: 17.5636007	test1: 19.2904286	best: 19.2904286 (200)	total: 14.5s	remaining: 5m 45s
250:	learn: 16.9778519	test: 16.9778519	test1: 18.9050764	best: 18.9050764 (250)	total: 18s	remaining: 5m 40s
300:	learn: 16.5555060	test: 16.5555060	test1: 18.6858772	best: 18.6858772 (300)	total: 21.5s	remaining: 5m 35s
350:	learn: 16.2402946	test: 16.2402946	test1: 18.4718840	best: 18.4718840 (350)	total: 25s	remaining: 5m 31s
400:	learn: 16.0159617	test: 16.0159617	test1: 18.3043742	best: 18.3043742 (400)	total: 28.5s	remaining: 5m 27s
450

3650:	learn: 11.5682148	test: 11.5682148	test1: 16.4964732	best: 16.4941678 (3631)	total: 4m 16s	remaining: 1m 34s
3700:	learn: 11.5394182	test: 11.5394182	test1: 16.4963585	best: 16.4917388 (3684)	total: 4m 19s	remaining: 1m 31s
3750:	learn: 11.5087075	test: 11.5087075	test1: 16.4909971	best: 16.4909971 (3750)	total: 4m 23s	remaining: 1m 27s
3800:	learn: 11.4779047	test: 11.4779047	test1: 16.4884929	best: 16.4833881 (3771)	total: 4m 26s	remaining: 1m 24s
3850:	learn: 11.4456998	test: 11.4456998	test1: 16.4943527	best: 16.4833881 (3771)	total: 4m 30s	remaining: 1m 20s
3900:	learn: 11.4163532	test: 11.4163532	test1: 16.4916349	best: 16.4833881 (3771)	total: 4m 33s	remaining: 1m 17s
3950:	learn: 11.3823280	test: 11.3823280	test1: 16.4946069	best: 16.4833881 (3771)	total: 4m 37s	remaining: 1m 13s
4000:	learn: 11.3504573	test: 11.3504573	test1: 16.4837340	best: 16.4828016 (3996)	total: 4m 40s	remaining: 1m 10s
4050:	learn: 11.3234599	test: 11.3234599	test1: 16.4854129	best: 16.4828016 (399

2200:	learn: 12.6742039	test: 12.6742039	test1: 18.6145404	best: 18.6144994 (2199)	total: 2m 45s	remaining: 3m 30s
2250:	learn: 12.6202987	test: 12.6202987	test1: 18.6031011	best: 18.5999323 (2233)	total: 2m 48s	remaining: 3m 26s
2300:	learn: 12.5881169	test: 12.5881169	test1: 18.5942155	best: 18.5935734 (2294)	total: 2m 52s	remaining: 3m 22s
2350:	learn: 12.5587466	test: 12.5587466	test1: 18.5825196	best: 18.5822783 (2345)	total: 2m 56s	remaining: 3m 18s
2400:	learn: 12.5241733	test: 12.5241733	test1: 18.5739694	best: 18.5724423 (2397)	total: 2m 59s	remaining: 3m 14s
2450:	learn: 12.4870042	test: 12.4870042	test1: 18.5593903	best: 18.5592960 (2448)	total: 3m 3s	remaining: 3m 10s
2500:	learn: 12.4467651	test: 12.4467651	test1: 18.5367703	best: 18.5364853 (2499)	total: 3m 7s	remaining: 3m 7s
2550:	learn: 12.4040200	test: 12.4040200	test1: 18.5255001	best: 18.5252662 (2548)	total: 3m 11s	remaining: 3m 3s
2600:	learn: 12.3647253	test: 12.3647253	test1: 18.5097888	best: 18.5058384 (2596)	t

750:	learn: 14.5465261	test: 14.5465261	test1: 21.6807412	best: 21.6807412 (750)	total: 55s	remaining: 5m 11s
800:	learn: 14.4195204	test: 14.4195204	test1: 21.5853983	best: 21.5853983 (800)	total: 58.7s	remaining: 5m 7s
850:	learn: 14.2799888	test: 14.2799888	test1: 21.4851949	best: 21.4803840 (847)	total: 1m 2s	remaining: 5m 3s
900:	learn: 14.1580995	test: 14.1580995	test1: 21.3846716	best: 21.3846716 (900)	total: 1m 5s	remaining: 4m 59s
950:	learn: 14.0272859	test: 14.0272859	test1: 21.2869865	best: 21.2782266 (947)	total: 1m 9s	remaining: 4m 56s
1000:	learn: 13.9331551	test: 13.9331551	test1: 21.2230892	best: 21.2226694 (998)	total: 1m 13s	remaining: 4m 53s
1050:	learn: 13.8363167	test: 13.8363167	test1: 21.1619932	best: 21.1589437 (1046)	total: 1m 17s	remaining: 4m 49s
1100:	learn: 13.7719218	test: 13.7719218	test1: 21.1072919	best: 21.1072919 (1100)	total: 1m 20s	remaining: 4m 45s
1150:	learn: 13.6934923	test: 13.6934923	test1: 21.0636140	best: 21.0636140 (1150)	total: 1m 24s	rem

4350:	learn: 11.0871649	test: 11.0871649	test1: 20.0937869	best: 20.0928346 (4339)	total: 5m 14s	remaining: 46.8s
4400:	learn: 11.0584341	test: 11.0584341	test1: 20.0994822	best: 20.0927824 (4362)	total: 5m 17s	remaining: 43.2s
4450:	learn: 11.0250463	test: 11.0250463	test1: 20.1085118	best: 20.0927824 (4362)	total: 5m 21s	remaining: 39.6s
4500:	learn: 10.9952746	test: 10.9952746	test1: 20.0998894	best: 20.0927824 (4362)	total: 5m 24s	remaining: 36s
4550:	learn: 10.9694199	test: 10.9694199	test1: 20.0888470	best: 20.0887193 (4549)	total: 5m 28s	remaining: 32.4s
4600:	learn: 10.9434199	test: 10.9434199	test1: 20.0789249	best: 20.0781962 (4590)	total: 5m 32s	remaining: 28.8s
4650:	learn: 10.9201592	test: 10.9201592	test1: 20.0690297	best: 20.0671434 (4638)	total: 5m 35s	remaining: 25.2s
4700:	learn: 10.8907759	test: 10.8907759	test1: 20.0675846	best: 20.0636900 (4685)	total: 5m 39s	remaining: 21.6s
4750:	learn: 10.8643047	test: 10.8643047	test1: 20.0630656	best: 20.0624764 (4732)	total: 

2900:	learn: 11.7216828	test: 11.7216828	test1: 24.0584124	best: 24.0584124 (2900)	total: 3m 31s	remaining: 2m 33s
2950:	learn: 11.6874324	test: 11.6874324	test1: 24.0526066	best: 24.0497419 (2938)	total: 3m 35s	remaining: 2m 29s
3000:	learn: 11.6413569	test: 11.6413569	test1: 24.0418439	best: 24.0416595 (2998)	total: 3m 38s	remaining: 2m 25s
3050:	learn: 11.6004838	test: 11.6004838	test1: 24.0370430	best: 24.0370430 (3050)	total: 3m 42s	remaining: 2m 21s
3100:	learn: 11.5626312	test: 11.5626312	test1: 24.0232769	best: 24.0232769 (3100)	total: 3m 45s	remaining: 2m 18s
3150:	learn: 11.5246409	test: 11.5246409	test1: 24.0096230	best: 24.0096230 (3150)	total: 3m 48s	remaining: 2m 14s
3200:	learn: 11.4928239	test: 11.4928239	test1: 23.9961522	best: 23.9961522 (3200)	total: 3m 52s	remaining: 2m 10s
3250:	learn: 11.4541207	test: 11.4541207	test1: 23.9958170	best: 23.9916633 (3235)	total: 3m 55s	remaining: 2m 6s
3300:	learn: 11.4228682	test: 11.4228682	test1: 23.9754996	best: 23.9749377 (3298

1450:	learn: 13.1960847	test: 13.1960847	test1: 20.2837859	best: 20.2837859 (1450)	total: 1m 41s	remaining: 4m 8s
1500:	learn: 13.1305583	test: 13.1305583	test1: 20.2503650	best: 20.2503650 (1500)	total: 1m 45s	remaining: 4m 5s
1550:	learn: 13.0573637	test: 13.0573637	test1: 20.2268073	best: 20.2268073 (1550)	total: 1m 48s	remaining: 4m 1s
1600:	learn: 12.9977101	test: 12.9977101	test1: 20.2124176	best: 20.2120305 (1599)	total: 1m 52s	remaining: 3m 58s
1650:	learn: 12.9408154	test: 12.9408154	test1: 20.1766211	best: 20.1762700 (1645)	total: 1m 55s	remaining: 3m 54s
1700:	learn: 12.8830781	test: 12.8830781	test1: 20.1635394	best: 20.1628081 (1699)	total: 1m 59s	remaining: 3m 50s
1750:	learn: 12.8190046	test: 12.8190046	test1: 20.1505361	best: 20.1505361 (1750)	total: 2m 2s	remaining: 3m 47s
1800:	learn: 12.7547092	test: 12.7547092	test1: 20.1303764	best: 20.1303764 (1800)	total: 2m 6s	remaining: 3m 44s
1850:	learn: 12.7051801	test: 12.7051801	test1: 20.1148511	best: 20.1146119 (1849)	to

0:	learn: 34.6441341	test: 34.6441341	test1: 32.4473777	best: 32.4473777 (0)	total: 99.6ms	remaining: 8m 17s
50:	learn: 22.8611317	test: 22.8611317	test1: 23.7429123	best: 23.7429123 (50)	total: 3.9s	remaining: 6m 18s
100:	learn: 19.5689613	test: 19.5689613	test1: 21.5586248	best: 21.5586248 (100)	total: 7.51s	remaining: 6m 4s
150:	learn: 18.1092005	test: 18.1092005	test1: 20.8093893	best: 20.8093893 (150)	total: 11s	remaining: 5m 54s
200:	learn: 17.2557638	test: 17.2557638	test1: 20.3942316	best: 20.3901731 (198)	total: 14.5s	remaining: 5m 46s
250:	learn: 16.6306318	test: 16.6306318	test1: 20.0510280	best: 20.0510280 (250)	total: 18s	remaining: 5m 40s
300:	learn: 16.2134139	test: 16.2134139	test1: 19.8287487	best: 19.8287487 (300)	total: 21.5s	remaining: 5m 35s
350:	learn: 15.8728142	test: 15.8728142	test1: 19.6378193	best: 19.6378193 (350)	total: 24.9s	remaining: 5m 30s
400:	learn: 15.6607864	test: 15.6607864	test1: 19.5149669	best: 19.5149669 (400)	total: 28.4s	remaining: 5m 25s
450

3650:	learn: 11.3757019	test: 11.3757019	test1: 17.8977737	best: 17.8975853 (3649)	total: 4m 13s	remaining: 1m 33s
3700:	learn: 11.3519650	test: 11.3519650	test1: 17.8940651	best: 17.8915895 (3669)	total: 4m 16s	remaining: 1m 30s
3750:	learn: 11.3248920	test: 11.3248920	test1: 17.8905583	best: 17.8902052 (3747)	total: 4m 20s	remaining: 1m 26s
3800:	learn: 11.3021034	test: 11.3021034	test1: 17.8873758	best: 17.8868868 (3757)	total: 4m 23s	remaining: 1m 23s
3850:	learn: 11.2808639	test: 11.2808639	test1: 17.8812802	best: 17.8812802 (3850)	total: 4m 27s	remaining: 1m 19s
3900:	learn: 11.2586743	test: 11.2586743	test1: 17.8749924	best: 17.8749131 (3898)	total: 4m 31s	remaining: 1m 16s
3950:	learn: 11.2407061	test: 11.2407061	test1: 17.8687469	best: 17.8676000 (3946)	total: 4m 34s	remaining: 1m 12s
4000:	learn: 11.2179727	test: 11.2179727	test1: 17.8614401	best: 17.8612996 (3996)	total: 4m 37s	remaining: 1m 9s
4050:	learn: 11.1940403	test: 11.1940403	test1: 17.8555944	best: 17.8555944 (4050

In [24]:
np.mean(scores)

28.58684209951778

In [25]:
final = pred_test/10

In [26]:
submission['target'] = final

In [27]:
submission.describe()

Unnamed: 0,target
count,16136.0
mean,57.751247
std,32.862284
min,1.919829
25%,33.598168
50%,49.553901
75%,74.519014
max,249.638638


In [28]:
submission.to_csv('cat2.csv',index=False)