Genpact Machine Learning Hackathon - 2018 December 15-16 Hosted by Analytics Vidhaya

Problem Statement

Your client is a meal delivery company which operates in multiple cities. They have various fulfillment centers in these cities for dispatching meal orders to their customers. The client wants you to help these centers with demand forecasting for upcoming weeks so that these centers will plan the stock of raw materials accordingly.
The replenishment of majority of raw materials is done on weekly basis and since the raw material is perishable, the procurement planning is of utmost importance. Secondly, staffing of the centers is also one area wherein accurate demand forecasts are really helpful. Given the following information, the task is to predict the demand for the next 10 weeks (Weeks: 146-155) for the center-meal combinations in the test set:

 

    Historical data of demand for a product-center combination (Weeks: 1 to 145)
    Product(Meal) features such as category, sub-category, current price and discount
    Information for fulfillment center like center area, city information etc.
    
Evaluation Metric
The evaluation metric for this competition is 100*RMSLE where RMSLE is Root of Mean Squared Logarithmic Error across all entries in the test set. 


In [1]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

print(tf.__version__)

OUTDIR = './Genpact'
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time

  from ._conv import register_converters as _register_converters


1.8.0


In [2]:
train_df  = pd.read_csv("train.csv", sep=",")
test_df   = pd.read_csv("test.csv", sep=",")
meal_df   = pd.read_csv("meal_info.csv",sep=",")  
center_df = pd.read_csv("fulfilment_center_info.csv",sep=",")

In [3]:
train = pd.merge(pd.merge(train_df,meal_df,on='meal_id'),center_df,on='center_id')
test  = pd.merge(pd.merge(test_df,meal_df,on='meal_id'),center_df,on='center_id')

In [4]:
train.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders,category,cuisine,city_code,region_code,center_type,op_area
0,1379560,1,55,1885,136.83,152.29,0,0,177,Beverages,Thai,647,56,TYPE_C,2.0
1,1018704,2,55,1885,135.83,152.29,0,0,323,Beverages,Thai,647,56,TYPE_C,2.0
2,1196273,3,55,1885,132.92,133.92,0,0,96,Beverages,Thai,647,56,TYPE_C,2.0
3,1116527,4,55,1885,135.86,134.86,0,0,163,Beverages,Thai,647,56,TYPE_C,2.0
4,1343872,5,55,1885,146.5,147.5,0,0,215,Beverages,Thai,647,56,TYPE_C,2.0


In [5]:
test.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,category,cuisine,city_code,region_code,center_type,op_area
0,1028232,146,55,1885,158.11,159.11,0,0,Beverages,Thai,647,56,TYPE_C,2.0
1,1262649,147,55,1885,159.11,159.11,0,0,Beverages,Thai,647,56,TYPE_C,2.0
2,1453211,149,55,1885,157.14,158.14,0,0,Beverages,Thai,647,56,TYPE_C,2.0
3,1262599,150,55,1885,159.14,157.14,0,0,Beverages,Thai,647,56,TYPE_C,2.0
4,1495848,151,55,1885,160.11,159.11,0,0,Beverages,Thai,647,56,TYPE_C,2.0


In [6]:
train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 456548 entries, 0 to 456547
Data columns (total 15 columns):
id                       456548 non-null int64
week                     456548 non-null int64
center_id                456548 non-null int64
meal_id                  456548 non-null int64
checkout_price           456548 non-null float64
base_price               456548 non-null float64
emailer_for_promotion    456548 non-null int64
homepage_featured        456548 non-null int64
num_orders               456548 non-null int64
category                 456548 non-null object
cuisine                  456548 non-null object
city_code                456548 non-null int64
region_code              456548 non-null int64
center_type              456548 non-null object
op_area                  456548 non-null float64
dtypes: float64(3), int64(9), object(3)
memory usage: 55.7+ MB


In [7]:
train.shape

(456548, 15)

In [8]:
test.shape

(32573, 14)

In [9]:
# [TRAIN DataSet] Missing values - checking

train.isnull().sum().sort_values(ascending=False)

op_area                  0
center_type              0
region_code              0
city_code                0
cuisine                  0
category                 0
num_orders               0
homepage_featured        0
emailer_for_promotion    0
base_price               0
checkout_price           0
meal_id                  0
center_id                0
week                     0
id                       0
dtype: int64

In [10]:
test.isnull().sum().sort_values(ascending=False)

op_area                  0
center_type              0
region_code              0
city_code                0
cuisine                  0
category                 0
homepage_featured        0
emailer_for_promotion    0
base_price               0
checkout_price           0
meal_id                  0
center_id                0
week                     0
id                       0
dtype: int64

In [11]:
# [TRAIN DATA SET] seperate out the Categorical and Numerical features

numerical_feature   = train.dtypes[train.dtypes!= 'object'].index
categorical_feature = train.dtypes[train.dtypes== 'object'].index

print ("There are {} numeric and {} categorical columns in train data"
       .format(numerical_feature.shape[0],categorical_feature.shape[0]))

There are 12 numeric and 3 categorical columns in train data


In [12]:
#[TRAIN DataSet] Display of the numeric features
numerical_feature.tolist()

['id',
 'week',
 'center_id',
 'meal_id',
 'checkout_price',
 'base_price',
 'emailer_for_promotion',
 'homepage_featured',
 'num_orders',
 'city_code',
 'region_code',
 'op_area']

In [13]:
#[TRAIN DataSet] Display of the categorical features
categorical_feature.tolist()

['category', 'cuisine', 'center_type']

In [14]:
train['category'].unique()

array(['Beverages', 'Rice Bowl', 'Starters', 'Pasta', 'Sandwich',
       'Biryani', 'Extras', 'Pizza', 'Seafood', 'Other Snacks', 'Desert',
       'Salad', 'Fish', 'Soup'], dtype=object)

In [15]:
train['cuisine'].unique()

array(['Thai', 'Indian', 'Italian', 'Continental'], dtype=object)

In [16]:
train['center_type'].unique()

array(['TYPE_C', 'TYPE_B', 'TYPE_A'], dtype=object)

In [17]:
train['emailer_for_promotion'].unique()

array([0, 1])

In [18]:
train['homepage_featured'].unique()

array([0, 1])

In [19]:
from sklearn.preprocessing import MinMaxScaler

In [20]:
scaler = MinMaxScaler()

In [21]:
numeric=['checkout_price','base_price']

In [22]:
train[numeric]=scaler.fit_transform(train[numeric])

In [23]:
train.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders,category,cuisine,city_code,region_code,center_type,op_area
0,1379560,1,55,1885,0.155056,0.119543,0,0,177,Beverages,Thai,647,56,TYPE_C,2.0
1,1018704,2,55,1885,0.153898,0.119543,0,0,323,Beverages,Thai,647,56,TYPE_C,2.0
2,1196273,3,55,1885,0.150527,0.09689,0,0,96,Beverages,Thai,647,56,TYPE_C,2.0
3,1116527,4,55,1885,0.153933,0.098049,0,0,163,Beverages,Thai,647,56,TYPE_C,2.0
4,1343872,5,55,1885,0.166257,0.113636,0,0,215,Beverages,Thai,647,56,TYPE_C,2.0


In [24]:
test[numeric]=scaler.fit_transform(test[numeric])

In [25]:
test.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,category,cuisine,city_code,region_code,center_type,op_area
0,1028232,146,55,1885,0.086266,0.068274,0,0,Beverages,Thai,647,56,TYPE_C,2.0
1,1262649,147,55,1885,0.087222,0.068274,0,0,Beverages,Thai,647,56,TYPE_C,2.0
2,1453211,149,55,1885,0.085338,0.067326,0,0,Beverages,Thai,647,56,TYPE_C,2.0
3,1262599,150,55,1885,0.087251,0.066349,0,0,Beverages,Thai,647,56,TYPE_C,2.0
4,1495848,151,55,1885,0.088178,0.068274,0,0,Beverages,Thai,647,56,TYPE_C,2.0


In [26]:
# Split the data into features and target label
target   = train['num_orders']
features = train.drop('num_orders', axis = 1)

In [27]:
# Import train_test_split
from sklearn.model_selection import train_test_split

# Split the 'features' and 'target' data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, 
                                                    target, 
                                                    test_size = 0.20, 
                                                    random_state = 1234)

# Show the results of the split
print("Training set has {} samples.".format(X_train.shape[0]))
print("Testing set has {} samples.".format(X_test.shape[0]))

Training set has 365238 samples.
Testing set has 91310 samples.


In [28]:
week       = tf.feature_column.numeric_column('week')
center_id  = tf.feature_column.numeric_column('center_id')
meal_id    = tf.feature_column.numeric_column('meal_id')
emailer_for_promotion  = tf.feature_column.numeric_column('emailer_for_promotion')
homepage_featured      = tf.feature_column.numeric_column('homepage_featured')
city_code              = tf.feature_column.numeric_column('city_code')
region_code            = tf.feature_column.numeric_column('region_code')
op_area        = tf.feature_column.numeric_column('op_area')
checkout_price = tf.feature_column.numeric_column('checkout_price')
base_price     = tf.feature_column.numeric_column('base_price')

In [29]:
category = tf.feature_column.categorical_column_with_vocabulary_list("category", ["Beverages", "Rice Bow", "Starters", "Pasta", "Sandwich",
       "Biryani", "Extras", "Pizza", "Seafood", "Other Snacks", "Desert",
       "Salad", "Fish", "Soup"])

cuisine = tf.feature_column.categorical_column_with_vocabulary_list("cuisine",["Thai", "Indian", "Italian", "Continental"])
center_type = tf.feature_column.categorical_column_with_vocabulary_list("center_type",["TYPE_C", "TYPE_B", "TYPE_A"])

In [30]:
#train["week"].unique()

In [31]:
#train["center_id"].unique()

In [32]:
#train["meal_id"].unique()

In [33]:
#train["city_code"].unique()

In [34]:
#train["region_code"].unique()

In [35]:
#train["op_area"].unique()

In [36]:
bucketized_week = tf.feature_column.bucketized_column(week, boundaries=[25,50,100,150])
bucketized_center_id = tf.feature_column.bucketized_column(center_id, boundaries=[50,100,150,200])
bucketized_meal_id = tf.feature_column.bucketized_column(meal_id, boundaries=[1000,1500,2000,2500])
bucketized_city_code = tf.feature_column.bucketized_column(city_code, boundaries=[500,600,700])
bucketized_region_code = tf.feature_column.bucketized_column(region_code, boundaries=[40,60,80])
bucketized_op_area = tf.feature_column.bucketized_column(op_area, boundaries=[2.0,5.0,7.0])

In [37]:
bucketized_week = tf.feature_column.embedding_column(categorical_column=bucketized_week,dimension=4)
bucketized_center_id = tf.feature_column.embedding_column(categorical_column=bucketized_center_id,dimension=4)
bucketized_meal_id = tf.feature_column.embedding_column(categorical_column=bucketized_meal_id,dimension=4)
bucketized_city_code = tf.feature_column.embedding_column(categorical_column=bucketized_city_code,dimension=4)
bucketized_region_code = tf.feature_column.embedding_column(categorical_column=bucketized_region_code,dimension=4)
bucketized_op_area = tf.feature_column.embedding_column(categorical_column=bucketized_op_area,dimension=4)

In [38]:
category = tf.feature_column.embedding_column(categorical_column=category,dimension=7)
cuisine = tf.feature_column.embedding_column(categorical_column=cuisine,dimension=4)
center_type = tf.feature_column.embedding_column(categorical_column=center_type,dimension=3)

In [39]:
#cross_term1 =tf.feature_column.crossed_column(['center_id', 'meal_id'], 16)
#cross_term2 =tf.feature_column.crossed_column(['city_code', 'region_code'], 16)

#bucketized_cross_term1 = tf.feature_column.embedding_column(categorical_column=cross_term1,dimension=10)
#bucketized_cross_term2 = tf.feature_column.embedding_column(categorical_column=cross_term2,dimension=16)

In [40]:
#feat_columns=[bucketized_week,bucketized_center_id,bucketized_meal_id,emailer_for_promotion,homepage_featured,
#              bucketized_city_code,bucketized_region_code,
#              bucketized_op_area,checkout_price,base_price,category,cuisine,center_type]

In [41]:
feat_columns=[bucketized_week,bucketized_center_id,bucketized_meal_id,emailer_for_promotion,
              checkout_price,base_price,category,cuisine]

In [42]:
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train ,batch_size=10,
                                                 num_epochs=1000,shuffle=True)

In [43]:
myopt = tf.train.AdamOptimizer(learning_rate=0.005)

In [44]:
model = tf.estimator.DNNRegressor(hidden_units=[11,5,5,11,5,5,11],
                                  feature_columns=feat_columns,
                                  optimizer = myopt)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4f9f00bdd0>, '_evaluation_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/tmp/tmpp9MKb6', '_global_id_in_cluster': 0, '_save_summary_steps': 100}


In [45]:
model.train(input_fn=input_func,steps=25000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmpp9MKb6/model.ckpt.
INFO:tensorflow:loss = 1534908.5, step = 1
INFO:tensorflow:global_step/sec: 170
INFO:tensorflow:loss = 1234082.0, step = 101 (0.594 sec)
INFO:tensorflow:global_step/sec: 331.068
INFO:tensorflow:loss = 108892.63, step = 201 (0.300 sec)
INFO:tensorflow:global_step/sec: 278.295
INFO:tensorflow:loss = 510828.16, step = 301 (0.360 sec)
INFO:tensorflow:global_step/sec: 280.481
INFO:tensorflow:loss = 904886.5, step = 401 (0.356 sec)
INFO:tensorflow:global_step/sec: 275.654
INFO:tensorflow:loss = 194773.77, step = 501 (0.363 sec)
INFO:tensorflow:global_step/sec: 276.322
INFO:tensorflow:loss = 128546.734, step = 601 (0.362 sec)
INFO:tensorflow:global_step/sec: 279.998
INFO:tensorflow:loss =

INFO:tensorflow:global_step/sec: 248.953
INFO:tensorflow:loss = 377035.3, step = 8101 (0.401 sec)
INFO:tensorflow:global_step/sec: 256.413
INFO:tensorflow:loss = 1561949.8, step = 8201 (0.390 sec)
INFO:tensorflow:global_step/sec: 259.534
INFO:tensorflow:loss = 74628.05, step = 8301 (0.385 sec)
INFO:tensorflow:global_step/sec: 254.848
INFO:tensorflow:loss = 512001.12, step = 8401 (0.393 sec)
INFO:tensorflow:global_step/sec: 259.133
INFO:tensorflow:loss = 155311.48, step = 8501 (0.386 sec)
INFO:tensorflow:global_step/sec: 256.977
INFO:tensorflow:loss = 907196.25, step = 8601 (0.390 sec)
INFO:tensorflow:global_step/sec: 261.35
INFO:tensorflow:loss = 284549.53, step = 8701 (0.382 sec)
INFO:tensorflow:global_step/sec: 254.908
INFO:tensorflow:loss = 23253.559, step = 8801 (0.392 sec)
INFO:tensorflow:global_step/sec: 256.276
INFO:tensorflow:loss = 98398.11, step = 8901 (0.390 sec)
INFO:tensorflow:global_step/sec: 262.975
INFO:tensorflow:loss = 81964.34, step = 9001 (0.381 sec)
INFO:tensorflow

INFO:tensorflow:global_step/sec: 279.337
INFO:tensorflow:loss = 296677.03, step = 16401 (0.358 sec)
INFO:tensorflow:global_step/sec: 275.296
INFO:tensorflow:loss = 288487.7, step = 16501 (0.363 sec)
INFO:tensorflow:global_step/sec: 281.194
INFO:tensorflow:loss = 426878.88, step = 16601 (0.356 sec)
INFO:tensorflow:global_step/sec: 280.561
INFO:tensorflow:loss = 929432.7, step = 16701 (0.357 sec)
INFO:tensorflow:global_step/sec: 219.609
INFO:tensorflow:loss = 272594.06, step = 16801 (0.456 sec)
INFO:tensorflow:global_step/sec: 283.099
INFO:tensorflow:loss = 1114107.9, step = 16901 (0.353 sec)
INFO:tensorflow:global_step/sec: 267.521
INFO:tensorflow:loss = 388047.2, step = 17001 (0.374 sec)
INFO:tensorflow:global_step/sec: 274.215
INFO:tensorflow:loss = 456115.84, step = 17101 (0.365 sec)
INFO:tensorflow:global_step/sec: 274.519
INFO:tensorflow:loss = 533900.06, step = 17201 (0.364 sec)
INFO:tensorflow:global_step/sec: 271.639
INFO:tensorflow:loss = 301664.44, step = 17301 (0.369 sec)
INF

INFO:tensorflow:loss = 740989.6, step = 24601 (0.364 sec)
INFO:tensorflow:global_step/sec: 276.988
INFO:tensorflow:loss = 174901.94, step = 24701 (0.361 sec)
INFO:tensorflow:global_step/sec: 276.238
INFO:tensorflow:loss = 269152.97, step = 24801 (0.362 sec)
INFO:tensorflow:global_step/sec: 283.132
INFO:tensorflow:loss = 110107.516, step = 24901 (0.353 sec)
INFO:tensorflow:Saving checkpoints for 25000 into /tmp/tmpp9MKb6/model.ckpt.
INFO:tensorflow:Loss for final step: 610751.1.


<tensorflow.python.estimator.canned.dnn.DNNRegressor at 0x7f4f9f00bf50>

In [46]:
eval_input_func = tf.estimator.inputs.pandas_input_fn(
      x=X_test,
      batch_size=10,
      num_epochs=1,
      shuffle=False)

In [47]:
eval_pred_gen = model.predict(eval_input_func)

In [48]:
predictions = list(eval_pred_gen)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpp9MKb6/model.ckpt-25000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [49]:
predictions

[{'predictions': array([49.836323], dtype=float32)},
 {'predictions': array([162.80365], dtype=float32)},
 {'predictions': array([671.6454], dtype=float32)},
 {'predictions': array([524.038], dtype=float32)},
 {'predictions': array([680.6118], dtype=float32)},
 {'predictions': array([398.63663], dtype=float32)},
 {'predictions': array([494.35794], dtype=float32)},
 {'predictions': array([507.588], dtype=float32)},
 {'predictions': array([49.836323], dtype=float32)},
 {'predictions': array([148.18524], dtype=float32)},
 {'predictions': array([289.22577], dtype=float32)},
 {'predictions': array([142.38147], dtype=float32)},
 {'predictions': array([49.836323], dtype=float32)},
 {'predictions': array([49.836323], dtype=float32)},
 {'predictions': array([716.6561], dtype=float32)},
 {'predictions': array([49.836323], dtype=float32)},
 {'predictions': array([238.95386], dtype=float32)},
 {'predictions': array([49.836323], dtype=float32)},
 {'predictions': array([60.833008], dtype=float32)},


In [50]:
final_preds = []


In [51]:
final_preds[:10]

[]

In [52]:
for pred in predictions:
    final_preds.append(pred['predictions'][0])

In [53]:
final_preds[:10]

[49.836323,
 162.80365,
 671.6454,
 524.038,
 680.6118,
 398.63663,
 494.35794,
 507.588,
 49.836323,
 148.18524]

In [54]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error

In [55]:
mean_squared_error(y_test,final_preds)**0.5

281.1014200280942

In [56]:
mean_squared_log_error(y_test, final_preds) 

0.6567679997342365

In [57]:
RMSLE = 100*mean_squared_log_error(y_test, final_preds) 
print(RMSLE)

65.67679997342366


In [58]:
test.shape

(32573, 14)

In [59]:
test_input_func = tf.estimator.inputs.pandas_input_fn(
      x=test,
      batch_size=10,
      num_epochs=1,
      shuffle=False)

In [60]:
test_pred_gen = model.predict(test_input_func)

In [61]:
test_predictions = list(test_pred_gen)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpp9MKb6/model.ckpt-25000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [62]:
test_predictions

[{'predictions': array([547.0345], dtype=float32)},
 {'predictions': array([546.5804], dtype=float32)},
 {'predictions': array([547.20123], dtype=float32)},
 {'predictions': array([558.2401], dtype=float32)},
 {'predictions': array([558.26636], dtype=float32)},
 {'predictions': array([559.10144], dtype=float32)},
 {'predictions': array([558.5325], dtype=float32)},
 {'predictions': array([557.98193], dtype=float32)},
 {'predictions': array([559.36755], dtype=float32)},
 {'predictions': array([546.12646], dtype=float32)},
 {'predictions': array([547.0808], dtype=float32)},
 {'predictions': array([547.02936], dtype=float32)},
 {'predictions': array([547.0345], dtype=float32)},
 {'predictions': array([559.652], dtype=float32)},
 {'predictions': array([559.3597], dtype=float32)},
 {'predictions': array([558.4982], dtype=float32)},
 {'predictions': array([559.10144], dtype=float32)},
 {'predictions': array([559.62573], dtype=float32)},
 {'predictions': array([559.06714], dtype=float32)},
 {'

In [63]:
test_final_preds = []
for pred in test_predictions:
    test_final_preds.append(pred['predictions'][0])

In [64]:
test_final_preds[:10]


[547.0345,
 546.5804,
 547.20123,
 558.2401,
 558.26636,
 559.10144,
 558.5325,
 557.98193,
 559.36755,
 546.12646]

In [65]:
test["num_orders"]=test_final_preds

In [66]:
test.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,category,cuisine,city_code,region_code,center_type,op_area,num_orders
0,1028232,146,55,1885,0.086266,0.068274,0,0,Beverages,Thai,647,56,TYPE_C,2.0,547.034485
1,1262649,147,55,1885,0.087222,0.068274,0,0,Beverages,Thai,647,56,TYPE_C,2.0,546.580383
2,1453211,149,55,1885,0.085338,0.067326,0,0,Beverages,Thai,647,56,TYPE_C,2.0,547.201233
3,1262599,150,55,1885,0.087251,0.066349,0,0,Beverages,Thai,647,56,TYPE_C,2.0,558.240112
4,1495848,151,55,1885,0.088178,0.068274,0,0,Beverages,Thai,647,56,TYPE_C,2.0,558.266357


In [67]:
test['num_orders'].max()

2326.17431640625

In [68]:
test['num_orders'].min()

49.83632278442383

In [69]:
# [SUBMISSION] file preparation
test[["id","num_orders"]].to_csv('g10.csv',index=False)

In [70]:
%bash
gsutil cp 'g10.csv' gs://av-genpact-skl-dec2018/

Copying file://g10.csv [Content-Type=text/csv]...
/ [0 files][    0.0 B/830.6 KiB]                                                / [1 files][830.6 KiB/830.6 KiB]                                                -
Operation completed over 1 objects/830.6 KiB.                                    
