# Problem Statement

The entire world is transforming digitally and our relationship with technology has grown exponentially over the last few years. We have grown closer to technology, and it has made our life a lot easier by saving time and effort. Today everything is accessible with smartphones — from groceries to cooked food and from medicines to doctors. In this hackathon, we provide you with data that is a by-product as well as a thriving proof of this growing relationship. 

When was the last time you ordered food online? And how long did it take to reach you?

In this hackathon, we are providing you with data from thousands of restaurants in India regarding the time they take to deliver food for online order. As data scientists, your goal is to predict the online order delivery time based on the given factors.

Analytics India Magazine and IMS Proschool bring to you ‘Predicting Predicting Food Delivery Time Hackathon’.

Size of training set: 11,094 records

Size of test set: 2,774 records

FEATURES:

Restaurant: A unique ID that represents a restaurant.

Location: The location of the restaurant.

Cuisines: The cuisines offered by the restaurant.

Average_Cost: The average cost for one person/order.

Minimum_Order: The minimum order amount.

Rating: Customer rating for the restaurant.

Votes: The total number of customer votes for the restaurant.

Reviews: The number of customer reviews for the restaurant.

Delivery_Time: The order delivery time of the restaurant. (Target Classes)

# Import Libraries

In [1]:
import pandas as pd
import numpy as np
import pandas_profiling
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from xgboost import XGBClassifier

from sklearn.model_selection import cross_val_score

from sklearn.metrics import mean_squared_error, r2_score
from math import sqrt
import statsmodels.api as sm

# Import Data Set in Python Environment

In [2]:
train_data = pd.read_csv("/home/aniruddha/Projects/Food Delivery Time/train_data.csv")
test_data = pd.read_csv("/home/aniruddha/Projects/Food Delivery Time/test_data.csv")

# Identify variable types of both train and test data

In [3]:
train_data.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews,Delivery_Time
0,ID_6321,"FTI College, Law College Road, Pune","Fast Food, Rolls, Burger, Salad, Wraps",₹200,₹50,3.5,12,4,30 minutes
1,ID_2882,"Sector 3, Marathalli","Ice Cream, Desserts",₹100,₹50,3.5,11,4,30 minutes
2,ID_1595,Mumbai Central,"Italian, Street Food, Fast Food",₹150,₹50,3.6,99,30,65 minutes
3,ID_5929,"Sector 1, Noida","Mughlai, North Indian, Chinese",₹250,₹99,3.7,176,95,30 minutes
4,ID_6123,"Rmz Centennial, I Gate, Whitefield","Cafe, Beverages",₹200,₹99,3.2,521,235,65 minutes


In [4]:
test_data.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews
0,ID_2842,"Mico Layout, Stage 2, BTM Layout,Bangalore","North Indian, Chinese, Assamese",₹350,₹50,4.2,361,225
1,ID_730,"Mico Layout, Stage 2, BTM Layout,Bangalore","Biryani, Kebab",₹100,₹50,NEW,-,-
2,ID_4620,"Sector 1, Noida",Fast Food,₹100,₹50,3.6,36,16
3,ID_5470,"Babarpur, New Delhi, Delhi","Mithai, North Indian, Chinese, Fast Food, Sout...",₹200,₹50,3.6,66,33
4,ID_3249,"Sector 1, Noida","Chinese, Fast Food",₹150,₹50,2.9,38,14


In [5]:
train_data.tail()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews,Delivery_Time
11089,ID_8067,"BTM Layout 1, Electronic City","Tibetan, Chinese, Continental, Momos",₹250,₹50,4.2,326,189,30 minutes
11090,ID_4620,"Sector 14, Noida",Fast Food,₹100,₹50,3.6,36,16,30 minutes
11091,ID_3392,Majestic,"South Indian, Chinese, North Indian",₹100,₹50,3.5,45,18,30 minutes
11092,ID_4115,"Sector 3, Marathalli",North Indian,₹100,₹50,3.1,24,9,30 minutes
11093,ID_4417,"Sector 63A,Gurgaon",North Indian,₹100,₹50,NEW,-,-,30 minutes


In [6]:
test_data.tail()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews
2769,ID_6342,Delhi University-GTB Nagar,"Fast Food, Chinese",₹100,₹50,NEW,-,-
2770,ID_8495,"Mico Layout, Stage 2, BTM Layout,Bangalore","Continental, North Indian, Desserts, Beverages",₹250,₹50,3.1,5,1
2771,ID_7122,"Noorkhan Bazaar, Malakpet, Hyderabad","Andhra, South Indian",₹150,₹50,3.0,16,1
2772,ID_2475,"D-Block, Sector 63, Noida",Bakery,₹100,₹99,3.0,7,2
2773,ID_1595,"Dockyard Road, Mumbai CST Area","Italian, Street Food, Fast Food",₹150,₹50,3.6,99,30


In [7]:
print("Shape of train data:",train_data.shape)
print("Shape of test data:",test_data.shape)

Shape of train data: (11094, 9)
Shape of test data: (2774, 8)


In [8]:
train_data.nunique()

Restaurant       7480
Location           35
Cuisines         2179
Average_Cost       26
Minimum_Order      18
Rating             33
Votes            1103
Reviews           761
Delivery_Time       7
dtype: int64

In [9]:
test_data.nunique()

Restaurant       2401
Location           35
Cuisines          881
Average_Cost       19
Minimum_Order       9
Rating             30
Votes             580
Reviews           392
dtype: int64

In [10]:
train_data.isnull().sum()

Restaurant       0
Location         0
Cuisines         0
Average_Cost     0
Minimum_Order    0
Rating           0
Votes            0
Reviews          0
Delivery_Time    0
dtype: int64

In [11]:
test_data.isnull().sum()

Restaurant       0
Location         0
Cuisines         0
Average_Cost     0
Minimum_Order    0
Rating           0
Votes            0
Reviews          0
dtype: int64

In [12]:
train_data.dtypes

Restaurant       object
Location         object
Cuisines         object
Average_Cost     object
Minimum_Order    object
Rating           object
Votes            object
Reviews          object
Delivery_Time    object
dtype: object

In [13]:
test_data.dtypes

Restaurant       object
Location         object
Cuisines         object
Average_Cost     object
Minimum_Order    object
Rating           object
Votes            object
Reviews          object
dtype: object

## Feature Engineering on train data

In [14]:
# find number of food provide by the restaurant in 'Cuisines' column. and convert into interger 
train_data['Cuisines'] = train_data['Cuisines'].str.split(",")
train_data['Cuisines'] = train_data['Cuisines'].apply(lambda x : len(x))
train_data.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews,Delivery_Time
0,ID_6321,"FTI College, Law College Road, Pune",5,₹200,₹50,3.5,12,4,30 minutes
1,ID_2882,"Sector 3, Marathalli",2,₹100,₹50,3.5,11,4,30 minutes
2,ID_1595,Mumbai Central,3,₹150,₹50,3.6,99,30,65 minutes
3,ID_5929,"Sector 1, Noida",3,₹250,₹99,3.7,176,95,30 minutes
4,ID_6123,"Rmz Centennial, I Gate, Whitefield",2,₹200,₹99,3.2,521,235,65 minutes


In [15]:
train_data.dtypes

Restaurant       object
Location         object
Cuisines          int64
Average_Cost     object
Minimum_Order    object
Rating           object
Votes            object
Reviews          object
Delivery_Time    object
dtype: object

In [16]:
# Removing Rs sign from 'Average_Cost' and 'Minimum_Order'
train_data['Average_Cost'] = train_data['Average_Cost'].str.replace(r'\D', '')
train_data['Minimum_Order'] = train_data['Minimum_Order'].str.replace(r'\D', '')

In [17]:
train_data.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews,Delivery_Time
0,ID_6321,"FTI College, Law College Road, Pune",5,200,50,3.5,12,4,30 minutes
1,ID_2882,"Sector 3, Marathalli",2,100,50,3.5,11,4,30 minutes
2,ID_1595,Mumbai Central,3,150,50,3.6,99,30,65 minutes
3,ID_5929,"Sector 1, Noida",3,250,99,3.7,176,95,30 minutes
4,ID_6123,"Rmz Centennial, I Gate, Whitefield",2,200,99,3.2,521,235,65 minutes


In [18]:
train_data['Average_Cost'].value_counts()

200     3241
100     2557
150     2462
250      881
300      537
350      283
400      282
50       265
600      154
500      101
450       63
550       60
650       55
800       44
750       38
900       15
700       15
850       12
1000      12
1200       8
950        4
2050       1
1400       1
1150       1
1100       1
           1
Name: Average_Cost, dtype: int64

### Found 1 null value in Average_Cost feature while checking value_counts(). null value is at 6297th row.

In [19]:
print(train_data.loc[6290:6300,'Average_Cost'])

6290    200
6291    100
6292    200
6293    200
6294    200
6295    200
6296    100
6297       
6298    150
6299    200
6300    200
Name: Average_Cost, dtype: object


In [20]:
print(train_data.iloc[[6297]])

     Restaurant         Location  Cuisines Average_Cost Minimum_Order Rating  \
6297    ID_6472  Pune University         1                         50    NEW   

     Votes Reviews Delivery_Time  
6297     -       -    30 minutes  


## Drop 6297 record because there are 3 missing values. (Average_Cost, Votes, Reviews)

In [21]:
train_data = train_data.drop(index=6297)

In [22]:
train_data.shape

(11093, 9)

In [23]:
train_data['Average_Cost'].value_counts()

200     3241
100     2557
150     2462
250      881
300      537
350      283
400      282
50       265
600      154
500      101
450       63
550       60
650       55
800       44
750       38
700       15
900       15
850       12
1000      12
1200       8
950        4
1100       1
1150       1
2050       1
1400       1
Name: Average_Cost, dtype: int64

In [24]:
train_data['Minimum_Order'].value_counts()

50     10117
99       779
0        158
200        8
199        8
59         3
350        3
299        3
300        2
450        2
79         2
90         2
400        1
250        1
240        1
89         1
500        1
150        1
Name: Minimum_Order, dtype: int64

In [25]:
# convert 'Average_Cost' and 'Minimum_Order' to "int"
train_data['Average_Cost'] = train_data['Average_Cost'].astype(int)
train_data['Minimum_Order'] = train_data['Minimum_Order'].astype(int)

In [26]:
# Convert values of 'Minimum_Order' to the nearest value divisible by 50
def rou(n): 
    # Smaller multiple 
    a = (n // 50) * 50
      
    # Larger multiple 
    b = a + 50
      
    # Return of closest of two 
    return (b if n - a > b - n else a) 
  
# driver code 
train_data['Minimum_Order'] = train_data['Minimum_Order'].apply(lambda x: rou(x))

In [27]:
train_data['Minimum_Order'].value_counts()

50     10120
100      784
0        158
200       16
300        5
350        3
450        2
250        2
150        1
500        1
400        1
Name: Minimum_Order, dtype: int64

In [28]:
train_data['Rating'].value_counts()

-                     1191
3.7                    869
3.6                    846
3.5                    818
3.8                    800
NEW                    757
3.9                    749
3.4                    718
3.3                    675
4.0                    614
3.2                    511
4.1                    459
3.1                    411
3.0                    302
4.2                    272
4.3                    247
2.9                    199
2.8                    157
4.4                    142
4.5                     78
2.7                     76
2.6                     42
4.6                     41
4.7                     36
2.5                     27
4.8                     13
2.4                     13
Opening Soon            12
4.9                      8
2.3                      6
Temporarily Closed       2
2.1                      1
2.2                      1
Name: Rating, dtype: int64

In [29]:
# In Rating feature, converting '-', 'NEW', 'Opening Soon' and 'Temporarily Closed' values to 0(zero)
train_data['Rating'] = train_data['Rating'].str.replace('-', '0').replace('NEW','0').replace('Opening Soon','0').replace('Temporarily Closed','0')

In [30]:
train_data['Rating'].value_counts()

0      1962
3.7     869
3.6     846
3.5     818
3.8     800
3.9     749
3.4     718
3.3     675
4.0     614
3.2     511
4.1     459
3.1     411
3.0     302
4.2     272
4.3     247
2.9     199
2.8     157
4.4     142
4.5      78
2.7      76
2.6      42
4.6      41
4.7      36
2.5      27
4.8      13
2.4      13
4.9       8
2.3       6
2.2       1
2.1       1
Name: Rating, dtype: int64

In [31]:
# Converting 'Rating' to float
train_data['Rating'] = train_data['Rating'].astype(float)

In [32]:
train_data['Votes'].value_counts()

-       2073
4        248
6        200
7        182
9        181
        ... 
1254       1
5116       1
754        1
935        1
945        1
Name: Votes, Length: 1103, dtype: int64

In [33]:
#  In Votes feature, converting '-' value to 0(zero)
train_data['Votes'] = train_data['Votes'].str.replace('-', '0')

In [34]:
# Converting Votes to 'int'
train_data['Votes'] = train_data['Votes'].astype(int)

In [35]:
# Convert values of 'Votes to the nearest value divisible by 50
def rou(n): 
    # Smaller multiple 
    a = (n // 50) * 50
      
    # Larger multiple 
    b = a + 50
      
    # Return of closest of two 
    return (b if n - a > b - n else a) 
  
# driver code 
train_data['Votes'] = train_data['Votes'].apply(lambda x: rou(x))

In [36]:
train_data['Votes'].value_counts()

0       4840
50      2101
100     1013
150      553
200      391
        ... 
7150       1
3700       1
5100       1
2750       1
4150       1
Name: Votes, Length: 100, dtype: int64

In [37]:
train_data['Reviews'].value_counts()

-       2311
2        420
3        387
1        381
4        356
        ... 
544        1
1617       1
938        1
435        1
459        1
Name: Reviews, Length: 761, dtype: int64

In [38]:
# In Reviews feature, converting '-' to 0(zero)
train_data['Reviews'] = train_data['Reviews'].str.replace('-', '0')

In [39]:
# Converting Reviews to 'int'
train_data['Reviews'] = train_data['Reviews'].astype(int)

In [40]:
# Convert values of Reviews to the nearest value divisible by 50
def rou(n): 
    # Smaller multiple 
    a = (n // 50) * 50
      
    # Larger multiple 
    b = a + 50
      
    # Return of closest of two 
    return (b if n - a > b - n else a) 
  
# driver code 
train_data['Reviews'] = train_data['Reviews'].apply(lambda x: rou(x))


In [41]:
train_data['Reviews'].value_counts()

0       6637
50      1962
100      743
150      397
200      247
        ... 
4950       1
5550       1
2700       1
2900       1
3750       1
Name: Reviews, Length: 63, dtype: int64

In [42]:
train_data['Delivery_Time'].value_counts()

30 minutes     7405
45 minutes     2665
65 minutes      923
120 minutes      62
20 minutes       20
80 minutes       14
10 minutes        4
Name: Delivery_Time, dtype: int64

In [43]:
train_data.dtypes

Restaurant        object
Location          object
Cuisines           int64
Average_Cost       int64
Minimum_Order      int64
Rating           float64
Votes              int64
Reviews            int64
Delivery_Time     object
dtype: object

In [44]:
train_data.columns

Index(['Restaurant', 'Location', 'Cuisines', 'Average_Cost', 'Minimum_Order',
       'Rating', 'Votes', 'Reviews', 'Delivery_Time'],
      dtype='object')

In [45]:
# Droping 'Restaurant' and 'Location' features from training data set
train_data = train_data.drop(['Restaurant','Location'],axis=1)

In [46]:
train_data.columns

Index(['Cuisines', 'Average_Cost', 'Minimum_Order', 'Rating', 'Votes',
       'Reviews', 'Delivery_Time'],
      dtype='object')

## Feature Engnieering of test data set

In [47]:
# find number of food provide by the restaurant in 'Cuisines' column. and convert into interger 
test_data['Cuisines'] = test_data['Cuisines'].str.split(",")
test_data['Cuisines'] = test_data['Cuisines'].apply(lambda x : len(x))
test_data.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews
0,ID_2842,"Mico Layout, Stage 2, BTM Layout,Bangalore",3,₹350,₹50,4.2,361,225
1,ID_730,"Mico Layout, Stage 2, BTM Layout,Bangalore",2,₹100,₹50,NEW,-,-
2,ID_4620,"Sector 1, Noida",1,₹100,₹50,3.6,36,16
3,ID_5470,"Babarpur, New Delhi, Delhi",5,₹200,₹50,3.6,66,33
4,ID_3249,"Sector 1, Noida",2,₹150,₹50,2.9,38,14


In [48]:
# Removing Rs sign from 'Average_Cost' and 'Minimum_Order'
test_data['Average_Cost'] = test_data['Average_Cost'].str.replace(r'\D', '')
test_data['Minimum_Order'] = test_data['Minimum_Order'].str.replace(r'\D', '')

In [49]:
test_data.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews
0,ID_2842,"Mico Layout, Stage 2, BTM Layout,Bangalore",3,350,50,4.2,361,225
1,ID_730,"Mico Layout, Stage 2, BTM Layout,Bangalore",2,100,50,NEW,-,-
2,ID_4620,"Sector 1, Noida",1,100,50,3.6,36,16
3,ID_5470,"Babarpur, New Delhi, Delhi",5,200,50,3.6,66,33
4,ID_3249,"Sector 1, Noida",2,150,50,2.9,38,14


In [50]:
test_data['Average_Cost'].value_counts()

200     820
100     664
150     589
250     223
300     173
50       72
350      71
400      64
600      30
500      15
550      13
450       9
650       8
800       6
850       4
1000      4
700       4
750       3
1200      2
Name: Average_Cost, dtype: int64

In [51]:
test_data['Minimum_Order'].value_counts()

50     2556
99      177
0        30
199       5
200       2
399       1
500       1
89        1
149       1
Name: Minimum_Order, dtype: int64

In [52]:
# Converting 'Average_Cost' and 'Minimum_Order' to int
test_data['Average_Cost'] = test_data['Average_Cost'].astype(int)
test_data['Minimum_Order'] = test_data['Minimum_Order'].astype(int)

In [53]:
# Convert values of Minimum_Order to the nearest value divisible by 50
def rou(n): 
    # Smaller multiple 
    a = (n // 50) * 50
      
    # Larger multiple 
    b = a + 50
      
    # Return of closest of two 
    return (b if n - a > b - n else a) 
  
# driver code 
test_data['Minimum_Order'] = test_data['Minimum_Order'].apply(lambda x: rou(x))

In [54]:
test_data['Minimum_Order'].value_counts()

50     2556
100     178
0        30
200       7
500       1
400       1
150       1
Name: Minimum_Order, dtype: int64

In [55]:
test_data['Rating'].value_counts()

-               305
3.6             223
3.9             216
3.7             212
NEW             200
3.5             197
3.4             185
3.8             183
3.3             153
4.0             141
3.2             129
3.1             120
4.1             115
4.2              70
3.0              65
2.9              57
4.3              52
2.8              41
4.4              29
2.7              22
4.5              18
2.6               9
4.6               7
4.7               6
2.5               6
2.4               5
4.8               3
Opening Soon      2
2.3               2
2.1               1
Name: Rating, dtype: int64

In [56]:
# In Rating feature, converting '-', 'NEW', 'Opening Soon'  values to 0(zero)
test_data['Rating'] = test_data['Rating'].str.replace('-', '0').replace('NEW','0').replace('Opening Soon','0')

In [57]:
test_data['Rating'].value_counts()

0      507
3.6    223
3.9    216
3.7    212
3.5    197
3.4    185
3.8    183
3.3    153
4.0    141
3.2    129
3.1    120
4.1    115
4.2     70
3.0     65
2.9     57
4.3     52
2.8     41
4.4     29
2.7     22
4.5     18
2.6      9
4.6      7
2.5      6
4.7      6
2.4      5
4.8      3
2.3      2
2.1      1
Name: Rating, dtype: int64

In [58]:
# Convert 'Rating' to float
test_data['Rating'] = test_data['Rating'].astype(float)

In [59]:
test_data['Votes'].value_counts()

-      542
7       60
9       57
6       55
5       51
      ... 
882      1
785      1
929      1
511      1
527      1
Name: Votes, Length: 580, dtype: int64

In [60]:
# In Votes feature, converting '-' value to 0(zero)
test_data['Votes'] = test_data['Votes'].str.replace('-', '0')

In [61]:
test_data['Votes'].value_counts()

0       542
7        60
9        57
6        55
5        51
       ... 
1409      1
574       1
297       1
3577      1
153       1
Name: Votes, Length: 580, dtype: int64

In [62]:
# Convert Votes to int
test_data['Votes'] = test_data['Votes'].astype(int)

In [63]:
# Convert values of Votes to the nearest value divisible by 50
def rou(n): 
    # Smaller multiple 
    a = (n // 50) * 50
      
    # Larger multiple 
    b = a + 50
      
    # Return of closest of two 
    return (b if n - a > b - n else a) 
  
# driver code 
test_data['Votes'] = test_data['Votes'].apply(lambda x: rou(x))

In [64]:
test_data['Votes'].value_counts()

0       1250
50       501
100      257
150      145
200      104
        ... 
5200       1
1800       1
1750       1
7800       1
3750       1
Name: Votes, Length: 64, dtype: int64

In [65]:
test_data['Reviews'].value_counts()

-       593
2       131
1       102
3        79
4        72
       ... 
246       1
1005      1
3503      1
170       1
135       1
Name: Reviews, Length: 392, dtype: int64

In [66]:
# In Reviews feature, converting '-' value to 0(zero)
test_data['Reviews'] = test_data['Reviews'].str.replace('-', '0')

In [67]:
# Convert Reviews to int
test_data['Reviews'] = test_data['Reviews'].astype(int)

In [68]:
# Convert values of Reviews to the nearest value divisible by 50
def rou(n): 
    # Smaller multiple 
    a = (n // 50) * 50
      
    # Larger multiple 
    b = a + 50
      
    # Return of closest of two 
    return (b if n - a > b - n else a) 
  
# driver code 
test_data['Reviews'] = test_data['Reviews'].apply(lambda x: rou(x))

In [69]:
test_data['Reviews'].value_counts()

0       1681
50       489
100      188
150      108
200       62
250       44
300       36
350       29
400       17
450       16
550       12
600       11
650       11
700        6
1000       5
900        5
500        4
750        4
1650       3
1600       3
1450       3
1350       3
800        3
850        3
1200       2
1400       2
1550       2
1700       2
1900       2
2150       2
2300       2
1050       1
2400       1
3500       1
2900       1
2650       1
2550       1
2450       1
2050       1
1100       1
1500       1
1300       1
1250       1
3700       1
3850       1
Name: Reviews, dtype: int64

In [70]:
# Drop 'Restaurant' and 'Location' from test data set
test_data = test_data.drop(['Restaurant','Location'],axis=1)

## Final Train and Test data set

In [71]:
train_data.head()

Unnamed: 0,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews,Delivery_Time
0,5,200,50,3.5,0,0,30 minutes
1,2,100,50,3.5,0,0,30 minutes
2,3,150,50,3.6,100,50,65 minutes
3,3,250,100,3.7,200,100,30 minutes
4,2,200,100,3.2,500,250,65 minutes


In [72]:
test_data.head()

Unnamed: 0,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews
0,3,350,50,4.2,350,200
1,2,100,50,0.0,0,0
2,1,100,50,3.6,50,0
3,5,200,50,3.6,50,50
4,2,150,50,2.9,50,0


## Spliting data in train and test

In [73]:
x_train = train_data.drop(['Delivery_Time'],axis = 1)
y_train = train_data['Delivery_Time']
x_test = test_data

## Model Fitting 

In [74]:
lr = LogisticRegression()
dt = DecisionTreeClassifier()
rf = RandomForestClassifier(n_estimators=100)
knn = KNeighborsClassifier(n_neighbors=7)
nb = GaussianNB()
xgb = XGBClassifier()

In [75]:
lr_score = cross_val_score(lr, x_train, y_train, cv=4, scoring='accuracy')
dt_score = cross_val_score(dt, x_train, y_train, cv=4, scoring='accuracy')
rf_score = cross_val_score(rf, x_train, y_train, cv=4, scoring='accuracy')
knn_score = cross_val_score(knn, x_train, y_train, cv=4, scoring='accuracy')
nb_score = cross_val_score(nb, x_train, y_train, cv=4, scoring='accuracy')
xgb_score = cross_val_score(xgb, x_train, y_train, cv=4, scoring='accuracy')



In [76]:
print('Logistic Regression: ',lr_score)
print('Decision Tree: ',dt_score)
print('Random Forest: ',rf_score)
print('K Nearest Neighbors: ',knn_score)
print('Naive Bayes: ',nb_score)
print('XGBoost: ',xgb_score)

Logistic Regression:  [0.69596542 0.69069935 0.6955267  0.68964273]
Decision Tree:  [0.73919308 0.74729632 0.73845599 0.73944424]
Random Forest:  [0.74819885 0.75486662 0.752886   0.74630097]
K Nearest Neighbors:  [0.69380403 0.69899063 0.71176046 0.70119091]
Naive Bayes:  [0.27809798 0.24008652 0.12842713 0.15698304]
XGBoost:  [0.72334294 0.72386446 0.72222222 0.71706965]


In [77]:
#mean
l_m = lr_score.mean()
d_m = dt_score.mean()
r_m = rf_score.mean()
k_m = knn_score.mean()
n_m = nb_score.mean()
x_m = xgb_score.mean()

#Stdev
l_s = lr_score.std()
d_s = dt_score.std()
r_s = rf_score.std()
k_s = knn_score.std()
n_s = nb_score.std()
x_s = xgb_score.std()

In [78]:
score = pd.DataFrame({'Model':['Logistic Regression','Decision Tree','Random Forest','K Nearest Neighbors','Naive Bayes','XGBoost'],
        'Accuracy':[l_m,d_m,r_m,k_m,n_m,x_m],
        'Stdev':[l_s, d_s, r_s, k_s, n_s, x_s]})
average_scores = score.sort_values(by='Accuracy', ascending=False)
average_scores

Unnamed: 0,Model,Accuracy,Stdev
2,Random Forest,0.750563,0.003452
1,Decision Tree,0.741097,0.003597
5,XGBoost,0.721625,0.002696
3,K Nearest Neighbors,0.701437,0.006536
0,Logistic Regression,0.692959,0.002817
4,Naive Bayes,0.200899,0.060573


In [82]:
rf.fit(x_train,y_train)
y_test = rf.predict(x_test)

In [83]:
output = pd.DataFrame({'Delivery_Time': y_test})
output.to_csv('/home/aniruddha/Projects/Food Delivery Time/my_submission.csv', index=False)