### Predicting Restaurant Food delivery Hackathon by MachineHack

This is one of many approaches to solving "Predicting Restaurant Food Delivery Hackathon, the latest hackathon by Machinehack.

This tutorial is for all Data Science enthusiasts who have just begun the journey. Use this tutorial to learn and submit your predictions at MachineHack.

Check out the details here : https://www.machinehack.com/course/predicting-food-delivery-time-hackathon-by-ims-proschool/


#### Data Description

The entire world is transforming digitally and our relationship with technology has grown exponentially over the last few years. We have grown closer to technology, and it has made our life a lot easier by saving time and effort. Today everything is accessible with smartphones — from groceries to cooked food and from medicines to doctors. In this hackathon, we provide you with data that is a by-product as well as a thriving proof of this growing relationship. 

When was the last time you ordered food online? And how long did it take to reach you?

In this hackathon, we are providing you with data from thousands of restaurants in India regarding the time they take to deliver food for online order. As data scientists, your goal is to predict the online order delivery time based on the given factors.

Analytics India Magazine and IMS Proschool bring to you ‘Predicting Predicting Food Delivery Time Hackathon’.

Size of training set: *11,094 records*

Size of test set: *2,774 records*

**FEATURES:**

* Restaurant: A unique ID that represents a restaurant.
* Location: The location of the restaurant.
* Cuisines: The cuisines offered by the restaurant.
* Average_Cost: The average cost for one person/order.
* Minimum_Order: The minimum order amount.
* Rating: Customer rating for the restaurant.
* Votes: The total number of customer votes for the restaurant.
* Reviews: The number of customer reviews for the restaurant.
* Delivery_Time: The order delivery time of the restaurant. (Target Classes) 


##### Contents :

1. Reading Data
2. 

In [1]:
## Importing the required packages

import numpy as np
import pandas as pd

In [2]:
train = pd.read_excel('./Participants Data/Data_Train.xlsx')
test = pd.read_excel('./Participants Data/Data_Test.xlsx')

In [3]:
train.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews,Delivery_Time
0,ID_6321,"FTI College, Law College Road, Pune","Fast Food, Rolls, Burger, Salad, Wraps",₹200,₹50,3.5,12,4,30 minutes
1,ID_2882,"Sector 3, Marathalli","Ice Cream, Desserts",₹100,₹50,3.5,11,4,30 minutes
2,ID_1595,Mumbai Central,"Italian, Street Food, Fast Food",₹150,₹50,3.6,99,30,65 minutes
3,ID_5929,"Sector 1, Noida","Mughlai, North Indian, Chinese",₹250,₹99,3.7,176,95,30 minutes
4,ID_6123,"Rmz Centennial, I Gate, Whitefield","Cafe, Beverages",₹200,₹99,3.2,521,235,65 minutes


In [4]:
train.shape

(11094, 9)

In [5]:
test.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews
0,ID_2842,"Mico Layout, Stage 2, BTM Layout,Bangalore","North Indian, Chinese, Assamese",₹350,₹50,4.2,361,225
1,ID_730,"Mico Layout, Stage 2, BTM Layout,Bangalore","Biryani, Kebab",₹100,₹50,NEW,-,-
2,ID_4620,"Sector 1, Noida",Fast Food,₹100,₹50,3.6,36,16
3,ID_5470,"Babarpur, New Delhi, Delhi","Mithai, North Indian, Chinese, Fast Food, Sout...",₹200,₹50,3.6,66,33
4,ID_3249,"Sector 1, Noida","Chinese, Fast Food",₹150,₹50,2.9,38,14


In [6]:
test.shape

(2774, 8)

In [7]:
# Checking for the null values

train.isnull().sum()

Restaurant       0
Location         0
Cuisines         0
Average_Cost     0
Minimum_Order    0
Rating           0
Votes            0
Reviews          0
Delivery_Time    0
dtype: int64

In [8]:
# Checking for the null values

test.isnull().sum()

Restaurant       0
Location         0
Cuisines         0
Average_Cost     0
Minimum_Order    0
Rating           0
Votes            0
Reviews          0
dtype: int64

In [9]:
print(train.columns)
print('-'*70)
print(test.columns)

Index(['Restaurant', 'Location', 'Cuisines', 'Average_Cost', 'Minimum_Order',
       'Rating', 'Votes', 'Reviews', 'Delivery_Time'],
      dtype='object')
----------------------------------------------------------------------
Index(['Restaurant', 'Location', 'Cuisines', 'Average_Cost', 'Minimum_Order',
       'Rating', 'Votes', 'Reviews'],
      dtype='object')


In [10]:
train.Delivery_Time.unique()

# 10 min
# 20 min
# 30 min
# 45 min
# 65 min
# 80 min
# 120 min

array(['30 minutes', '65 minutes', '45 minutes', '10 minutes',
       '20 minutes', '120 minutes', '80 minutes'], dtype=object)

In [11]:
train.Location.unique()

array(['FTI College, Law College Road, Pune', 'Sector 3, Marathalli',
       'Mumbai Central', 'Sector 1, Noida',
       'Rmz Centennial, I Gate, Whitefield', 'Delhi University-GTB Nagar',
       'Yerawada, Pune, Maharashtra',
       'Delhi Administration Flats, Timarpur', 'Moulali, Kolkata',
       'Dockyard Road, Mumbai CST Area', 'Pune University',
       'Gora Bazar, Rajbari, North Dumdum, Kolkata',
       'D-Block, Sector 63, Noida', 'Sector 14, Noida',
       'Mico Layout, Stage 2, BTM Layout,Bangalore',
       'Laxman Vihar Industrial Area, Sector 3A, Gurgoan',
       'Tiretti, Kolkata', 'Sandhurst Road, Mumbai CST Area',
       'MG Road, Pune', 'Hyderabad Public School, Begumpet', 'Majestic',
       'Chandni Chowk, Kolkata', 'Delhi High Court, India Gate',
       'Chatta Bazaar, Malakpet, Hyderabad', 'Sector 63A,Gurgaon',
       'Delhi Cantt.', 'Tejas Nagar Colony, Wadala West, Mumbai',
       'Babarpur, New Delhi, Delhi', 'Nathan Road, Mangaldas Road, Pune',
       'Panjetan C

In [12]:
test.Location.unique()

array(['Mico Layout, Stage 2, BTM Layout,Bangalore', 'Sector 1, Noida',
       'Babarpur, New Delhi, Delhi', 'Yerawada, Pune, Maharashtra',
       'Raja Bazar, Kolkata', 'Sector 3, Marathalli',
       'D-Block, Sector 63, Noida', 'Delhi University-GTB Nagar',
       'Delhi High Court, India Gate', 'BTM Layout 1, Electronic City',
       'Sector 14, Noida',
       'Laxman Vihar Industrial Area, Sector 3A, Gurgoan',
       'FTI College, Law College Road, Pune',
       'Sandhurst Road, Mumbai CST Area',
       'Nathan Road, Mangaldas Road, Pune', 'Majestic',
       'Dockyard Road, Mumbai CST Area', 'Delhi Cantt.', 'Mumbai Central',
       'Tiretti, Kolkata', 'Sector 63A,Gurgaon',
       'Rmz Centennial, I Gate, Whitefield', 'MG Road, Pune',
       'Tejas Nagar Colony, Wadala West, Mumbai', 'Moulali, Kolkata',
       'Delhi Administration Flats, Timarpur', 'Chandni Chowk, Kolkata',
       'Musi Nagar, Malakpet, Hyderabad', 'Pune University',
       'Panjetan Colony, Malakpet, Hyderabad',
 

In [252]:
train.Minimum_Order.unique()

array(['₹50', '₹99', '₹0', '₹200', '₹450', '₹350', '₹79', '₹400', '₹199',
       '₹500', '₹250', '₹150', '₹90', '₹299', '₹300', '₹240', '₹89',
       '₹59'], dtype=object)

In [13]:
#Combining trainig set and test sets for analysing data and finding patterns

data_temp = [train[['Restaurant', 'Location', 'Cuisines', 'Average_Cost', 'Minimum_Order',
       'Rating', 'Votes', 'Reviews']], test]

data_temp = pd.concat(data_temp)

data_temp.head()

Unnamed: 0,Restaurant,Location,Cuisines,Average_Cost,Minimum_Order,Rating,Votes,Reviews
0,ID_6321,"FTI College, Law College Road, Pune","Fast Food, Rolls, Burger, Salad, Wraps",₹200,₹50,3.5,12,4
1,ID_2882,"Sector 3, Marathalli","Ice Cream, Desserts",₹100,₹50,3.5,11,4
2,ID_1595,Mumbai Central,"Italian, Street Food, Fast Food",₹150,₹50,3.6,99,30
3,ID_5929,"Sector 1, Noida","Mughlai, North Indian, Chinese",₹250,₹99,3.7,176,95
4,ID_6123,"Rmz Centennial, I Gate, Whitefield","Cafe, Beverages",₹200,₹99,3.2,521,235


In [14]:
data_temp.shape

(13868, 8)

In [15]:
# Finding the number of unique restaurants
data_temp.Restaurant.nunique()

8661

In [16]:
# Analysing cuisines 

cuisines = list(data_temp['Cuisines'])

maxim = 1
for i in cuisines :
    if len(i.split(',')) > maxim:
        maxim = len(i.split(','))

print("\n\nMaximum cuisines in a Cell : ", maxim)    

all_cuisines = []

for i in cuisines :
    if len(i.split(',')) == 1:
        #print(i.split(',')[0])
        all_cuisines.append(i.split(',')[0].strip().upper())
    else :
        for it in range(len(i.split(','))):
            #print(i.split(',')[it])
            all_cuisines.append(i.split(',')[it].strip().upper())

print("\n\nNumber of Unique Cuisines : ", len(pd.Series(all_cuisines).unique()))
print("\n\nUnique Cuisines:\n", pd.Series(all_cuisines).unique())

all_cuisines = list(pd.Series(all_cuisines).unique())



Maximum cuisines in a Cell :  8


Number of Unique Cuisines :  101


Unique Cuisines:
 ['FAST FOOD' 'ROLLS' 'BURGER' 'SALAD' 'WRAPS' 'ICE CREAM' 'DESSERTS'
 'ITALIAN' 'STREET FOOD' 'MUGHLAI' 'NORTH INDIAN' 'CHINESE' 'CAFE'
 'BEVERAGES' 'SOUTH INDIAN' 'THAI' 'ASIAN' 'MITHAI' 'MOMOS' 'INDONESIAN'
 'BIRYANI' 'KERALA' 'BIHARI' 'MEXICAN' 'JAPANESE' 'BAKERY' 'BURMESE'
 'BUBBLE TEA' 'TEA' 'PIZZA' 'LUCKNOWI' 'MANGALOREAN' 'EUROPEAN'
 'CONTINENTAL' 'SANDWICH' 'HEALTHY FOOD' 'BENGALI' 'AMERICAN' 'MISHTI'
 'HYDERABADI' 'ANDHRA' 'ASSAMESE' 'MAHARASHTRIAN' 'GERMAN' 'ARABIAN'
 'FINGER FOOD' 'KEBAB' 'CHETTINAD' 'SEAFOOD' 'JUICES' 'PARSI' 'SUSHI'
 'ODIA' 'TAMIL' 'CANTONESE' 'NORTH EASTERN' 'TIBETAN' 'LEBANESE' 'SPANISH'
 'BAR FOOD' 'KONKAN' 'PAAN' 'STEAK' 'MEDITERRANEAN' 'BOHRI' 'AFGHAN'
 'GOAN' 'GUJARATI' 'BBQ' 'RAW MEATS' 'MALAYSIAN' 'VIETNAMESE' 'SRI LANKAN'
 'RAJASTHANI' 'POKÉ' 'ROAST CHICKEN' 'COFFEE' 'BRAZILIAN' 'BELGIAN' 'NAGA'
 'KOREAN' 'MODERN INDIAN' 'AWADHI' 'KASHMIRI' 'FRENCH' 'PORTUGUES

In [17]:
# Analysing Location

all_cities = list(data_temp['Location'])

for i in range(len(all_cities)):
    if type(all_cities[i]) == float:
        all_cities[i] = 'NOT AVAILABLE'
    all_cities[i] = all_cities[i].strip().upper()
        
print("\n\nNumber of Unique Locations (Including NOT AVAILABLE): ", len(pd.Series(all_cities).unique()))
print("\n\nUnique Locations:\n", pd.Series(all_cities).unique())
 
all_cities = list(pd.Series(all_cities).unique())



Number of Unique Locations (Including NOT AVAILABLE):  35


Unique Locations:
 ['FTI COLLEGE, LAW COLLEGE ROAD, PUNE' 'SECTOR 3, MARATHALLI'
 'MUMBAI CENTRAL' 'SECTOR 1, NOIDA' 'RMZ CENTENNIAL, I GATE, WHITEFIELD'
 'DELHI UNIVERSITY-GTB NAGAR' 'YERAWADA, PUNE, MAHARASHTRA'
 'DELHI ADMINISTRATION FLATS, TIMARPUR' 'MOULALI, KOLKATA'
 'DOCKYARD ROAD, MUMBAI CST AREA' 'PUNE UNIVERSITY'
 'GORA BAZAR, RAJBARI, NORTH DUMDUM, KOLKATA' 'D-BLOCK, SECTOR 63, NOIDA'
 'SECTOR 14, NOIDA' 'MICO LAYOUT, STAGE 2, BTM LAYOUT,BANGALORE'
 'LAXMAN VIHAR INDUSTRIAL AREA, SECTOR 3A, GURGOAN' 'TIRETTI, KOLKATA'
 'SANDHURST ROAD, MUMBAI CST AREA' 'MG ROAD, PUNE'
 'HYDERABAD PUBLIC SCHOOL, BEGUMPET' 'MAJESTIC' 'CHANDNI CHOWK, KOLKATA'
 'DELHI HIGH COURT, INDIA GATE' 'CHATTA BAZAAR, MALAKPET, HYDERABAD'
 'SECTOR 63A,GURGAON' 'DELHI CANTT.'
 'TEJAS NAGAR COLONY, WADALA WEST, MUMBAI' 'BABARPUR, NEW DELHI, DELHI'
 'NATHAN ROAD, MANGALDAS ROAD, PUNE'
 'PANJETAN COLONY, MALAKPET, HYDERABAD' 'RAJA BAZAR, KOLKATA'
 '

In [239]:
train.dtypes

Restaurant       object
Location         object
Cuisines         object
Average_Cost     object
Minimum_Order    object
Rating           object
Votes            object
Reviews          object
Delivery_Time    object
dtype: object

In [41]:
a = list(data_temp['Location'])
a = a[0].split(',')
a[-1]

['FTI College, Law College Road, Pune',
 'Sector 3, Marathalli',
 'Mumbai Central',
 'Sector 1, Noida',
 'Rmz Centennial, I Gate, Whitefield',
 'Rmz Centennial, I Gate, Whitefield',
 'Mumbai Central',
 'Delhi University-GTB Nagar',
 'Delhi University-GTB Nagar',
 'Sector 1, Noida',
 'Mumbai Central',
 'Yerawada, Pune, Maharashtra',
 'Sector 1, Noida',
 'Delhi University-GTB Nagar',
 'Delhi Administration Flats, Timarpur',
 'Moulali, Kolkata',
 'Sector 1, Noida',
 'Dockyard Road, Mumbai CST Area',
 'Pune University',
 'Gora Bazar, Rajbari, North Dumdum, Kolkata',
 'D-Block, Sector 63, Noida',
 'Sector 14, Noida',
 'D-Block, Sector 63, Noida',
 'D-Block, Sector 63, Noida',
 'Sector 3, Marathalli',
 'Mico Layout, Stage 2, BTM Layout,Bangalore',
 'D-Block, Sector 63, Noida',
 'Mumbai Central',
 'Dockyard Road, Mumbai CST Area',
 'Sector 1, Noida',
 'Gora Bazar, Rajbari, North Dumdum, Kolkata',
 'D-Block, Sector 63, Noida',
 'FTI College, Law College Road, Pune',
 'D-Block, Sector 63, Noida

In [31]:
a = list(data_temp['Location']) 

for i in a:
    a = i.split(',')
    print(a[-1])

 Pune
 Marathalli
Mumbai Central
 Noida
 Whitefield
 Whitefield
Mumbai Central
Delhi University-GTB Nagar
Delhi University-GTB Nagar
 Noida
Mumbai Central
 Maharashtra
 Noida
Delhi University-GTB Nagar
 Timarpur
 Kolkata
 Noida
 Mumbai CST Area
Pune University
 Kolkata
 Noida
 Noida
 Noida
 Noida
 Marathalli
Bangalore
 Noida
Mumbai Central
 Mumbai CST Area
 Noida
 Kolkata
 Noida
 Pune
 Noida
 Gurgoan
 Noida
Pune University
 Kolkata
Delhi University-GTB Nagar
 Kolkata
 Mumbai CST Area
Mumbai Central
 Pune
 Begumpet
 Kolkata
Majestic
 Noida
Bangalore
Delhi University-GTB Nagar
Delhi University-GTB Nagar
 Pune
 Pune
 Mumbai CST Area
 Pune
 Noida
 Kolkata
 Pune
Delhi University-GTB Nagar
Bangalore
 Timarpur
 Pune
 Noida
 Maharashtra
 Marathalli
 Noida
 Noida
 Mumbai CST Area
Bangalore
 India Gate
Mumbai Central
 Whitefield
 Noida
 Noida
 Kolkata
Majestic
 Kolkata
 Timarpur
 Whitefield
Delhi University-GTB Nagar
 Noida
 Mumbai CST Area
 Noida
 Noida
 Noida
 Pune
 Noida
 Pune
 Noida
 Whitefi

 India Gate
 Pune
 Noida
Majestic
 Pune
Delhi University-GTB Nagar
 Noida
Bangalore
 Gurgoan
 Pune
Gurgaon
 Mumbai CST Area
 Whitefield
 Noida
Bangalore
 Kolkata
 Gurgoan
 Pune
Bangalore
Bangalore
 Noida
Delhi University-GTB Nagar
 Mumbai CST Area
Mumbai Central
Pune University
Majestic
Gurgaon
 Maharashtra
 Noida
Mumbai Central
 Gurgoan
 Pune
 Mumbai CST Area
 Noida
 Noida
 Marathalli
 Noida
Gurgaon
Bangalore
 Maharashtra
 Kolkata
 Hyderabad
 Gurgoan
 Timarpur
 Noida
Majestic
Bangalore
Mumbai Central
 Delhi
 Delhi
 Whitefield
Delhi University-GTB Nagar
 Pune
 Whitefield
 Hyderabad
 Timarpur
 Noida
 Pune
 Gurgoan
 Delhi
Mumbai Central
 Hyderabad
Pune University
 Marathalli
 India Gate
 Whitefield
 Mumbai CST Area
 Pune
 Noida
 Timarpur
 Pune
Gurgaon
Bangalore
 Kolkata
Delhi University-GTB Nagar
 Noida
 Timarpur
 Noida
 Pune
 Mumbai
Majestic
 Noida
 Mumbai CST Area
Majestic
 Noida
Delhi University-GTB Nagar
 Whitefield
 Maharashtra
 Noida
Pune University
Bangalore
 Timarpur
 Mumbai CST 

Bangalore
Bangalore
Delhi Cantt.
 Whitefield
Delhi University-GTB Nagar
Bangalore
Majestic
Majestic
 Gurgoan
 Whitefield
 Pune
 Whitefield
Pune University
 Pune
 Pune
 Noida
 Noida
 Mumbai CST Area
 Kolkata
 Mumbai CST Area
 Marathalli
 Maharashtra
 Whitefield
 Noida
 Pune
 Gurgoan
 Pune
 Noida
Majestic
 Maharashtra
 Noida
Delhi University-GTB Nagar
 Kolkata
 Pune
 Mumbai
 Marathalli
 Maharashtra
 Pune
 Maharashtra
Delhi Cantt.
 Maharashtra
Gurgaon
Gurgaon
 Gurgoan
Bangalore
 Pune
 Timarpur
 Noida
 India Gate
Delhi Cantt.
 Pune
 Kolkata
Delhi University-GTB Nagar
 Gurgoan
 Marathalli
 Maharashtra
 Hyderabad
 Mumbai CST Area
 Kolkata
 Kolkata
 Pune
Majestic
Bangalore
 Pune
 Noida
 Delhi
 Pune
 Whitefield
Bangalore
 Noida
 Pune
 Whitefield
 Noida
 Whitefield
 Noida
 Pune
 Timarpur
 Noida
Delhi University-GTB Nagar
 Mumbai CST Area
 Pune
 Noida
 Mumbai CST Area
 Noida
 Marathalli
 Delhi
 Noida
 Noida
 Marathalli
 Mumbai
 Kolkata
 Pune
 Marathalli
 Noida
Delhi Cantt.
 Pune
 Noida
 Whitefie

 Noida
 Timarpur
 Noida
 Maharashtra
 Timarpur
Delhi Cantt.
Pune University
Mumbai Central
Delhi University-GTB Nagar
 Mumbai CST Area
Bangalore
 Electronic City
 Timarpur
 Maharashtra
 Gurgoan
 Whitefield
 Pune
 Noida
 Pune
 Noida
Delhi University-GTB Nagar
 Maharashtra
 Mumbai CST Area
 Kolkata
Bangalore
 Timarpur
 Gurgoan
 Pune
 Whitefield
 Pune
 Gurgoan
Bangalore
 Pune
 Maharashtra
Mumbai Central
 Pune
Pune University
 Mumbai CST Area
 Noida
 Kolkata
Bangalore
 Maharashtra
Pune University
Bangalore
 Noida
 Noida
 India Gate
Mumbai Central
Delhi University-GTB Nagar
 Delhi
Majestic
 Hyderabad
 Noida
 Hyderabad
 Gurgoan
 India Gate
Delhi University-GTB Nagar
Delhi Cantt.
 Marathalli
 Noida
 Noida
 Pune
 Timarpur
 Noida
 Pune
 Electronic City
 Noida
Gurgaon
 Hyderabad
 Pune
 Kolkata
Gurgaon
 Hyderabad
 Mumbai CST Area
 Noida
 India Gate
 Delhi
 Noida
 Timarpur
 India Gate
 Timarpur
 Gurgoan
 Kolkata
 Gurgoan
 Marathalli
 Pune
Delhi University-GTB Nagar
Bangalore
Delhi Cantt.
 Delhi
 T

 Noida
Delhi University-GTB Nagar
 Pune
 Mumbai CST Area
 Timarpur
 Pune
 Noida
 Marathalli
 Pune
 Noida
Bangalore
 Timarpur
 Maharashtra
 Whitefield
 Pune
 Gurgoan
Bangalore
Delhi University-GTB Nagar
Delhi University-GTB Nagar
 Noida
 Hyderabad
Delhi Cantt.
 Noida
 Noida
 Kolkata
 Whitefield
Bangalore
Bangalore
 Noida
 Pune
 Gurgoan
 Noida
 Whitefield
 Pune
Mumbai Central
Delhi Cantt.
Pune University
Bangalore
Pune University
 Kolkata
 Gurgoan
 Noida
 Hyderabad
 Hyderabad
 Timarpur
 Delhi
 Maharashtra
 Maharashtra
Mumbai Central
Delhi University-GTB Nagar
Delhi Cantt.
 Whitefield
 Noida
 Kolkata
 Pune
 Kolkata
 Pune
 Hyderabad
 Pune
Mumbai Central
 Noida
 Hyderabad
Delhi Cantt.
Delhi University-GTB Nagar
Bangalore
 Mumbai
Majestic
 Pune
 Timarpur
 Noida
Bangalore
 Noida
Bangalore
 Noida
Delhi University-GTB Nagar
Mumbai Central
Bangalore
 Gurgoan
 Begumpet
 Noida
 Noida
Delhi University-GTB Nagar
 Noida
Delhi University-GTB Nagar
 Noida
 Delhi
 Pune
 Gurgoan
 Pune
 Begumpet
 Whitefie

In [69]:
a = ['SECTOR 3, MARATHALLI','SECTOR 3, MARATHALLI, HYDERABAD']

for i in list(a):
    a = i.split(',')
    print(a[-1])
    #print(a)
    #if a[-1]=='MARATHALLI':
        #a.append('BANGALORE')
    #print(a)

 MARATHALLI
 HYDERABAD


In [143]:
a = ['SECTOR 3, MARATHALLI','SECTOR 3, MARATHALLI, HYDERABAD']
bang = ', BANGALORE'

for i in list(a):
    a = i.split(',')
    my_new_list = [x + bang for x in a]
    print(my_new_list)

['SECTOR 3, BANGALORE', ' MARATHALLI, BANGALORE']
['SECTOR 3, BANGALORE', ' MARATHALLI, BANGALORE', ' HYDERABAD, BANGALORE']


In [150]:
my_list = ["foo, bar", "fob", "faz", "funk, bar"]


def list_append(p):
    p+=["101"]

for i in my_list:
    my_list = i.split(",")
    if my_list[-1] == " bar":
        my_list = list_append(my_list)
    print(my_list)
        #my_list = my_list + ['101']

None
['fob']
['faz']
None


In [187]:
friends=["Rajendra V","Mini, Veeru","Veeru, Veeru"]
for i in range(len(friends)):
    print(friends[i])
    if friends[-1] == ' Veeru':
        friends.append('Pappu')
        print(friends)

#for i in friends:
#    a = i.split(",")
    
#a = a[-1]
#if a == ' Veeru':
#print(a)
#friends.append("Pappu")
#print(friends)


Rajendra V
Mini, Veeru
Veeru, Veeru


In [168]:
words = ['aba', 'xyz', 'xgx', 'dssd', 'sdjh']

for i in range(len(words)):
    print(words[i])

aba
xyz
xgx
dssd
sdjh


In [231]:
for i in train['Cuisines']:
    a = i.split(',')
    print(len(a))
    print(a)

5
2
3
3
2
3
2
3
2
6
3
4
4
1
1
2
1
4
2
1
4
2
2
2
3
2
2
3
1
2
1
5
4
2
1
8
1
2
2
2
2
2
3
1
1
2
2
1
2
1
2
2
2
3
2
1
3
1
2
3
2
3
6
1
2
1
1
3
2
3
3
2
2
3
4
2
2
2
1
1
4
3
2
2
2
2
3
1
4
1
2
5
2
3
3
1
1
1
2
1
1
3
2
1
2
1
5
2
2
1
1
1
2
2
3
1
2
5
2
1
2
1
1
3
4
1
2
1
2
2
2
1
2
4
1
8
2
1
3
2
2
2
1
1
5
1
1
3
2
2
3
2
4
2
3
1
3
2
4
2
3
1
2
2
3
1
1
2
2
3
2
2
1
2
1
2
2
2
3
1
4
2
2
2
1
2
1
3
1
1
2
4
1
3
2
2
3
8
2
2
3
1
2
3
2
3
2
2
3
7
1
1
4
5
4
4
2
1
1
3
2
2
6
3
3
2
5
2
1
3
1
1
3
2
2
2
2
1
1
2
3
2
2
3
2
3
1
2
3
2
2
7
2
2
3
1
2
2
1
1
2
2
3
3
5
5
1
6
1
3
2
2
4
3
3
1
1
2
2
3
3
2
3
1
1
4
1
1
3
1
4
2
1
1
3
3
2
4
2
1
1
4
2
5
1
1
2
2
2
1
3
2
8
3
2
3
2
2
4
1
2
2
2
3
1
2
2
2
1
3
2
1
1
5
2
4
5
1
4
3
5
2
4
1
4
1
1
3
3
3
2
1
2
1
3
2
2
3
1
2
1
2
1
2
1
2
2
1
1
1
1
1
2
2
3
3
3
4
1
2
1
1
4
2
2
6
3
2
2
1
2
3
2
2
6
2
4
5
3
2
1
3
2
1
3
2
2
3
1
2
4
2
2
2
4
1
1
1
3
2
1
3
2
1
2
1
2
4
2
4
2
2
3
1
1
2
3
1
1
4
3
2
2
3
2
3
1
2
3
2
3
1
3
2
3
4
4
3
5
3
1
3
1
5
2
2
3
2
1
3
1
4
3
4
1
3
4
2
4
3
2
2
3
2
3
3
3
3
3
1
2
2
2
2
1
3
2
2
3
8


1
5
2
4
1
1
2
2
2
2
2
2
4
1
2
2
2
1
3
4
2
1
3
1
3
2
2
2
1
3
2
1
2
2
1
2
1
3
2
2
1
3
2
1
1
3
1
2
1
2
2
1
3
2
4
4
1
1
2
2
4
4
2
1
2
2
4
2
3
3
3
4
2
2
2
3
2
4
2
3
3
2
1
4
3
5
1
1
4
2
3
3
3
4
3
2
2
3
3
1
1
2
5
3
5
2
2
4
2
1
1
1
1
3
2
2
3
2
2
3
1
2
2
3
1
2
3
2
2
5
2
1
1
5
1
1
2
1
3
1
2
2
3
2
3
4
2
2
2
2
3
2
2
3
1
2
3
1
2
3
5
7
2
3
2
2
4
2
3
1
2
1
2
3
4
3
2
3
2
1
2
2
3
3
3
1
3
2
2
5
1
1
3
1
1
2
1
1
3
1
2
1
2
3
3
2
4
3
1
2
1
2
3
2
3
2
5
4
3
1
3
1
2
1
3
4
1
2
3
1
3
2
2
7
2
1
2
2
1
1
3
1
3
1
3
2
3
3
1
3
1
1
3
1
1
1
5
6
2
2
4
2
1
1
7
2
6
4
6
2
1
2
2
1
4
3
1
1
1
2
2
3
3
3
2
2
2
2
1
2
1
2
2
2
1
3
3
2
2
5
2
1
4
2
3
3
1
3
2
3
3
3
2
2
2
2
2
2
4
4
2
2
1
2
1
2
2
1
2
3
3
2
2
2
1
1
2
1
2
1
2
1
1
2
2
2
2
1
1
3
3
2
2
3
1
2
3
3
2
2
1
1
2
1
2
1
1
1
2
1
4
3
1
1
1
1
2
2
1
2
1
4
2
2
3
1
1
2
5
2
1
2
7
1
1
3
2
3
1
4
1
2
1
3
3
2
2
4
2
1
1
2
3
1
3
1
2
2
4
1
1
2
4
2
5
2
3
2
2
3
1
1
4
2
1
2
8
2
3
3
2
1
2
1
2
1
2
3
4
1
4
4
3
2
2
1
3
3
1
1
8
5
3
7
1
2
1
3
2
2
1
1
3
3
2
1
2
1
1
2
1
2
1
3
1
4
2
3
2
2
2
4
2
1
3
1
2
2
4
4


3
2
1
1
2
3
2
1
1
2
2
2
4
2
4
2
2
3
3
2
2
2
3
1
2
2
2
2
4
1
2
2
4
3
1
4
2
3
4
2
1
2
3
3
4
1
3
1
1


In [234]:
for i in train['Location']:
    a = i.split(',')
    print(len(a))
    #print(a)

3
2
1
2
3
3
1
1
1
2
1
3
2
1
2
2
2
2
1
4
3
2
3
3
2
4
3
1
2
2
4
3
3
3
3
2
1
2
1
2
2
1
2
2
2
1
3
4
1
1
3
3
2
2
2
2
2
1
4
2
2
2
3
2
2
2
2
4
2
1
3
3
2
2
1
2
2
3
1
2
2
3
3
2
2
2
2
3
3
3
2
2
4
3
1
3
2
3
1
1
2
1
4
1
3
2
3
3
2
2
2
4
2
3
2
3
3
3
2
4
3
3
1
3
4
3
3
3
4
2
3
3
4
2
1
2
1
3
1
3
3
3
2
3
3
3
1
2
1
4
3
1
2
3
3
2
3
1
2
2
2
1
2
2
2
3
3
2
3
1
3
3
3
3
3
2
1
2
3
1
1
3
3
3
3
2
1
1
1
3
2
4
3
2
2
1
1
3
2
1
2
3
4
4
3
2
3
3
3
2
1
2
1
4
4
2
1
3
3
2
1
2
3
2
1
3
2
3
1
2
3
2
4
2
2
3
2
2
1
2
3
3
2
4
3
3
1
2
3
2
1
3
2
1
2
2
3
2
2
4
2
1
2
3
3
2
3
2
2
3
2
2
4
3
3
2
2
2
2
2
2
2
2
1
1
3
2
2
3
1
3
1
3
4
3
1
1
2
1
3
1
2
3
2
3
2
2
2
2
3
2
2
4
1
2
2
2
4
4
2
2
3
3
1
2
1
1
3
3
4
3
3
4
4
2
3
2
1
3
3
3
3
3
2
4
3
1
2
2
3
2
2
2
2
3
2
1
4
1
2
2
2
2
2
3
2
3
2
2
2
2
2
4
4
4
2
4
3
2
4
3
3
2
3
2
3
3
1
2
4
2
1
1
3
1
2
2
2
1
2
1
3
3
3
4
3
2
3
2
2
2
2
2
2
3
3
2
2
2
4
2
3
3
3
3
2
3
2
4
3
2
4
3
3
1
2
3
1
2
3
1
1
2
3
1
3
2
1
2
1
2
1
1
1
2
2
3
1
3
3
1
2
4
2
1
1
2
1
2
3
1
2
2
3
3
2
2
2
3
2
3
3
2
2
2
3
3
2
1
3
1
2
3
3
1
2
2
3
3
2


2
2
3
1
2
2
2
1
3
3
2
3
1
2
4
2
2
3
4
3
2
2
3
2
4
2
1
3
3
4
4
3
3
3
2
2
3
3
4
2
2
3
3
3
1
3
3
3
2
3
3
1
2
3
2
3
3
3
4
2
2
3
2
3
4
3
3
2
2
1
2
2
2
3
3
2
1
1
2
2
4
2
3
3
3
3
2
3
3
3
1
2
3
4
3
2
2
3
1
2
2
2
2
3
1
1
1
3
1
1
3
3
2
2
3
3
1
3
3
1
3
2
4
1
4
2
3
2
2
1
2
2
4
4
2
4
2
3
2
3
2
3
2
3
2
2
2
1
2
2
2
3
3
3
2
3
3
2
2
1
3
2
3
3
3
4
3
2
1
2
2
2
3
3
1
2
4
1
1
2
3
1
3
4
2
4
3
1
2
2
4
2
4
2
3
2
2
3
4
4
2
2
2
2
3
3
1
2
3
3
2
3
1
1
3
2
3
3
4
3
4
2
3
3
1
2
2
4
3
2
2
4
2
3
2
2
3
4
2
2
3
4
1
2
2
2
1
1
2
4
3
2
2
3
2
2
2
1
1
4
2
2
4
3
3
3
2
1
1
3
2
1
2
3
3
2
3
4
4
2
1
2
2
3
2
3
3
3
2
3
2
3
3
1
3
2
2
1
2
1
1
3
4
2
3
3
2
3
3
2
1
2
1
3
2
3
2
3
2
2
3
2
2
2
4
2
3
2
1
3
2
4
3
2
3
3
2
2
2
2
2
2
3
3
1
3
2
2
3
2
2
3
4
1
2
3
3
4
2
2
2
3
1
2
1
3
2
2
2
2
3
2
3
3
2
3
1
4
1
2
1
4
2
3
2
4
2
2
3
2
2
2
3
3
2
1
3
3
1
1
2
2
3
2
3
1
3
2
3
3
3
3
1
4
4
1
1
3
3
2
3
1
2
3
3
2
1
3
3
1
3
1
3
2
2
1
2
1
1
3
1
3
4
4
4
2
2
1
2
3
3
3
3
3
4
2
3
2
3
2
4
2
3
2
3
1
2
3
2
3
1
3
2
2
4
2
3
2
3
3
4
3
3
2
3
2
2
1
3
1
2
2
2
2
1
3
3
4
2
4


1
4
2
3
3
1
2
2
3
3
2
3
1
2
3
2
3
1
2
2
1
2
3
3
2
1
1
2
2
3
3
3
2
1
1
1
3
1
2
1
1
1
3
3
2
3
2
4
2
2
3
1
2
3
4
2
1
1
4
2
3
1
3
2
2
4
1
1
3
4
3
3
4
2
3
3
3
3
2
1
2
1
2
4
1
1
1
2
2
2
2
2
1
4
2
4
2
1
1
1
1
1
2
1
2
4
2
2
1
3
3
3
3
3
2
4
2
2
3
3
2
3
1
2
1
2
3
4
2
2
3
2
3
3
2
2
2
3
1
2
2
3
2
3
1
1
2
2
1
2
2
3
2
2
2
2
4
3
2
1
3
3
3
3
1
3
2
4
2
2
1
3
3
3
1
3
2
3
3
2
1
3
2
2
2
3
1
3
3
2
1
2
3
3
2
4
3
3
3
1
3
3
2
3
3
2
2
3
1
2
3
2
3
3
3
2
3
3
3
3
3
3
1
4
3
1
3
3
3
1
3
1
3
2
1
3
2
2
3
2
3
1
3
3
3
2
2
1
4
2
1
3
2
2
2
3
3
3
3
2
3
2
2
3
2
3
2
3
3
1
3
1
2
3
4
3
2
1
3
2
3
2
1
1
1
3
2
2
3
3
1
2
4
2
2
2
4
4
1
1
2
4
4
1
2
1
4
2
2
3
3
2
3
1
2
3
3
3
3
3
2
2
1
3
2
1
3
4
2
4
1
3
2
3
1
1
3
4
2
3
4
3
2
3
2
2
1
2
4
1
3
2
2
2
1
3
2
4
4
1
1
2
2
3
2
2
3
2
1
1
1
1
2
4
2
2
3
3
3
3
2
2
2
1
3
2
2
4
2
3
3
3
3
3
4
3
3
1
2
1
2
2
2
4
3
1
4
2
2
2
1
1
3
1
3
2
3
3
2
1
1
2
2
2
3
2
3
3
2
2
2
3
3
2
2
3
2
2
2
3
2
2
2
2
3
2
3
2
3
1
4
1
3
2
2
2
2
1
3
2
3
2
1
3
2
2
2
2
2
4
2
4
3
3
2
3
1
3
3
2
3
3
4
2
2
3
1
2
2
3
2
2
2
2
1
1
4
4
2
3


In [240]:
train.Minimum_Order

0        ₹50
1        ₹50
2        ₹50
3        ₹99
4        ₹99
        ... 
11089    ₹50
11090    ₹50
11091    ₹50
11092    ₹50
11093    ₹50
Name: Minimum_Order, Length: 11094, dtype: object

In [None]:
# Data Cleaning

###############################################################################################################################################


# Cleaning Training Set
#______________________


#Cleaning CUISINES 

cuisines = list(train['Cuisines'])
   
# Since Maximum number of cuisines in a cell is 8 will will split title in to 8 columns
   
C1 = []
C2 = []
C3 = []
C4 = []
C5 = []
C6 = []
C7 = []
C8 = []


for i in cuisines:
        try :
            C1.append(i.split(',')[0].strip().upper())
        except :
            C1.append('NONE')
        try :
            C2.append(i.split(',')[1].strip().upper())
        except :
            C2.append('NONE')
        try :
            C3.append(i.split(',')[2].strip().upper())
        except :
            C3.append('NONE')
        try :
            C4.append(i.split(',')[3].strip().upper())
        except :
            C4.append('NONE')
        try :
            C5.append(i.split(',')[4].strip().upper())
        except :
            C5.append('NONE')
        try :
            C6.append(i.split(',')[5].strip().upper())
        except :
            C6.append('NONE')
        try :
            C7.append(i.split(',')[6].strip().upper())
        except :
            C7.append('NONE')
        try :
            C8.append(i.split(',')[7].strip().upper())
        except :
            C8.append('NONE')

# appending NONE to Unique cuisines list
all_cuisines.append('NONE')

# Cleaning Location

location = list(train['Location'])
   
# Since Maximum number of cuisines in a cell is 8 will will split title in to 8 columns
   
L1 = []
L2 = []
L3 = []
L4 = []


for i in cuisines:
        try :
            L1.append(i.split(',')[0].strip().upper())
        except :
            L1.append('NONE')
        try :
            L2.append(i.split(',')[1].strip().upper())
        except :
            L2.append('NONE')
        try :
            L3.append(i.split(',')[2].strip().upper())
        except :
            L3.append('NONE')
        try :
            L4.append(i.split(',')[3].strip().upper())
        except :
            L4.append('NONE')


# appending NONE to Unique locations list
all_location.cuisines.append('NONE')


# * Restaurant	* Location	 * Cuisines 	Average_Cost	Minimum_Order	 * Rating	 * Votes	* Reviews

    

# Cleanign Average Cost


# Cleaning Minimum order
    
    
#Cleaning Rating

rates = list(train['Rating'])

for i in range(len(rates)) :
    try:
        rates[i] = float(rates[i])
    except :
        rates[i] = np.nan


# Votes
       
votes = list(train['Votes'])

for i in range(len(votes)) :
    try:
        votes[i] = int(votes[i])
    except :
        votes[i] = np.nan 
    

# Reviews
       
reviews = list(train['Reviews'])

for i in range(len(reviews)) :
    try:
        reviews[i] = int(reviews[i])
    except :
        reviews[i] = np.nan

    
    

new_data_train = {}

new_data_train['RESTAURANT_ID'] = train["Restaurant"]
new_data_train['LOCATION1'] = L1
new_data_train['LOCATION2'] =L2
new_data_train['LOCATION3'] = L3
new_data_train['LOCATION4'] = L4
new_data_train['CUISINE1'] = C1
new_data_train['CUISINE2'] = C2
new_data_train['CUISINE3'] = C3
new_data_train['CUISINE4'] = C4
new_data_train['CUISINE5'] = C5
new_data_train['CUISINE6'] = C6
new_data_train['CUISINE7'] = C7
new_data_train['CUISINE8'] = C8
new_data_train['RATING'] = rates
new_data_train['VOTES'] = votes
new_data_train['REVIEWS'] = reviews

new_data_train = pd.DataFrame(new_data_train)
#______________________



#______________________
# Cleaning Test Set
#______________________




#Cleaning CUISINES 

cuisines = list(test['Cuisines'])
   
# Since Maximum number of cuisines in a cell is 8 will will split title in to 8 columns
   
C1 = []
C2 = []
C3 = []
C4 = []
C5 = []
C6 = []
C7 = []
C8 = []


for i in cuisines:
        try :
            C1.append(i.split(',')[0].strip().upper())
        except :
            C1.append('NONE')
        try :
            C2.append(i.split(',')[1].strip().upper())
        except :
            C2.append('NONE')
        try :
            C3.append(i.split(',')[2].strip().upper())
        except :
            C3.append('NONE')
        try :
            C4.append(i.split(',')[3].strip().upper())
        except :
            C4.append('NONE')
        try :
            C5.append(i.split(',')[4].strip().upper())
        except :
            C5.append('NONE')
        try :
            C6.append(i.split(',')[5].strip().upper())
        except :
            C6.append('NONE')
        try :
            C7.append(i.split(',')[6].strip().upper())
        except :
            C7.append('NONE')
        try :
            C8.append(i.split(',')[7].strip().upper())
        except :
            C8.append('NONE')

# appending NONE to Unique cuisines list
all_cuisines.append('NONE')

# Cleaning Location

location = list(test['Location'])
   
# Since Maximum number of cuisines in a cell is 8 will will split title in to 8 columns
   
L1 = []
L2 = []
L3 = []
L4 = []


for i in cuisines:
        try :
            L1.append(i.split(',')[0].strip().upper())
        except :
            L1.append('NONE')
        try :
            L2.append(i.split(',')[1].strip().upper())
        except :
            L2.append('NONE')
        try :
            L3.append(i.split(',')[2].strip().upper())
        except :
            L3.append('NONE')
        try :
            L4.append(i.split(',')[3].strip().upper())
        except :
            L4.append('NONE')


# appending NONE to Unique locations list
all_location.cuisines.append('NONE')


# * Restaurant	* Location	 * Cuisines 	Average_Cost	Minimum_Order	 * Rating	 * Votes	* Reviews

    

# Cleanign Average Cost


# Cleaning Minimum order
    
    
#Cleaning Rating

rates = list(test['Rating'])

for i in range(len(rates)) :
    try:
        rates[i] = float(rates[i])
    except :
        rates[i] = np.nan


# Votes
       
votes = list(test['Votes'])

for i in range(len(votes)) :
    try:
        votes[i] = int(votes[i])
    except :
        votes[i] = np.nan 
    

# Reviews
       
reviews = list(test['Reviews'])

for i in range(len(reviews)) :
    try:
        reviews[i] = int(reviews[i])
    except :
        reviews[i] = np.nan

    
    

new_data_test = {}

new_data_test['RESTAURANT_ID'] = train["Restaurant"]
new_data_test['LOCATION1'] = L1
new_data_test['LOCATION2'] =L2
new_data_test['LOCATION3'] = L3
new_data_test['LOCATION4'] = L4
new_data_test['CUISINE1'] = C1
new_data_test['CUISINE2'] = C2
new_data_test['CUISINE3'] = C3
new_data_test['CUISINE4'] = C4
new_data_test['CUISINE5'] = C5
new_data_test['CUISINE6'] = C6
new_data_test['CUISINE7'] = C7
new_data_test['CUISINE8'] = C8
new_data_test['RATING'] = rates
new_data_test['VOTES'] = votes
new_data_test['REVIEWS'] = reviews


new_data_test = pd.DataFrame(new_data_test)

print("\n\nnew_data_train: \n", new_data_train.head())
print("\n\nnew_data_test: \n", new_data_test.head())

In [241]:
df = pd.DataFrame( {'names': ['AA', 'BB', 'CC', 'DD', 'EE'], 'marks': [10, 20, 40, 30, 20]})

In [242]:
df

Unnamed: 0,names,marks
0,AA,10
1,BB,20
2,CC,40
3,DD,30
4,EE,20


In [250]:
df['marks'].value_counts()

20    2
30    1
10    1
40    1
Name: marks, dtype: int64