## Problem Description:



The first thing any visitor to India will take in — probably while staring out the window in awe as their aeroplane descends is the sheer size of this country. It is densely populated and patchworked with distinct neighbourhoods, each with its own culinary identity. It would take several lifetimes to get to know all of the street stands, holes in the wall, neighbourhood favourites, and high-end destinations in this city.
And for Indians dining out is and always will be a joyous occasion. Everyone has their own favourite restaurants in the city starting from the street food stall across the street to the 5-star restaurants in the heart of the city. Some are favourites because of the memory attached to it and some are favourites because of the fact that the place has a fantastic ambience. There are a lot of other factors as well which contribute to the likeness of the restaurants which in turn determines their popularity among the masses. 

If you look at this from the business perspective for a restaurant, more popularity may mean more visits to the joint increasing the annual turnover of the restaurants. For any restaurant to survive and do well, the annual turnover of the restaurants has to be substantial. 

This problem takes a shot at predicting the annual turnover of a set of restaurants across India based on a set of variables given in the data set. This includes the data related to the restaurant such as location, opening date, cuisine type, themes etc. This also includes data pooled from different sources such as social media popularity index, Zomato ratings, etc. Lastly, it also adds a different flavour to the problem by looking at the Customer survey data as well as ratings provided by mystery visitor data (audit done by a third party). 


## Data Dictionary:

Registration Number: 	This is a restaurant ID

Annual Turnover:	Annual revenue of the restaurant in INR

Cuisine:	Type of cuisine served in the restaurant

City:	City at which the restaurant is located

Restaurant Location:	This variable determines whether the restaurant is located near a business hub or a party hub

Opening Day of Restaurant:	Date of the opening of the restaurant

Facebook Popularity Quotient:	Out of 100, this is the popularity of the restaurant on Facebook

Endoresed By:	This variable tells us what kind of celebrity endorses the restaurant

Instagram Popularity Quotient:	Out of 100, this is the popularity of the restaurant on Instragram

Fire Audit:	This variable tells us whether the fire audit is succesfull in the restaurant. Here 1 means appropriate fire safety is present, 0 means appropriate fire safety is not present

Liquor License Obtained:	This variable tells us whether the restaurant has liquour license or not. 1 means liquor license is present, 0 means otherwise

Situated in a Multi Complex:	This variable tells us whether the restaurant is situated in a multi complex. 1 means the restaurant is present in the multi complex, 0 means otherwise

Dedicated Parking:	This variable tells us whether the restaurant has a dedicated parking space. 1 means dedicated parking space is present, 0 means otherwise

Open Sitting Available:	This variable tells us whether the restaurant has opening sitting. 1 means opening sitting is present, 0 means otherwise

Resturant Tier:	This variable tells us what tier the restaurant belongs to.

Restaurant Type:	This variable tells us the type of restaurant.

Restaurant Theme:	This variable tells us the theme of the restaurant by which it is designed.

Restaurant Zomato Rating:	This variable tells us the Zomato rating of the restaurant on a scale of 1 to 5, 5 being the highest.

Restaurant City Tier:	This variable tells us the tier that restaurant belongs to

Order Wait Time:	This variable rates the waiting time of the restaurant on a scale of 1 to 10, 10 being the highest

Staff Responsivness	:This variable rates the staff responsive of the restaurant on a scale of 1 to 8, 8 being the highest

Value for Money:	This variable rates the staff responsive of the restaurant on a scale of 1 to 7, 7 being the highest

Hygiene Rating:	This is the Hygiene Rating of restuarent on a scale of 1 to 10,10 being the highest

Food Rating:	This is the food Rating of restuarent on a scale of 1 to 10, 10 being the highest

Overall Restaurant Rating:	This is the overall restaurant rating on a scale of 1 to 10, 10 being the highest

Live Music Rating:	This variable gives an indication about the satisfacation from the Live musicon a scale of 1 to 10, 10 being the highest, NA means restuarant do not offer Live music

Comedy Gigs Rating:	This variable gives an indication about the satisfacation from the Comedy Show on a scale of 1 to 6, 6 being the highest NA means restuarant do not offer any comedy gigs

Value Deals Rating:	This variable gives an indication about the satisfacation from the Value Deals on a scale of 1 to 7, 7 being the highest NA means restuarant do not offer any value deals

Live Sports Rating:	This variable gives an indication about the satisfacation from the Live screening of Sports on a scale of 1 to 6, 6 being the highestNA means restuarant do not have live screening

Ambience:	This variable gives us an indication about the ambience feel level rating of the restuarant on a scale of 0 to 10, 10 being the highest

Lively:	This variable rates the lively atmosphere of the restaurant on a scale of 1 to 10, 10 being the highest

Service:	This variable gives us an indication about the service satisfaction level rating of the restuarant. Here Rating of 10 means highly Satisfied from the service and 0 means otherwise

Comfortablility:	This variable gives us an indication about the comfort level rating of the restuarent on a scale of 0 to 10, 10 being the highest

Privacy:	This variable gives us an indication about the privacy level of the restuarant on a scale of 0 to 10, 10 being the highest


# Import Libraries

In [1]:
import numpy as np   
from sklearn.linear_model import LinearRegression
import pandas as pd    
import matplotlib.pyplot as plt 
%matplotlib inline 
import seaborn as sns
from sklearn.model_selection import train_test_split # Sklearn package's randomized data splitting function

# Loading the Training Data and using the Training Data to build the Model

### Import the necessary packages - pandas, numpy, seaborn, matplotlib.pyplot

In [2]:
df_train = pd.read_csv('Train_dataset.csv')
df_train.head()

Unnamed: 0,Registration Number,Annual Turnover,Cuisine,City,Restaurant Location,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,...,Overall Restaurant Rating,Live Music Rating,Comedy Gigs Rating,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy
0,60001,42000000,"indian,irish",Bangalore,Near Business Hub,14-02-2009,84.3,Not Specific,95.8,1,...,10.0,4.0,,,,8.0,8,6,6,6
1,60002,50000000,"indian,irish",Indore,Near Party Hub,29-09-2008,85.4,Tier A Celebrity,85.0,1,...,9.0,,4.0,,,5.0,7,7,3,8
2,60003,32500000,"tibetan,italian",Chennai,Near Business Hub,30-07-2011,85.0,Tier A Celebrity,68.2,1,...,8.0,3.0,,,,7.0,10,5,2,8
3,60004,110000000,"turkish,nigerian",Gurgaon,Near Party Hub,30-11-2008,85.6,Tier A Celebrity,83.6,0,...,9.0,6.0,,,,7.0,7,4,3,5
4,60005,20000000,"irish,belgian",Manesar,Near Party Hub,22-02-2010,,Tier A Celebrity,76.8,1,...,6.0,,2.0,,,,6,2,4,6


In [3]:
# let's create a copy of the data to avoid any changes to original data
df1 = df_train.copy()

In [4]:
## Checking for the null values

df1.isnull().sum()

Registration Number                 0
Annual Turnover                     0
Cuisine                             0
City                                0
Restaurant Location                 0
Opening Day of Restaurant           0
Facebook Popularity Quotient       99
Endorsed By                         0
Instagram Popularity Quotient      56
Fire Audit                          0
Liquor License Obtained             0
Situated in a Multi Complex         0
Dedicated Parking                   0
Open Sitting Available              0
Resturant Tier                     49
Restaurant Type                     0
Restaurant Theme                    0
Restaurant Zomato Rating            0
Restaurant City Tier                0
Order Wait Time                     0
Staff Responsivness                 0
Value for Money                     0
Hygiene Rating                      0
Food Rating                         0
Overall Restaurant Rating         212
Live Music Rating                 765
Comedy Gigs 

In [5]:
# checking column datatypes and number of non-null values
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3493 entries, 0 to 3492
Data columns (total 34 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Registration Number            3493 non-null   int64  
 1   Annual Turnover                3493 non-null   int64  
 2   Cuisine                        3493 non-null   object 
 3   City                           3493 non-null   object 
 4   Restaurant Location            3493 non-null   object 
 5   Opening Day of Restaurant      3493 non-null   object 
 6   Facebook Popularity Quotient   3394 non-null   float64
 7   Endorsed By                    3493 non-null   object 
 8   Instagram Popularity Quotient  3437 non-null   float64
 9   Fire Audit                     3493 non-null   int64  
 10  Liquor License Obtained        3493 non-null   int64  
 11  Situated in a Multi Complex    3493 non-null   int64  
 12  Dedicated Parking              3493 non-null   i

In [6]:
# checking for duplicate values
df1.duplicated().sum()

0

## Data Preprocessing

### Processing columns

In [7]:
# viewing the column values
df1["Cuisine"].head(10)

0           indian,irish
1           indian,irish
2        tibetan,italian
3       turkish,nigerian
4          irish,belgian
5          tibetan,greek
6           indian,irish
7           indian,irish
8    chinese,salvadorian
9       algerian,belgian
Name: Cuisine, dtype: object

In [8]:
df1_cuisine = df1["Cuisine"].str.split(",", expand=True)
df1_cuisine.head(15)

Unnamed: 0,0,1
0,indian,irish
1,indian,irish
2,tibetan,italian
3,turkish,nigerian
4,irish,belgian
5,tibetan,greek
6,indian,irish
7,indian,irish
8,chinese,salvadorian
9,algerian,belgian


In [9]:
# we will create two new columns for mileage values and units

df1["df_col1"] = df1_cuisine[0]
df1["df_col2"] = df1_cuisine[1]


# Checking the new dataframe
df1.head()

Unnamed: 0,Registration Number,Annual Turnover,Cuisine,City,Restaurant Location,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,...,Comedy Gigs Rating,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy,df_col1,df_col2
0,60001,42000000,"indian,irish",Bangalore,Near Business Hub,14-02-2009,84.3,Not Specific,95.8,1,...,,,,8.0,8,6,6,6,indian,irish
1,60002,50000000,"indian,irish",Indore,Near Party Hub,29-09-2008,85.4,Tier A Celebrity,85.0,1,...,4.0,,,5.0,7,7,3,8,indian,irish
2,60003,32500000,"tibetan,italian",Chennai,Near Business Hub,30-07-2011,85.0,Tier A Celebrity,68.2,1,...,,,,7.0,10,5,2,8,tibetan,italian
3,60004,110000000,"turkish,nigerian",Gurgaon,Near Party Hub,30-11-2008,85.6,Tier A Celebrity,83.6,0,...,,,,7.0,7,4,3,5,turkish,nigerian
4,60005,20000000,"irish,belgian",Manesar,Near Party Hub,22-02-2010,,Tier A Celebrity,76.8,1,...,2.0,,,,6,2,4,6,irish,belgian


In [10]:
df2 = pd.concat([df1["df_col1"], df1["df_col2"] ], axis=1)

df2.head()

Unnamed: 0,df_col1,df_col2
0,indian,irish
1,indian,irish
2,tibetan,italian
3,turkish,nigerian
4,irish,belgian


In [11]:
df1["cuisine_new"] = df1['df_col1'].append(df1['df_col2']).reset_index(drop=True)
df1.head()

Unnamed: 0,Registration Number,Annual Turnover,Cuisine,City,Restaurant Location,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,...,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy,df_col1,df_col2,cuisine_new
0,60001,42000000,"indian,irish",Bangalore,Near Business Hub,14-02-2009,84.3,Not Specific,95.8,1,...,,,8.0,8,6,6,6,indian,irish,indian
1,60002,50000000,"indian,irish",Indore,Near Party Hub,29-09-2008,85.4,Tier A Celebrity,85.0,1,...,,,5.0,7,7,3,8,indian,irish,indian
2,60003,32500000,"tibetan,italian",Chennai,Near Business Hub,30-07-2011,85.0,Tier A Celebrity,68.2,1,...,,,7.0,10,5,2,8,tibetan,italian,tibetan
3,60004,110000000,"turkish,nigerian",Gurgaon,Near Party Hub,30-11-2008,85.6,Tier A Celebrity,83.6,0,...,,,7.0,7,4,3,5,turkish,nigerian,turkish
4,60005,20000000,"irish,belgian",Manesar,Near Party Hub,22-02-2010,,Tier A Celebrity,76.8,1,...,,,,6,2,4,6,irish,belgian,irish


In [12]:
df1.drop(columns=["Cuisine", "df_col1", "df_col2"], inplace=True)
df1.head()

Unnamed: 0,Registration Number,Annual Turnover,City,Restaurant Location,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,Liquor License Obtained,...,Live Music Rating,Comedy Gigs Rating,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy,cuisine_new
0,60001,42000000,Bangalore,Near Business Hub,14-02-2009,84.3,Not Specific,95.8,1,1,...,4.0,,,,8.0,8,6,6,6,indian
1,60002,50000000,Indore,Near Party Hub,29-09-2008,85.4,Tier A Celebrity,85.0,1,1,...,,4.0,,,5.0,7,7,3,8,indian
2,60003,32500000,Chennai,Near Business Hub,30-07-2011,85.0,Tier A Celebrity,68.2,1,1,...,3.0,,,,7.0,10,5,2,8,tibetan
3,60004,110000000,Gurgaon,Near Party Hub,30-11-2008,85.6,Tier A Celebrity,83.6,0,1,...,6.0,,,,7.0,7,4,3,5,turkish
4,60005,20000000,Manesar,Near Party Hub,22-02-2010,,Tier A Celebrity,76.8,1,1,...,,2.0,,,,6,2,4,6,irish


In [13]:
df1_near = df1["Restaurant Location"].str.split(" ", expand=True)
df1_near.head(15)

Unnamed: 0,0,1,2
0,Near,Business,Hub
1,Near,Party,Hub
2,Near,Business,Hub
3,Near,Party,Hub
4,Near,Party,Hub
5,Near,Party,Hub
6,Near,Party,Hub
7,Near,Party,Hub
8,Near,Party,Hub
9,Near,Party,Hub


In [14]:
df1["Restaurant_Location"] = df1_near[1]
df1.head()

Unnamed: 0,Registration Number,Annual Turnover,City,Restaurant Location,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,Liquor License Obtained,...,Comedy Gigs Rating,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy,cuisine_new,Restaurant_Location
0,60001,42000000,Bangalore,Near Business Hub,14-02-2009,84.3,Not Specific,95.8,1,1,...,,,,8.0,8,6,6,6,indian,Business
1,60002,50000000,Indore,Near Party Hub,29-09-2008,85.4,Tier A Celebrity,85.0,1,1,...,4.0,,,5.0,7,7,3,8,indian,Party
2,60003,32500000,Chennai,Near Business Hub,30-07-2011,85.0,Tier A Celebrity,68.2,1,1,...,,,,7.0,10,5,2,8,tibetan,Business
3,60004,110000000,Gurgaon,Near Party Hub,30-11-2008,85.6,Tier A Celebrity,83.6,0,1,...,,,,7.0,7,4,3,5,turkish,Party
4,60005,20000000,Manesar,Near Party Hub,22-02-2010,,Tier A Celebrity,76.8,1,1,...,2.0,,,,6,2,4,6,irish,Party


In [15]:
df1.drop(columns=["Restaurant Location"], inplace=True)
df1.head()

Unnamed: 0,Registration Number,Annual Turnover,City,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,Liquor License Obtained,Situated in a Multi Complex,...,Comedy Gigs Rating,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy,cuisine_new,Restaurant_Location
0,60001,42000000,Bangalore,14-02-2009,84.3,Not Specific,95.8,1,1,1,...,,,,8.0,8,6,6,6,indian,Business
1,60002,50000000,Indore,29-09-2008,85.4,Tier A Celebrity,85.0,1,1,1,...,4.0,,,5.0,7,7,3,8,indian,Party
2,60003,32500000,Chennai,30-07-2011,85.0,Tier A Celebrity,68.2,1,1,1,...,,,,7.0,10,5,2,8,tibetan,Business
3,60004,110000000,Gurgaon,30-11-2008,85.6,Tier A Celebrity,83.6,0,1,1,...,,,,7.0,7,4,3,5,turkish,Party
4,60005,20000000,Manesar,22-02-2010,,Tier A Celebrity,76.8,1,1,1,...,2.0,,,,6,2,4,6,irish,Party


In [16]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3493 entries, 0 to 3492
Data columns (total 34 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Registration Number            3493 non-null   int64  
 1   Annual Turnover                3493 non-null   int64  
 2   City                           3493 non-null   object 
 3   Opening Day of Restaurant      3493 non-null   object 
 4   Facebook Popularity Quotient   3394 non-null   float64
 5   Endorsed By                    3493 non-null   object 
 6   Instagram Popularity Quotient  3437 non-null   float64
 7   Fire Audit                     3493 non-null   int64  
 8   Liquor License Obtained        3493 non-null   int64  
 9   Situated in a Multi Complex    3493 non-null   int64  
 10  Dedicated Parking              3493 non-null   int64  
 11  Open Sitting Available         3493 non-null   int64  
 12  Resturant Tier                 3444 non-null   f

## converting Restaurant Opened to datetime and adding in year of opening

In [17]:
df1['Opened'] = pd.to_datetime(df1['Opening Day of Restaurant'])
df1['Opened year'] = df1['Opened'].dt.year  # adding in a feature that's just the year
print(min(df1['Opened']), max(df1['Opened']))
df1['Opened'].head()

1996-10-25 00:00:00 2016-05-22 00:00:00


0   2009-02-14
1   2008-09-29
2   2011-07-30
3   2008-11-30
4   2010-02-22
Name: Opened, dtype: datetime64[ns]

In [18]:
# investigating the players with this earliest Joined date
df1[df1['Opened'] == min(df1['Opened'])]

Unnamed: 0,Registration Number,Annual Turnover,City,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,Liquor License Obtained,Situated in a Multi Complex,...,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy,cuisine_new,Restaurant_Location,Opened,Opened year
59,60060,12000000,Bangalore,25-10-1996,72.0,Tier A Celebrity,49.0,1,1,1,...,3.0,6.0,6,3,3,6,tibetan,Party,1996-10-25,1996


In [19]:
df1["Restaurant Type"]
df1= pd.get_dummies(df1, columns=['Restaurant Type'])
df1.head()


Unnamed: 0,Registration Number,Annual Turnover,City,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,Liquor License Obtained,Situated in a Multi Complex,...,Comfortablility,Privacy,cuisine_new,Restaurant_Location,Opened,Opened year,Restaurant Type_Bar,Restaurant Type_Buffet/Family Restaurant,Restaurant Type_Caffee,Restaurant Type_Gastro Bar
0,60001,42000000,Bangalore,14-02-2009,84.3,Not Specific,95.8,1,1,1,...,6,6,indian,Business,2009-02-14,2009,1,0,0,0
1,60002,50000000,Indore,29-09-2008,85.4,Tier A Celebrity,85.0,1,1,1,...,3,8,indian,Party,2008-09-29,2008,1,0,0,0
2,60003,32500000,Chennai,30-07-2011,85.0,Tier A Celebrity,68.2,1,1,1,...,2,8,tibetan,Business,2011-07-30,2011,1,0,0,0
3,60004,110000000,Gurgaon,30-11-2008,85.6,Tier A Celebrity,83.6,0,1,1,...,3,5,turkish,Party,2008-11-30,2008,1,0,0,0
4,60005,20000000,Manesar,22-02-2010,,Tier A Celebrity,76.8,1,1,1,...,4,6,irish,Party,2010-02-22,2010,1,0,0,0


In [20]:
df1= pd.get_dummies(df1, columns=['Restaurant Theme'])
df1.head()

Unnamed: 0,Registration Number,Annual Turnover,City,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,Liquor License Obtained,Situated in a Multi Complex,...,Restaurant Theme_Parsi,Restaurant Theme_Petit,Restaurant Theme_Picante,Restaurant Theme_Piquant,Restaurant Theme_Resca,Restaurant Theme_Sage,Restaurant Theme_Savory,Restaurant Theme_Spoon,Restaurant Theme_TheGem,Restaurant Theme_Umami
0,60001,42000000,Bangalore,14-02-2009,84.3,Not Specific,95.8,1,1,1,...,0,0,0,0,0,0,0,0,0,0
1,60002,50000000,Indore,29-09-2008,85.4,Tier A Celebrity,85.0,1,1,1,...,0,0,0,0,0,0,0,0,0,0
2,60003,32500000,Chennai,30-07-2011,85.0,Tier A Celebrity,68.2,1,1,1,...,0,0,0,0,0,0,0,0,0,0
3,60004,110000000,Gurgaon,30-11-2008,85.6,Tier A Celebrity,83.6,0,1,1,...,0,0,0,0,0,0,0,0,0,0
4,60005,20000000,Manesar,22-02-2010,,Tier A Celebrity,76.8,1,1,1,...,0,0,0,0,0,0,0,0,0,0


In [21]:
## Importing Simple Imputer to treat the null values

from sklearn.impute import SimpleImputer

In [22]:
## Defning the Simple Imputer funtion to use 'mean' as a strategy of imputation

SI = SimpleImputer(strategy='mean')

In [23]:
## Fitting the Simple Imputer function to get the mean of the variable

SI.fit(df_train[['Instagram Popularity Quotient','Comedy Gigs Rating','Facebook Popularity Quotient','Live Sports Rating','Value Deals Rating','Live Music Rating','Overall Restaurant Rating','Resturant Tier','Ambience']])

SimpleImputer()

In [24]:
## Transforming the variable --> Imputing the variable with its mean

IPQ = SI.transform(df_train[['Instagram Popularity Quotient','Comedy Gigs Rating','Facebook Popularity Quotient','Live Sports Rating','Value Deals Rating','Live Music Rating','Overall Restaurant Rating','Resturant Tier','Ambience']])

IPQ

array([[95.8       ,  2.93267327, 84.3       , ..., 10.        ,
         2.        ,  8.        ],
       [85.        ,  4.        , 85.4       , ...,  9.        ,
         2.        ,  5.        ],
       [68.2       ,  2.93267327, 85.        , ...,  8.        ,
         2.        ,  7.        ],
       ...,
       [86.8       ,  2.93267327, 80.83      , ...,  8.        ,
         2.        ,  6.        ],
       [86.        ,  2.        , 79.4       , ...,  7.        ,
         2.        ,  7.        ],
       [67.        ,  2.93267327, 72.        , ...,  8.479427  ,
         2.        ,  6.        ]])

In [25]:
## Declaring the Linear Regression function

LR = LinearRegression()

In [26]:
## Fitting the Linear Regression function

model = LR.fit(IPQ,df_train['Annual Turnover'])

In [27]:
## Checking the score of the function on the training data

model.score(IPQ,df_train['Annual Turnover'])

0.0766764296219089

In [28]:
## Checking the score of the function on the training data

model.score(IPQ,df_train['Annual Turnover'])

0.0766764296219089

# Loading the Test Data and using the Test Data to Predict the target Variable

In [29]:
df_test = pd.read_csv('Test_dataset.csv')
df_test.head()

Unnamed: 0,Registration Number,Cuisine,City,Restaurant Location,Opening Day of Restaurant,Facebook Popularity Quotient,Endoresed By,Instagram Popularity Quotient,Fire Audit,Liquor License Obtained,...,Overall Restaurant Rating,Live Music Rating,Comedy Gigs Rating,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy
0,20001,"tibetan,italian",Bangalore,Near Business Hub,13-07-2010,78.0,Tier A Celebrity,69.0,1,1,...,6.0,,2.0,,,5,2,2,6,0
1,20002,"tibetan,italian",Hyderabad,Near Party Hub,05-09-2011,89.17,Not Specific,96.0,1,1,...,9.0,3.0,,,6.0,7,8,5,1,8
2,20003,"algerian,belgian",Hyderabad,Near Party Hub,12-04-2011,84.0,Not Specific,86.0,1,1,...,6.0,3.0,,,,9,7,5,1,5
3,20004,"tibetan,greek",-1,Near Party Hub,16-01-2005,79.38,Not Specific,74.4,0,1,...,9.0,6.0,,,,6,7,6,4,7
4,20005,"cuban,british",Pune,Near Party Hub,10-11-2008,84.67,Not Specific,86.46,1,1,...,,,,,,4,4,7,7,3


In [30]:
## Checking for the null values

df_test.isnull().sum()

Registration Number                0
Cuisine                            0
City                               0
Restaurant Location                0
Opening Day of Restaurant          0
Facebook Popularity Quotient       0
Endoresed By                       0
Instagram Popularity Quotient      0
Fire Audit                         0
Liquor License Obtained            0
Situated in a Multi Complex        0
Dedicated Parking                  0
Open Sitting Available             0
Resturant Tier                     0
Restaurant Type                    0
Restaurant Theme                   0
Restaurant Zomato Rating           0
Restaurant City Tier               0
Order Wait Time                    0
Staff Responsivness                0
Value for Money                    0
Hygiene Rating                     0
Food Rating                        0
Overall Restaurant Rating         34
Live Music Rating                102
Comedy Gigs Rating               370
Value Deals Rating               385
L

# Saving the output in a data frame and then exporting it to a '.csv' file with the appropriate 'Registration Number'.

In [31]:
solution_df = pd.DataFrame(df_test['Registration Number'])
solution_df

Unnamed: 0,Registration Number
0,20001
1,20002
2,20003
3,20004
4,20005
...,...
495,20496
496,20497
497,20498
498,20499


In [32]:
solution_df['Annual Turnover'] = prediction
solution_df

NameError: name 'prediction' is not defined

In [None]:
## Setting the directory to export the file as a '.csv'

import os
os.chdir('$Home/Desktop/texas/Classification/Hackathon')

In [None]:
## Exporting the data frame to a '.csv' file and setting the index = False as we do want the index

solution_df.to_csv('Submission.csv',index=False)

In [None]:
mkdir()

# END