# Predicting Restaurant Annual Turnover in India

The first thing any visitor to India will take in — probably while staring out the window in awe as their aeroplane descends — is the sheer size of this country. It is densely populated and patch-worked with distinct neighbourhoods, each with its own culinary identity. It would take several lifetimes to get to know all of the street stands, holes in the wall, neighbourhood favourites, and high-end destinations in this city.

## The Joy of Dining Out in India

For Indians, dining out is and always will be a joyous occasion. Everyone has their own favourite restaurants in the city, starting from the street food stall across the street to the 5-star restaurants in the heart of the city. Some are favourites because of the memories attached to them, and some are favourites because of the fantastic ambience. There are many other factors that contribute to the popularity of these restaurants.

## The Business Perspective

From a business perspective, the popularity of a restaurant is crucial. Higher popularity often means more visits to the restaurant, which leads to increased annual turnover. For a restaurant to thrive and continue operating, it must maintain a substantial annual turnover.

## The Problem: Predicting Annual Turnover

This problem aims to predict the annual turnover of a set of restaurants across India based on various factors. The dataset includes several key variables such as:

- **Restaurant-specific data**: Location, opening date, cuisine type, themes, etc.
- **External data**: Social media popularity index, Zomato ratings, and other external metrics.
- **Customer insights**: Survey data from customers and ratings from mystery visitors (third-party audits).

By analyzing these variables, we can gain insights into the factors that influence a restaurant's success and predict its annual turnover.


In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.simplefilter("ignore")
#
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso,Ridge

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
path='/content/drive/MyDrive/Python Course'

In [4]:
# Read File
rest_df=pd.read_csv(f"{path}/Res_Train_dataset.csv")

In [5]:
rest_df.head()

Unnamed: 0,Registration Number,Annual Turnover,Cuisine,City,Restaurant Location,Opening Day of Restaurant,Facebook Popularity Quotient,Endorsed By,Instagram Popularity Quotient,Fire Audit,...,Overall Restaurant Rating,Live Music Rating,Comedy Gigs Rating,Value Deals Rating,Live Sports Rating,Ambience,Lively,Service,Comfortablility,Privacy
0,60001,42000000,"indian,irish",Bangalore,Near Business Hub,14/02/09,84.3,Not Specific,95.8,1,...,10.0,4.0,,,,8.0,8,6,6,6
1,60002,50000000,"indian,irish",Indore,Near Party Hub,29/09/08,85.4,Tier A Celebrity,85.0,1,...,9.0,,4.0,,,5.0,7,7,3,8
2,60003,32500000,"tibetan,italian",Chennai,Near Business Hub,30/07/11,85.0,Tier A Celebrity,68.2,1,...,8.0,3.0,,,,7.0,10,5,2,8
3,60004,110000000,"turkish,nigerian",Gurgaon,Near Party Hub,30/11/08,85.6,Tier A Celebrity,83.6,0,...,9.0,6.0,,,,7.0,7,4,3,5
4,60005,20000000,"irish,belgian",Manesar,Near Party Hub,22/02/10,,Tier A Celebrity,76.8,1,...,6.0,,2.0,,,,6,2,4,6


In [11]:
rest_df.shape

(3493, 34)

In [10]:
rest_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3493 entries, 0 to 3492
Data columns (total 34 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Registration Number            3493 non-null   int64  
 1   Annual Turnover                3493 non-null   int64  
 2   Cuisine                        3493 non-null   object 
 3   City                           3493 non-null   object 
 4   Restaurant Location            3493 non-null   object 
 5   Opening Day of Restaurant      3493 non-null   object 
 6   Facebook Popularity Quotient   3394 non-null   float64
 7   Endorsed By                    3493 non-null   object 
 8   Instagram Popularity Quotient  3437 non-null   float64
 9   Fire Audit                     3493 non-null   int64  
 10  Liquor License Obtained        3493 non-null   int64  
 11  Situated in a Multi Complex    3493 non-null   int64  
 12  Dedicated Parking              3493 non-null   i

In [7]:
rest_df.isnull().sum()

Unnamed: 0,0
Registration Number,0
Annual Turnover,0
Cuisine,0
City,0
Restaurant Location,0
Opening Day of Restaurant,0
Facebook Popularity Quotient,99
Endorsed By,0
Instagram Popularity Quotient,56
Fire Audit,0


In [8]:
rest_df.isna().sum()

Unnamed: 0,0
Registration Number,0
Annual Turnover,0
Cuisine,0
City,0
Restaurant Location,0
Opening Day of Restaurant,0
Facebook Popularity Quotient,99
Endorsed By,0
Instagram Popularity Quotient,56
Fire Audit,0
