# Project: Restaurant Revenue Prediction

# Business Problem

Motto: "To Predict Annual Restaurant Sales Based on Objective Measurements"

TFI is the company behind some of the world's most well-known brands: Burger King, Sbarro, Popeyes, Usta Donerci, and Arby’s. They employ over 20,000 people in Europe and Asia and make significant daily investments in developing new restaurant sites. 

Deciding when and where to open new restaurants is crucial for the Business. This subjective data is difficult to accurately extrapolate across geographies and cultures. 

New restaurant sites take large investments of time and capital to get up and running. When the wrong location for a restaurant brand is chosen, the site closes within 18 months and operating losses are incurred. 

Finding a mathematical model to increase the effectiveness of investments in new restaurant sites would allow TFI to invest more in other important business areas, like sustainability, innovation, and training for new employees. Using demographic, real estate, and commercial data, this competition challenges to predict the annual restaurant sales of 100,000 regional locations.

# <img src="food.JPG" width="800" height = "500">

The dataset for this project can be found on the [Kaggle](https://www.kaggle.com/c/restaurant-revenue-prediction/data). This is a Kaggle Competition Sponsored by the TFI(Tab Food Investment).



TFI has provided a dataset with 137 restaurants in the training set, and a test set of 100000 restaurants. The data columns include the open date, location, city type, and three categories of obfuscated data: Demographic data, Real estate data, and Commercial data. The revenue column indicates a (transformed) revenue of the restaurant in a given year and is the target of Predictive Analysis. 

# Exploratory Data Analysis of Restaurant Revenue Prediction

In [40]:
# Import the necessary libraries for Data Manipulation and visual representation
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

In [11]:
# Loading the training data 
df_train = pd.read_csv("Restaurant_Train.csv")

In [8]:
df_train.head()

Unnamed: 0,Id,Open Date,City,City Group,Type,P1,P2,P3,P4,P5,...,P29,P30,P31,P32,P33,P34,P35,P36,P37,revenue
0,0,07/17/1999,İstanbul,Big Cities,IL,4,5.0,4.0,4.0,2,...,3.0,5,3,4,5,5,4,3,4,5653753.0
1,1,02/14/2008,Ankara,Big Cities,FC,4,5.0,4.0,4.0,1,...,3.0,0,0,0,0,0,0,0,0,6923131.0
2,2,03/09/2013,Diyarbakır,Other,IL,2,4.0,2.0,5.0,2,...,3.0,0,0,0,0,0,0,0,0,2055379.0
3,3,02/02/2012,Tokat,Other,IL,6,4.5,6.0,6.0,4,...,7.5,25,12,10,6,18,12,12,6,2675511.0
4,4,05/09/2009,Gaziantep,Other,IL,3,4.0,3.0,4.0,2,...,3.0,5,1,3,2,3,4,3,3,4316715.0


In [23]:
# Removing the Target Column from the Training set

df_train = df_train.iloc[:, :-1]

In [36]:
# Columns in the training data
df_train.columns

Index(['Id', 'Open Date', 'City', 'City Group', 'Type', 'P1', 'P2', 'P3', 'P4',
       'P5', 'P6', 'P7', 'P8', 'P9', 'P10', 'P11', 'P12', 'P13', 'P14', 'P15',
       'P16', 'P17', 'P18', 'P19', 'P20', 'P21', 'P22', 'P23', 'P24', 'P25',
       'P26', 'P27', 'P28', 'P29', 'P30', 'P31', 'P32', 'P33', 'P34', 'P35',
       'P36', 'P37'],
      dtype='object')

# The Data Set Feature/Attribute Information:

 1. id         : Restaurant id 
 2. Open Date  : Opening date for a restaurant (Date)
 3. City       : City that the restaurant is in. (Nominal)
 4. City Group : Type of the city. Big cities, or Other. (Ordinal)
 5. Type       : Type of the restaurant. FC: Food Court, IL: Inline, DT: Drive Thru, MB: Mobile (Nominal)
 6. P1-P37     : Hidden data with three categories, Demographics, Real-estate, Commercial (Discrete)
 7. Revenue    : Revenue of the restaurant in a given year (Target Variable) (Continuous) 

In [24]:
# There are 137 and 43 rows in the restaurant training dataset
df_train.shape

(137, 42)

# Restaurant Categories Exploration

Observe a statistical description of the dataset, check the relevance of each feature, and select a few sample data points from the dataset which we can track throughout the project.

In [25]:
# Display a description of the dataset

display(df_train.describe())

Unnamed: 0,Id,P1,P2,P3,P4,P5,P6,P7,P8,P9,...,P28,P29,P30,P31,P32,P33,P34,P35,P36,P37
count,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,...,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0
mean,68.0,4.014599,4.408759,4.317518,4.372263,2.007299,3.357664,5.423358,5.153285,5.445255,...,3.222628,3.135036,2.729927,1.941606,2.525547,1.138686,2.489051,2.029197,2.211679,1.116788
std,39.692569,2.910391,1.5149,1.032337,1.016462,1.20962,2.134235,2.296809,1.858567,1.834793,...,2.308806,1.680887,5.536647,3.512093,5.230117,1.69854,5.165093,3.436272,4.168211,1.790768
min,0.0,1.0,1.0,0.0,3.0,1.0,1.0,1.0,1.0,4.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,34.0,2.0,4.0,4.0,4.0,1.0,2.0,5.0,4.0,4.0,...,2.0,2.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,68.0,3.0,5.0,4.0,4.0,2.0,3.0,5.0,5.0,5.0,...,2.5,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,102.0,4.0,5.0,5.0,5.0,2.0,4.0,5.0,5.0,5.0,...,4.0,3.0,4.0,3.0,3.0,2.0,3.0,4.0,3.0,2.0
max,136.0,12.0,7.5,7.5,7.5,8.0,10.0,10.0,10.0,10.0,...,12.5,7.5,25.0,15.0,25.0,6.0,24.0,15.0,20.0,8.0


In [38]:
now= datetime.datetime.now()
df_train['Open Date']= pd.to_datetime(df_train['Open Date'], 
                    format= '%m/%d/%Y')
df_train['years_old']= now.year - pd.DatetimeIndex(
                              df_train['Open Date']).year
df_train

Unnamed: 0,Id,Open Date,City,City Group,Type,P1,P2,P3,P4,P5,...,P29,P30,P31,P32,P33,P34,P35,P36,P37,years_old
0,0,1999-07-17,İstanbul,Big Cities,IL,4,5.0,4.0,4.0,2,...,3.0,5,3,4,5,5,4,3,4,20
1,1,2008-02-14,Ankara,Big Cities,FC,4,5.0,4.0,4.0,1,...,3.0,0,0,0,0,0,0,0,0,11
2,2,2013-03-09,Diyarbakır,Other,IL,2,4.0,2.0,5.0,2,...,3.0,0,0,0,0,0,0,0,0,6
3,3,2012-02-02,Tokat,Other,IL,6,4.5,6.0,6.0,4,...,7.5,25,12,10,6,18,12,12,6,7
4,4,2009-05-09,Gaziantep,Other,IL,3,4.0,3.0,4.0,2,...,3.0,5,1,3,2,3,4,3,3,10
5,5,2010-02-12,Ankara,Big Cities,FC,6,6.0,4.5,7.5,8,...,5.0,0,0,0,0,0,0,0,0,9
6,6,2010-10-11,İstanbul,Big Cities,IL,2,3.0,4.0,4.0,1,...,3.0,4,5,2,2,3,5,4,4,9
7,7,2011-06-21,İstanbul,Big Cities,IL,4,5.0,4.0,5.0,2,...,2.0,0,0,0,0,0,0,0,0,8
8,8,2010-08-28,Afyonkarahisar,Other,IL,1,1.0,4.0,4.0,1,...,3.0,4,5,5,3,4,5,4,5,9
9,9,2011-11-16,Edirne,Other,IL,6,4.5,6.0,7.5,6,...,2.5,0,0,0,0,0,0,0,0,8
