# Come Visit Again
## Team Members : 
* Abhishek Tandon
* Chandrakant Sahu

# Project Overview

### Discription
TAs have collected data from Chwiggy & Yomato (restaurant sales service & review platforms)

### Evaluation
TAs are challenging us to predict the expected number of visitors to a given restaurant on given date to help the restaurants be better equipped and prepared.


### Data Description
In this competition, we are provided a time-series forecasting problem centered around restaurant visitors.<br>
The data comes from two separate sites:<br>
 -> Yomato(yom) - which uses high performance grouping (hpg) to collect and store restaurant data.<br>
 -> Chwiggy(chw) - which uses all index restaurant (air) to collect and store the restaurant data.<p>
We have use the reservations, visits, and other information from these sites to forecast future restaurant
visitor totals on a given date. The training data covers the dates from 2016 until early (first week) April 2017.
The test set covers the mid weeks (second and third weeks) of April 2017. The training and testing set both
omit days where the restaurants were closed.

<h2>File Description:</h2>
<p>This is a relational dataset from two systems. Each file is prefaced with the source (either air_ or hpg_) to indicate its origin. Each restaurant has a unique <code>chw_store_id</code> and <code>yom_store_id</code>. Note that not all restaurants are covered by both systems, and that you have been provided data beyond the restaurants for which you must forecast.</p>
<h2><strong>train.csv</strong></h2>
<p>This file contains historical visit data for the chwiggy restaurants.</p>
<ol>
<li>chw_store_id - store id with air prefix.</li>
<li>visit_date - the date</li>
<li>visitors - the number of visitors to the restaurant on the date</li>
</ol>
<h2><strong>sample_submission.csv</strong></h2>
<p>This file shows a submission in the correct format, including the days for which you must forecast.</p>
<ol>
<li>id - the id is formed by concatenating the chw_store_id and visit_date with an underscore.</li>
<li>visitors- the number of visitors forecasted for the store and date combination.</li>
</ol>

In [27]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [28]:
pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 50)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
sns.set(color_codes=True)

# Chwiggy CSV Analysis

__chw_reserve__

Column Name | Column Description
-------------|:------------------
chw_store_id | restaurant id in the Chwiggy system
visit_datetime| time of the reservation
reserve_datetime | time the reservation was made
reserve_visitors | number of visitors for that reservation

In [29]:
chw_reserve = pd.read_csv('Data/come-visit-again-iiitb/chw_reserve.csv')

In [31]:
chw_reserve.head()

Unnamed: 0,chw_store_id,visit_datetime,reserve_datetime,reserve_visitors
0,air_877f79706adbfb06,1/1/2016 19:00,1/1/2016 16:00,1
1,air_db4b38ebe7a7ceff,1/1/2016 19:00,1/1/2016 19:00,3
2,air_db4b38ebe7a7ceff,1/1/2016 19:00,1/1/2016 19:00,6
3,air_877f79706adbfb06,1/1/2016 20:00,1/1/2016 16:00,2
4,air_db80363d35f10926,1/1/2016 20:00,1/1/2016 1:00,5


In [32]:
from dateutil.parser import parse
print(parse(chw_reserve['visit_datetime'][4]))

2016-01-01 20:00:00


In [33]:
#parsing date time so that date time of every table/relation matches
chw_reserve['visit_datetime'] = [ parse(i) for i in chw_reserve['visit_datetime']]

In [34]:
chw_reserve.head()

Unnamed: 0,chw_store_id,visit_datetime,reserve_datetime,reserve_visitors
0,air_877f79706adbfb06,2016-01-01 19:00:00,1/1/2016 16:00,1
1,air_db4b38ebe7a7ceff,2016-01-01 19:00:00,1/1/2016 19:00,3
2,air_db4b38ebe7a7ceff,2016-01-01 19:00:00,1/1/2016 19:00,6
3,air_877f79706adbfb06,2016-01-01 20:00:00,1/1/2016 16:00,2
4,air_db80363d35f10926,2016-01-01 20:00:00,1/1/2016 1:00,5


In [35]:
chw_reserve['reserve_datetime'] = [ parse(i) for i in chw_reserve['reserve_datetime']]
chw_reserve.head()

Unnamed: 0,chw_store_id,visit_datetime,reserve_datetime,reserve_visitors
0,air_877f79706adbfb06,2016-01-01 19:00:00,2016-01-01 16:00:00,1
1,air_db4b38ebe7a7ceff,2016-01-01 19:00:00,2016-01-01 19:00:00,3
2,air_db4b38ebe7a7ceff,2016-01-01 19:00:00,2016-01-01 19:00:00,6
3,air_877f79706adbfb06,2016-01-01 20:00:00,2016-01-01 16:00:00,2
4,air_db80363d35f10926,2016-01-01 20:00:00,2016-01-01 01:00:00,5


In [36]:
chw_reserve.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 92378 entries, 0 to 92377
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   chw_store_id      92378 non-null  object        
 1   visit_datetime    92378 non-null  datetime64[ns]
 2   reserve_datetime  92378 non-null  datetime64[ns]
 3   reserve_visitors  92378 non-null  int64         
dtypes: datetime64[ns](2), int64(1), object(1)
memory usage: 2.8+ MB


* We have total 92378 rows in the table with 4 columns. 
* Visit Dates and Reserve Dates are currently object data types, which needs to be converted to __data time format__. 

In [37]:
chw_reserve.describe()

Unnamed: 0,reserve_visitors
count,92378.0
mean,4.48
std,4.92
min,1.0
25%,2.0
50%,3.0
75%,5.0
max,100.0


* We have atleast 1 visitor for all rows. And it's obious because 0 reservation dosen't make any sence. 
* Maximum visitors count is 100 which is much higher than the mean visitors count. 
* We have data from 2016-01-01 19:00:00 and end date is 2017-05-31 21:00:00

In [39]:
#let's check if there is any null value in chw_reserve
chw_reserve.isnull().values.any()

False

There are __no null value__ in __chw_reserve__

In [44]:
print( "Number of unique restaunts in chwigge - ", len(chw_reserve['chw_store_id'].unique()))

Number of unique restaunts in chwigge -  314


In [45]:
chw_reserve["visit_year"] = pd.DatetimeIndex(chw_reserve['visit_datetime']).year
chw_reserve["visit_month"] = pd.DatetimeIndex(chw_reserve['visit_datetime']).month
chw_reserve["visit_weekday"] = pd.DatetimeIndex(chw_reserve['visit_datetime']).weekday
chw_reserve["visit_date"] = pd.DatetimeIndex(chw_reserve['visit_datetime']).date
chw_reserve.head()

Unnamed: 0,chw_store_id,visit_datetime,reserve_datetime,reserve_visitors,visit_year,visit_month,visit_weekday,visit_date
0,air_877f79706adbfb06,2016-01-01 19:00:00,2016-01-01 16:00:00,1,2016,1,4,2016-01-01
1,air_db4b38ebe7a7ceff,2016-01-01 19:00:00,2016-01-01 19:00:00,3,2016,1,4,2016-01-01
2,air_db4b38ebe7a7ceff,2016-01-01 19:00:00,2016-01-01 19:00:00,6,2016,1,4,2016-01-01
3,air_877f79706adbfb06,2016-01-01 20:00:00,2016-01-01 16:00:00,2,2016,1,4,2016-01-01
4,air_db80363d35f10926,2016-01-01 20:00:00,2016-01-01 01:00:00,5,2016,1,4,2016-01-01
