<a href="https://colab.research.google.com/github/SarkarPriyanshu/CognifyzTechnologiesInterns/blob/main/Cognifyz_Technologies_Internship_Task_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [44]:
!pip install plotly --q
!pip install dash --q

# Restaurant Recommender

## Recommendation Filtering Techniques: Summary and Recommendations
1. **Collaborative Filtering**:
   - Collaborative filtering works by analyzing user-item interactions to identify patterns and similarities among users or items.
   - There are two main approaches:
     - **User-based**: Finds users who are similar to the target user based on their interactions with items. Recommendations are then made based on items liked by similar users.
     - **Item-based**: Identifies similar items based on their interactions with users. Recommendations are made by suggesting items similar to those previously liked by the user.
   - Collaborative filtering doesn't require detailed item information but relies on user-item interaction data.
   - **Advantages**: Doesn't require detailed item information; can capture complex user preferences.
   - **Disadvantages**: Cold-start problem for new users/items; sparsity of data can lead to inaccuracies.
   - **When to choose**: Recommended when you have sufficient user-item interaction data and need personalized recommendations.

2. **Content-Based Filtering**:
   - Content-based filtering recommends items based on their attributes or features, matching them with user preferences.
   - It analyzes the characteristics of items (e.g., text descriptions, metadata) and matches them with user profiles or preferences.
   - Recommendations are made by selecting items that have attributes similar to those liked or interacted with by the user in the past.
   - Content-based filtering doesn't require explicit user interactions but relies on the availability of detailed item information.
   - **Advantages**: Doesn't suffer from the cold-start problem; can recommend niche items.
   - **Disadvantages**: Limited to recommending items similar to past interactions; may not capture serendipitous recommendations.
   - **When to choose**: Suitable when you have detailed item information and want transparent, personalized recommendations.


3. **Demographic Filtering**:
   - Demographic filtering recommends items based on user demographic information such as age, gender, location, or other user characteristics.
   - It doesn't rely on explicit user-item interactions but rather on general preferences within demographic groups.
   - Recommendations are made by selecting items popular among users with similar demographic characteristics.
   - Demographic filtering provides generalized recommendations for broad user segments but may lack personalization.
   - **Advantages**: Can provide generalized recommendations for broad user segments.
   - **Disadvantages**: Oversimplified view of user preferences within demographic groups; limited personalization.
   - **When to choose**: Appropriate when user demographics significantly influence preferences or when targeting specific user segments.

4. **Knowledge-Based Filtering**:
   - Knowledge-based filtering recommends items based on explicit knowledge about user preferences or requirements.
   - It often involves asking users to provide explicit feedback or preferences during the registration or onboarding process.
   - Recommendations are made based on user-provided information such as favorite genres, brands, or attributes.
   - Knowledge-based filtering allows users to express preferences but requires user input and may have limited scalability.
   - **Advantages**: Allows users to express preferences; provides personalized recommendations.
   - **Disadvantages**: Requires user input, limited scalability depending on user knowledge depth.
   - **When to choose**: Suitable when users can provide explicit preferences and personalization is crucial.

5. **Popularity-Based Filtering**:
   - Popularity-based filtering recommends popular items based on overall ratings, reviews, or trends.
   - It doesn't require any user-specific data but provides recommendations based on items liked or purchased by many users.
   - Recommendations are made by suggesting items that are currently popular or highly rated.
   - Popularity-based filtering is simple to implement but may not consider individual preferences and may lead to filter bubbles.
   - **Advantages**: Simple to implement; effective for recommending popular items.
   - **Disadvantages**: Doesn't consider individual preferences; may lead to filter bubbles.
   - **When to choose**: Recommended for users open to trying popular choices or when seeking trending items without relying on individual preferences.


In [45]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [46]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from dash import Dash, dcc, html
from dash.dependencies import Input, Output

from sklearn.metrics.pairwise import linear_kernel

plt.style.use('ggplot')
pd.set_option('display.max_columns', None)

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [47]:
df = pd.read_csv('/content/drive/MyDrive/DataSets/Dataset .csv')

1. **Restaurant ID**: A unique identifier for each restaurant in the dataset.
2. **Restaurant Name**: The name or title of the restaurant.
3. **Country Code**: The country code where the restaurant is located.
4. **City**: The city where the restaurant is located.
5. **Address**: The street address or location of the restaurant.
6. **Locality**: A specific locality or neighborhood within the city where the restaurant is situated.
7. **Locality Verbose**: A more detailed description of the locality, providing additional information or context.
8. **Longitude**: The geographical longitude coordinate of the restaurant's location.
9. **Latitude**: The geographical latitude coordinate of the restaurant's location.
10. **Cuisines**: The type of cuisines offered by the restaurant (e.g., Italian, Indian, Chinese).
11. **Average Cost for two**: The average cost for two people to dine at the restaurant, typically in the local currency.
12. **Currency**: The currency used to denote the average cost for two and other monetary values.
13. **Has Table booking**: Indicates whether the restaurant accepts table bookings (Yes/No or True/False).
14. **Has Online delivery**: Indicates whether the restaurant offers online delivery services (Yes/No or True/False).
15. **Is delivering now**: Indicates whether the restaurant is currently delivering orders (Yes/No or True/False).
16. **Switch to order menu**: An option or link to switch to the restaurant's order menu, if available.
17. **Price range**: A categorical indicator of the price range for dining at the restaurant (e.g., $, $$, $$$).
18. **Aggregate rating**: The overall rating or score of the restaurant, typically based on user reviews or feedback.
19. **Rating color**: The color code representing the rating range (e.g., red for low ratings, green for high ratings).
20. **Rating text**: A textual representation of the rating (e.g., Excellent, Very Good, Average).
21. **Votes**: The number of votes or reviews contributing to the aggregate rating of the restaurant.


In [48]:
df.sample(2)

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,Average Cost for two,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
5710,303497,Shri Shyam Ji Shudh Shakahari Bhojnalaya,1,New Delhi,"Khaira Mod, Najafgarh, New Delhi",Najafgarh,"Najafgarh, New Delhi",76.974892,28.611254,North Indian,150,Indian Rupees(Rs.),No,No,No,No,1,0.0,White,Not rated,0
2569,948,Waves,1,New Delhi,"A-4, Sarvodaya Enclave, Adchini, New Delhi",Adchini,"Adchini, New Delhi",77.198808,28.538666,"North Indian, Chinese",1500,Indian Rupees(Rs.),Yes,Yes,No,No,3,3.5,Yellow,Good,141


In [49]:
df.shape

(9551, 21)

In [50]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9551 entries, 0 to 9550
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Restaurant ID         9551 non-null   int64  
 1   Restaurant Name       9551 non-null   object 
 2   Country Code          9551 non-null   int64  
 3   City                  9551 non-null   object 
 4   Address               9551 non-null   object 
 5   Locality              9551 non-null   object 
 6   Locality Verbose      9551 non-null   object 
 7   Longitude             9551 non-null   float64
 8   Latitude              9551 non-null   float64
 9   Cuisines              9542 non-null   object 
 10  Average Cost for two  9551 non-null   int64  
 11  Currency              9551 non-null   object 
 12  Has Table booking     9551 non-null   object 
 13  Has Online delivery   9551 non-null   object 
 14  Is delivering now     9551 non-null   object 
 15  Switch to order menu 

1 - India  
216 - USA  
215 - UK  
30 - BRAZIL  
214 - UAE  
189 - SOUTH AFRICA  
148 - New Zealand  
208 - TURKEY  
14 - AUSTRALIA  
162 - Philippines  
94 - Indonesia  
184 - Singapore  
166 - Qatar  
191 - Sri Lanka  
37 - CANADA


In [51]:
country_codes = {
    1: 'INDIA',
    216: 'USA',
    215: 'UK',
    30: 'BRAZIL',
    214: 'UAE',
    189: 'SOUTH AFRICA',
    148: 'NEW ZEWLAND',
    208: 'TURKEY',
    14: 'AUSTRALIA',
    162: 'PHILIPPENES',
    94: 'INDONESEA',
    184: 'SINGAPORE',
    166: 'QATAR',
    191: 'SRI LANKA',
    37: 'CANADA'
}


In [52]:
df['Country'] = df['Country Code']
df['Country'] = df['Country'].map(country_codes)

In [53]:
df.sample(3)

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,Average Cost for two,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes,Country
8452,309098,Popular Cakery,1,Noida,"BK-1, Near Sector 18, Metro Station, Sector 18...",Sector 18,"Sector 18, Noida",77.324998,28.569788,"Bakery, Desserts, Fast Food",500,Indian Rupees(Rs.),No,Yes,No,No,2,3.9,Yellow,Good,250,INDIA
9369,7600471,The Elephant House,215,Edinburgh,"21 George IV Bridge, Old Town, Edinburgh EH1 1EN",Old Town,"Old Town, Edinburgh",-3.191781,55.947556,Cafe,30,Pounds(��),No,No,No,No,3,3.9,Yellow,Good,81,UK
8564,18433854,Bunz & Dogz,1,Noida,"Inside Unitech Infospace, Sector 33, Noida",Sector 33,"Sector 33, Noida",77.354155,28.587913,American,300,Indian Rupees(Rs.),No,No,No,No,1,0.0,White,Not rated,1,INDIA


### 1) Demographic Filtering

They offer generalized recommendations to every user, based on movie popularity and/or genre. The System recommends the same movies to users with similar demographic features. Since each user is different , this approach is considered to be too simple. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience.

The formula for calculating the weighted rating (wr) for a restaurant:

weighted rating (wr) = (v / (v + m)) * R + (m / (v + m)) * C

Where:
- v is the number of ratings or reviews for the restaurant.
- R is the average rating of the restaurant.
- C is the mean rating across all restaurants in your dataset.
- m is the minimum number of ratings required for a restaurant to be considered.

You can use this formula to calculate the weighted rating for each restaurant in your dataset, sort the restaurants based on their weighted ratings, and recommend the top-rated restaurants to users. Adjust the values of C and m according to your dataset and preferences.


**Certainly! By using the percentile as the threshold for the minimum number of votes a restaurant must have, we're essentially filtering out restaurants that have received votes equal to or higher than 70% of all the restaurants in the dataset.**


The choice of the threshold, such as 70%, 90%, or any other value, depends on the specific context and objectives of the analysis. It's often determined based on domain knowledge, the distribution of the data, and the desired balance between inclusivity and selectivity. For instance, a higher threshold like 90% may focus on only the top performers, while a lower threshold like 70% may capture a broader range of candidates or entities while still prioritizing higher-performing ones.



**Certainly! Quantiles, such as quartiles or percentiles, are widely used in statistics for various purposes beyond just filtering data. Here are some other important applications:**

1. **Descriptive Statistics**: Quantiles are essential for understanding the distribution of data. They provide insights into the spread and central tendency of a dataset, allowing statisticians to summarize data effectively. For example, the median is the 50th percentile, which divides the data into two equal parts.

2. **Outlier Detection**: Quantiles are useful for identifying outliers in a dataset. By comparing data points to specific quantiles, statisticians can detect values that fall significantly above or below the expected range, indicating potential anomalies or errors in the data.

3. **Forecasting and Risk Management**: In finance and risk management, quantiles are used to estimate the probability of extreme events or losses. For instance, the Value at Risk (VaR) metric measures the maximum potential loss within a specified confidence level, often based on certain quantiles of a financial dataset.

4. **Performance Evaluation**: Quantiles can be employed to evaluate performance in various fields, such as sports, education, or business. For instance, in sports analytics, percentiles are used to rank athletes based on their performance metrics relative to their peers.

5. **Sampling and Survey Analysis**: When conducting surveys or sampling from a population, quantiles help ensure that the sample is representative of the population. Stratified sampling methods often use quantiles to divide the population into homogeneous groups, allowing for more accurate estimates and analysis.

Overall, quantiles play a crucial role in statistical analysis by providing valuable insights into the distribution, variability, and extremities of data, which are essential for making informed decisions in diverse fields.


In [54]:
df.sample()

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,Average Cost for two,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes,Country
8381,18254314,Sarfira,1,Noida,"B Block Market, Sector 15, Noida",Sector 15,"Sector 15, Noida",0.0,0.0,Fast Food,200,Indian Rupees(Rs.),No,No,No,No,1,0.0,White,Not rated,1,INDIA


In [55]:
def processTopPerformers(query={},threshold=0.7,top=10):
  filtered_data = df.copy()

  if query.get('Country',None):
    filtered_data = filtered_data.query(f"Country == '{query.get('Country')}'")

  mean_aggregate_rating = filtered_data['Aggregate rating'].mean()

  minimum_votes_criteria = filtered_data['Votes'].quantile(threshold)

  filtered_data = filtered_data[filtered_data['Votes']>=int(minimum_votes_criteria)]


  def calculate_score(data, m=minimum_votes_criteria, C=mean_aggregate_rating):
      v = data['Votes']
      R = data['Aggregate rating']
      return (v/(v+m) * R) + (m/(m+v) * C)


  filtered_data['Score'] = filtered_data.apply(calculate_score,axis=1)

  if top:
    top_performers = filtered_data.nlargest(n=top,columns=['Score'])
    return top_performers

  top_performers = filtered_data
  return top_performers


In [56]:
# Setting Application
app = Dash(__name__)

# FrontEnd Server
app.layout = html.Div([
    html.Link(
        rel='stylesheet',
        href='https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css'
    ),
    html.Div([
        html.Div([
            html.H1('Top Restaurants', className='text-center text-uppercase text-primary'),
            html.Hr(),
            html.Div([
                html.P('Note:',className='text-uppercase text-warning fw-bold'),
                html.Small('''Threshold : Its a metric that help you set the how percentile of votes a resturant must have.
                              Higher the value shrink the spread, lower the value increase the spread''',className='d-block'),
                html.Small('Top Performer : Its just a value that tells top how many performers you wanna see.',className='d-block'),
                html.Br()
            ],className='bg-light-subtle text-info-emphasis text-start'),
            html.Hr(),
            # Nav bar
            html.Div([
                html.Div([
                    html.Label('Select threshold', className='d-block fw-bold text-info'),
                    html.Small('(This helps to find top voted restaurants.)', className='text-secondary-emphasis'),
                    dcc.Slider(min=5, max=9, step=1, value=7, id='threshold-slider'),
                ],className='col-3'),

                html.Div([
                    html.Label('Select Top Performers', className='d-block fw-bold text-info'),
                    html.Small('(This gives top "n" restaurants.)', className='text-secondary-emphasis'),
                    dcc.Slider(min=1, max=7, step=1, value=5, id='count-slider'),
                ],className='col-3'),

                html.Div([
                  html.Label('Country', className='d-block fw-bold text-info'),
                  html.Small('(Select your country.)', className='text-secondary-emphasis'),
                  dcc.Dropdown(options=list(country_codes.values()), id='country-dropdown'),
                ],className='col-3')

            ], className='d-flex justify-content-between d-flex align-items-center', style={'width':'100%','height': '300px'}),

            # Observation Section
            html.Div([
                # Graph Section
                html.Div([
                    dcc.Graph(id='graph')
                ], style={'width': '100%'}),

                # # Suggestion Sections
                # html.Div([
                #     html.H6('Top 5 Restaurants Suggestions', className='text-center text-uppercase'),
                # ], style={'width': '100%'})

            ], className='d-flex flex-column align-items-stretch',style={'width':'100%'})


        ], className='row d-flex flex-column justify-content-between d-flex align-items-center')
    ], className='container')
], style={'width': '100%', 'height': '100vh', 'margin': '0px', 'padding': '0px', 'box-sizing': 'border-box'})

# Backend Server
@app.callback(
    Output('graph', 'figure'),
    [Input('threshold-slider', 'value'),
     Input('count-slider', 'value'),
     Input('country-dropdown', 'value')]
)
def interact(threshold_value, top_value, selected_country):
    if selected_country is None:
        top_performers = processTopPerformers(query={}, threshold=(threshold_value/10.0), top=top_value)
        fig = px.bar(top_performers, x=top_performers['Restaurant Name'] + ' (' + top_performers['Country'] + ')', y='Score',
                     title='Top Performing Restaurants Worldwide')
        fig.update_traces(text=top_performers['Score'], textposition='auto')
        return fig
    else:
        top_performers = processTopPerformers(query={'Country': selected_country.strip().upper()}, threshold=(threshold_value/10.0), top=top_value)
        fig = px.bar(top_performers, x=top_performers['Restaurant Name'] + ' (' + top_performers['City'] + ')', y='Score',
                     title=f'Top Performing Restaurants in {selected_country.title()}')

        fig.update_traces(text=top_performers['Score'], textposition='auto')
        return fig
    return px.bar()

# App initialization
if __name__ == '__main__':
    app.run_server(debug=True,port=1252)


<IPython.core.display.Javascript object>

### 2) Content Based Filtering

In this recommender system, the content of the restaurants (country code, cuisines, locality, etc.) is used to find its similarity with other restaurants. Then, the restaurants that are most likely to be similar are recommended.

**Plan of Action**

- We have a 'Cuisines' column which we are going to use for building features.
- Mostly, people choose restaurants depending on the location and available cuisines.
- Find similar restaurants using cosine similarity.

**Resources**

- https://www.geeksforgeeks.org/cosine-similarity/.
- https://medium.com/@shachiakyaagba_41915integrating-folium-with-dash-5338604e7c56
- https://hackernoon.com/introduction-to-recommender-system-part-1-collaborative-filtering-singular-value-decomposition-44c9659c5e75
- https://hackernoon.com/search?query=Recommendation%20System&tab=tags

In [57]:
df.isna().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
Country                 0
dtype: int64

In [58]:
df.dropna(inplace=True)
df = df.reset_index(drop=True)

In [59]:
# Gathering different cuisines in a list so we can create columns names

Cuisines = list()
for value in  df['Cuisines'].str.split(','):
  for item in value:
    if item.strip() not in Cuisines:
      Cuisines.append(item.strip())

In [60]:
# Create a numpy n dimensional data that matches the default data rows and cuisines columns

CuisinesDataFrame = pd.DataFrame(np.zeros((df.shape[0],len(Cuisines)),dtype=int),columns=Cuisines)

In [61]:
CuisinesDataFrame.shape,df.shape

((9542, 145), (9542, 22))

In [62]:
# This is our cuisines dataframe looks like.

CuisinesDataFrame.head()

Unnamed: 0,French,Japanese,Desserts,Seafood,Asian,Filipino,Indian,Sushi,Korean,Chinese,European,Mexican,American,Ice Cream,Cafe,Italian,Pizza,Bakery,Mediterranean,Fast Food,Brazilian,Arabian,Bar Food,Grill,International,Peruvian,Latin American,Burger,Juices,Healthy Food,Beverages,Lebanese,Sandwich,Steak,BBQ,Gourmet Fast Food,Mineira,North Eastern,Coffee and Tea,Vegetarian,Tapas,Breakfast,Diner,Southern,Southwestern,Spanish,Argentine,Caribbean,German,Vietnamese,Thai,Modern Australian,Teriyaki,Cajun,Canadian,Tex-Mex,Middle Eastern,Greek,Bubble Tea,Tea,Australian,Fusion,Cuban,Hawaiian,Salad,Irish,New American,Soul Food,Turkish,Pub Food,Persian,Continental,Singaporean,Malay,Cantonese,Dim Sum,Western,Finger Food,British,Deli,Indonesian,North Indian,Mughlai,Biryani,South Indian,Pakistani,Afghani,Hyderabadi,Rajasthani,Street Food,Goan,African,Portuguese,Gujarati,Armenian,Mithai,Maharashtrian,Modern Indian,Charcoal Grill,Malaysian,Burmese,Chettinad,Parsi,Tibetan,Raw Meats,Kerala,Belgian,Kashmiri,South American,Bengali,Iranian,Lucknowi,Awadhi,Nepalese,Drinks Only,Oriya,Bihari,Assamese,Andhra,Mangalorean,Malwani,Cuisine Varies,Moroccan,Naga,Sri Lankan,Peranakan,Sunda,Ramen,Kiwi,Asian Fusion,Taiwanese,Fish and Chips,Contemporary,Scottish,Curry,Patisserie,South African,Durban,Kebab,Turkish Pizza,Izgara,World Cuisine,D�_ner,Restaurant Cafe,B�_rek
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [63]:
CuisinesDataFrame.shape

(9542, 145)

In [64]:
# This one is important
# here we are iterating over resturant data grabing it Cuisines column split the text
# iterating over those Cuisines list
# matches the Cuisines column name in Cuisines dataframe we had created and add 1 rest to 0

for index,cuisines_list in enumerate(df['Cuisines'].str.split(',')):
  for cuisines in cuisines_list:
    CuisinesDataFrame[cuisines.strip()].iloc[index] = 1

In [65]:
# This features we are going to use to find similarity amoung restaurants.
selected_columns = ['Country Code',
                    'Longitude',
                    'Latitude',
                    'Has Table booking',
                    'Has Online delivery',
                    'Is delivering now',
                    'Switch to order menu',
                    'Price range',
                    'Aggregate rating','Votes']

feature_df = df[selected_columns]

In [66]:
feature_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9542 entries, 0 to 9541
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Country Code          9542 non-null   int64  
 1   Longitude             9542 non-null   float64
 2   Latitude              9542 non-null   float64
 3   Has Table booking     9542 non-null   object 
 4   Has Online delivery   9542 non-null   object 
 5   Is delivering now     9542 non-null   object 
 6   Switch to order menu  9542 non-null   object 
 7   Price range           9542 non-null   int64  
 8   Aggregate rating      9542 non-null   float64
 9   Votes                 9542 non-null   int64  
dtypes: float64(3), int64(3), object(4)
memory usage: 745.6+ KB


In [67]:
# Handling categorical features

feature_df = pd.get_dummies(data=feature_df,
                            drop_first=True,
                            columns=['Has Table booking',
                                     'Has Online delivery',
                                     'Is delivering now',
                                     'Switch to order menu'],
                            dtype=int)

In [68]:
# Concatating the CuisinesDataFrame we had created with Selected features dataframe

content_feature_df = pd.concat((feature_df,CuisinesDataFrame),axis=1)

In [69]:
content_feature_df.head(3)

Unnamed: 0,Country Code,Longitude,Latitude,Price range,Aggregate rating,Votes,Has Table booking_Yes,Has Online delivery_Yes,Is delivering now_Yes,French,Japanese,Desserts,Seafood,Asian,Filipino,Indian,Sushi,Korean,Chinese,European,Mexican,American,Ice Cream,Cafe,Italian,Pizza,Bakery,Mediterranean,Fast Food,Brazilian,Arabian,Bar Food,Grill,International,Peruvian,Latin American,Burger,Juices,Healthy Food,Beverages,Lebanese,Sandwich,Steak,BBQ,Gourmet Fast Food,Mineira,North Eastern,Coffee and Tea,Vegetarian,Tapas,Breakfast,Diner,Southern,Southwestern,Spanish,Argentine,Caribbean,German,Vietnamese,Thai,Modern Australian,Teriyaki,Cajun,Canadian,Tex-Mex,Middle Eastern,Greek,Bubble Tea,Tea,Australian,Fusion,Cuban,Hawaiian,Salad,Irish,New American,Soul Food,Turkish,Pub Food,Persian,Continental,Singaporean,Malay,Cantonese,Dim Sum,Western,Finger Food,British,Deli,Indonesian,North Indian,Mughlai,Biryani,South Indian,Pakistani,Afghani,Hyderabadi,Rajasthani,Street Food,Goan,African,Portuguese,Gujarati,Armenian,Mithai,Maharashtrian,Modern Indian,Charcoal Grill,Malaysian,Burmese,Chettinad,Parsi,Tibetan,Raw Meats,Kerala,Belgian,Kashmiri,South American,Bengali,Iranian,Lucknowi,Awadhi,Nepalese,Drinks Only,Oriya,Bihari,Assamese,Andhra,Mangalorean,Malwani,Cuisine Varies,Moroccan,Naga,Sri Lankan,Peranakan,Sunda,Ramen,Kiwi,Asian Fusion,Taiwanese,Fish and Chips,Contemporary,Scottish,Curry,Patisserie,South African,Durban,Kebab,Turkish Pizza,Izgara,World Cuisine,D�_ner,Restaurant Cafe,B�_rek
0,162,121.027535,14.565443,3,4.8,314,1,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,162,121.014101,14.553708,3,4.5,591,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,162,121.056831,14.581404,4,4.4,270,1,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [70]:
print(f'No of columns in content_feature_df : {len(content_feature_df.columns)}')

No of columns in content_feature_df : 154


In [71]:
content_feature_df.isna().sum().sum()

0

In [72]:
# Calculating cosine similarity

cosine_sim = linear_kernel(content_feature_df, content_feature_df)

In [73]:
# This dataframe is for indexing purpose to find the resturant name given by the user

info_df = df[['City','Country','Country Code','Restaurant Name','Cuisines','Longitude','Latitude']]

In [74]:
def getSimilarResturants(name=None,top=5):

  # find the resturant in indexing data
  index_df = info_df.query(f'`Restaurant Name` == "{name}"')

  # grap the index so we can check it in cosine matrix
  index = index_df.index[0]

  # find the similarity score
  sim_scores = list(enumerate(cosine_sim[index]))

  # sort them scores
  sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

  # grab the sorted indexes
  indexes = [i[0] for i in sim_scores]

  # Filter similarity based on country
  return info_df.iloc[indexes].query(f"`Country` == '{index_df['Country'].iloc[0]}'").iloc[:top,:]

In [75]:
# Setting Application
app = Dash(__name__)

# FrontEnd Server
app.layout = html.Div([
    html.Link(
        rel='stylesheet',
        href='https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css'
    ),
    html.Div([
        html.H1('Restaurant Recommendations',className='text-center text-primary'),
        html.Br(),
        html.Div([

            html.Div([
              html.Label('Select Country', className='d-block text-start fw-bold text-primary'),
              html.Small('(Select Country.)', className='text-secondary-emphasis text-start d-block'),
              html.Br(),
              dcc.Dropdown(df['Country'].unique(),id='country-dropdown')
            ],className='col-4'),

            html.Div([
              html.Label('Select Restaurant Name', className='d-block text-start fw-bold text-primary'),
              html.Small('(Find Similar Restaurant.)', className='text-secondary-emphasis text-start d-block'),
              html.Br(),
              dcc.Dropdown(list(),id='restaurant-dropdown',disabled=True)
            ],className='col-5')

        ],className="row d-flex justify-content-between align-items-center"),

        html.Div([],className='w-100',id='suggestion-section')
      ],className="container w-100")
    ],style={'width': '100%', 'margin': '0px', 'padding': '0px', 'box-sizing': 'border-box'})

# Backend Server
@app.callback(
    Output('restaurant-dropdown', 'disabled'),
    Output('restaurant-dropdown', 'options'),
    Output('suggestion-section','children'),
    Input('country-dropdown', 'value'),
    Input('restaurant-dropdown', 'value')
)
def interact(selected_country,resturant_name):
  if (selected_country is not None) and (resturant_name is None):
    filtered_data = df.query(f'`Country` == "{selected_country}"')
    return False,list(filtered_data['Restaurant Name'].unique()),'',


  if (selected_country is not None) and (resturant_name is not None):
    Cuisines = list()

    # This is to grab city of particular country which got selected
    filtered_data = df.query(f'`Country` == "{selected_country}"')

    # Get top Similar resturants
    resturants_data = getSimilarResturants(resturant_name.strip(),top=5)

    # Get Cuisines based on which our filtering occurs
    for values in resturants_data['Cuisines'].str.split(','):
      Cuisines.extend([item.strip() for item in values])

    Cuisines = list(set(Cuisines))

    component = html.Div([
        html.Br(),
        html.Br(),
        html.H2('Top 5 Suggestions',className='text-center text-primary'),
        html.Br(),
        html.Div([html.P(f"{resturants_data['Restaurant Name'].iloc[index]}\n({resturants_data['City'].iloc[index]})",
                         style={'padding':'0px 5px'},
                         className='w-25 text-capitalize fw-bold text-info align-items-center') for index in range(len(resturants_data))],
                                                                         className='d-flex align-items-center flex-wrap',
                                                                         style={'margin':'20px 0px','list-style':'none'}),
        html.H2('Top Similar Cuisines',className='text-center text-primary'),
        html.Div([html.P(Cuisine,
                         style={'padding':'0px 5px'},
                         className='w-25 text-capitalize fw-bold text-info align-items-center') for Cuisine in Cuisines],
                                                                         className='d-flex align-items-center flex-wrap',
                                                                         style={'margin':'20px 0px','list-style':'none'})
    ])

    return False,list(filtered_data['Restaurant Name'].unique()),component

  return True,list(),''


# App initialization
if __name__ == '__main__':
    app.run_server(debug=True,port=1226)


<IPython.core.display.Javascript object>