# A recommendation system for food stalls aimed at students (based on student count and location of schools/ colleges in Mumbai, India)

## Brief Introduction

## Part 1: Description of problem

### Mumbai, India is an extremely densely populated city (one of the most dense), with more than 18 million residents.

### Obviously it is tough to start a business here due to high real estate costs. So, an entrepreneur aiming at a student centric market (13 - 20 year old demographic) should know the best places to set up shop.

### A large population of Mumbai lies in this student demographic (more than 50 schools), and eating snack foods out is more popular and convenient than ever, hence we will find the best places in Mumbai to set up a food shop/ restaurant

### Target audience: 
### Entrepreneurs and small-scale businessmen/women interested in the food/ snacks industry, aiming at the student demographic

 #   

## Part 2: Data that is needed

### 1. **We need a list of the most populated schools in Mumbai.** Their latitude and longitude will be calculated using geopy Nominatim. 

This data can be found on Wikipedia, as well as the school websites.

For instance: https://en.wikipedia.org/wiki/List_of_educational_institutions_in_Mumbai

### 2. **Then we can use the FourSquare API to find the number of eateries in a 1km radius around each school.** The API will provide us with Postal Code, Neighborhood, Venue, Venue Summary and Venue Category.

Foursquare is a local search-and-discovery service mobile app which provides search results for its users (Wikipedia). It has more than 60 million users.

### 3. Processing the Retrieved data and creating a structured DataFrame for all the venues, grouped by schools. 

### 4. Selecting relevant venues (food related only).

### **The schools with highest ratio of `(no. of students)/(no. of eateries)` would be the best places to start a food stall/ restaurant.** (supply and demand)

We can also create clusters of most highly student populated areas

### Thank you for your time, I would greatly appreciate any feedback (sidjain1412@gmail.com)

In [1]:
import requests  # library to handle requests
import pandas as pd  # library for data analsysis
import numpy as np  # library to handle data in a vectorized manner
import random  # library for random number generation

# module to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# libraries for displaying images
from IPython.display import Image
from IPython.core.display import HTML

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import folium  # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [2]:
CLIENT_ID = 'WEMY4AM5NRBMPJ55IDUZ1XRYOHE52FANWWSHMCT2S0I1JUG3'  # your Foursquare ID
# your Foursquare Secret
CLIENT_SECRET = 'GAGO1KZFQ1DI3IKT1DG42DNQLGHPEBSJIE0QMDRXBJIHGJB1'
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WEMY4AM5NRBMPJ55IDUZ1XRYOHE52FANWWSHMCT2S0I1JUG3
CLIENT_SECRET:GAGO1KZFQ1DI3IKT1DG42DNQLGHPEBSJIE0QMDRXBJIHGJB1


In [3]:
# Podar School Mumbai
geolocator = Nominatim(user_agent='myapplication')
location = geolocator.geocode("Lilavatibai podar santacruz Mumbai India").raw
lat = location['lat']
lon = location['lon']
print("Latitude: ", lat)
print("Longitude: ", lon)

Latitude:  19.0810735
Longitude:  72.8371727


In [5]:
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    lat,
    lon,
    radius,
    100)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=WEMY4AM5NRBMPJ55IDUZ1XRYOHE52FANWWSHMCT2S0I1JUG3&client_secret=GAGO1KZFQ1DI3IKT1DG42DNQLGHPEBSJIE0QMDRXBJIHGJB1&v=20180604&ll=19.0810735,72.8371727&radius=1000&limit=100'

In [6]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c880f7a4c1f676352e432f2'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4ce017cbdb125481d7a13ace-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/icecream_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1c9941735',
         'name': 'Ice Cream Shop',
         'pluralName': 'Ice Cream Shops',
         'primary': True,
         'shortName': 'Ice Cream'}],
       'id': '4ce017cbdb125481d7a13ace',
       'location': {'cc': 'IN',
        'country': 'India',
        'distance': 184,
        'formattedAddress': ['Mahārāshtra', 'India'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 19.081607436400976,
          'lng': 72.83551217986849}],
        'lat': 19.081607436400976,
        'lng': 72.8355121798

In [7]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [8]:
venues = results['response']['groups'][0]['items']

nearby_venues = json_normalize(venues)  # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories',
                    'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(
    get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print(nearby_venues.shape)
nearby_venues

(58, 4)


Unnamed: 0,name,categories,lat,lng
0,Gokul Icecreams,Ice Cream Shop,19.081607,72.835512
1,Sandwizzaa,Sandwich Place,19.0807,72.840414
2,Seasons,Women's Store,19.080887,72.83831
3,Society Stores,Convenience Store,19.084338,72.836298
4,Being Human Store,Clothing Store,19.079782,72.834403
5,Ram & Shyam,Food Truck,19.07822,72.836411
6,Three Wise Men,Pub,19.084432,72.835128
7,Atrang Fashions,Jewelry Store,19.082495,72.838007
8,Nice Fast Food Corner,Fast Food Restaurant,19.077202,72.837742
9,Joss,Japanese Restaurant,19.085574,72.834691


In [9]:
nearby_venues.categories.unique()

array(['Ice Cream Shop', 'Sandwich Place', "Women's Store",
       'Convenience Store', 'Clothing Store', 'Food Truck', 'Pub',
       'Jewelry Store', 'Fast Food Restaurant', 'Japanese Restaurant',
       "Men's Store", 'Gym', 'Coffee Shop', 'Gym / Fitness Center',
       'Lounge', 'Furniture / Home Store', 'Steakhouse', 'Boutique',
       'Snack Place', 'French Restaurant', 'Thai Restaurant', 'Juice Bar',
       'Indian Restaurant', 'Chinese Restaurant', 'Pizza Place',
       'Music Venue', 'Train Station', 'Café', 'Bakery', 'Shopping Mall',
       'Market', 'Dance Studio', 'Moving Target',
       'Middle Eastern Restaurant', 'Platform', 'Movie Theater',
       'Yoga Studio', 'Hotel', 'Metro Station',
       'Vegetarian / Vegan Restaurant'], dtype=object)

### Listing the schools/ colleges we will study

In [10]:
insts = ['Lilavatibai podar santacruz', 'Narsee Monjee College', 'University of Mumbai', 'Jai Hind College',
         'Mithibai College', 'Ramnarain Ruia College', 'Sophia College', "St. Andrew's College",
         'St. Xaviers College', 'Wilson College', 'IIT Bombay', 'Arya Vidya Mandir', 'BD Somani', 'Cambridge School',
         'Don Bosco High School', 'Hiranandani Foundation School Powai', 'Oberoi International', 'Vibgyor High School']
print(len(insts))

18


Adding Mumbai to the end of each to help Nominatim find the coordinates for the locations more easily

In [11]:
insts = [x+" Mumbai" for x in insts]
insts

['Lilavatibai podar santacruz Mumbai',
 'Narsee Monjee College Mumbai',
 'University of Mumbai Mumbai',
 'Jai Hind College Mumbai',
 'Mithibai College Mumbai',
 'Ramnarain Ruia College Mumbai',
 'Sophia College Mumbai',
 "St. Andrew's College Mumbai",
 'St. Xaviers College Mumbai',
 'Wilson College Mumbai',
 'IIT Bombay Mumbai',
 'Arya Vidya Mandir Mumbai',
 'BD Somani Mumbai',
 'Cambridge School Mumbai',
 'Don Bosco High School Mumbai',
 'Hiranandani Foundation School Powai Mumbai',
 'Oberoi International Mumbai',
 'Vibgyor High School Mumbai']

### Function to get latitude and longitude of each institute

In [48]:
def coords(institute):
    d = {}
    d['institute'] = institute
    geolocator = Nominatim(user_agent='myapplication')
    try:
        location = geolocator.geocode(institute).raw
        d['latitude'] = location['lat']
        d['longitude'] = location['lon']
        return d
    except Exception as e:
        print("Institute %s not found"%institute)
        return -1

In [49]:
l = []
for i in insts:
    details = coords(i)
    if(details!=-1):
        l.append(coords(i))
print(l)
print(len(l))

Institute University of Mumbai Mumbai not found
[{'institute': 'Lilavatibai podar santacruz Mumbai', 'longitude': '72.8371727', 'latitude': '19.0810735'}, {'institute': 'Narsee Monjee College Mumbai', 'longitude': '72.837347688538', 'latitude': '19.1037065'}, {'institute': 'Jai Hind College Mumbai', 'longitude': '72.8251531862371', 'latitude': '18.93455995'}, {'institute': 'Mithibai College Mumbai', 'longitude': '72.8374936781393', 'latitude': '19.1028853'}, {'institute': 'Ramnarain Ruia College Mumbai', 'longitude': '72.8500989494695', 'latitude': '19.02381515'}, {'institute': 'Sophia College Mumbai', 'longitude': '72.8070136', 'latitude': '18.970042'}, {'institute': "St. Andrew's College Mumbai", 'longitude': '72.8287305', 'latitude': '19.0566226'}, {'institute': 'St. Xaviers College Mumbai', 'longitude': '72.831870310951', 'latitude': '18.943156'}, {'institute': 'Wilson College Mumbai', 'longitude': '72.810628561733', 'latitude': '18.9567432'}, {'institute': 'IIT Bombay Mumbai', 'lo

Latitude and Longitude of Mumbai, India

In [14]:
mum_lat = 19.0760
mum_lon = 72.8777

#### Plotting all the institutes that we are considering

In [61]:
inst_map = folium.Map(location = [mum_lat, mum_lon], zoom_start=11)

for d in l:
    folium.CircleMarker(
    [float(d['latitude']), float(d['longitude'])],
        radius = 5, 
        popup = d['institute'],
        fill = True,
        color = '#0012EE',
        fill_color = 'red',
        fill_opacity = 0.5
    ).add_to(inst_map)
    
inst_map

Food related categories

In [46]:
food_cats = ['Ice Cream Shop', 'Sandwich Place', 'Food Truck', 'Fast Food Restaurant', 'Indian Restaurant', 'Steakhouse',
            'Chinese Restaurant', 'Coffee Shop', 'Bakery', 'Café', 'Middle Eastern Restuarant']

Testing plot for nearby venues for Lilavatibai Podar School

In [51]:
print(l[0])
inst1 = l[0]['institute']
lat = l[0]['latitude']
lon = l[0]['longitude']

{'institute': 'Lilavatibai podar santacruz Mumbai', 'longitude': '72.8371727', 'latitude': '19.0810735'}


In [52]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    lat,
    lon,
    radius,
    100)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=WEMY4AM5NRBMPJ55IDUZ1XRYOHE52FANWWSHMCT2S0I1JUG3&client_secret=GAGO1KZFQ1DI3IKT1DG42DNQLGHPEBSJIE0QMDRXBJIHGJB1&v=20180604&ll=19.0810735,72.8371727&radius=500&limit=100'

In [53]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c8817e14c1f67634b24a735'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4ce017cbdb125481d7a13ace-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/icecream_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1c9941735',
         'name': 'Ice Cream Shop',
         'pluralName': 'Ice Cream Shops',
         'primary': True,
         'shortName': 'Ice Cream'}],
       'id': '4ce017cbdb125481d7a13ace',
       'location': {'cc': 'IN',
        'country': 'India',
        'distance': 184,
        'formattedAddress': ['Mahārāshtra', 'India'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 19.081607436400976,
          'lng': 72.83551217986849}],
        'lat': 19.081607436400976,
        'lng': 72.8355121798

#### Filtering only food related venues

In [54]:
venues = results['response']['groups'][0]['items']

nearby_venues = json_normalize(venues)  # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories',
                    'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(
    get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues = nearby_venues[nearby_venues['categories'].isin(food_cats)]
print(nearby_venues.shape)
nearby_venues

(12, 4)


Unnamed: 0,name,categories,lat,lng
0,Gokul Icecreams,Ice Cream Shop,19.081607,72.835512
2,Sandwizzaa,Sandwich Place,19.0807,72.840414
6,Ram & Shyam,Food Truck,19.07822,72.836411
9,Nice Fast Food Corner,Fast Food Restaurant,19.077202,72.837742
11,Yoko Sizzlers,Steakhouse,19.077763,72.837744
12,Yoko's,Steakhouse,19.081523,72.837911
13,Shabari Restaurant,Indian Restaurant,19.082411,72.840759
14,Fatkong Restaurant,Chinese Restaurant,19.078744,72.837806
15,Coffee By Di Bella,Coffee Shop,19.078608,72.837778
18,Dynasty,Chinese Restaurant,19.07836,72.837656


In [55]:
venues_i1 = []
for index, row in nearby_venues.iterrows():
    d = {}
    d['name'] = row['name']
    d['lat'] = row['lat']
    d['lng'] = row['lng']
    venues_i1.append(d)

#### Plotting the food venues near Lilavatibai Podar School

In [59]:
inst_map = folium.Map(location=[float(lat), float(lon)], zoom_start=15)

for d in venues_i1:
    folium.CircleMarker(
        [float(d['lat']), float(d['lng'])],
        radius=5,
        popup=d['name'],
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.5
    ).add_to(inst_map)

folium.CircleMarker(
    [float(lat), float(lon)],
    radius=7.5,
    popup=inst1,
    fill=True,
    color='red',
    fill_color='red'
).add_to(inst_map)
inst_map