# <b> Capstone Project Data Section </b>

## Importing Libraries

In [7]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

## <b> Background problem and Data Acquisition</b>
<b>Q: Which is the best city to start a new Restaurant?</b>
 <i> India is already an overpopulated city. There are always a need for building new restaurants. In this capstone project I will try to recommend a city for bulding a restaurant based on restaurant population with respect to city population. For this I will require:</i>
<ul>
  <li>Population of India with respect to its different cities - to be obtained by scraping data from websites</li>
  <li>City latitude and longitude (location) - to be obtained by scraping data from websites</li>
  <li>Number of Restaurants for different cities - to be obtained from Foursquare</li>
</ul>
<i> I have selected the site <a> 'http://worldpopulationreview.com/countries/india-population/cities/ </a> which provides up-to-date data on population as well as the location coordinates required for my project. </i>

<b>Below I have scraped and cleaned the data and assigned a dataframe df for the data I will be using</b>

In [8]:
from bs4 import BeautifulSoup
import requests
source= requests.get('http://worldpopulationreview.com/countries/india-population/cities/').text
soup=BeautifulSoup(source,'lxml')
table=soup.findAll('table')[1]
df=pd.read_html(str(table))[0]
Location_link=table.findAll('a',href=True)
Lat=[];
Lon=[];
for i in range((len(Location_link))):
    k=Location_link[i].get('href')
    Lat.append(float(((k.split('?')[1]).split('=')[1]).split(',')[0]))
    Lon.append(float(((k.split('?')[1]).split('=')[1]).split(',')[1]))

df['Latitude']=Lat[0:]
df['Longitude']=Lon[0:]
df.rename(columns={'Name':'City', '2019 Population':'Population_2019'},inplace=True)
df.drop(['Location'], inplace=True, axis=1)  # Dropping previous index column
df.head()

Unnamed: 0,City,Population_2019,Latitude,Longitude
0,Mumbai,12691836,19.07283,72.88261
1,Delhi,10927986,28.65195,77.23149
2,Bengaluru,5104047,12.97194,77.59369
3,Kolkata,4631392,22.56263,88.36304
4,Chennai,4328063,13.08784,80.27847


<b>Next I will be using FourSquare Data.</b> <i>I need information on restaurants located at different cities from Foursquare so that I can recommend the best city where a new restaurant can be built. Then by sorting the ratio of city population with respect to the number of restaurants in the city I can recommend the city for building a new Restaurant. </i>

In [9]:
CLIENT_ID = 'W1ILFLLXXCJ2GG0X5BYYBN4VI2PJT4ETZQ5PQOAHJ0YUDNKF' # your Foursquare ID


CLIENT_SECRET = 'KRCDVDAX2KHXH3LCKJV5QWOSFOTNA5E2OD2Z4NFGTCBQX5YU'

VERSION = '20180605' # Foursquare API version
LIMIT='400'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: W1ILFLLXXCJ2GG0X5BYYBN4VI2PJT4ETZQ5PQOAHJ0YUDNKF
CLIENT_SECRET:KRCDVDAX2KHXH3LCKJV5QWOSFOTNA5E2OD2Z4NFGTCBQX5YU


In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
India_venues = getNearbyVenues(names=df['City'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  );

Mumbai
Delhi
Bengaluru
Kolkata
Chennai
Ahmedabad
Hyderabad
Pune
Surat
Kanpur
Jaipur
Navi Mumbai
Lucknow
Nagpur
Indore
Patna
Bhopal
Ludhiana
Tirunelveli
Agra
Vadodara
Gorakhpur
Nashik
Pimpri
Kalyan
Thane
Meerut
Nowrangapur
Faridabad
Ghaziabad
Dombivli
Rajkot
Varanasi
Amritsar
Allahabad
Visakhapatnam
Teni
Jabalpur
Haora
Aurangabad
Shivaji Nagar
Solapur
Srinagar
Chandigarh
Coimbatore
Jodhpur
Madurai
Guwahati
Gwalior
Vijayawada
Mysore
Ranchi
Hubli
Jalandhar
Thiruvananthapuram
Salem
Tiruchirappalli
Kota
Bhubaneshwar
Aligarh
Bareilly
Moradabad
Bhiwandi
Raipur
Gorakhpur
Bhilai
Jamshedpur
Borivli
Cochin
Amravati
Sangli
Cuttack
Bikaner
Warangal
Bhavnagar
Nanded
Raurkela
Guntur
Dehra Dun
Bhayandar
Durgapur
Ajmer
Ulhasnagar
Kolhapur
Shiliguri
Bilimora
Karol Bagh
Asansol
Jamnagar
Saharanpur
Gulbarga
Bhatpara
Jammu
Kurnool
Ujjain
Ramgundam
Shyamnagar
Nangi
Kozhikode
Malegaon
Davangere
Jalgaon
Akola
Belgaum
Gaya
Udaipur
Korba
Bokaro
Mangalore
Jhansi
Thoothukudi
Nellore
Tiruppur
Kollam
Panihati
Ahmad

In [12]:
print(India_venues.shape)

print('There are {} uniques categories.'.format(len(India_venues['Venue Category'].unique())))
India_venues.head()

(1241, 7)
There are 201 uniques categories.


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mumbai,19.07283,72.88261,Candere,19.074104,72.882732,Jewelry Store
1,Mumbai,19.07283,72.88261,Focus-Suites,19.0733,72.87843,Market
2,Mumbai,19.07283,72.88261,Workout Gym,19.069415,72.880235,Gym
3,Mumbai,19.07283,72.88261,ONLY,19.070139,72.879041,Clothing Store
4,Delhi,28.65195,77.23149,Haveli Dharampura,28.653247,77.232309,Hotel


### One hot Encoding

In [13]:
# one hot encoding
In_onehot = pd.get_dummies(India_venues[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
In_onehot['City'] = India_venues['City'] 

# move city column to the first column
fixed_columns = [In_onehot.columns[-1]] + list(In_onehot.columns[:-1])
In_onehot = In_onehot[fixed_columns]

In_onehot.head()

Unnamed: 0,Zoo,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Andhra Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Basketball Court,Beach,Bed & Breakfast,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Café,Cajun / Creole Restaurant,Campground,Cantonese Restaurant,Castle,Chaat Place,Chettinad Restaurant,Chinese Restaurant,City,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cricket Ground,Cupcake Shop,Currency Exchange,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Electronics Store,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Field,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,General Travel,Gift Shop,Gluten-free Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Home Service,Hookah Bar,Hospital,Hot Spring,Hotel,Hotel Bar,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karnataka Restaurant,Lake,Light Rail Station,Lighting Store,Liquor Store,Lounge,Mac & Cheese Joint,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Music Store,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Optical Shop,Outdoors & Recreation,Outlet Store,Palace,Paper / Office Supplies Store,Park,Parsi Restaurant,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Public Art,Rajasthani Restaurant,Recording Studio,Rental Car Location,Resort,Rest Area,Restaurant,River,Salad Place,Sandwich Place,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Sports Club,Stadium,Surf Spot,Tailor Shop,Tea Room,Temple,Tennis Court,Tennis Stadium,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track Stadium,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Watch Shop,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Extracting list of resturants 

In [14]:
Rest_data=In_onehot.filter(like='Restaurant', axis=1)
Rest_data['City']=In_onehot['City']
Rest_data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Afghan Restaurant,Andhra Restaurant,Asian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Chettinad Restaurant,Chinese Restaurant,Eastern European Restaurant,Fast Food Restaurant,French Restaurant,Gluten-free Restaurant,Hyderabadi Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Multicuisine Indian Restaurant,North Indian Restaurant,Northeast Indian Restaurant,Parsi Restaurant,Rajasthani Restaurant,Restaurant,Seafood Restaurant,South Indian Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,City
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mumbai
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Delhi


In [15]:
Rest_grouped=Rest_data.groupby(['City']).size().reset_index(name='Counts')
df_sort=df.sort_values(by=['City']).reset_index()

left=df_sort
right=Rest_grouped
df_new= pd.merge(left=df_sort,right=Rest_grouped)
df_new['Ratio']=df_new['Population_2019']/df_new['Counts']

df_new.drop(['index'], inplace=True, axis=1)
df_new=df_new.sort_values(by=['Ratio'],ascending=False).reset_index()
df_new.drop(['index'], inplace=True, axis=1)
df_new.head()

<i><b>df_new</b> is the complete data that I will be using for my project</i>