# Capstone Project: Finding a suitable location for Burger point in Puerto Rico (PR) state

#### Business Problem:

New entrepreneurs who want to start a restaurant business find it difficult to find a best suitable location where there are more potential customers to serve and earn profits.

Problem statement. Find the best location in Puerto Rico State, USA to start a Burger point.

This project analyse the neighbourhood of Puerto Rico and come up with the best location to start new Burger point


#### Data

Data for the analysis is taken from the below URL

https://zipcodedownload.com/lookup/Puerto_Rico

It has Zip code, City Name, State Name columns and links to find the latitude and longitude for each Zip code. I have pulled all these details and saved it in csv file 'PRZipcodes.csv' 


#### Importing Libraries and creating a DataFrame from the Wiki Data

In [3]:
import numpy as np 
import pandas as pd
import requests
from bs4 import BeautifulSoup

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
#=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Libraries imported.


#### Reading the Data from csv file. 
ZIP stored as number without leading zeros. Hence zeroes prefixed

In [4]:
df=pd.read_csv('PRZipcodes.csv')
df['ZIP']= df['ZIP'].astype(str)
df['ZIP'] = df['ZIP'].apply(lambda x: x.zfill(5))
df

Unnamed: 0,ZIP,City Name,County Name,State Name,LAT,LNG
0,601,Adjuntas,Adjuntas,Puerto Rico,18.16595,-66.72363
1,602,Aguada,Aguada,Puerto Rico,18.361945,-67.175597
2,603,Aguadilla,Aguadilla,Puerto Rico,18.455183,-67.119887
3,604,Aguadilla,Aguadilla,Puerto Rico,18.50529,-67.1359
4,605,Aguadilla,Aguadilla,Puerto Rico,18.43615,-67.15134
5,606,Maricao,Maricao,Puerto Rico,18.158345,-66.932911
6,610,Anasco,Anasco,Puerto Rico,18.295366,-67.125135
7,611,Angeles,Utuado,Puerto Rico,18.28772,-66.79758
8,612,Arecibo,Arecibo,Puerto Rico,18.402253,-66.711397
9,613,Arecibo,Arecibo,Puerto Rico,18.47274,-66.71928


#### Referencing latitude and longitude for creating map

In [5]:
address = '00717'

geolocator = Nominatim(user_agent="capstoneProject")
location = geolocator.geocode(address, timeout=60, exactly_one=True)
latitude = location.latitude
longitude = location.longitude
print('The decimal coordinates of 00717 are {}, {}.'.format(latitude, longitude))

The decimal coordinates of 00717 are 18.00545375, -66.6212251.


#### List of columns in the dataset

In [6]:
df.columns

Index(['ZIP', 'City Name', 'County Name', 'State Name', 'LAT', 'LNG'], dtype='object')

#### S howing map with surrounding areas

In [7]:

# create map of PR using latitude and longitude values
map_00717 = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, Zip in zip(df['LAT'], df['LNG'], df['ZIP']):
    label = '{}'.format(Zip)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_00717)  
    
map_00717

#### Four Square API Call for Neighbourhood analysis

In [8]:

CLIENT_ID = 'WLIUWGSEAK1C0RJODW42EGRJUTYXK45RBOGNRXNN3B4IVMCK'
CLIENT_SECRET = 'KIE2NWPDYGTWEFL3CJCYLHABMPNZCGNLFK3B4XCTGOBXMV0X'
VERSION = '20190301'
limit = 500 # limit of number of venues returned by Foursquare API
radius = 5000 # define radius


#### Function to find the venues near Puerto Rico Surrounding area

In [9]:

# function to repeat the exploring process to all the neighborhoods in Puerto Rico
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=1000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, limit)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Localidad', 
                  'Localidad Latitude', 
                  'Localidad Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        #print(results)
        print(nearby_venues)

    return(nearby_venues)

#### List of burger points

In [13]:
zipcode_venues_burger = getNearbyVenues(names=df['ZIP'], latitudes=df['LAT'], longitudes=df['LNG'], radius=1000,categoryIds='4bf58dd8d48988d16c941735')
venues_at_zipcode.head()


Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,601,18.16595,-66.72363,Burger King,18.169486,-66.726676,Fast Food Restaurant
1,605,18.43615,-67.15134,Joe Spud's,18.439296,-67.147833,Burger Joint
2,605,18.43615,-67.15134,Wendy's Aguadilla #2,18.443918,-67.14647,Burger Joint
3,613,18.47274,-66.71928,Burger King arecibo,18.469993,-66.722496,Burger Joint
4,623,18.083361,-67.153897,Wiliche,18.086051,-67.145678,Burger Joint


In [14]:
zipcode_venues_burger.shape

(213, 7)

#### Function to add markers for given venues to map

In [15]:

def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Localidad'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [16]:
map_venues_zipcode = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(venues_at_zipcode, 'red', map_venues_zipcode)
map_venues_zipcode

from pandas.io.json import json_normalize
df = pd.DataFrame.from_dict(json_normalize(xdict), orient='columns')

#### List of High schools

In [17]:
zipcode_venues_highschools = getNearbyVenues(names=df['ZIP'], latitudes=df['LAT'], longitudes=df['LNG'], radius=1000, categoryIds='4bf58dd8d48988d13d941735')
zipcode_venues_highschools.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,601,18.16595,-66.72363,Escuela José Emilio Lugo,18.162628,-66.721026,High School
1,604,18.50529,-67.1359,Ramey School,18.498053,-67.139784,High School
2,614,18.4569,-66.73589,Abelardo Martinez Otero,18.465688,-66.739285,High School
3,636,18.16213,-67.07793,Laura Mercado High School,18.165284,-67.078158,High School
4,637,18.076713,-66.947389,Missionary Christian Academy,18.071133,-66.942162,High School


In [18]:
zipcode_venues_highschools.shape

(149, 7)

#### Showing Schools in the map

In [20]:
map_zipcode_schools = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(zipcode_venues_highschools, 'blue', map_zipcode_schools)
map_zipcode_schools

#### List of Universities

In [21]:
zipcode_venues_uni = getNearbyVenues(names=df['ZIP'], latitudes=df['LAT'], longitudes=df['LNG'], radius=1000, categoryIds='4bf58dd8d48988d1ae941735')
zipcode_venues_uni.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,601,18.16595,-66.72363,Esc Jose E Lugo,18.168703,-66.722083,University
1,604,18.50529,-67.1359,UPR Aguadilla,18.499344,-67.135543,University
2,604,18.50529,-67.1359,Caribbean Aviation Technical Institute,18.501749,-67.14099,University
3,613,18.47274,-66.71928,Resepcion,18.473582,-66.718891,University
4,614,18.4569,-66.73589,UNE. Universidad del Este.,18.459937,-66.743102,University


In [22]:
zipcode_venues_uni.shape



(187, 7)

#### Map showing Universities

In [23]:
map_zipcode_uni = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(zipcode_venues_uni, 'green', map_zipcode_uni)
map_zipcode_uni


#### List Of Offices

In [24]:
zipcode_venues_office = getNearbyVenues(names=df['ZIP'], latitudes=df['LAT'], longitudes=df['LNG'], radius=1000, categoryIds='4d4b7105d754a06375d81259')
zipcode_venues_office.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,601,18.16595,-66.72363,Comite PPD Adjuntas,18.165085,-66.723804,Government Building
1,601,18.16595,-66.72363,Tribunal de Primera Instancia de Adjuntas,18.166603,-66.72494,Courthouse
2,601,18.16595,-66.72363,COMICIÓN ESTATAL DE ELECCIONES,18.164359,-66.723519,Voting Booth
3,601,18.16595,-66.72363,Funeraria Del Carmen,18.162893,-66.725784,Funeral Home
4,601,18.16595,-66.72363,Laboratorio Clinico Adjuntas,18.163314,-66.723996,Office


In [25]:
zipcode_venues_office.shape

(3812, 7)

#### Offices in MAP

In [26]:
map_zipcode_office = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(zipcode_venues_office, 'blue', map_zipcode_office)
map_zipcode_office

#### Adding the columns of  High schools, Universities and Offices to the data frame and aggreagte the counts of each of them .

In [27]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Localidad').count()
    
    for n in startDf['Localidad']:
        try:
            startDf.loc[startDf['Localidad'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Localidad'] == n,columnTitle] = 0

In [28]:
df_data = df.copy()
df_data.rename(columns={'ZIP':'Localidad'}, inplace=True)
addColumn(df_data, 'Burger', zipcode_venues_burger)
addColumn(df_data, 'High Schools', zipcode_venues_highschools)
addColumn(df_data, 'Universities', zipcode_venues_uni)
addColumn(df_data, 'Offices', zipcode_venues_office)
df_data

Unnamed: 0,Localidad,City Name,County Name,State Name,LAT,LNG,Burger,High Schools,Universities,Offices
0,601,Adjuntas,Adjuntas,Puerto Rico,18.16595,-66.72363,1.0,1.0,1.0,30.0
1,602,Aguada,Aguada,Puerto Rico,18.361945,-67.175597,0.0,0.0,0.0,0.0
2,603,Aguadilla,Aguadilla,Puerto Rico,18.455183,-67.119887,0.0,0.0,0.0,1.0
3,604,Aguadilla,Aguadilla,Puerto Rico,18.50529,-67.1359,0.0,1.0,2.0,23.0
4,605,Aguadilla,Aguadilla,Puerto Rico,18.43615,-67.15134,2.0,0.0,0.0,48.0
5,606,Maricao,Maricao,Puerto Rico,18.158345,-66.932911,0.0,0.0,0.0,0.0
6,610,Anasco,Anasco,Puerto Rico,18.295366,-67.125135,0.0,0.0,0.0,5.0
7,611,Angeles,Utuado,Puerto Rico,18.28772,-66.79758,0.0,0.0,0.0,1.0
8,612,Arecibo,Arecibo,Puerto Rico,18.402253,-66.711397,0.0,0.0,0.0,0.0
9,613,Arecibo,Arecibo,Puerto Rico,18.47274,-66.71928,1.0,0.0,1.0,46.0


#### In order to find the score of each location we add appropriate weights to the locations based on the venues 

In [29]:
# negative weight, to open a burger point we want to avoid concurrence as much as possible
weight_burger = -1

#  because high school students are good customers
weight_schools = 1

#  university students are good customers so given positive
weight_uni = 2

#  employees are even better customers so more weight
weight_offices = 3

#### Applying weights to find the score to find suitable location

In [30]:
df_weighted = df_data[['Localidad']].copy()


In [31]:
df_weighted['Score'] = df_data['Burger'] * weight_burger + df_data['High Schools'] * weight_schools + df_data['Universities'] * weight_uni + df_data['Offices'] * weight_offices
df_weighted = df_weighted.sort_values(by=['Score'], ascending=False)
df_weighted

Unnamed: 0,Localidad,Score
128,925,187.0
134,931,176.0
133,930,175.0
139,937,168.0
148,955,168.0
120,917,167.0
137,935,166.0
140,939,166.0
141,940,166.0
163,975,166.0


#### From the results above zipcode 00925 (SAN JUAN) is the best area to open Burger shop in the state Puerto Rico because it got the best score out of all

#### Create a map showing the results. ie. burger point, schools, universities, offices

In [32]:
map_zipcode_result = folium.Map(location=[latitude, longitude], zoom_start=15)

zipcode_win = df[df['ZIP'] == '00925']

for lat, lng, local in zip(zipcode_win['LAT'], zipcode_win['LNG'], zipcode_win['ZIP']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_zipcode_result) 

addToMap(zipcode_venues_burger[zipcode_venues_burger['Localidad'] == '00925'], 'red', map_zipcode_result)
addToMap(zipcode_venues_highschools[zipcode_venues_highschools['Localidad'] == '00925'], 'green', map_zipcode_result)
addToMap(zipcode_venues_uni[zipcode_venues_uni['Localidad'] == '00925'], 'blue', map_zipcode_result)
addToMap(zipcode_venues_office[zipcode_venues_office['Localidad'] == '00925'], 'fuchsia', map_zipcode_result)

map_zipcode_result

#### Details of this 00925 are

In [33]:

import zipcodes
zipcodes.matching('00925')

[{'zip_code': '00925',
  'zip_code_type': 'STANDARD',
  'city': 'SAN JUAN',
  'state': 'PR',
  'lat': 18.4,
  'long': -66.06,
  'world_region': 'NA',
  'country': 'US',
  'active': True}]

#### We will examine how many burger pointes are there at this location now serving how many schools, universities and offices at thi location

#### There are 3 Burger points now 

In [34]:
venues_at_zipcode[venues_at_zipcode['Localidad']=='00925'].shape

(3, 7)

#### There are 18 schools nearby

In [35]:
zipcode_venues_uni[zipcode_venues_uni['Localidad']=='00925'].shape

(18, 7)

#### There are 50 Universities near by.

In [37]:
zipcode_venues_office[zipcode_venues_office['Localidad']=='00925'].shape

(50, 7)

## Conclusion:
We can start a new burger point at SAN JUAN (00925) at this location since we have potential customers to serve and earn profits
