# How to Choose the Best Location for your Medical Practice

<h2>Table of contents</h2>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="#Introduction">Introduction</a></li>
        <li><a href="#Objetive">Objetive</a></li>  
        <li><a href="#Background and significance">Background and significance</a></li> 
         <ol>
                <li><a href="#Healthcare System in Mexico">Healthcare System in Mexico</a></li>
                <li><a href="#How patients choose their practitioner">How patients choose their practitioner</a></li>
            </ol>
        <li><a href="#Design research and Methods">Design research and Methods</a></li> 
            <ol>
                <li><a href="#pre_processing">Pre-processing</a></li>
                <li><a href="#modeling">Modeling</a></li>
                <li><a href="#insights">Insights</a></li>
            </ol>
    </ul>
</div>
<br>
<hr>

## Introduction

When it comes to purchasing a home or investment property, it’s all about **location, location, location**. The same rule applies when you’re looking to buy or rent a space for your medical practice.
According to a July 2014 report by The Associated Press-NORC Center for Public Affairs Research, 50 percent of patients consider the location of medical practice when choosing a doctor. Another report similarly found that 70 percent of healthcare consumers deem location either critical or very important when selecting a provider or healthcare system.

## Objective 

The aim of this project is **find an optimal location for a Medical practice**. For most of Healthcare Professionals (HPC) is it always a headache find a suitable place to open practice or even change the actual facility location.   Scope of this project is help HCPs which Neighborhoods in **Queretaro city, México** are best suited for their business. Location of the facility will have a significant impact on the practice outcome, (adjacent and nearby shops and offices) play a very important role in building a positive early impression of the clinic.
We will use our data science techniques to detect the most promising neighborhoods based on criteria selected in background section. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Background and significance

### Healthcare System in Mexico 

Mexico has achieved universal health coverage and its public healthcare is acceptable for most Mexican residents. Despite this, the private healthcare sector has grown considerably and is driven by increasing disposable income, the growth of medical tourism, and ease of access to higher quality private healthcare services.
Mexico’s public healthcare operates through the Instituto Mexicano de Seguro Social (IMSS)  and Seguro Popular systems. These cover  patients for most medical services and prescription drugs. Those employed in Mexico are automatically enrolled in the IMSS system and their contribution to the scheme is deducted from their salary. Those who are not formally employed may voluntarily enrol in the IMSS system, in which case they will have to pay an annual contribution fee. People who cannot afford the IMSS system must enrol with the Seguro Popular system. Fees for the Seguro Popular system are charged on a sliding scale depending on a resident’s income. While public healthcare in Mexico is relatively good, the quality of services varies between hospitals. 
Most mexicans above mid income opt for private health care, which they finance through private health insurance. Although private hospitals are more expensive, they are better equipped, provide greater access to specialised procedures and generally provide higher quality care.

### How patients choose their practitioner 

How Long and How Far Do Adults Travel and Will Adults Travel for Primary Care?  Accordingly to Washington State Health Services adults are willing spend 28.4 minutes and travel a distance of 32 kilometers (1). This info will be taken in account to set the parameters.
Since there are lots of HCPs  in Queretaro we will try to detect locations that fulfill the next 5 points:

1. Demographics 

We need to define our demographic data such as population age, net income, education. 
We also have to consider whether or not the population is growing or declining, age is a demographic trait that can have several financial impacts on the doctor's office.– it is usually easier to break into newer communities than mature communities where you would have to take patients away from practitioners who have been in the area many years.
Another aspect to take into account is the economic level of the population. An area with a high rate of low-income residents will likely have more patients going to social security than doctors in the private sector.

2. Accessibility

The location choose for the medical practice must be accessible and convenient for patients. For example, a good rule of thumb is to choose a location within 20 minutes of the residential area you hope to serve.

When comparing locations, consider the availability and amount of parking. Free parking is always preferable. And aim for a location with a spacious entryway where elderly, injured or disabled patients can be dropped off and picked up without difficulty.

3. Competition

We need to determine how many providers are in the area, how big their practices are, and what their specialties are.
Finding a space that’s well-known as site for medical practitioners can work to our advantage since people are accustomed to traveling there. “It’s a lot easier to tap into an existing behavior than to create a behavior all by itself.”

4. Visibility
A location in a remote part of town might seem cost-effective, but having low visibility will mean spending more money on marketing to get patients in the door. “Think about marketing costs as part of the rent equation.”
A medical office that’s located on a major road or thoroughfare, or in a busy shopping center, can give you maximum visibility. 

5. Nearby Hospitals, Pharmacies and other business

Speaking of proximity to other businesses, medical practices benefit from operating close to places such as:
-Pharmacies & drug stores
-Hospitals
-Urgent care centers
-Fitness centers
Beyond the obvious convenience of locating close to the hospital, the clinic also will benefit from the patient perception that is located in a recognized healthcare area.
Let's think where are popular businesses, such as supermarkets and banks? The more popular businesses attract more potential clients. Also, upscale businesses attract upscale clients – think Starbucks.


## Design research and Methods

Based on definition of our problem, factors that will influence our decision are:

Demographics: Age and income.
Accessibility: Radius of 32 km from city center.
Competition: Number of proximity clinics.
Visibility: Proximity to principal avenues and from city center
Nearby Hospitals, Pharmacies and other business: Hospitals, pharmacies, restaurants, coffee shops, will be taken in account.


Following data sources will be needed to extract/generate the required information:
-Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using https://github.com/marioalbertodev/colonias-queretaro/blob/master/colonias.json
-Demographics will be obtained using INEGI (National Institute of Statistics and Geography) https://www.inegi.org.mx/app/indicadores/?t=0200&ag=22#D02000070. y en https://www.inegi.org.mx/programas/enigh/nc/2018/default.html. 
-Number of Practices, Hospitals, pharmacies, restaurants, coffee shops, and their type and location in every neighborhood will be obtained using Foursquare API.



### Import libraries
Lets first import the required libraries.
Also run <b> %matplotlib inline </b> since we will be plotting in this section.

In [1]:
#import libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [188]:
%pip install geopandas

Collecting geopandas
  Downloading geopandas-0.8.1-py2.py3-none-any.whl (962 kB)
[K     |████████████████████████████████| 962 kB 3.4 kB/s eta 0:00:01
Collecting fiona
  Downloading Fiona-1.8.13.post1-cp37-cp37m-macosx_10_9_x86_64.whl (13.9 MB)
[K     |████████████████████████████████| 13.9 MB 94 kB/s eta 0:00:011    |█████████▊                      | 4.2 MB 108 kB/s eta 0:01:30
[?25hCollecting pyproj>=2.2.0
  Downloading pyproj-2.6.1.post1-cp37-cp37m-macosx_10_9_x86_64.whl (13.0 MB)
[K     |████████████████████████████████| 13.0 MB 17 kB/s  eta 0:00:01    |███████████████████████▎        | 9.5 MB 121 kB/s eta 0:00:30     |███████████████████████▋        | 9.6 MB 140 kB/s eta 0:00:25
[?25hCollecting shapely
  Downloading Shapely-1.7.0-cp37-cp37m-macosx_10_9_x86_64.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 67 kB/s eta 0:00:011
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-

In [189]:
%pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [281]:
%pip install cartoframes

Collecting cartoframes
  Downloading cartoframes-1.0.4-py2.py3-none-any.whl (256 kB)
[K     |████████████████████████████████| 256 kB 2.7 kB/s eta 0:00:01
[?25hCollecting google-cloud-bigquery==1.22.0
  Downloading google_cloud_bigquery-1.22.0-py2.py3-none-any.whl (161 kB)
[K     |████████████████████████████████| 161 kB 2.2 MB/s eta 0:00:01
[?25hCollecting fastavro==0.22.7
  Downloading fastavro-0.22.7-cp37-cp37m-macosx_10_13_x86_64.whl (421 kB)
[K     |████████████████████████████████| 421 kB 628 kB/s eta 0:00:01
[?25hCollecting unidecode<2.0,>=1.1.0
  Downloading Unidecode-1.1.1-py2.py3-none-any.whl (238 kB)
[K     |████████████████████████████████| 238 kB 2.2 MB/s eta 0:00:01
[?25hCollecting semantic-version<3,>=2.8.0
  Downloading semantic_version-2.8.5-py2.py3-none-any.whl (15 kB)
Collecting carto<2.0,>=1.11.1
  Downloading carto-1.11.1.tar.gz (26 kB)
Collecting google-cloud-bigquery-storage==0.7.0
  Downloading google_cloud_bigquery_storage-0.7.0-py2.py3-none-any.whl (55

Successfully installed appdirs-1.4.4 cachetools-4.1.1 carto-1.11.1 cartoframes-1.0.4 fastavro-0.22.7 google-api-core-1.22.0 google-auth-1.20.0 google-cloud-bigquery-1.22.0 google-cloud-bigquery-storage-0.7.0 google-cloud-core-1.4.1 google-cloud-storage-1.23.0 google-resumable-media-0.5.1 googleapis-common-protos-1.52.0 grpcio-1.31.0 protobuf-3.12.4 pyasn1-0.4.8 pyasn1-modules-0.2.8 pyrestcli-0.6.11 rsa-4.6 semantic-version-2.8.5 unidecode-1.1.1
Note: you may need to restart the kernel to use updated packages.


### Get an Clean data

In [220]:
#Read the Neighbourhoods list from URL CP= Postal Code, Localidad=Neighbourhood, Tipe= Industrial or residencial area,Municipio= Borough 
#soruce of Postal Codes --> https://www.correosdemexico.gob.mx/SSLServicios/ConsultaCP/Descarga.aspx
adresses = pd.read_csv (r'https://raw.githubusercontent.com/Alexrendon/Capstone-Project-Notebook/master/Codigos_Postales_Queretaro.csv')
adresses.head()

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough
0,76000,Centro,Colonia,Queretaro
1,76005,Rincón de San Andrés,Fraccionamiento,Queretaro
2,76005,Mariano Escobedo,Unidad habitacional,Queretaro
3,76005,Vicente Guerrero,Unidad habitacional,Queretaro
4,76005,Rinconada Morelos,Unidad habitacional,Queretaro


In [221]:
#Review object shape
adresses.shape

(845, 4)

In [94]:
#import geopy libraries
import time
from geopy.geocoders import Nominatim
from geopy.util import get_version
get_version()

'2.0.0'

In [222]:
#Convert to dataframe
df = pd.DataFrame(data = adresses)

In [223]:
#Get postal codes
df['Postcode'].unique()

array([76000, 76005, 76010, 76017, 76020, 76024, 76025, 76026, 76027,
       76028, 76030, 76036, 76037, 76040, 76046, 76047, 76048, 76049,
       76050, 76057, 76058, 76059, 76060, 76063, 76067, 76069, 76070,
       76074, 76078, 76079, 76080, 76085, 76086, 76087, 76090, 76093,
       76099, 76100, 76110, 76113, 76114, 76115, 76116, 76117, 76118,
       76120, 76121, 76125, 76127, 76130, 76134, 76135, 76136, 76137,
       76138, 76139, 76140, 76144, 76146, 76147, 76148, 76149, 76150,
       76154, 76155, 76156, 76157, 76158, 76159, 76160, 76164, 76165,
       76166, 76168, 76169, 76170, 76175, 76176, 76177, 76178, 76179,
       76180, 76185, 76190, 76197, 76199, 76210, 76211, 76212, 76213,
       76214, 76215, 76216, 76217, 76218, 76219, 76220, 76221, 76223,
       76224, 76225, 76226, 76227, 76228, 76229, 76230, 76233, 76234,
       76235, 76237, 76238])

In [224]:
#sort the dataframe and drop its index
df = df.sort_values(by = ['Postcode'], ascending = True)
df.reset_index(inplace= True, drop = True)
df.head()

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough
0,76000,Centro,Colonia,Queretaro
1,76005,Rincón de San Andrés,Fraccionamiento,Queretaro
2,76005,Mariano Escobedo,Unidad habitacional,Queretaro
3,76005,Vicente Guerrero,Unidad habitacional,Queretaro
4,76005,Rinconada Morelos,Unidad habitacional,Queretaro


**rows** with be same postalcode are merged into one row with the neighborhoods separated with a comma

In [225]:
#Group by Postcode and drop Zoning
df_group=df.groupby('Postcode').agg({'Borough':'first',
                               'Neighbourhood': ', ' .join}).reset_index()

column_names = ['Postcode','Borough', 'Neighbourhood'] 
df_group1 = pd.DataFrame(columns=column_names)

df_group1 = df_group.drop(df_group[df_group['Borough'].str.contains("Queretaro")==False].index, axis=0, inplace=False)

#Reset Index
df_group1.index = pd.RangeIndex(len(df_group1.index))
df_group1

Unnamed: 0,Postcode,Borough,Neighbourhood
0,76000,Queretaro,Centro
1,76005,Queretaro,"Rincón de San Andrés, Mariano Escobedo, Vicent..."
2,76010,Queretaro,"Las Campanas, Niños Héroes"
3,76017,Queretaro,Centro Universitario (U.A.Q.)
4,76020,Queretaro,"San Javier, Pathé, La Cruz, Jardines de Queret..."
5,76024,Queretaro,"Conjunto Seminario, Karina, La Peñita, Calesa ..."
6,76025,Queretaro,La Pastora
7,76026,Queretaro,"La Cruz, Goyeneche"
8,76027,Queretaro,"San José Inn, Arboledas del Río"
9,76028,Queretaro,Universidad


In [226]:
#Review object shape
df_group1.shape

(111, 3)

### install aditional libraries

In [24]:
%pip install pgeocode

Collecting pgeocode
  Downloading pgeocode-0.2.1-py2.py3-none-any.whl (7.6 kB)
Installing collected packages: pgeocode
Successfully installed pgeocode-0.2.1
Note: you may need to restart the kernel to use updated packages.


### Get Coordinates from PostalCode

In [227]:
geolocator = Nominatim(scheme='http', user_agent="biotecnologo.rendon@gmail.com")

for row_index, item in df_group1.iterrows():
    
    list1 = df_group1.loc[[row_index],['Postcode']].values.astype('str')
    loc = ' , Queretaro, Queretaro, Mexico'
    list1.astype('str')
    list1 = np.append(list1, loc)
    latitude = None
    longitude = None
    location = None
    
    location = geolocator.geocode(list1 , limit = 15)
    time.sleep(5)
    if(location is not None):
        df_group1.loc[df_group1.index[row_index], 'Latitude'] = location.latitude
        df_group1.loc[df_group1.index[row_index], 'Longitude'] = location.longitude
        
df_group1.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,"Santiago de Queretaro Centro, Centro",20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andres, Mariano Escobedo, Rincon...",20.595471,-100.397059
2,76006,Queretaro,Ciudad de Queretaro,20.612119,-100.385082
3,76007,Queretaro,Secretaria de Hacienda y Crédito Publico,20.612119,-100.385082
4,76008,Queretaro,Centro Sct Queretaro,20.612119,-100.385082


### review if all coodinates are complete

In [228]:
df_group1

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,Centro,20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andrés, Mariano Escobedo, Vicent...",20.595471,-100.397059
2,76010,Queretaro,"Las Campanas, Niños Héroes",20.589145,-100.410023
3,76017,Queretaro,Centro Universitario (U.A.Q.),20.612119,-100.385082
4,76020,Queretaro,"San Javier, Pathé, La Cruz, Jardines de Queret...",20.595158,-100.380164
5,76024,Queretaro,"Conjunto Seminario, Karina, La Peñita, Calesa ...",20.60101,-100.369215
6,76025,Queretaro,La Pastora,20.592994,-100.379686
7,76026,Queretaro,"La Cruz, Goyeneche",20.597203,-100.386813
8,76027,Queretaro,"San José Inn, Arboledas del Río",20.598873,-100.386271
9,76028,Queretaro,Universidad,20.59881,-100.387291


In [229]:
#complete missing values from www.googlemaps.com
#Postcode of Neighbourhood Cumbres Del Cimatario was incorrect replace for 76973

df_group1.at[12,'Latitude']=20.579183291477012
df_group1.at[12,'Longitude']=-100.40147663475348
df_group1.at[29,'Latitude']=20.576460196050522
df_group1.at[29,'Longitude']=-100.38223438507171
df_group1.at[74,'Latitude']=20.614491425378638
df_group1.at[74,'Longitude']=-100.39066676510589
df_group1.at[109,'Latitude']=20.67806196562475
df_group1.at[109,'Longitude']=-100.38697958504592

df_group1

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,Centro,20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andrés, Mariano Escobedo, Vicent...",20.595471,-100.397059
2,76010,Queretaro,"Las Campanas, Niños Héroes",20.589145,-100.410023
3,76017,Queretaro,Centro Universitario (U.A.Q.),20.612119,-100.385082
4,76020,Queretaro,"San Javier, Pathé, La Cruz, Jardines de Queret...",20.595158,-100.380164
5,76024,Queretaro,"Conjunto Seminario, Karina, La Peñita, Calesa ...",20.60101,-100.369215
6,76025,Queretaro,La Pastora,20.592994,-100.379686
7,76026,Queretaro,"La Cruz, Goyeneche",20.597203,-100.386813
8,76027,Queretaro,"San José Inn, Arboledas del Río",20.598873,-100.386271
9,76028,Queretaro,Universidad,20.59881,-100.387291


In [254]:
#First we need to be sure dataframe is free from duplicated values in Latitude and Longitude Columns
dups_lat = df_group1.pivot_table(index=['Latitude','Longitude'], aggfunc='size')
print(dups_lat)

Latitude   Longitude  
20.548625  -100.403380     1
20.556338  -100.376800     1
20.561558  -100.398420     1
20.563962  -100.414175     1
20.564054  -100.376876     1
20.566198  -100.369693     1
20.567192  -100.412853     1
20.571274  -100.380372     1
20.575479  -100.373968     1
20.576460  -100.382234     1
20.578726  -100.412367     1
20.579183  -100.401477     1
20.580200  -100.394995     1
20.580318  -100.409792     1
20.581376  -100.377801     1
20.581914  -100.409816     1
20.582016  -100.413860     1
20.583607  -100.381214     1
20.583838  -100.370059     1
20.587390  -100.367297     1
20.587625  -100.385412     1
20.587717  -100.356161     1
20.588841  -100.384972     1
20.589145  -100.410023     1
20.591591  -100.371716     1
20.592994  -100.379686     1
20.595158  -100.380164     1
20.595471  -100.397059    11
20.596408  -100.412606     1
20.597203  -100.386813     1
20.598810  -100.387291     1
20.598873  -100.386271     1
20.600800  -100.390144     1
20.600912  -100.3944

In [270]:
#50 Coordinates are duplicated (52 in total - 2 that are right), lets group Coordinates in order to determine which postcodes have duplicated Coordinates
df_group1.groupby(['Latitude', 'Longitude'])['Postcode'].value_counts().to_frame('count')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count
Latitude,Longitude,Postcode,Unnamed: 3_level_1
20.548625,-100.40338,76086,1
20.556338,-100.3768,76087,1
20.561558,-100.39842,76080,1
20.563962,-100.414175,76185,1
20.564054,-100.376876,76074,1
20.566198,-100.369693,76090,1
20.567192,-100.412853,76180,1
20.571274,-100.380372,76048,1
20.575479,-100.373968,76099,1
20.57646,-100.382234,76079,1


In [280]:
df_group1    

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,Centro,20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andrés, Mariano Escobedo, Vicent...",20.595471,-100.397059
2,76010,Queretaro,"Las Campanas, Niños Héroes",20.589145,-100.410023
3,76017,Queretaro,Centro Universitario (U.A.Q.),20.612119,-100.385082
4,76020,Queretaro,"San Javier, Pathé, La Cruz, Jardines de Queret...",20.595158,-100.380164
5,76024,Queretaro,"Conjunto Seminario, Karina, La Peñita, Calesa ...",20.60101,-100.369215
6,76025,Queretaro,La Pastora,20.592994,-100.379686
7,76026,Queretaro,"La Cruz, Goyeneche",20.597203,-100.386813
8,76027,Queretaro,"San José Inn, Arboledas del Río",20.598873,-100.386271
9,76028,Queretaro,Universidad,20.59881,-100.387291


In [235]:
#save to csv in local file
df_group1.to_csv(r'/Users/javierrendon/Desktop/Queretaro_Coordinates.csv', index = None, header=True)
#UPDATE CSV DUPLICATED COORDINATES MANUALLY (CUMBERSOME BUT NECESARY, ), UPOLOAD FILE TO GITHUB

In [230]:
df_group1.dtypes

Postcode           int64
Borough           object
Neighbourhood     object
Latitude         float64
Longitude        float64
dtype: object

In [334]:
df_group2 = pd.read_csv (r'https://raw.githubusercontent.com/Alexrendon/Capstone-Project-Notebook/master/Queretaro_Coordinates.csv')
df_group2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,Centro,20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andrés, Mariano Escobedo, Vicent...",20.595006,-100.39877
2,76010,Queretaro,"Las Campanas, Niños Héroes",20.589145,-100.410023
3,76017,Queretaro,Centro Universitario (U.A.Q.),20.592282,-100.409679
4,76020,Queretaro,"San Javier, Pathé, La Cruz, Jardines de Queret...",20.596568,-100.379079


In [None]:
#Convert to dataframe
df = pd.DataFrame(data = adresses)

### Get latitudes and longitudes of Queretaro City

In [232]:
address = 'Queretaro, QRO'

geolocator = Nominatim(user_agent="biotecnologo.rendon@gmail.com")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Queretaro are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Queretaro are 20.5954708, -100.3970593.


### Create a map of Queretaro with its neighbourhoods

In [335]:
Queretaro_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df_group2['Latitude'], df_group2['Longitude'], df_group2['Borough'], df_group2['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Queretaro_map)  
    
Queretaro_map

In [None]:
Queretaro_map.save('Queretaro.html')

### Postcode Boundaries are not shown on map, lets create the geojson map

In [None]:
#Postcode Boundaries are not shown on map, thus we need to install kml2geojson to tranform kml public avaliable data to geojson  
%pip install kml2geojson

In [None]:
#Convert kml postcode boundaries to geojson
import kml2geojson
kml2geojson.main.convert(r'/Users/javierrendon/Desktop/cp_qro/CP_22_Qro_v6.kml', r'/Users/javierrendon/Desktop/', separate_folders=False, style_type=None, style_filename='style.json')

In [309]:
# Download and store a geojson file of Queretaro
!wget --quiet https://github.com/Alexrendon/Capstone-Project-Notebook/blob/master/CP_Qro.geojson
qro_geo = r'CP_Qro.geojson'
print("geojson ready!")

geojson ready!


In [337]:
# load GeoJSON
with open(qro_geo) as jsonFile:
    data = json.load(jsonFile)
tmp = data

# remove ZIP codes not in our dataset
geozips = []
for i in range(len(tmp['features'])):
    if tmp['features'][i]['properties']['d_cp'] in list(df_group2['Postcode'].unique()):
        geozips.append(tmp['features'][i])

# creating new JSON object
new_json = dict.fromkeys(['type','features'])
new_json['type'] = 'FeatureCollection'
new_json['features'] = geozips

# save JSON object as updated-file
open("qro_borough.json", "w").write(
    json.dumps(new_json, sort_keys=True, indent=4, separators=(',', ': '))
)

2488368

In [338]:
# reading of the updated GeoJSON file
qro_boroughs = r'qro_borough.json'    
    
# Create a map of Queretaro with postcodes boundaries
map_queretaro = folium.Map(location=[latitude, longitude], zoom_start=12)

#upload postal codes boundaries files
folium.GeoJson(qro_boroughs, name='json').add_to(map_queretaro)

for lat, lng, Postcode, Neighbourhood in zip(df_group2['Latitude'], df_group2['Longitude'],df_group2['Postcode'], df_group2['Neighbourhood'] ):
    label = '{}, {}'.format(Neighbourhood, Postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_queretaro) 

map_queretaro

In [None]:
pip install ipywidgets
jupyter labextension install @jupyter-widgets/jupyterlab-manager

### Lets use FOURSQUARE API to explore Queretaro

In [316]:
CLIENT_ID = 'JR0XGIS01BV2ZKK2SGWB5TSMH52TAFNTVSZWLZ4PPR0PB31H' # your Foursquare ID
CLIENT_SECRET = 'F4UTYNYA3AFFWLY3HPTE5IWESFOTKSD1X2IGEKDIPBIU1IS4' # your Foursquare Secret
VERSION = '20180604'

print('Successfully Logged-In')

Successfully Logged-In


#### Where are the nearest hospitals by Neihbourhood and where are the most of them?

In [318]:
#Whats the name of the first Neighbourhood in our dataframe
df_group2.loc[0, 'Neighbourhood']

'Centro'

#### Now let's get the latitude and longitude of this neighborhood

In [322]:

neighbourhood_latitude = df_group2.loc[0, 'Latitude'] # neighbourhood latitude value
neighbourhood_longitude = df_group2.loc[0, 'Longitude'] # neighbourhood longitude value

neighbourhood_name = df_group2.loc[0, 'Neighbourhood'] # neighbourhood name

print('Latitude and longitude values of {} Neighbourhood are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Centro Neighbourhood are 20.5954708, -100.3970593.


#### Get the top 50 Hospitals in Centro Neighbourhood in a radius of 1000 meters

In [330]:
LIMIT= 50
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4bf58dd8d48988d196941735,4bf58dd8d48988d104941735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)

In [331]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f2d88edfa01662f62b30a47'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Santiago de Querétaro',
  'headerFullLocation': 'Santiago de Querétaro',
  'headerLocationGranularity': 'city',
  'query': 'hospital',
  'totalResults': 9,
  'suggestedBounds': {'ne': {'lat': 20.60447080900001,
    'lng': -100.38746275434151},
   'sw': {'lat': 20.586470790999993, 'lng': -100.40665584565848}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '509bd9f6e4b09d90f57896c4',
       'name': 'Sanatorio Margarita',
       'location': {'address': 'Guerrero Norte 3',
        'crossStreet': 'Frente al Jardín Guerrero',
        'lat': 20.59235738795313,
        'lng': -

In [332]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  from ipykernel import kernelapp as app


Unnamed: 0,name,categories,lat,lng
0,Sanatorio Margarita,Hospital,20.592357,-100.395213
1,Sanatorio Santiago de Querétaro,Hospital,20.592119,-100.399937
2,Hospital Santa Rosa de Viterbo,Hospital,20.588506,-100.397214
3,Hospital del Sagrado Corazon,Hospital,20.599134,-100.393373
4,Hospital Luis Martin,Hospital,20.587689,-100.394677


In [333]:
print('{} Hospitals were returned by Foursquare.'.format(nearby_venues.shape[0]))

9 Hospitals were returned by Foursquare.


In [None]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4bf58dd8d48988d196941735&categoryId=4bf58dd8d48988d104941735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)