# How to Choose the Best Location for your Medical Practice

<h2>Table of contents</h2>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="#Introduction">Introduction</a></li>
        <li><a href="#Objetive">Objetive</a></li>  
        <li><a href="#Background and significance">Background and significance</a></li> 
         <ol>
                <li><a href="#Healthcare System in Mexico">Healthcare System in Mexico</a></li>
                <li><a href="#How patients choose their practitioner">How patients choose their practitioner</a></li>
            </ol>
        <li><a href="#Design research and Methods">Design research and Methods</a></li> 
            <ol>
                <li><a href="#pre_processing">Pre-processing</a></li>
                <li><a href="#modeling">Modeling</a></li>
                <li><a href="#insights">Insights</a></li>
            </ol>
    </ul>
</div>
<br>
<hr>

## Introduction

When it comes to purchasing a home or investment property, it’s all about **location, location, location**. The same rule applies when you’re looking to buy or rent a space for your medical practice.
According to a July 2014 report by The Associated Press-NORC Center for Public Affairs Research, 50 percent of patients consider the location of medical practice when choosing a doctor. Another report similarly found that 70 percent of healthcare consumers deem location either critical or very important when selecting a provider or healthcare system.

## Objective 

The aim of this project is **find an optimal location for a Medical practice**. For most of Healthcare Professionals (HPC) is it always a headache find a suitable place to open practice or even change the actual facility location.   Scope of this project is help HCPs which Neighborhoods in **Queretaro city, México** are best suited for their business. Location of the facility will have a significant impact on the practice outcome, (adjacent and nearby shops and offices) play a very important role in building a positive early impression of the clinic.
We will use our data science techniques to detect the most promising neighborhoods based on criteria selected in background section. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Background and significance

### Healthcare System in Mexico 

Mexico has achieved universal health coverage and its public healthcare is acceptable for most Mexican residents. Despite this, the private healthcare sector has grown considerably and is driven by increasing disposable income, the growth of medical tourism, and ease of access to higher quality private healthcare services.
Mexico’s public healthcare operates through the Instituto Mexicano de Seguro Social (IMSS)  and Seguro Popular systems. These cover  patients for most medical services and prescription drugs. Those employed in Mexico are automatically enrolled in the IMSS system and their contribution to the scheme is deducted from their salary. Those who are not formally employed may voluntarily enrol in the IMSS system, in which case they will have to pay an annual contribution fee. People who cannot afford the IMSS system must enrol with the Seguro Popular system. Fees for the Seguro Popular system are charged on a sliding scale depending on a resident’s income. While public healthcare in Mexico is relatively good, the quality of services varies between hospitals. 
Most mexicans above mid income opt for private health care, which they finance through private health insurance. Although private hospitals are more expensive, they are better equipped, provide greater access to specialised procedures and generally provide higher quality care.

### How patients choose their practitioner 

How Long and How Far Do Adults Travel and Will Adults Travel for Primary Care?  Accordingly to Washington State Health Services adults are willing spend 28.4 minutes and travel a distance of 32 kilometers (1). This info will be taken in account to set the parameters.
Since there are lots of HCPs  in Queretaro we will try to detect locations that fulfill the next 5 points:

1. Demographics 

We need to define our demographic data such as population age, net income, education. 
We also have to consider whether or not the population is growing or declining, age is a demographic trait that can have several financial impacts on the doctor's office.– it is usually easier to break into newer communities than mature communities where you would have to take patients away from practitioners who have been in the area many years.
Another aspect to take into account is the economic level of the population. An area with a high rate of low-income residents will likely have more patients going to social security than doctors in the private sector.

2. Accessibility

The location choose for the medical practice must be accessible and convenient for patients. For example, a good rule of thumb is to choose a location within 20 minutes of the residential area you hope to serve.

When comparing locations, consider the availability and amount of parking. Free parking is always preferable. And aim for a location with a spacious entryway where elderly, injured or disabled patients can be dropped off and picked up without difficulty.

3. Competition

We need to determine how many providers are in the area, how big their practices are, and what their specialties are.
Finding a space that’s well-known as site for medical practitioners can work to our advantage since people are accustomed to traveling there. “It’s a lot easier to tap into an existing behavior than to create a behavior all by itself.”

4. Visibility
A location in a remote part of town might seem cost-effective, but having low visibility will mean spending more money on marketing to get patients in the door. “Think about marketing costs as part of the rent equation.”
A medical office that’s located on a major road or thoroughfare, or in a busy shopping center, can give you maximum visibility. 

5. Nearby Hospitals, Pharmacies and other business

Speaking of proximity to other businesses, medical practices benefit from operating close to places such as:
-Pharmacies & drug stores
-Hospitals
-Urgent care centers
-Fitness centers
Beyond the obvious convenience of locating close to the hospital, the clinic also will benefit from the patient perception that is located in a recognized healthcare area.
Let's think where are popular businesses, such as supermarkets and banks? The more popular businesses attract more potential clients. Also, upscale businesses attract upscale clients – think Starbucks.


## Design research and Methods

Based on definition of our problem, factors that will influence our decision are:

Demographics: Age and income.
Accessibility: Radius of 32 km from city center.
Competition: Number of proximity clinics.
Visibility: Proximity to principal avenues and from city center
Nearby Hospitals, Pharmacies and other business: Hospitals, pharmacies, restaurants, coffee shops, will be taken in account.


Following data sources will be needed to extract/generate the required information:
-Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using https://github.com/marioalbertodev/colonias-queretaro/blob/master/colonias.json
-Demographics will be obtained using INEGI (National Institute of Statistics and Geography) https://www.inegi.org.mx/app/indicadores/?t=0200&ag=22#D02000070. y en https://www.inegi.org.mx/programas/enigh/nc/2018/default.html. 
-Number of Practices, Hospitals, pharmacies, restaurants, coffee shops, and their type and location in every neighborhood will be obtained using Foursquare API.



### Import libraries
Lets first import the required libraries.
Also run <b> %matplotlib inline </b> since we will be plotting in this section.

In [1]:
#import libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [188]:
%pip install geopandas

Collecting geopandas
  Downloading geopandas-0.8.1-py2.py3-none-any.whl (962 kB)
[K     |████████████████████████████████| 962 kB 3.4 kB/s eta 0:00:01
Collecting fiona
  Downloading Fiona-1.8.13.post1-cp37-cp37m-macosx_10_9_x86_64.whl (13.9 MB)
[K     |████████████████████████████████| 13.9 MB 94 kB/s eta 0:00:011    |█████████▊                      | 4.2 MB 108 kB/s eta 0:01:30
[?25hCollecting pyproj>=2.2.0
  Downloading pyproj-2.6.1.post1-cp37-cp37m-macosx_10_9_x86_64.whl (13.0 MB)
[K     |████████████████████████████████| 13.0 MB 17 kB/s  eta 0:00:01    |███████████████████████▎        | 9.5 MB 121 kB/s eta 0:00:30     |███████████████████████▋        | 9.6 MB 140 kB/s eta 0:00:25
[?25hCollecting shapely
  Downloading Shapely-1.7.0-cp37-cp37m-macosx_10_9_x86_64.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 67 kB/s eta 0:00:011
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-

In [189]:
%pip install geocoder

Note: you may need to restart the kernel to use updated packages.


### Get an Clean data

In [2]:
#Read the Neighbourhoods list from URL CP= Postal Code, Localidad=Neighbourhood, Tipe= Industrial or residencial area,Municipio= Borough 
#https://www.correosdemexico.gob.mx/SSLServicios/ConsultaCP/Descarga.aspx
adresses = pd.read_json (r'https://raw.githubusercontent.com/marioalbertodev/colonias-queretaro/master/colonias.json')
adresses.head()

Unnamed: 0,cp,localidad,tipo,municipio
0,76118,10 de Abril,Fraccionamiento,Querétaro
1,76118,15 de Mayo,Fraccionamiento,Querétaro
2,76069,2 de Abril,Colonia,Querétaro
3,76118,5 de Febrero,Colonia,Querétaro
4,76069,8 de Diciembre,Colonia,Querétaro


In [27]:
#Export file to csv to review data quality
export_csv = adresses.to_csv (r'/Users/javierrendon/Desktop/Queretaro_CP.csv', index = None, header=True)

In [193]:
#TRANSLATE AND UPDATE CSV FILE AND STORE IN GITHUB
adresses1 = pd.read_csv ('https://raw.githubusercontent.com/Alexrendon/Capstone-Project-Notebook/master/Queretaro_CP.csv')
adresses1.head()

Unnamed: 0,CP,Neighbourhood,Zoning,Borough
0,76118,10 de Abril,Residential,Queretaro
1,76118,15 de Mayo,Residential,Queretaro
2,76069,2 de Abril,Residential,Queretaro
3,76118,5 de Febrero,Residential,Queretaro
4,76069,8 de Diciembre,Residential,Queretaro


In [194]:
#Review object shape
adresses1.shape

(875, 4)

In [94]:
#import geopy libraries
import time
from geopy.geocoders import Nominatim
from geopy.util import get_version
get_version()

'2.0.0'

In [196]:
#Rename CP to Postcode
adresses1.rename(columns={'CP':'Postcode'},inplace=True)
adresses1.head()

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough
0,76118,10 de Abril,Residential,Queretaro
1,76118,15 de Mayo,Residential,Queretaro
2,76069,2 de Abril,Residential,Queretaro
3,76118,5 de Febrero,Residential,Queretaro
4,76069,8 de Diciembre,Residential,Queretaro


In [197]:
#Convert to dataframe
df = pd.DataFrame(data = adresses1)

In [198]:
#Get postal codes
df['Postcode'].unique()

array([76118, 76069, 76226, 76026, 76040, 76160, 76140, 76148, 76134,
       76027, 76048, 76085, 76137, 76079, 76159, 76144, 76230, 76024,
       76125, 76089, 76120, 76237, 76087, 76020, 76113, 76225, 76147,
       76116, 76070, 76177, 76050, 76030, 76224, 76008, 76221, 76059,
       76017, 76006, 76093, 76168, 76190, 76170, 76117, 76090, 76158,
       76121, 76156, 76150, 76220, 76110, 76180, 76127, 76209, 76095,
       76114, 76146, 76178, 76115, 76228, 76187, 76046, 76047, 76229,
       76227, 76039, 76154, 76208, 76179, 76236, 76138, 76197, 76112,
       76086, 76122, 76036, 76100, 76165, 76164, 76057, 76130, 76139,
       76029, 76215, 76210, 76136, 76074, 76217, 76214, 76067, 76212,
       76219, 76025, 76239, 76166, 76010, 76186, 76049, 76218, 76060,
       76080, 76176, 76175, 76185, 76037, 76235, 76188, 76149, 76206,
       76038, 76211, 76009, 76213, 76063, 76233, 76223, 76099, 76169,
       76199, 76216, 76131, 76005, 76135, 76058, 76238, 76232, 76157,
       76000, 76234,

In [199]:
#sort the dataframe and drop its index
df = df.sort_values(by = ['Postcode'], ascending = True)
df.reset_index(inplace= True, drop = True)
df.head()

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough
0,76000,Santiago de Queretaro Centro,Residential,Queretaro
1,76000,Centro,Residential,Queretaro
2,76005,Rincón de San Andres,Residential,Queretaro
3,76005,Mariano Escobedo,Residential,Queretaro
4,76005,Rinconada Morelos,Residential,Queretaro


**rows** with be same postalcode are merged into one row with the neighborhoods separated with a comma

In [101]:
#Group by Postcode and drop Zoning
df_group=df.groupby('Postcode').agg({'Borough':'first',
                               'Neighbourhood': ', ' .join}).reset_index()

column_names = ['Postcode','Borough', 'Neighbourhood'] 
df_group1 = pd.DataFrame(columns=column_names)

df_group1 = df_group.drop(df_group[df_group['Borough'].str.contains("Queretaro")==False].index, axis=0, inplace=False)

#Reset Index
df_group1.index = pd.RangeIndex(len(df_group1.index))
df_group1

Unnamed: 0,Postcode,Borough,Neighbourhood
0,76000,Queretaro,"Santiago de Queretaro Centro, Centro"
1,76005,Queretaro,"Rincón de San Andres, Mariano Escobedo, Rincon..."
2,76006,Queretaro,Ciudad de Queretaro
3,76007,Queretaro,Secretaria de Hacienda y Crédito Publico
4,76008,Queretaro,Centro Sct Queretaro
5,76009,Queretaro,Palacio de Gobierno Del Estado de Queretaro
6,76010,Queretaro,"Las Campanas, Niños Héroes"
7,76017,Queretaro,"Circuito Universitario, Centro Universitario (..."
8,76020,Queretaro,"Pathe, Diligencias, Calesa, El Cortijo, Bosque..."
9,76024,Queretaro,"El Cortijo II, Calesa 2a Sección, Peñitas, Con..."


In [102]:
#Review object shape
df_group1.shape

(132, 3)

### install aditional libraries

In [24]:
%pip install pgeocode

Collecting pgeocode
  Downloading pgeocode-0.2.1-py2.py3-none-any.whl (7.6 kB)
Installing collected packages: pgeocode
Successfully installed pgeocode-0.2.1
Note: you may need to restart the kernel to use updated packages.


### Get Coordinates from PostalCode

In [201]:
#Group by Postcode and drop Zoning
df_group=df.groupby('Postcode').agg({'Borough':'first',
                               'Neighbourhood': ', ' .join}).reset_index()

column_names = ['Postcode','Borough', 'Neighbourhood'] 
df_group2 = pd.DataFrame(columns=column_names)

df_group2 = df_group.drop(df_group[df_group['Borough'].str.contains("Queretaro")==False].index, axis=0, inplace=False)

#Reset Index
df_group2.index = pd.RangeIndex(len(df_group2.index))
df_group2

Unnamed: 0,Postcode,Borough,Neighbourhood
0,76000,Queretaro,"Santiago de Queretaro Centro, Centro"
1,76005,Queretaro,"Rincón de San Andres, Mariano Escobedo, Rincon..."
2,76006,Queretaro,Ciudad de Queretaro
3,76007,Queretaro,Secretaria de Hacienda y Crédito Publico
4,76008,Queretaro,Centro Sct Queretaro
5,76009,Queretaro,Palacio de Gobierno Del Estado de Queretaro
6,76010,Queretaro,"Las Campanas, Niños Héroes"
7,76017,Queretaro,"Circuito Universitario, Centro Universitario (..."
8,76020,Queretaro,"Pathe, Diligencias, Calesa, El Cortijo, Bosque..."
9,76024,Queretaro,"El Cortijo II, Calesa 2a Sección, Peñitas, Con..."


In [202]:
#Review object shape
df_group2.shape

(132, 3)

In [203]:
geolocator = Nominatim(scheme='http', user_agent="biotecnologo.rendon@gmail.com")

for row_index, item in df_group2.iterrows():
    
    list1 = df_group2.loc[[row_index],['Postcode']].values.astype('str')
    loc = ' , Queretaro, Queretaro, Mexico'
    list1.astype('str')
    list1 = np.append(list1, loc)
    latitude = None
    longitude = None
    location = None
    
    location = geolocator.geocode(list1 , limit = 15)
    time.sleep(5)
    if(location is not None):
        df_group2.loc[df_group2.index[row_index], 'Latitude'] = location.latitude
        df_group2.loc[df_group2.index[row_index], 'Longitude'] = location.longitude

### review if all coodinates are complete

In [205]:
df_group2

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,"Santiago de Queretaro Centro, Centro",20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andres, Mariano Escobedo, Rincon...",20.595471,-100.397059
2,76006,Queretaro,Ciudad de Queretaro,20.612119,-100.385082
3,76007,Queretaro,Secretaria de Hacienda y Crédito Publico,20.612119,-100.385082
4,76008,Queretaro,Centro Sct Queretaro,20.612119,-100.385082
5,76009,Queretaro,Palacio de Gobierno Del Estado de Queretaro,20.612119,-100.385082
6,76010,Queretaro,"Las Campanas, Niños Héroes",20.589145,-100.410023
7,76017,Queretaro,"Circuito Universitario, Centro Universitario (...",20.612119,-100.385082
8,76020,Queretaro,"Pathe, Diligencias, Calesa, El Cortijo, Bosque...",20.595158,-100.380164
9,76024,Queretaro,"El Cortijo II, Calesa 2a Sección, Peñitas, Con...",20.60101,-100.369215


In [206]:
#complete missing values from www.googlemaps.com
#Postcode of Neighbourhood Cumbres Del Cimatario was incorrect replace for 76973

df_group2.at[14,'Latitude']=20.5978089
df_group2.at[14,'Longitude']=-100.389773
df_group2.at[17,'Latitude']=20.5791832
df_group2.at[17,'Longitude']=-100.401476
df_group2.at[18,'Latitude']=20.5819629
df_group2.at[18,'Longitude']=-100.401958 
df_group2.at[36,'Latitude']=20.5764601
df_group2.at[36,'Longitude']=-100.382234
df_group2.at[41,'Latitude']=20.6262225
df_group2.at[41,'Longitude']=-100.439922
df_group2.at[44,'Postcode']=76973
df_group2.at[44,'Latitude']=20.6262225
df_group2.at[44,'Longitude']=-100.439922
df_group2.at[86,'Latitude']=20.6144914
df_group2.at[86,'Longitude']=-100.390666
df_group2.at[129,'Latitude']=20.6780619
df_group2.at[129,'Longitude']=-100.386979

df_group2

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,"Santiago de Queretaro Centro, Centro",20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andres, Mariano Escobedo, Rincon...",20.595471,-100.397059
2,76006,Queretaro,Ciudad de Queretaro,20.612119,-100.385082
3,76007,Queretaro,Secretaria de Hacienda y Crédito Publico,20.612119,-100.385082
4,76008,Queretaro,Centro Sct Queretaro,20.612119,-100.385082
5,76009,Queretaro,Palacio de Gobierno Del Estado de Queretaro,20.612119,-100.385082
6,76010,Queretaro,"Las Campanas, Niños Héroes",20.589145,-100.410023
7,76017,Queretaro,"Circuito Universitario, Centro Universitario (...",20.612119,-100.385082
8,76020,Queretaro,"Pathe, Diligencias, Calesa, El Cortijo, Bosque...",20.595158,-100.380164
9,76024,Queretaro,"El Cortijo II, Calesa 2a Sección, Peñitas, Con...",20.60101,-100.369215


In [208]:
#save to csv in local file
df_group2.to_csv(r'/Users/javierrendon/Desktop/Queretaro_Coordinates.csv', index = None, header=True)
#UPDATE CSV FILE AND STORE IN GITHUB

In [209]:
df_group2.dtypes

Postcode           int64
Borough           object
Neighbourhood     object
Latitude         float64
Longitude        float64
dtype: object

### Get latitudes and longitudes of Queretaro City

In [211]:
address = 'Queretaro, QRO'

geolocator = Nominatim(user_agent="biotecnologo.rendon@gmail.com")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Queretaro are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Queretaro are 20.5954708, -100.3970593.


### Create a map of Queretaro with its neighbourhoods

In [212]:
Queretaro_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df_group2['Latitude'], df_group2['Longitude'], df_group2['Borough'], df_group2['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Queretaro_map)  
    
Queretaro_map

In [None]:
Queretaro_map.save('Queretaro.html')

In [None]:
#Postcode Boundaries are not shown on map, thus we need to install kml2geojson to tranform kml public avaliable data to geojson  
%pip install kml2geojson

In [None]:
#Convert kml postcode boundaries to geojson
import kml2geojson
kml2geojson.main.convert(r'/Users/javierrendon/Desktop/cp_qro/CP_22_Qro_v6.kml', r'/Users/javierrendon/Desktop/', separate_folders=False, style_type=None, style_filename='style.json')

In [None]:
# Download and store a geojson file of Queretaro
!wget --quiet https://github.com/Alexrendon/Capstone-Project-Notebook/blob/master/CP_Qro.geojson
qro_geo = r'CP_Qro.geojson'
print("geojson ready!")

In [None]:
# load GeoJSON
with open(r'CP_Qro.geojson') as jsonFile:
    data = json.load(jsonFile)
tmp = data 

#Remove Postcode not in our dataset
geozips = []
for i in range(len(qro_geo['features'])):
    if qro_geo['features'][i]['properties']['d_cp'] in list (df_group1['Postcode'].unique()):
        geozips.append(qro_geo['features'][i])

#creating a new JSON object
new_json = dict.fromkeys(['type','features'])
new_json['type'] = 'FeatureCollection'
new_json['features'] = geozips

#save JSON object as updated File
open("Updated_qro_cp.json", "w").write(
    json.dumps(new_json, sort_keys=True, indent=4, separators=(',', ': '))
)   
print("geojson ready!")

In [214]:
    
# Create a map of Queretaro with postcodes boundaries
map_queretaro = folium.Map(location=[latitude, longitude], zoom_start=12)

#upload postal codes boundaries files
folium.GeoJson(qro_geo, name='geojson').add_to(map_queretaro)

for lat, lng, borough, neighbourhood in zip(df_group2['Latitude'], df_group2['Longitude'], df_group2['Borough'], df_group2['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_queretaro) 

map_queretaro