# How to Choose the Best Location for your Medical Practice

<h2>Table of contents</h2>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="#Introduction">Introduction</a></li>
        <li><a href="#Objetive">Objetive</a></li>  
        <li><a href="#Background and significance">Background and significance</a></li> 
         <ol>
                <li><a href="#Healthcare System in Mexico">Healthcare System in Mexico</a></li>
                <li><a href="#How patients choose their practitioner">How patients choose their practitioner</a></li>
            </ol>
        <li><a href="#Design research and Methods">Design research and Methods</a></li> 
            <ol>
                <li><a href="#pre_processing">Pre-processing</a></li>
                <li><a href="#modeling">Modeling</a></li>
                <li><a href="#insights">Insights</a></li>
            </ol>
    </ul>
</div>
<br>
<hr>

## Introduction

When it comes to purchasing a home or investment property, it’s all about **location, location, location**. The same rule applies when you’re looking to buy or rent a space for your medical practice.
According to a July 2014 report by The Associated Press-NORC Center for Public Affairs Research, 50 percent of patients consider the location of medical practice when choosing a doctor. Another report similarly found that 70 percent of healthcare consumers deem location either critical or very important when selecting a provider or healthcare system.

## Objective 

The aim of this project is **find an optimal location for a Medical practice**. For most of Healthcare Professionals (HPC) is it always a headache find a suitable place to open practice or even change the actual facility location.   Scope of this project is help HCPs which Neighborhoods in **Queretaro city, México** are best suited for their business. Location of the facility will have a significant impact on the practice outcome, (adjacent and nearby shops and offices) play a very important role in building a positive early impression of the clinic.
We will use our data science techniques to detect the most promising neighborhoods based on criteria selected in background section. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Background and significance

### Healthcare System in Mexico 

Mexico has achieved universal health coverage and its public healthcare is acceptable for most Mexican residents. Despite this, the private healthcare sector has grown considerably and is driven by increasing disposable income, the growth of medical tourism, and ease of access to higher quality private healthcare services.
Mexico’s public healthcare operates through the Instituto Mexicano de Seguro Social (IMSS)  and Seguro Popular systems. These cover  patients for most medical services and prescription drugs. Those employed in Mexico are automatically enrolled in the IMSS system and their contribution to the scheme is deducted from their salary. Those who are not formally employed may voluntarily enrol in the IMSS system, in which case they will have to pay an annual contribution fee. People who cannot afford the IMSS system must enrol with the Seguro Popular system. Fees for the Seguro Popular system are charged on a sliding scale depending on a resident’s income. While public healthcare in Mexico is relatively good, the quality of services varies between hospitals. 
Most mexicans above mid income opt for private health care, which they finance through private health insurance. Although private hospitals are more expensive, they are better equipped, provide greater access to specialised procedures and generally provide higher quality care.

### How patients choose their practitioner 

How Long and How Far Do Adults Travel and Will Adults Travel for Primary Care?  Accordingly to Washington State Health Services adults are willing spend 28.4 minutes and travel a distance of 32 kilometers (1). This info will be taken in account to set the parameters.
Since there are lots of HCPs  in Queretaro we will try to detect locations that fulfill the next 5 points:

1. Demographics 

We need to define our demographic data such as population age, net income, education. 
We also have to consider whether or not the population is growing or declining, age is a demographic trait that can have several financial impacts on the doctor's office.– it is usually easier to break into newer communities than mature communities where you would have to take patients away from practitioners who have been in the area many years.
Another aspect to take into account is the economic level of the population. An area with a high rate of low-income residents will likely have more patients going to social security than doctors in the private sector.

2. Accessibility

The location choose for the medical practice must be accessible and convenient for patients. For example, a good rule of thumb is to choose a location within 20 minutes of the residential area you hope to serve.

When comparing locations, consider the availability and amount of parking. Free parking is always preferable. And aim for a location with a spacious entryway where elderly, injured or disabled patients can be dropped off and picked up without difficulty.

3. Competition

We need to determine how many providers are in the area, how big their practices are, and what their specialties are.
Finding a space that’s well-known as site for medical practitioners can work to our advantage since people are accustomed to traveling there. “It’s a lot easier to tap into an existing behavior than to create a behavior all by itself.”

4. Visibility
A location in a remote part of town might seem cost-effective, but having low visibility will mean spending more money on marketing to get patients in the door. “Think about marketing costs as part of the rent equation.”
A medical office that’s located on a major road or thoroughfare, or in a busy shopping center, can give you maximum visibility. 

5. Nearby Hospitals, Pharmacies and other business

Speaking of proximity to other businesses, medical practices benefit from operating close to places such as:
-Pharmacies & drug stores
-Hospitals
-Urgent care centers
-Fitness centers
Beyond the obvious convenience of locating close to the hospital, the clinic also will benefit from the patient perception that is located in a recognized healthcare area.
Let's think where are popular businesses, such as supermarkets and banks? The more popular businesses attract more potential clients. Also, upscale businesses attract upscale clients – think Starbucks.


## Design research and Methods

Based on definition of our problem, factors that will influence our decision are:

Demographics: Age and income.
Accessibility: Radius of 32 km from city center.
Competition: Number of proximity clinics.
Visibility: Proximity to principal avenues and from city center
Nearby Hospitals, Pharmacies and other business: Hospitals, pharmacies, restaurants, coffee shops, will be taken in account.


Following data sources will be needed to extract/generate the required information:
-Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using https://github.com/marioalbertodev/colonias-queretaro/blob/master/colonias.json
-Demographics will be obtained using INEGI (National Institute of Statistics and Geography) https://www.inegi.org.mx/app/indicadores/?t=0200&ag=22#D02000070. y en https://www.inegi.org.mx/programas/enigh/nc/2018/default.html. 
-Number of Practices, Hospitals, pharmacies, restaurants, coffee shops, and their type and location in every neighborhood will be obtained using Foursquare API.



### Import libraries
Lets first import the required libraries.
Also run <b> %matplotlib inline </b> since we will be plotting in this section.

In [1]:
#import libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


### Get an Clean data

In [2]:
#Read the Neighbourhoods list from URL CP= Postal Code, Localidad=Neighbourhood, Tipe= Industrial or residencial area,Municipio= Borough  
adresses = pd.read_json (r'https://raw.githubusercontent.com/marioalbertodev/colonias-queretaro/master/colonias.json')
adresses.head()

Unnamed: 0,cp,localidad,tipo,municipio
0,76118,10 de Abril,Fraccionamiento,Querétaro
1,76118,15 de Mayo,Fraccionamiento,Querétaro
2,76069,2 de Abril,Colonia,Querétaro
3,76118,5 de Febrero,Colonia,Querétaro
4,76069,8 de Diciembre,Colonia,Querétaro


In [27]:
#Export file to csv to review data quality
export_csv = adresses.to_csv (r'/Users/javierrendon/Desktop/Queretaro_CP.csv', index = None, header=True)

In [3]:
#TRANSLATE AND UPDATE CSV FILE AND STORE IN GITHUB
adresses1 = pd.read_csv ('https://raw.githubusercontent.com/Alexrendon/Capstone-Project-Notebook/master/Queretaro_CP.csv')
adresses1.head()

Unnamed: 0,CP,Neighbourhood,Zoning,Borough
0,76118,10 de Abril,Residential,Queretaro
1,76118,15 de Mayo,Residential,Queretaro
2,76069,2 de Abril,Residential,Queretaro
3,76118,5 de Febrero,Residential,Queretaro
4,76069,8 de Diciembre,Residential,Queretaro


In [4]:
#Review object shape
adresses1.shape

(875, 4)

In [5]:
#import geopy libraries
import time
from geopy.geocoders import Nominatim
from geopy.util import get_version
get_version()

'2.0.0'

In [6]:
#Rename CP to Postcode
adresses1.rename(columns={'CP':'Postcode'},inplace=True)
adresses1.head()

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough
0,76118,10 de Abril,Residential,Queretaro
1,76118,15 de Mayo,Residential,Queretaro
2,76069,2 de Abril,Residential,Queretaro
3,76118,5 de Febrero,Residential,Queretaro
4,76069,8 de Diciembre,Residential,Queretaro


In [7]:
#Convert to dataframe
df = pd.DataFrame(data = adresses1)

In [8]:
#Get postal codes
df['Postcode'].unique()

array([76118, 76069, 76226, 76026, 76040, 76160, 76140, 76148, 76134,
       76027, 76048, 76085, 76137, 76079, 76159, 76144, 76230, 76024,
       76125, 76089, 76120, 76237, 76087, 76020, 76113, 76225, 76147,
       76116, 76070, 76177, 76050, 76030, 76224, 76008, 76221, 76059,
       76017, 76006, 76093, 76168, 76190, 76170, 76117, 76090, 76158,
       76121, 76156, 76150, 76220, 76110, 76180, 76127, 76209, 76095,
       76114, 76146, 76178, 76115, 76228, 76187, 76046, 76047, 76229,
       76227, 76039, 76154, 76208, 76179, 76236, 76138, 76197, 76112,
       76086, 76122, 76036, 76100, 76165, 76164, 76057, 76130, 76139,
       76029, 76215, 76210, 76136, 76074, 76217, 76214, 76067, 76212,
       76219, 76025, 76239, 76166, 76010, 76186, 76049, 76218, 76060,
       76080, 76176, 76175, 76185, 76037, 76235, 76188, 76149, 76206,
       76038, 76211, 76009, 76213, 76063, 76233, 76223, 76099, 76169,
       76199, 76216, 76131, 76005, 76135, 76058, 76238, 76232, 76157,
       76000, 76234,

In [9]:
#sort the dataframe and drop its index
df = df.sort_values(by = ['Postcode'], ascending = True)
df.reset_index(inplace= True, drop = True)
df.head()

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough
0,76000,Santiago de Queretaro Centro,Residential,Queretaro
1,76000,Centro,Residential,Queretaro
2,76005,Rincón de San Andres,Residential,Queretaro
3,76005,Mariano Escobedo,Residential,Queretaro
4,76005,Rinconada Morelos,Residential,Queretaro


In [24]:
%pip install pgeocode

Collecting pgeocode
  Downloading pgeocode-0.2.1-py2.py3-none-any.whl (7.6 kB)
Installing collected packages: pgeocode
Successfully installed pgeocode-0.2.1
Note: you may need to restart the kernel to use updated packages.


### Get Coordinates from PostalCode

In [19]:
#Get Coordinates
from geopy import geocoders
g = geocoders.GoogleV3(api_key='')
from geopy.geocoders import Nominatim, GoogleV3
geolocator = Nominatim(user_agent="biotecnologo.rendon@gmail.com")

import time
start_time = time.time()

n=875
for j in range(n):
    print("row:",j)
    
for row_index, item in df.iterrows():
    
    list1 = df.loc[[row_index],['Postcode']].values.astype('str')
    loc = ' , Queretaro, Queretaro, Mexico'
    list1.astype('str')
    list1 = np.append(list1, loc)
    latitude = None
    longitude = None
    location = None
    geolocator = Nominatim(user_agent="biotecnologo.rendon@gmail.com")

    location = geolocator.geocode(list1 , limit = 15)
    time.sleep(5)
    if(location is not None):
        df.loc[df.index[row_index], 'Latitude'] = location.latitude
        df.loc[df.index[row_index], 'Longitude'] = location.longitude
df.head()        

row: 0
row: 1
row: 2
row: 3
row: 4
row: 5
row: 6
row: 7
row: 8
row: 9
row: 10
row: 11
row: 12
row: 13
row: 14
row: 15
row: 16
row: 17
row: 18
row: 19
row: 20
row: 21
row: 22
row: 23
row: 24
row: 25
row: 26
row: 27
row: 28
row: 29
row: 30
row: 31
row: 32
row: 33
row: 34
row: 35
row: 36
row: 37
row: 38
row: 39
row: 40
row: 41
row: 42
row: 43
row: 44
row: 45
row: 46
row: 47
row: 48
row: 49
row: 50
row: 51
row: 52
row: 53
row: 54
row: 55
row: 56
row: 57
row: 58
row: 59
row: 60
row: 61
row: 62
row: 63
row: 64
row: 65
row: 66
row: 67
row: 68
row: 69
row: 70
row: 71
row: 72
row: 73
row: 74
row: 75
row: 76
row: 77
row: 78
row: 79
row: 80
row: 81
row: 82
row: 83
row: 84
row: 85
row: 86
row: 87
row: 88
row: 89
row: 90
row: 91
row: 92
row: 93
row: 94
row: 95
row: 96
row: 97
row: 98
row: 99
row: 100
row: 101
row: 102
row: 103
row: 104
row: 105
row: 106
row: 107
row: 108
row: 109
row: 110
row: 111
row: 112
row: 113
row: 114
row: 115
row: 116
row: 117
row: 118
row: 119
row: 120
row: 121
row: 122
row

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough,Latitude,Longitude
0,76000,Santiago de Queretaro Centro,Residential,Queretaro,20.595471,-100.397059
1,76000,Centro,Residential,Queretaro,20.595471,-100.397059
2,76005,Rincón de San Andres,Residential,Queretaro,20.595471,-100.397059
3,76005,Mariano Escobedo,Residential,Queretaro,20.595471,-100.397059
4,76005,Rinconada Morelos,Residential,Queretaro,20.595471,-100.397059


In [20]:
#review if all coodinates are complete
df

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough,Latitude,Longitude
0,76000,Santiago de Queretaro Centro,Residential,Queretaro,20.595471,-100.397059
1,76000,Centro,Residential,Queretaro,20.595471,-100.397059
2,76005,Rincón de San Andres,Residential,Queretaro,20.595471,-100.397059
3,76005,Mariano Escobedo,Residential,Queretaro,20.595471,-100.397059
4,76005,Rinconada Morelos,Residential,Queretaro,20.595471,-100.397059
5,76005,Vicente Guerrero,Residential,Queretaro,20.595471,-100.397059
6,76006,Ciudad de Queretaro,Airport,Queretaro,20.612119,-100.385082
7,76007,Secretaria de Hacienda y Crédito Publico,Gran usuario,Queretaro,20.612119,-100.385082
8,76008,Centro Sct Queretaro,Residential,Queretaro,20.612119,-100.385082
9,76009,Palacio de Gobierno Del Estado de Queretaro,Gran usuario,Queretaro,20.612119,-100.385082


In [22]:
#save to csv in local file
df.to_csv(r'/Users/javierrendon/Desktop/Queretaro_Coordinates.csv', index = None, header=True)

In [23]:
#Sort data
geo_data=df[['Postcode','Borough','Neighbourhood','Zoning','Latitude','Longitude']]
geo_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Zoning,Latitude,Longitude
0,76000,Queretaro,Santiago de Queretaro Centro,Residential,20.595471,-100.397059
1,76000,Queretaro,Centro,Residential,20.595471,-100.397059
2,76005,Queretaro,Rincón de San Andres,Residential,20.595471,-100.397059
3,76005,Queretaro,Mariano Escobedo,Residential,20.595471,-100.397059
4,76005,Queretaro,Rinconada Morelos,Residential,20.595471,-100.397059


In [43]:
coordinates = pd.read_csv ('https://raw.githubusercontent.com/Alexrendon/Capstone-Project-Notebook/master/Queretaro_Coordinates.csv', encoding='utf-8')
coordinates.head()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 896: invalid start byte

In [41]:
#Convert to dataframe
df1 = pd.DataFrame(data = coordinates)

In [42]:
df1

Unnamed: 0,Postcode,Neighbourhood,Zoning,Borough,Latitude,Longitude
0,76000,Santiago de Queretaro Centro,Residential,Queretaro,20.595471,-100.3970593
1,76000,Centro,Residential,Queretaro,20.595471,-100.3970593
2,76005,Rincon de San Andres,Residential,Queretaro,20.595471,-100.3970593
3,76005,Mariano Escobedo,Residential,Queretaro,20.595471,-100.3970593
4,76005,Rinconada Morelos,Residential,Queretaro,20.595471,-100.3970593
5,76005,Vicente Guerrero,Residential,Queretaro,20.595471,-100.3970593
6,76006,Ciudad de Queretaro,Airport,Queretaro,20.612119,-100.3850819
7,76007,Secretaria de Hacienda y Credito Publico,Institutional,Queretaro,20.612119,-100.3850819
8,76008,Centro Sct Queretaro,Residential,Queretaro,20.612119,-100.3850819
9,76009,Palacio de Gobierno Del Estado de Queretaro,Institutional,Queretaro,20.612119,-100.3850819


### Get latitudes and longitudes of Queretaro City

In [25]:
address = 'Queretaro, QRO'

geolocator = Nominatim(user_agent="biotecnologo.rendon@gmail.com")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Queretaro are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Queretaro are 20.5954708, -100.3970593.


### Create a map of Queretaro with its neighbourhoods¶

In [29]:
Queretaro_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Queretaro_map)  
    
Queretaro_map

ValueError: Location values cannot contain NaNs, got:
[nan, nan]

In [33]:
income = pd.read_csv (r'/Users/javierrendon/Desktop/income.csv')
income.head()

Unnamed: 0,folioviv,foliohog,numren,clave,mes_1,mes_2,mes_3,mes_4,mes_5,mes_6,ing_1,ing_2,ing_3,ing_4,ing_5,ing_6,ing_tri
0,100013601,1,1,P032,9,8,7,6,5,4,3100,3100,3100,3100,3100,3100,9147.54
1,100013601,1,1,P001,9,8,7,6,5,4,10000,10000,10000,10000,10000,10000,29508.19
2,100013601,1,3,P001,9,8,7,6,5,4,8000,8000,8000,8000,8000,8000,23606.55
3,100013601,1,2,P044,9,8,7,6,5,4,0,1100,0,1100,0,1100,1622.95
4,100013602,1,2,P063,9,8,7,6,5,4,0,800,0,0,0,0,393.44
