# How to Choose the Best Location for your Medical Practice

<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="#Introduction">Introduction</a></li>
        <li><a href="#Objetive">Objetive</a></li>  
        <li><a href="#Background and significance">Background and significance</a></li> 
         <ol>
                <li><a href="#Healthcare System in Mexico">Healthcare System in Mexico</a></li>
                <li><a href="#How patients choose their practitioner">How patients choose their practitioner</a></li>
        </ol>
        <li><a href="#Design research and Methods">Design research and Methods</a></li> 
            <ol>
                <li><a href="#pre_processing">Pre-processing</a></li>
                <li><a href="#modeling">Modeling</a></li>
                <li><a href="#insights">Insights</a></li>
            </ol>
    </ul>
</div>
<br>
<hr>

## Introduction

When it comes to purchasing a home or investment property, it’s all about **location, location, location**. The same rule applies when you’re looking to buy or rent a space for your medical practice.
According to a July 2014 report by The Associated Press-NORC Center for Public Affairs Research, 50 percent of patients consider the location of medical practice when choosing a doctor. Another report similarly found that 70 percent of healthcare consumers deem location either critical or very important when selecting a provider or healthcare system.

## Objective 

The aim of this project is **find an optimal location for a Medical practice**. For most of Healthcare Professionals (HPC) is it always a headache find a suitable place to open practice or even change the actual facility location.   Scope of this project is help HCPs which Neighborhoods in **Queretaro city, México** are best suited for their business. Location of the facility will have a significant impact on the practice outcome, (adjacent and nearby shops and offices) play a very important role in building a positive early impression of the clinic.
We will use our data science techniques to detect the most promising neighborhoods based on criteria selected in background section. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Background and significance

### Healthcare System in Mexico 

Mexico has achieved universal health coverage and its public healthcare is acceptable for most Mexican residents. Despite this, the private healthcare sector has grown considerably and is driven by increasing disposable income, the growth of medical tourism, and ease of access to higher quality private healthcare services.
Mexico’s public healthcare operates through the Instituto Mexicano de Seguro Social (IMSS)  and Seguro Popular systems. These cover  patients for most medical services and prescription drugs. Those employed in Mexico are automatically enrolled in the IMSS system and their contribution to the scheme is deducted from their salary. Those who are not formally employed may voluntarily enrol in the IMSS system, in which case they will have to pay an annual contribution fee. People who cannot afford the IMSS system must enrol with the Seguro Popular system. Fees for the Seguro Popular system are charged on a sliding scale depending on a resident’s income. While public healthcare in Mexico is relatively good, the quality of services varies between hospitals. 
Most mexicans above mid income opt for private health care, which they finance through private health insurance. Although private hospitals are more expensive, they are better equipped, provide greater access to specialised procedures and generally provide higher quality care.

### How patients choose their practitioner 

How Long and How Far Do Adults Travel and Will Adults Travel for Primary Care?  Accordingly to Washington State Health Services adults are willing spend 28.4 minutes and travel a distance of 32 kilometers (1). This info will be taken in account to set the parameters.
Since there are lots of HCPs  in Queretaro we will try to detect locations that fulfill the next 5 points:

1. Demographics 

We need to define our demographic data such as population age, net income, education. 
We also have to consider whether or not the population is growing or declining, age is a demographic trait that can have several financial impacts on the doctor's office.– it is usually easier to break into newer communities than mature communities where you would have to take patients away from practitioners who have been in the area many years.
Another aspect to take into account is the economic level of the population. An area with a high rate of low-income residents will likely have more patients going to social security than doctors in the private sector.

2. Accessibility

The location choose for the medical practice must be accessible and convenient for patients. For example, a good rule of thumb is to choose a location within 20 minutes of the residential area you hope to serve.

When comparing locations, consider the availability and amount of parking. Free parking is always preferable. And aim for a location with a spacious entryway where elderly, injured or disabled patients can be dropped off and picked up without difficulty.

3. Competition

We need to determine how many providers are in the area, how big their practices are, and what their specialties are.
Finding a space that’s well-known as site for medical practitioners can work to our advantage since people are accustomed to traveling there. “It’s a lot easier to tap into an existing behavior than to create a behavior all by itself.”

4. Visibility

A location in a remote part of town might seem cost-effective, but having low visibility will mean spending more money on marketing to get patients in the door. “Think about marketing costs as part of the rent equation.”
A medical office that’s located on a major road or thoroughfare, or in a busy shopping center, can give you maximum visibility. 


5. Nearby Hospitals, Pharmacies and other business

Speaking of proximity to other businesses, medical practices benefit from operating close to places such as:
-Pharmacies & drug stores
-Hospitals
-Urgent care centers
-Fitness centers
Beyond the obvious convenience of locating close to the hospital, the clinic also will benefit from the patient perception that is located in a recognized healthcare area.
Let's think where are popular businesses, such as supermarkets and banks? The more popular businesses attract more potential clients. Also, upscale businesses attract upscale clients – think Starbucks.


## Design research and Methods

Based on definition of our problem, factors that will influence our decision are:

Demographics: Age and income.
Accessibility: Radius of 32 km from city center.
Competition: Number of proximity clinics.
Visibility: Proximity to principal avenues and from city center
Nearby Hospitals, Pharmacies and other business: Hospitals, pharmacies, restaurants, coffee shops, will be taken in account.


Following data sources will be needed to extract/generate the required information:
-Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using https://github.com/marioalbertodev/colonias-queretaro/blob/master/colonias.json
-Demographics will be obtained using INEGI (National Institute of Statistics and Geography) https://www.inegi.org.mx/app/indicadores/?t=0200&ag=22#D02000070. y en https://www.inegi.org.mx/programas/enigh/nc/2018/default.html. 
-Number of Practices, Hospitals, pharmacies, restaurants, coffee shops, and their type and location in every neighborhood will be obtained using Foursquare API.


## Import libraries
Lets first import the required libraries.
Also run <b> %matplotlib inline </b> since we will be plotting in this section.

In [7]:
#import libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/javierrendon/opt/anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.8.4                |   py37hc8dfbb8_2         3.0 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following packages will be UPDATED:

  conda                                4.8.3-py37hc8dfbb8_1 --> 4.8.4-py37hc8dfbb8_2



Downloading and Extracting Packages
conda-4.8.4          | 3.0 MB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages alre

In [113]:
# Install GDAL and Geopandas
!apt install gdal-bin python-gdal python3-gdal --quiet
!pip install git+git://github.com/geopandas/geopandas.git --quiet
!pip install descartes --quiet
!pip install geopy
!pip install plotly_express
!pip install ipython-autotime

Unable to locate an executable at "/Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/bin/apt" (-1)
Unable to locate an executable at "/Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/bin/apt" (-1)
^C
Traceback (most recent call last):
  File "/Users/javierrendon/opt/anaconda3/bin/pip", line 11, in <module>
    sys.exit(main())
  File "/Users/javierrendon/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/main.py", line 73, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/Users/javierrendon/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/__init__.py", line 96, in create_command
    module = importlib.import_module(module_path)
  File "/Users/javierrendon/opt/anaconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
Collecting plotly_express
  Downloading pl

##  1. Download and Explore Dataset

#### Let's get the postal codes and Coordinates of all the Hospitals and Medical Centers from Queretaro, Mexico.

In [13]:
df = pd.read_csv(r'/Users/javierrendon/Desktop/Queretaro_MedicalCenters.csv')
df.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,20.595471,-100.397059,Laboratorio Corregidora,20.595621,-100.392677,Medical Center
1,Centro,20.595471,-100.397059,Farmacias del Ahorro,20.595793,-100.392745,Pharmacy
2,Centro,20.595471,-100.397059,Farmacias del Ahorro,20.592041,-100.391829,Pharmacy
3,Centro,20.595471,-100.397059,salud digna,20.586743,-100.395478,Medical Center
4,Centro,20.595471,-100.397059,Farmacia Dérmica,20.597282,-100.394133,Pharmacy


In [15]:
df.shape

(1624, 7)

In [19]:
#Convert to dataframe
MedicalVenues = pd.DataFrame(data = df)

In [23]:
MedicalVenues.dtypes

Neighbourhood               object
Neighbourhood Latitude     float64
Neighbourhood Longitude    float64
Venue                       object
Venue Latitude             float64
Venue Longitude            float64
Venue Category              object
dtype: object

## 1.1 Data Cleaning

### Preprocessing
Categories diferent from Hospital, Medical Center, Doctors Office, Dentist's Office, Emergency Room, Medical Lab, Mental Health Office, Rehab Center, Physical Therapist, Chiropractor', Nutritionist, Eye Doctor, must be drop from the data frame.

In [27]:
# Get names of indexes for which column Venue Category has Pharmacy value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Pharmacy' ].index
# Delete these row indexes from dataFrame
MedicalVenues.drop(indexNames , inplace=True)

In [70]:
MedicalVenues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,20.595471,-100.397059,Laboratorio Corregidora,20.595621,-100.392677,Medical Center
3,Centro,20.595471,-100.397059,salud digna,20.586743,-100.395478,Medical Center
6,Centro,20.595471,-100.397059,Sanatorio Margarita,20.592357,-100.395213,Hospital
7,Centro,20.595471,-100.397059,Secretaría De Salud,20.594394,-100.390262,Medical Center
8,Centro,20.595471,-100.397059,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital


In [35]:
#List unique values in the MedicalVenues['Venue Category'] column to delete 
MedicalVenues['Venue Category'].unique()

array(['Medical Center', 'Hospital', "Doctor's Office", 'Optical Shop',
       "Dentist's Office", 'Veterinarian', 'Emergency Room',
       'Medical Lab', 'Supplement Shop', 'Mental Health Office',
       'Rehab Center', 'Outdoor Supply Store', 'Spa', 'Drugstore',
       'Physical Therapist', 'Baby Store', 'Chiropractor', 'Park',
       'Nutritionist', 'Market', 'Eye Doctor'], dtype=object)

In [67]:
# Get names of indexes for which column Venue Category has Pharmacy value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Optical Shop' ].index
# Delete these row indexes from dataFrame
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Veterinarian value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Veterinarian' ].index
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Supplement Shop value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Supplement Shop' ].index
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Outdoor Supply Store value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Outdoor Supply Store' ].index
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Spa value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Spa' ].index
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Drugstore value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Drugstore' ].index
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Baby Store value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Baby Store' ].index
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Park value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Park' ].index
MedicalVenues.drop(indexNames , inplace=True)
# Get names of indexes for which column Venue Category has Market value 
indexNames = MedicalVenues[ MedicalVenues['Venue Category'] == 'Market' ].index
MedicalVenues.drop(indexNames , inplace=True)

In [71]:
MedicalVenues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,20.595471,-100.397059,Laboratorio Corregidora,20.595621,-100.392677,Medical Center
3,Centro,20.595471,-100.397059,salud digna,20.586743,-100.395478,Medical Center
6,Centro,20.595471,-100.397059,Sanatorio Margarita,20.592357,-100.395213,Hospital
7,Centro,20.595471,-100.397059,Secretaría De Salud,20.594394,-100.390262,Medical Center
8,Centro,20.595471,-100.397059,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital


In [69]:
#List unique values in the MedicalVenues['Venue Category']
MedicalVenues['Venue Category'].unique()

array(['Medical Center', 'Hospital', "Doctor's Office",
       "Dentist's Office", 'Emergency Room', 'Medical Lab',
       'Mental Health Office', 'Rehab Center', 'Physical Therapist',
       'Chiropractor', 'Nutritionist', 'Eye Doctor'], dtype=object)

#### We need to list all the unique Medical Centers and hospitals in the venue category in order to get rid of duplicates

In [73]:
#List unique values in the MedicalVenues['Venue Category']
MedicalVenues['Venue'].unique()

array(['Laboratorio Corregidora', 'salud digna', 'Sanatorio Margarita',
       'Secretaría De Salud', 'Sanatorio Santiago de Querétaro',
       'Fisioterapia', 'Hospital Santa Rosa de Viterbo',
       'Hospital del Sagrado Corazon', 'Sanatorio Mariano',
       'Policlínica Médica De Querétaro',
       'Centro de Salud Dr. Pedro Escobedo', 'Ópticas Devlyn',
       'Hospital Luis Martin', 'Sanatorio Alcocer Pozo',
       'Medicina Transfusional de Querétaro (Banco de Sangre)',
       'Clinica Medica Familiar ISSSTE Queretaro',
       'FisioINTEGRATE (Fisioterapia Y Rehabilitacion)',
       'ISSSTE Hospital General', 'SESEQ Adquisiciones',
       'Medicina transfusional de Querétaro',
       'Clinica Naturista Dra. Eliud', 'Medica 4', 'Clínica Dental Terán',
       'Odontologia Restauradora ( Dra. Eva Campos)', 'Dentista',
       'Sorriso Clínica Dental', 'DENTAL CENTER odotonlogia integral',
       'Dental Center RAV', 'ISSSTE', 'ISSSTE Hospital General Queretaro',
       'Isste', 'Coleg

In [74]:
# Delete multiple columns from the dataframe, Neighbourhood data is duplicated or overlaping.
MedicalVenues_ungrouped = MedicalVenues.drop(["Neighbourhood", "Neighbourhood Latitude", "Neighbourhood Longitude"], axis=1)

In [78]:
MedicalVenues_ungrouped.head() 

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center
3,salud digna,20.586743,-100.395478,Medical Center
6,Sanatorio Margarita,20.592357,-100.395213,Hospital
7,Secretaría De Salud,20.594394,-100.390262,Medical Center
8,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital


In [89]:
#First we need to be sure dataframe is free from duplicated values in Venue Latitude and Longitude Columns
dups_med = MedicalVenues_ungrouped.pivot_table(index=['Venue','Venue Latitude','Venue Longitude'], aggfunc='size')
print(dups_med)

Venue                                                                           Venue Latitude  Venue Longitude
A&K Medical                                                                     20.583556       -100.403562        5
ADMINISTRACIÓN MEDICA INDUSTRIAL                                                20.568166       -100.416976        3
ALFADENT                                                                        20.569128       -100.411788        4
Alfadent                                                                        20.615876       -100.389652        2
Aquaskin                                                                        20.632473       -100.354853        1
Area de Hospital Y Cirugias Del IMSS                                            20.582263       -100.404901        2
Atención Médica Especializada en Urgencias                                      20.577062       -100.376045        7
Atención Psicológica Integral                                        

In [91]:
Duplicated = MedicalVenues_ungrouped.groupby(['Venue Latitude', 'Venue Longitude'])['Venue'].value_counts().to_frame('count')
print(Duplicated)

                                                                                   count
Venue Latitude Venue Longitude Venue                                                    
20.548983      -100.383385     Centro Medico Cubano                                    1
20.557020      -100.388495     Dentista                                                1
20.557198      -100.381974     Centro de Salud Lázaro Cárdenas                         3
20.557365      -100.378091     Clínica Dental San Gabriel                              1
20.557791      -100.420296     Clínica De Fisioterapia UAQ                             1
20.558037      -100.384304     Clinica ISSSTE La Azteca                                1
20.558325      -100.381522     CONSULTORIO DENTAL KENIA MARIA SANCHEZ CARRASCO         1
20.558429      -100.397324     Clinicos EUREKA                                         1
20.558579      -100.417296     Laboratorios Chopo Constituyentes                       1
20.559641      -100.3

In [100]:
#Drop duplicated data
MedicalVenues_sorted = MedicalVenues_ungrouped.drop_duplicates()

In [106]:
MedicalVenues_sorted.reset_index(drop=True)

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center
1,salud digna,20.586743,-100.395478,Medical Center
2,Sanatorio Margarita,20.592357,-100.395213,Hospital
3,Secretaría De Salud,20.594394,-100.390262,Medical Center
4,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital
5,Fisioterapia,20.591962,-100.400661,Medical Center
6,Hospital Santa Rosa de Viterbo,20.588506,-100.397214,Hospital
7,Hospital del Sagrado Corazon,20.599134,-100.393373,Hospital
8,Sanatorio Mariano,20.590467,-100.39383,Medical Center
9,Policlínica Médica De Querétaro,20.591215,-100.392235,Medical Center


In [109]:
#save to csv in local file
MedicalVenues_sorted.to_csv(r'/Users/javierrendon/Desktop/MedicalCenters.csv', index = None, header=True)
#UPDATE CSV TO DETERMINE THE HEALTCARE SYSTEM (CUMBERSOME BUT NECESARY FOR THE ANALYSIS), UPLOAD FILE TO GITHUB

In [110]:
#DOWNLOAD FILE TO GITHUB
MedicalCenters = pd.read_csv (r'https://raw.githubusercontent.com/Alexrendon/Capstone-Project-Notebook/master/MedicalCenters.csv')
MedicalCenters.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private
1,salud digna,20.586743,-100.395478,Medical Center,Private
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private
3,Secretaría De Salud,20.594394,-100.390262,Goverment Center,Public
4,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private


In [111]:
MedicalCenters.shape

(288, 5)

In [112]:
MedicalCenters.dtypes

Venue               object
Venue Latitude     float64
Venue Longitude    float64
Venue Category      object
Health System       object
dtype: object

### Get Addresses and PostalCodes from Coordinates

In [121]:
MedicalCenters['geom'] =  MedicalCenters['Venue Latitude'].map(str)  + ',' + MedicalCenters['Venue Longitude'].map(str)
MedicalCenters.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System,geom
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private,"20.595621028703498,-100.392676878503"
1,salud digna,20.586743,-100.395478,Medical Center,Private,"20.5867425537626,-100.39547810202299"
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private,"20.592357387953104,-100.39521346802601"
3,Secretaría De Salud,20.594394,-100.390262,Goverment Center,Public,"20.5943935755244,-100.39026151674801"
4,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private,"20.5921186289838,-100.399937383702"


time: 19.5 ms


In [122]:
%load_ext autotime

import geopandas as gpd
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

import matplotlib.pyplot as plt
import plotly_express as px
import tqdm
from tqdm import tqdm
from tqdm._tqdm_notebook import tqdm_notebook
from tqdm import tqdm
import geocoder

locator = Nominatim(user_agent="myGeocoder", timeout=10)
rgeocode = RateLimiter(locator.reverse, min_delay_seconds=0.001)
rgeocode

tqdm.pandas()
MedicalCenters['Postcode'] = MedicalCenters['geom'].progress_apply(rgeocode)

MedicalCenters.head()

  from pandas import Panel
  0%|          | 0/288 [00:00<?, ?it/s]

The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime


100%|██████████| 288/288 [02:27<00:00,  1.95it/s]


Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System,geom,Postcode
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private,"20.595621028703498,-100.392676878503","(75, Avenida Corregidora, Centro, Delegación C..."
1,salud digna,20.586743,-100.395478,Medical Center,Private,"20.5867425537626,-100.39547810202299","(Multimundo, 15, Avenida Ignacio Zaragoza, Cen..."
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private,"20.592357387953104,-100.39521346802601","(9, Calle Vicente Guerrero, Centro, Delegación..."
3,Secretaría De Salud,20.594394,-100.390262,Goverment Center,Public,"20.5943935755244,-100.39026151674801","(51, Calle 16 de Septiembre, Barrio de la Cruz..."
4,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private,"20.5921186289838,-100.399937383702","(101, Calle Miguel Hidalgo, Centro, Delegación..."


time: 2min 27s


In [128]:
#review full address of first data in dataframe
MedicalCenters["Postcode"][0]

Location(75, Avenida Corregidora, Centro, Delegación Centro Histórico, Santiago de Querétaro, Municipio de Querétaro, Querétaro, 76000, México, (20.5955795, -100.39274225, 0.0))

time: 5.7 ms


In [131]:
#change name of column
MedicalCenters.rename(columns={"Postcode": "Address"}, inplace = True) 

time: 3.02 ms


In [132]:
MedicalCenters.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System,geom,Address
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private,"20.595621028703498,-100.392676878503","(75, Avenida Corregidora, Centro, Delegación C..."
1,salud digna,20.586743,-100.395478,Medical Center,Private,"20.5867425537626,-100.39547810202299","(Multimundo, 15, Avenida Ignacio Zaragoza, Cen..."
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private,"20.592357387953104,-100.39521346802601","(9, Calle Vicente Guerrero, Centro, Delegación..."
3,Secretaría De Salud,20.594394,-100.390262,Goverment Center,Public,"20.5943935755244,-100.39026151674801","(51, Calle 16 de Septiembre, Barrio de la Cruz..."
4,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private,"20.5921186289838,-100.399937383702","(101, Calle Miguel Hidalgo, Centro, Delegación..."


time: 35.1 ms


In [207]:
#change Address data from object to string
MedicalCenters['Address'] = MedicalCenters['Address'].astype('str') 

time: 2.38 ms


In [209]:
# get zipcode from full address by applying regex and append Postcode column
import re 
MedicalCenters['Postcode'] = MedicalCenters['Address'].str.extract(r"\b(\d{5})\b")

time: 3.97 ms


In [211]:
MedicalCenters.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System,geom,Address,Postcode
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private,"20.595621028703498,-100.392676878503","75, Avenida Corregidora, Centro, Delegación Ce...",76000
1,salud digna,20.586743,-100.395478,Medical Center,Private,"20.5867425537626,-100.39547810202299","Multimundo, 15, Avenida Ignacio Zaragoza, Cent...",76000
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private,"20.592357387953104,-100.39521346802601","9, Calle Vicente Guerrero, Centro, Delegación ...",76000
3,Secretaría De Salud,20.594394,-100.390262,Goverment Center,Public,"20.5943935755244,-100.39026151674801","51, Calle 16 de Septiembre, Barrio de la Cruz,...",76020
4,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private,"20.5921186289838,-100.399937383702","101, Calle Miguel Hidalgo, Centro, Delegación ...",76000


time: 19.8 ms


Some values may be empty or wrong in postal code column we need to fill this data manually at index 85, 178, 179, 180, 229, 267-276

In [216]:
#complete missing values from www.googlemaps.com

MedicalCenters.at[85,'Postcode']=76040
MedicalCenters.at[178,'Postcode']=76125
MedicalCenters.at[179,'Postcode']=76125
MedicalCenters.at[180,'Postcode']=76130
MedicalCenters.at[229,'Postcode']=76130
MedicalCenters.at[257,'Postcode']=76030
MedicalCenters.at[267,'Postcode']=76220
MedicalCenters.at[268,'Postcode']=76220
MedicalCenters.at[269,'Postcode']=76220
MedicalCenters.at[270,'Postcode']=76220
MedicalCenters.at[271,'Postcode']=76220
MedicalCenters.at[272,'Postcode']=76220
MedicalCenters.at[273,'Postcode']=76220
MedicalCenters.at[274,'Postcode']=76220
MedicalCenters.at[275,'Postcode']=76220
MedicalCenters.at[276,'Postcode']=76227

time: 4.43 ms


In [218]:
MedicalCenters.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System,geom,Address,Postcode
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private,"20.595621028703498,-100.392676878503","75, Avenida Corregidora, Centro, Delegación Ce...",76000
1,salud digna,20.586743,-100.395478,Medical Center,Private,"20.5867425537626,-100.39547810202299","Multimundo, 15, Avenida Ignacio Zaragoza, Cent...",76000
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private,"20.592357387953104,-100.39521346802601","9, Calle Vicente Guerrero, Centro, Delegación ...",76000
3,Secretaría De Salud,20.594394,-100.390262,Goverment Center,Public,"20.5943935755244,-100.39026151674801","51, Calle 16 de Septiembre, Barrio de la Cruz,...",76020
4,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private,"20.5921186289838,-100.399937383702","101, Calle Miguel Hidalgo, Centro, Delegación ...",76000


time: 40.4 ms


In [220]:
MedicalCenters.dtypes

Venue               object
Venue Latitude     float64
Venue Longitude    float64
Venue Category      object
Health System       object
geom                object
Address             object
Postcode            object
dtype: object

time: 5.46 ms


In [226]:
MedicalCenters.shape

(288, 8)

time: 4.45 ms


In [222]:
#change Postcode data from object to integer
MedicalCenters['Postcode'] = MedicalCenters['Postcode'].astype(str).astype(int)

time: 2.53 ms


#### We need to add Neigbourhoods accordingly to Postal code from previous work

In [219]:
#Upload Neighbourhood csv 
Neighbourhoods = pd.read_csv (r'https://raw.githubusercontent.com/Alexrendon/Capstone-Project-Notebook/master/Queretaro_Coordinates.csv')
Neighbourhoods.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,76000,Queretaro,Centro,20.595471,-100.397059
1,76005,Queretaro,"Rincón de San Andrés, Mariano Escobedo, Vicent...",20.595006,-100.39877
2,76010,Queretaro,"Las Campanas, Niños Héroes",20.585667,-100.407091
3,76017,Queretaro,Centro Universitario (U.A.Q.),20.592282,-100.409679
4,76020,Queretaro,"San Javier, Pathé, La Cruz, Jardines de Queret...",20.596568,-100.379079


time: 955 ms


In [221]:
Neighbourhoods.dtypes

Postcode           int64
Borough           object
Neighbourhood     object
Latitude         float64
Longitude        float64
dtype: object

time: 6.59 ms


In [229]:
#Get postal codes
MedicalCenters['Postcode'].unique()

array([76000, 76020, 76164, 76010, 76168, 76005, 76030, 76070, 76154,
       76176, 76178, 76024, 76058, 76025, 76050, 76060, 76040, 76057,
       76046, 76160, 76096, 76080, 76190, 76099, 76047, 76090, 76049,
       76069, 76048, 76074, 76087, 76230, 76085, 76100, 76113, 76114,
       76115, 76116, 76120, 76125, 76130, 76127, 76134, 76930, 76140,
       76147, 76128, 76149, 76157, 76150, 76158, 76180, 76170, 76185,
       76901, 76220, 76227])

time: 5.38 ms


In [230]:
#Get postal codes
Neighbourhoods['Postcode'].unique()

array([76000, 76005, 76010, 76017, 76020, 76024, 76025, 76026, 76027,
       76028, 76030, 76036, 76037, 76040, 76046, 76047, 76048, 76049,
       76050, 76057, 76058, 76059, 76060, 76063, 76067, 76069, 76070,
       76074, 76078, 76079, 76080, 76085, 76086, 76087, 76090, 76093,
       76099, 76100, 76110, 76113, 76114, 76115, 76116, 76117, 76118,
       76120, 76121, 76125, 76127, 76130, 76134, 76135, 76136, 76137,
       76138, 76139, 76140, 76144, 76146, 76147, 76148, 76149, 76150,
       76154, 76155, 76156, 76157, 76158, 76159, 76160, 76164, 76165,
       76166, 76168, 76169, 76170, 76175, 76176, 76177, 76178, 76179,
       76180, 76185, 76190, 76197, 76199, 76210, 76211, 76212, 76213,
       76214, 76215, 76216, 76217, 76218, 76219, 76220, 76221, 76223,
       76224, 76225, 76226, 76227, 76228, 76229, 76230, 76233, 76234,
       76235, 76237, 76238])

time: 5.1 ms


In [232]:
Medical_Neighbourhoods = pd.merge(left=MedicalCenters, right=Neighbourhoods, left_on='Postcode', right_on='Postcode')
Medical_Neighbourhoods.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System,geom,Address,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private,"20.595621028703498,-100.392676878503","75, Avenida Corregidora, Centro, Delegación Ce...",76000,Queretaro,Centro,20.595471,-100.397059
1,salud digna,20.586743,-100.395478,Medical Center,Private,"20.5867425537626,-100.39547810202299","Multimundo, 15, Avenida Ignacio Zaragoza, Cent...",76000,Queretaro,Centro,20.595471,-100.397059
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private,"20.592357387953104,-100.39521346802601","9, Calle Vicente Guerrero, Centro, Delegación ...",76000,Queretaro,Centro,20.595471,-100.397059
3,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private,"20.5921186289838,-100.399937383702","101, Calle Miguel Hidalgo, Centro, Delegación ...",76000,Queretaro,Centro,20.595471,-100.397059
4,Fisioterapia,20.591962,-100.400661,Medical Center,Private,"20.591962,-100.400661","51, Calle Nicolás Campa, Centro, Delegación Ce...",76000,Queretaro,Centro,20.595471,-100.397059


time: 36 ms


In [233]:
Medical_Neighbourhoods.shape

(275, 12)

time: 5.14 ms


In [234]:
Medical_Neighbourhoods.dtypes

Venue               object
Venue Latitude     float64
Venue Longitude    float64
Venue Category      object
Health System       object
geom                object
Address             object
Postcode             int64
Borough             object
Neighbourhood       object
Latitude           float64
Longitude          float64
dtype: object

time: 7.21 ms


In [237]:
# Delete multiple columns from the dataframe, Neighbourhood data is duplicated or overlaping.
Medical_Neighbourhoods1 = Medical_Neigbourhoods.drop(["Latitude", "Longitude"], axis=1)
Medical_Neighbourhoods1.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Health System,geom,Address,Postcode,Borough,Neighbourhood
0,Laboratorio Corregidora,20.595621,-100.392677,Medical Center,Private,"20.595621028703498,-100.392676878503","75, Avenida Corregidora, Centro, Delegación Ce...",76000,Queretaro,Centro
1,salud digna,20.586743,-100.395478,Medical Center,Private,"20.5867425537626,-100.39547810202299","Multimundo, 15, Avenida Ignacio Zaragoza, Cent...",76000,Queretaro,Centro
2,Sanatorio Margarita,20.592357,-100.395213,Hospital,Private,"20.592357387953104,-100.39521346802601","9, Calle Vicente Guerrero, Centro, Delegación ...",76000,Queretaro,Centro
3,Sanatorio Santiago de Querétaro,20.592119,-100.399937,Hospital,Private,"20.5921186289838,-100.399937383702","101, Calle Miguel Hidalgo, Centro, Delegación ...",76000,Queretaro,Centro
4,Fisioterapia,20.591962,-100.400661,Medical Center,Private,"20.591962,-100.400661","51, Calle Nicolás Campa, Centro, Delegación Ce...",76000,Queretaro,Centro


time: 20.4 ms
