# Capstone Project - The Battle of Neighborhoods

### By Fidel Navarro

## Introduction

#### This project's purpose is to help people find and choose healthcare centers based on their location, availability, user experience and necessities in a big and busy city like Mexico City (CDMX).

#### Large cities such as CDMX where the population is approximately 9 million inhabitants (one of the most inhabited cities in the world) may have traffic and access problems to people in public areas. Furthermore, with the increasing demand of health services, these can easily exceed their attention capacities. For this reason, this project seeks yo solve these problems, creating a program that allows viewing of the nearest health centers with adequate services and capacity.

#### Thie proyect will work with a dataset of Hospitals and Health Centers provided by the goverment from Mexicco City.

#### The file can be optained from:

https://datos.cdmx.gob.mx/explore/dataset/hospitales-y-centros-de-salud/export/

In [1]:
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

df = pd.read_csv("hospitales-y-centros-de-salud.csv")
df.head()

Unnamed: 0,Nombre,Titular,Latitud,Longitud,Coordenadas,Geopoint
0,Hospital General La Villa,Director: Dr. Enrique Garduño Salvador Direcci...,19.480774,-99.103371,"-99.103371,19.480774,0.000000","19.480774,-99.103371"
1,Hospital General Milpa Alta,Dirección: Dr.Benjamín Ortega Romero Dirección...,19.200199,-99.011253,"-99.011253,19.200199,0.000000","19.200199,-99.011253"
2,Hospital General Ticomán,Director: Dr. Carlos Vazquez Noriega Dirección...,19.514547,-99.138245,"-99.138245,19.514547,0.000000","19.514547,-99.138245"
3,Hospital General Dr. Rub?n Le?ero,Director: Dr. José Alfredo Jiménez Douglas Dir...,19.450987,-99.169189,"-99.169189,19.450987,0.000000","19.450987,-99.169189"
4,Hospital General Iztapalapa C.E.E.,Director: Dr. Benjamín Méndez Pinto Dirección:...,19.343451,-99.027863,"-99.027863,19.343451,0.000000","19.343451,-99.027863"


In [2]:
df.shape

(27, 6)

#### The dataset has 27 registered hospitals and healthcare centers available in Mexico City.

#### It contains the following columns: 

##### 1. Nombre (Name)
##### 2. Titular (Person in charge)
##### 3. Latitud (Latitude)
##### 4.Longitud (Longitude)
##### 5.Coordenadas (coordinates)
##### 6.Geopoint

#### Visualization of the 27 hospitals using folium.

In [3]:
CDMX = folium.Map(location=[19.42847,-99.12766],zoom_start=11)

hosps = folium.map.FeatureGroup()

for lat, lng, in zip(df.Latitud, df.Longitud):
    folium.Marker([lat, lng]).add_to(CDMX)

CDMX

#### Define foursquare credentials.

In [4]:
CLIENT_ID = "BV3XE0TTHY0NP3QORDSPTLSYFZ05U5BJJMGGP5YENTY50DA2" 
CLIENT_SECRET = "5041RQQJFMWBWV3SLJILREFT2225HTVC3EQBAZZ2ZLQXYNRJ"
VERSION = '20180604'
LIMIT = 30

#### Define a function that transforms an address to geographic coördinates.

In [5]:
def geo(address):
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return (latitude, longitude)

def geo_add(lat, lng):
    geolocator = Nominatim(user_agent="foursquare_agent")
    lat = str(lat)
    lng = str(lng)
    coord = lat+","+lng
    location = geolocator.reverse(coord)
    return(location.address)

#### Transform an address from CDMX using the previous defined function.

In [6]:
address = "Avenida Frontera, Batán Barrio Viejo, Álvaro Obregón, Ciudad de México, 01080, México"
latitude, longitude = geo(address)
print(latitude, ",", longitude)

19.3417075 , -99.197798


#### Search for Hospitals near our desired address.

In [7]:
search_query = 'Hospital'
radius = 500
LIMIT = 100

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=BV3XE0TTHY0NP3QORDSPTLSYFZ05U5BJJMGGP5YENTY50DA2&client_secret=5041RQQJFMWBWV3SLJILREFT2225HTVC3EQBAZZ2ZLQXYNRJ&ll=19.3417075,-99.197798&v=20180604&query=Hospital&radius=500&limit=100'

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f04ec000ff35d2a393975e3'},
 'response': {'venues': [{'id': '4d6585b03384a0933e51a83c',
    'name': 'Hospital San Angel Inn Sur',
    'location': {'address': 'Av. México No. 2, Col. Tizapán San Angel',
     'crossStreet': 'Guerrero',
     'lat': 19.340471795284504,
     'lng': -99.19989652515093,
     'labeledLatLngs': [{'label': 'display',
       'lat': 19.340471795284504,
       'lng': -99.19989652515093}],
     'distance': 259,
     'postalCode': '01080',
     'cc': 'MX',
     'city': 'Ciudad de México',
     'state': 'Distrito Federal',
     'country': 'México',
     'formattedAddress': ['Av. México No. 2, Col. Tizapán San Angel (Guerrero)',
      '01080 Ciudad de México, Distrito Federal',
      'México']},
    'categories': [{'id': '4bf58dd8d48988d196941735',
      'name': 'Hospital',
      'pluralName': 'Hospitals',
      'shortName': 'Hospital',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/medical_',
       'suff

#### Get the relevant part of JSON and transform it into a pandas dataframe.

In [9]:
hosp = results["response"]["venues"]
dataframe = json_normalize(hosp)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress
0,4d6585b03384a0933e51a83c,Hospital San Angel Inn Sur,"[{'id': '4bf58dd8d48988d196941735', 'name': 'H...",v-1594158050,False,"Av. México No. 2, Col. Tizapán San Angel",Guerrero,19.340472,-99.199897,"[{'label': 'display', 'lat': 19.34047179528450...",259,1080.0,MX,Ciudad de México,Distrito Federal,México,"[Av. México No. 2, Col. Tizapán San Angel (Gue..."
1,5022edc3e4b07c69f51c736e,Hospital Vicentino y Casa Hogar,"[{'id': '4bf58dd8d48988d196941735', 'name': 'H...",v-1594158050,False,,,19.339441,-99.199837,"[{'label': 'display', 'lat': 19.33944091517003...",330,,MX,,México,México,"[México, México]"
2,53dd0439498e6dfaddc4365c,"HOSPITAL ANGELES PEDREGAL, DR MIRANDA","[{'id': '4bf58dd8d48988d177941735', 'name': 'D...",v-1594158050,False,,,19.341569,-99.198837,"[{'label': 'display', 'lat': 19.34156853799819...",110,,MX,,,México,[México]
3,4eb57c76775b544c275148c1,Hospital Medicos Especialistas,"[{'id': '4bf58dd8d48988d177941735', 'name': 'D...",v-1594158050,False,Morelos,Veracruz,19.341809,-99.201051,"[{'label': 'display', 'lat': 19.341809, 'lng':...",341,,MX,Ciudad de México,Distrito Federal,México,"[Morelos (Veracruz), Ciudad de México, Distrit..."


#### Define information of interest.

In [10]:
filtered_columns = ["id", "name", "location.lat", "location.lng", "location.distance"]
hosp_filtered = dataframe.loc[:, filtered_columns]
hosp_filtered.rename(columns={"location.lat" : "lat", "location.lng" : "lng", "location.distance" : "distance"}, inplace=True)
hosp_filtered.sort_values(by=["distance"], inplace=True)
hosp_filtered.reset_index(drop=True, inplace=True)
hosp_filtered

Unnamed: 0,id,name,lat,lng,distance
0,53dd0439498e6dfaddc4365c,"HOSPITAL ANGELES PEDREGAL, DR MIRANDA",19.341569,-99.198837,110
1,4d6585b03384a0933e51a83c,Hospital San Angel Inn Sur,19.340472,-99.199897,259
2,5022edc3e4b07c69f51c736e,Hospital Vicentino y Casa Hogar,19.339441,-99.199837,330
3,4eb57c76775b544c275148c1,Hospital Medicos Especialistas,19.341809,-99.201051,341


#### Nearby hospitals.

In [11]:
hosp_filtered.name

0    HOSPITAL ANGELES PEDREGAL, DR MIRANDA
1               Hospital San Angel Inn Sur
2          Hospital Vicentino y Casa Hogar
3           Hospital Medicos Especialistas
Name: name, dtype: object

In [13]:
my_map = folium.Map(location=[latitude, longitude], zoom_start=18)

folium.CircleMarker([latitude, longitude], color="red", fill = True, fill_color = "red", fill_opacity=0.6).add_to(my_map)

for lat, lng, in zip(hosp_filtered.lat, hosp_filtered.lng):
    folium.Marker([lat, lng]).add_to(my_map)

my_map

#### Get the hospitals' overall rating

In [14]:
rating=[]
tips=[]
for i in range(0, len(hosp_filtered)):
    hosp_id = hosp_filtered["id"][i]
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(hosp_id, CLIENT_ID, CLIENT_SECRET, VERSION)

    result = requests.get(url).json()
    try:
        rating.append(result['response']['venue']['rating'])
    except:
        rating.append(np.nan)
    tips.append(result['response']['venue']['tips']['count'])

#### None but one of the hospitals have been rated

In [15]:
rating

[nan, nan, nan, nan]

#### Lest visualize the number of tips

In [16]:
tips

[0, 38, 2, 0]

#### From the tips we know that the closest and farthest hospitals have not received any tips; mean while, the 2nd and 3rd closest hospitals have 38 and 2 tips accordingly.

#### Since our porpouse is to determine the best hospital or health center we can start analyzing the second hospital since it shows a bigger popularity.

#### Lets add a tips and rating column to the dataframe.

In [17]:
hosp_filtered["numb_tips"] = tips
hosp_filtered["rating"] = rating
hosp_filtered

Unnamed: 0,id,name,lat,lng,distance,numb_tips,rating
0,53dd0439498e6dfaddc4365c,"HOSPITAL ANGELES PEDREGAL, DR MIRANDA",19.341569,-99.198837,110,0,
1,4d6585b03384a0933e51a83c,Hospital San Angel Inn Sur,19.340472,-99.199897,259,38,
2,5022edc3e4b07c69f51c736e,Hospital Vicentino y Casa Hogar,19.339441,-99.199837,330,2,
3,4eb57c76775b544c275148c1,Hospital Medicos Especialistas,19.341809,-99.201051,341,0,


#### Now lets get the tip with the greatest number of agree counts.

In [202]:
limit = 15 
tip_id = hosp_filtered["id"][hosp_filtered["numb_tips"].idxmax()]
url = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(tip_id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)

results = requests.get(url).json()

tips = results['response']['tips']['items']
tip = results['response']['tips']['items'][0]

pd.set_option('display.max_colwidth', -1)

tips_df = json_normalize(tips) 

filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName', 'user.lastName', 'user.id'] # 'user.gender',
tips_filtered = tips_df.loc[:, filtered_columns]

tips_filtered

Unnamed: 0,text,agreeCount,disagreeCount,id,user.firstName,user.lastName,user.id
0,El hospital esta en excelentes condiciones. Lo único es que ofrecen paquetes para quirofano a un precio y resulta al final que sale mas caro una prueba de sangre que la misma operación. ¡Cuidado!,0,0,546199b4498e2f85cc5a0c98,Paola,R,55141140


#### Since there is no much information on the hospital on these area, we can only recommend a hospital based on its popularity (number of tips).

In [142]:
print("The recommended hospital is:",hosp_filtered["name"][hosp_filtered["numb_tips"].idxmax()])
print ("Address:",geo_add(hosp_filtered["lat"][hosp_filtered["numb_tips"].idxmax()],
       hosp_filtered["lng"][hosp_filtered["numb_tips"].idxmax()]))

The recommended hospital is: Hospital San Angel Inn Sur
Address: Calle México, Tizapán, Álvaro Obregón, Ciudad de México, 01090, México


#### Now let's make the same analysis for a different address. But this time lets automate the process.

#### We define a new function that will automate this process. The function is made using the previous lines of code from this notebook.

In [22]:
def hospital(address):
    latitude, longitude = geo(address)
    search_query = 'Hospital'
    radius = 500
    LIMIT = 100

    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    
    hosp = results["response"]["venues"]
    dataframe = json_normalize(hosp)
    
    filtered_columns = ["id", "name", "location.lat", "location.lng", "location.distance"]
    hosp_filtered = dataframe.loc[:, filtered_columns]
    hosp_filtered.rename(columns={"location.lat" : "lat", "location.lng" : "lng", "location.distance" : "distance"}, inplace=True)
    hosp_filtered.sort_values(by=["distance"], inplace=True)
    hosp_filtered.reset_index(drop=True, inplace=True)
    
    #print(hosp_filtered.name)
    
    rating=[]
    tips=[]
    for i in range(0, len(hosp_filtered)):
        hosp_id = hosp_filtered["id"][i]
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(hosp_id, CLIENT_ID, CLIENT_SECRET, VERSION)

        result = requests.get(url).json()
        try:
            rating.append(result['response']['venue']['rating'])
        except:
            rating.append(np.nan)
        tips.append(result['response']['venue']['tips']['count'])
        
    hosp_filtered["numb_tips"] = tips
    hosp_filtered["rating"] = rating   
    
    limit = 15 
    tip_id = hosp_filtered["id"][hosp_filtered["numb_tips"].idxmax()]
    url = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&v={}&limit={}'.format(tip_id, CLIENT_ID, CLIENT_SECRET, VERSION, limit)

    results = requests.get(url).json()

    tips = results['response']['tips']['items']
    tip = results['response']['tips']['items'][0]

    pd.set_option('display.max_colwidth', -1)

    tips_df = json_normalize(tips) 

    filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName', 'user.lastName', 'user.id'] # 'user.gender',
    tips_filtered = tips_df.loc[:, filtered_columns]
    
    txt = tips_filtered["text"][0]

    #print("The recommended hospital is:",hosp_filtered["name"][hosp_filtered["numb_tips"].idxmax()])
    #print ("Address:",geo_add(hosp_filtered["lat"][hosp_filtered["numb_tips"].idxmax()],
       #hosp_filtered["lng"][hosp_filtered["numb_tips"].idxmax()]))
    hosp = "The recommended hospital is: " + hosp_filtered["name"][hosp_filtered["numb_tips"].idxmax()]
    add = "Address: " + geo_add(hosp_filtered["lat"][hosp_filtered["numb_tips"].idxmax()],
                  hosp_filtered["lng"][hosp_filtered["numb_tips"].idxmax()])
    
    my_map = folium.Map(location=[latitude, longitude], zoom_start=17)

    folium.CircleMarker([latitude, longitude], color="red", fill = True, fill_color = "red", fill_opacity=0.6).add_to(my_map)

    folium.Marker([hosp_filtered["lat"][hosp_filtered["numb_tips"].idxmax()], hosp_filtered["lng"][hosp_filtered["numb_tips"].idxmax()]]).add_to(my_map)

    my_map
    
    return my_map, txt, hosp, add
    

#### We send the address we are interested on. 

In [23]:
address = 'Calle Norte 1, Isidro Fabela, Pedregal de Tepepan, Tlalpan, Ciudad de México, 14030, México'

#### We call the function, this will return:
#### 1. A map with our current location (in red) and the location of the recommended hospital. 
#### 2. A string with a tip of the hospital.
#### 3. The hospital name.
#### 4. The hospital address.

In [24]:
my_map, txt, hosp, add = hospital(address)

In [26]:
print(hosp)
print(add)
print("Tip:", txt)
my_map

The recommended hospital is: Hospital Sedna
Address: Hospital Sedna, Lateral Periférico, Bosques de Tetlameya, Coyoacán, Ciudad de México, 14388, México
Tip: En general el hospital es buenísimo, la atención, las enfermeras todo me encantó ahora que nació mi bebé, incluso la cesárea no me dolió nada! Mis respetos al dr. Ramírez Castro
