## Introduction


Houston is the fourth largest city in the United States of America and one of the most diverse cities in the world.
When a person immigrates to the United States and lives in Houston for the first time, they must go through a process of adaptation and generally must begin to rebuild their life from scratch.
This process implies not having the privileges that other people have, such as having a car, having a credit history, among other things.
The data of the neighborhoods become more important at the moment that they provide information on the different amenities that do not present a significant cost of mobilization for their residents and where they consequently raise their quality of life.

## Problem

As Houston is a city with many neighborhoods, the data collected will focus on the number of amenities found in the different super neighborhoods. This will allow the person to be moved a better overview of the amenities available in a larger area, to determine which of these super neighborhoods offers more amenities and therefore the cost of mobilization is lower.
The amenities that will be considered will be those that can increase the quality of life of people such as: parks, restaurants, gyms, schools, supermarkets, banks, gas stations, clothing stores

## Data

The main source of data is the list of all the super neighborhoods in Houston with their respective latitudes and longitudes. These data were already extracted from different sources, mainly from the Houston city page where all the super neighborhoods are listed

In [12]:
#!pip install folium

In [13]:
import pandas as pd
import numpy as np
import requests

In [14]:

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
from geopy.geocoders import Nominatim
import folium

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_645feeffcf4c47488c03303855a15e7e = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='y7b5cDKgLX0EaeB2Yc1sJMxgqWq2WOwFkP_6ihBGJSbY',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_645feeffcf4c47488c03303855a15e7e.get_object(Bucket='courseracapstone-donotdelete-pr-px5cdgehlvg2lk',Key='HTX_SUPERNEIGHBORHOODS.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body)
df.head(90)


Unnamed: 0,NEIGHBORHOOD,LATITUDE,LONGITUDE
0,FOURTH WARD,29.759772,-95.384385
1,SECOND WARD,29.751404,-95.335659
2,DOWNTOWN,29.759464,-95.370603
3,CLINTON PARK TRI-COMMUNITY,29.747192,-95.269902
4,GREATER UPTOWN,29.745115,-95.465307
...,...,...,...
83,BRIAR FOREST,29.747880,-95.571911
84,NEARTOWN - MONTROSE,29.742768,-95.399474
85,MEMORIAL,29.772691,-95.575681
86,SPRING BRANCH WEST,29.790836,-95.546277


In [15]:
def get_coordinates_by_address(address,api_name):
    geolocator = Nominatim(user_agent=api_name)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude,longitude

def generate_map(df,df_latitude, df_longitude ,df_field ,map_color,map_fill_color,map_opacity):
    map_data = folium.Map(location=[latitude, longitude], zoom_start=11)
    for lat, lng, label in zip(df[df_latitude], df[df_longitude], df[df_field]):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat, lng],radius=5,popup=label,color=map_color,fill=True,fill_color=map_fill_color,fill_opacity=map_opacity,parse_html=False).add_to(map_data)
    return map_data

## Get Houston coordinates 

In [16]:
latitude,longitude= get_coordinates_by_address('Houston, Texas','address_explorer')
print('The geograpical coordinates of Houston area are {}, {}.'.format(latitude, longitude))


The geograpical coordinates of Houston area are 29.7589382, -95.3676974.


## Show Houston super neighborhoods

In [38]:
map_houston = generate_map(df ,'LATITUDE','LONGITUDE','NEIGHBORHOOD','blue','#3186cc',0.7)
map_houston

In [25]:
#@hidden_cell
# Define Foursquare Credentials
CLIENT_ID = 'RK0XLB0XDT2ZQUHMGDB1NSSPNL43ERCSMDPL4YG0MHFABZ31' # your Foursquare ID
CLIENT_SECRET = '4BXR0M20A4CECIMSGMVKYKPH3B0QZRQOSA1O5NFWZPL0VKM2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [26]:
def get_neighborhood_data(index,df):
    neighborhood_latitude = df.loc[index, 'LATITUDE'] 
    neighborhood_longitude = df.loc[index, 'LONGITUDE']
    neighborhood_name = df.loc[index, 'NEIGHBORHOOD']
    return neighborhood_latitude,neighborhood_longitude,neighborhood_name

def get_foursquare_api_url(LIMIT,radius,CLIENT_ID,CLIENT_SECRET,VERSION,neighborhood_latitude,neighborhood_longitude):
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
    return url

def get_foursquare_api_results(url):
    results = requests.get(url).json()
    return results



In [27]:
import json
from pandas.io.json import json_normalize
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
           
        # create the API request URL
        url = get_foursquare_api_url(100,2000,CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Diplay all Houston Venues by Neighborhood

In [29]:
houston_venues = getNearbyVenues(names=df['NEIGHBORHOOD'],
                                   latitudes=df['LATITUDE'],
                                   longitudes=df['LONGITUDE'],
                                   radius=1500
                                  )
houston_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,FOURTH WARD,29.759772,-95.384385,Eleanor Tinsley Park,29.761440,-95.379271,Park
1,FOURTH WARD,29.759772,-95.384385,Lucio's BYOB,29.758326,-95.385591,Bar
2,FOURTH WARD,29.759772,-95.384385,Buffalo Bayou Park,29.762068,-95.391626,Park
3,FOURTH WARD,29.759772,-95.384385,Paper Street Crossfit,29.757435,-95.385846,Gym
4,FOURTH WARD,29.759772,-95.384385,Buffalo Bayou Walk,29.762177,-95.375844,Trail
...,...,...,...,...,...,...,...
5098,SPRING BRANCH WEST,29.790836,-95.546277,Hotel Sorella CITYCENTRE,29.780198,-95.561398,Hotel
5099,ADDICKS PARK TEN,29.813039,-95.644582,Bill Archer Dog Park,29.817540,-95.647513,Dog Run
5100,ADDICKS PARK TEN,29.813039,-95.644582,Bear Creek Community Center,29.816189,-95.640549,Government Building
5101,ADDICKS PARK TEN,29.813039,-95.644582,Haunted Road,29.814805,-95.645856,Outdoors & Recreation


## Filter data by Venue Category 

In [30]:
wellness_venues = houston_venues[houston_venues['Venue Category'].astype(str).str.contains("Restaurant|Gym|Park|School|Trail|River|Hospital|Supermarket|Shop")] 


In [31]:
venues_by_neighborhood = wellness_venues[['Neighborhood','Venue']]

## List all the super neighborhoods with more than 20 venues 
Neighborhoods that have more than 20 venues in the United States will be classified as the most resourceful to live in and with the shortest mobilization time.

In [32]:
data = venues_by_neighborhood.groupby(['Neighborhood'], as_index=False).count()
data =data[data ['Venue'] > 20].sort_values(by = ['Venue'],ascending = False)
data.columns =data.columns.str.replace('Neighborhood','NEIGHBORHOOD')
data.reset_index()

Unnamed: 0,index,NEIGHBORHOOD,Venue
0,37,GULFTON,59
1,55,MID WEST,57
2,35,GREENWAY / UPPER KIRBY AREA,55
3,29,GREATER HEIGHTS,55
4,76,SPRING BRANCH WEST,52
5,53,MEMORIAL,49
6,60,NEARTOWN - MONTROSE,48
7,56,MIDTOWN,48
8,1,AFTON OAKS / RIVER OAKS AREA,48
9,83,WESTCHASE,47


## Now we can see in the map the best neighborhoods  

In [33]:
final_data=pd.merge(data, df, on='NEIGHBORHOOD')
all_best =generate_map(final_data ,'LATITUDE','LONGITUDE','NEIGHBORHOOD','red','#3186cc',0.7)
all_best

### Let's now look at the top ten neighborhoods

In [34]:
top10 =final_data.head(10)
top10

Unnamed: 0,NEIGHBORHOOD,Venue,LATITUDE,LONGITUDE
0,GULFTON,59,29.716575,-95.48077
1,MID WEST,57,29.737498,-95.513982
2,GREENWAY / UPPER KIRBY AREA,55,29.732596,-95.433009
3,GREATER HEIGHTS,55,29.796099,-95.399876
4,SPRING BRANCH WEST,52,29.790836,-95.546277
5,MEMORIAL,49,29.772691,-95.575681
6,NEARTOWN - MONTROSE,48,29.742768,-95.399474
7,MIDTOWN,48,29.740742,-95.375769
8,AFTON OAKS / RIVER OAKS AREA,48,29.74803,-95.435028
9,WESTCHASE,47,29.727917,-95.571579


In [35]:

best_neighborhoods = generate_map(top10 ,'LATITUDE','LONGITUDE','NEIGHBORHOOD','green','#3186cc',0.7)
best_neighborhoods

Finally, we can conclude that it is better for a person to live within the inner loop of Houston, being a good second option also the neighborhoods that are to the west of the inner loop but within the outer loop. If you want to live outside the outer loop, the best option is to live in the neighborhoods to the west.

If you wanted to do a more exhaustive investigation taking into account the budget that is needed to live in these neighborhoods, you could calculate the average budget that a person needs to live in each of these ten neighborhoods, in this way we could calculate which are the best neighborhoods according to a monthly salary.