# CAPSTONE PROJECT- THE BATTLE OF THE NEIGHBORHOODS

## Week 1

### Diego González C.

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('\nLibraries imported.')


Libraries imported.


### 1. A description of the problem and a discussion of the background.

Houston, Texas is one of the most important cities in the US. Its vast amout of food, sports and entertainment venues along with its medical and educational services make it a very attractive city to live. For this project I will make an analysis which will try to find the best neighborhoods to look for an apartment or house taking in consideration the following characteristics:
1. House / Apartments prices without exceeding $1,800 USD (this threshold may vary depending on the results found on the project)
2. Atractive venues such as restaurants, bars, cafes and museums nearby. 
3. Public transportation service nearby
4. Hospitals or health care services nearby
5. Education services nearby (priority on universities)

### Interested Audience

I will attempt to construct a methodology which can be replied in any city in the world. Any person who is considering moving to a new city want to find the best location posible for the best price. For this matter, this tool will hopefully be useful for a large amount of people. 

### 2. A description of the data and how it will be used to solve the problem.

For the most part, the data that will be used will be obtained from Foursquare location data. The information that will be gathered from foursquare will be: Since foursquare gives the data from the venues, public transportation, hospitals and education centers this will be our most important source. 

* Attractive venues (Restaurants, bars, museums, etc.)
* Public transportation venues
* Hospitals 
* Education centers

On the other hand, information about the prices of apartments in rent in Houston will be needed 

### How data will be used to solve the problem

* Houston Neighborhoods will be clustered in groups, which will include the top 10 attractive venues from such neighborhoods using Foursquare and our geopy tools. 
* On the same way, the location of the public transportation stations, hospitals and schools will be obtained from Foursquare and geopy. Such venues will be clustered separately. 
* On the other hand, data of department/house rental prices and locations will be searched in open data sources: Government agencies, real estate agencies, etc.
* All of the above information will be depicted on a map using Folium to have a visual representation for better decision making. 


### Our decision for best neighborhood to live will consider:

* Is the average rental price in the neighborhood below our previously stated budget ($1,800 USD)?
* Are there attractive venues nearby?
* Are there any patterns for the distribution of popular venues and public services across Houston?
* How do the housing prices relate to the pressence of schools, hospitals and venues nearby?
* Final recomendations from the information gathered. 

## INITIAL DATA FROM FOURSQUARE

In [54]:

address = 'Houston, TX'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of {} are {}, {}.'.format(address, latitude, longitude))



The geograpical coordinates of Houston, TX are 29.7589382, -95.3676974.


In [62]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: XXXXXXXXXXXXXXXXXXXXXX
CLIENT_SECRET: XXXXXXXXXXXXXXXXXXXXXX


In [56]:

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 5000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=WCU1ZCMQCCWCPEGMUS003YA4A5N0QWXRSUL0LSM4TVWO5O0U&client_secret=TWFVNVI05BPBMXXHGBR0GT2TLROCU3RVU0TBWCNHYJZJBSLA&v=20180605&ll=29.7589382,-95.3676974&radius=5000&limit=100'

In [57]:
results = requests.get(url).json()

In [58]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [59]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat','venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type,axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print(nearby_venues.head())


                                   name             categories        lat  \
0                         Alley Theatre                Theater  29.761671   
1  Hobby Center for the Performing Arts  Performing Arts Venue  29.761526   
2                Wortham Theater Center                Theater  29.763353   
3                          Conservatory            Beer Garden  29.760427   
4        Flying Saucer Draught Emporium               Beer Bar  29.759116   

         lng  
0 -95.365313  
1 -95.369376  
2 -95.365663  
3 -95.361570  
4 -95.363216  


In [60]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [61]:

# create map of Houston venues using latitude and longitude values
map_hou = folium.Map(location=[latitude, longitude], zoom_start=14)

# add markers to map
for lat, lng, label in zip(nearby_venues['lat'], nearby_venues['lng'], nearby_venues['name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill= True,
        fill_color='lightblue',
        fill_opacity=0.7,
    ).add_to(map_hou)  
    
map_hou