# Capstone Final Project - Battle of the Neighborhoods

### Applied Data Science Capstone IBM/Coursera Course

## Table of Contents

 * [Introduction](#introduction)
 * [Data](#data)
 * [Methods](#methods)
 * [Analysis](#analysis)
 * [Results and Discussion](#results)
 * [Conclusion](#conclusion)

# Introduction <a name="introduction"></a>
The idea of this study is to clearly settle, once and for all, which city is best: San Francisco or New York City. I will frame this study by exploring which city gives "more bang for your buck". I will answer this question by comparing median rent prices of neighborhoods in each city (the buck), and venues that reside in these neighborhoods (the bang). This study will give clarity to individuals in both cities that are considering moving to the other. Additionally, it will inform current residents of San Francisco and New York City, of other neighborhoods within their own city they could potentially move to whether they want to pay less in rent or live near certain venues. Examples of this could be a family wanting to move to an area with more parks and playgrounds, a coffee enthusiast wanting to live near the most coffee shops as possible, or your average person looking to pay less rent in a neighborhood comparable to their current one.

# Data <a name="data"></a>


# Methods <a name="methods"></a>
## Cleaning Datasets

#### Import packages to clean neighborhood and transportation data.

In [6]:
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

print('Libraries imported.')

Libraries imported.


#### Defining file paths so each csv can be read into a Pandas dataframe

In [7]:
#sf rent/location
file_path = 'https://raw.githubusercontent.com/d-alvear/Coursera_Capstone/master/data/medianrent_1B_SF.csv'
#sf muni stops
file_path1 = 'https://raw.githubusercontent.com/d-alvear/Coursera_Capstone/master/data/Muni_Stops.csv'

#nyc rent/location
file_path2 = 'https://raw.githubusercontent.com/d-alvear/Coursera_Capstone/master/data/medianrent_1B_NYC.csv'
#nyc subway stations
file_path3 = 'https://raw.githubusercontent.com/d-alvear/Coursera_Capstone/master/data/NYC_SUBWAY.csv'

#### NYC and SF datasets only need columns to be renamed

In [8]:
df_sfRent = pd.read_csv(file_path)
df_sfRent.head()
#dataframe looks good, just rename some of the columns
df_sfRent.rename(columns={'MedianRent':'Rent', 'Lat':'Latitude', 'Long':'Longitude'}, inplace = True)

In [9]:
df_nycRent = pd.read_csv(file_path2)
df_nycRent.head()
#dataframe looks good, just rename some of the columns
df_nycRent.rename(columns={'MedianRent':'Rent', 'Lat':'Latitude', 'Long':'Longitude'}, inplace = True)

#### Now I will clean the transportation data for each city

In [10]:
df_sfMuni = pd.read_csv(file_path1)

df_sfMuni.drop(['OBJECTID','SIGNUPID','TRAPEZESTOPABBR',
                'RUCUSSTOPABBR','STOPID','ACCESSIBILITYMASK',
                'ATSTREET','ONSTREET','POSITION','ORIENTATION',
                'SERVICEPLANNINGSTOPTYPE','SHELTER','INSERT_TIMESTAMP', 
                'SDE_ID','SUPERVISOR_DISTRICT','point'], axis=1, inplace=True)

df_sfMuni.rename(columns={'STOPNAME':'Stop Name', 'LATITUDE':'Latitude', 'LONGITUDE':'Longitude'}, inplace = True)
df_sfMuni.head()

Unnamed: 0,Stop Name,Latitude,Longitude
0,Beale St & Howard St W-NS/SB,37.78993,-122.394461
1,La Playa St&Cabrillo St SW-FS/BZ,37.773214,-122.51006
2,Fulton St&La Playa SE-FS/BZ,37.77134,-122.50939
3,Fulton St&46TH Ave SE-FS/BZ,37.77148,-122.50631
4,Fulton St&43RD Ave SE-FS/BZ,37.771612,-122.503296


In [11]:
df_nycSubway = pd.read_csv(file_path3)
df_nycSubway.drop(['URL', 'OBJECTID', 'NOTES'], axis = 1, inplace = True)

station_location = df_nycSubway['the_geom'].str.split(' ', expand=True)
station_location.head()

Unnamed: 0,0,1
0,-73.99106999861966,40.73005400028978
1,-74.00019299927328,40.71880300107709
2,-73.98384899986625,40.76172799961419
3,-73.97499915116808,40.68086213682956
4,-73.89488591154061,40.66471445143568


In [12]:
df_nycSubway['Latitude'] = station_location[1]
df_nycSubway['Longitude'] = station_location[0]

df_nycSubway.rename(columns = {'NAME': 'Station Name', 'LINE': 'Line'}, inplace = True)
df_nycSubway.drop(['the_geom'], axis = 1, inplace = True)
df_nycSubway.head()

Unnamed: 0,Station Name,Line,Latitude,Longitude
0,Astor Pl,4-6-6 Express,40.73005400028978,-73.99106999861966
1,Canal St,4-6-6 Express,40.71880300107709,-74.00019299927328
2,50th St,1-2,40.76172799961419,-73.98384899986625
3,Bergen St,2-3-4,40.68086213682956,-73.97499915116808
4,Pennsylvania Ave,3-4,40.66471445143568,-73.89488591154061


In [None]:
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

In [13]:
df_nycRent.to_csv('nyc_rent.csv')
df_sfRent.to_csv('sf_rent.csv')
df_nycSubway.to_csv('nyc_stations.csv')
df_sfMuni.to_csv('sf_munistop.csv')

In [14]:
!pip install folium

import folium
from folium import plugins



#### Now I will create a map of San Francisco with neighborhood markers

In [21]:
latitude = 37.7749 
longitude = -122.4194
# creating map of SF using latitude and longitude values
map_sf = folium.Map(location=[latitude, longitude], zoom_start=12.25)

# add markers to map
for lat, lng, neighborhood, rent in zip(df_sfRent['Latitude'], df_sfRent['Longitude'], df_sfRent['Neighborhood'], df_sfRent['Rent']):
    label = 'Neighborhood: {}, Median Rent: ${}'.format(neighborhood, rent)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sf)
    

# instantiate a mark cluster object for the incidents in the dataframe
stops = plugins.MarkerCluster().add_to(map_sf)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, in zip(df_sfMuni['Latitude'], df_sfMuni['Longitude']):
    folium.Marker(
        location=[lat, lng],
        icon=None
    ).add_to(stops)


map_sf

# Analysis <a name="analysis"></a>

# Results and Discussion <a name="results"></a>

# Conclusion <a name="conclusion"></a>