# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

This project will try to find best possible locations for opening a new restaurant **of any type** in **Los Angeles**. There are already many restaurants in Los Angeles, so the analysis will focus less dense areas of restaurants with medium or high population density. The neighborhoods surrounding our candidate locations should help provide insights into which type of restaurants might be successful, both from price point and cuisine perspective.

The analysis will exclude unincorporated areas, as they tend to be of very low population density and do not make a good candidate for this type of analysis. Analysis will include all neighborhoods in L.A. County, as well as cities that are independant of L.A. county of which there are many (Beverley Hills, West Hollywood, etc.).

Data science techniques wil be utilized to drive this analysis and recommendations.

## Data <a name="data"></a>

Based on the problem statement, there will be a need for a list of all Los Angeles neighborhoods, with a way to filter out unincoprated areas, with latitude and longitude for each. This data set is then joined with data from FourSquare for Los Angeles restaurants.

Los Angeles neightborhood data is available here:

https://usc.data.socrata.com/api/views/9utn-waje/rows.csv?accessType=DOWNLOAD

However the data set has bad data which make the data import challenging. The data set has been cleaned manually and placed here:

http://glacier2.verio.com/data/la_neighborhoods.csv

### Neighborhood Data

In [3]:
import pandas as pd

url = "http://glacier2.verio.com/data/la_neighborhoods.csv"

la_neighborhood_df = pd.read_csv(url)
print("Completed CSV Read")

Completed CSV Read


Let's take a look at the data set.

In [4]:
la_neighborhood_df.head()

Unnamed: 0,set,slug,the_geom,kind,external_i,name,display_na,sqmi,type,name_1,slug_1,latitude,longitude,location
0,L.A. County Neighborhoods (Current),acton,MULTIPOLYGON (((-118.20261747920541 34.5389897...,L.A. County Neighborhood (Current),acton,Acton,Acton L.A. County Neighborhood (Current),39.339109,unincorporated-area,,,-118.16981,34.497355,POINT(34.497355239240846 -118.16981019229348)
1,L.A. County Neighborhoods (Current),adams-normandie,MULTIPOLYGON (((-118.30900800000012 34.0374109...,L.A. County Neighborhood (Current),adams-normandie,Adams-Normandie,Adams-Normandie L.A. County Neighborhood (Curr...,0.80535,segment-of-a-city,,,-118.300208,34.031461,POINT(34.031461499124156 -118.30020800000011)
2,L.A. County Neighborhoods (Current),agoura-hills,MULTIPOLYGON (((-118.76192500000009 34.1682029...,L.A. County Neighborhood (Current),agoura-hills,Agoura Hills,Agoura Hills L.A. County Neighborhood (Current),8.14676,standalone-city,,,-118.759884,34.146736,POINT(34.146736499122795 -118.75988450000015)
3,L.A. County Neighborhoods (Current),agua-dulce,MULTIPOLYGON (((-118.2546773959221 34.55830403...,L.A. County Neighborhood (Current),agua-dulce,Agua Dulce,Agua Dulce L.A. County Neighborhood (Current),31.462632,unincorporated-area,,,-118.317104,34.504927,POINT(34.504926999796837 -118.3171036690717)
4,L.A. County Neighborhoods (Current),alhambra,MULTIPOLYGON (((-118.12174700000014 34.1050399...,L.A. County Neighborhood (Current),alhambra,Alhambra,Alhambra L.A. County Neighborhood (Current),7.623814,standalone-city,,,-118.136512,34.085539,POINT(34.085538999123571 -118.13651200000021)


The key fields that will be used are are name, type, and latitude and longitude.

In [5]:
la_neighborhood_df.shape

(269, 14)

269 distinct neightborhoods. Now let's remove the unincoporated areas.

In [6]:
# Ignore unincorporated areas
options = ['segment-of-a-city', 'standalone-city'] 
    
# selecting rows based on condition 
rslt_df = la_neighborhood_df.loc[la_neighborhood_df['type'].isin(options)] 

In [7]:
rslt_df.shape

(199, 14)

70 unicorporated areas have been removed. Let's subset and correct the data set.

In [8]:
la_neigh_df = rslt_df[["name", "type", "latitude", "longitude"]]

# It appears latitude and longitude are reversed, so perform a rename

la_neigh_df.rename(columns={"latitude": "lng", "longitude": "lat"}, inplace=True)

la_neigh_df.columns

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Index(['name', 'type', 'lng', 'lat'], dtype='object')

In [9]:
# Finally check some of the data
la_neigh_df.head()

Unnamed: 0,name,type,lng,lat
1,Adams-Normandie,segment-of-a-city,-118.300208,34.031461
2,Agoura Hills,standalone-city,-118.759884,34.146736
4,Alhambra,standalone-city,-118.136512,34.085539
6,Artesia,standalone-city,-118.080101,33.866896
9,Arcadia,standalone-city,-118.030419,34.13323


### Restaurant Data

Pull restaurant data down from Foursquare.

In [4]:
#!pip install geopy

Collecting geopy
  Using cached https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl
Collecting geographiclib<2,>=1.49 (from geopy)
  Using cached https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0


In [5]:
import geopy

C:\Users\Daniel\Anaconda3
base


In [10]:
# Get Los Angeles latitude and longitude
from geopy.geocoders import Nominatim 

# Get the coordinates of Los Angeles
address = 'Los Angeles'

geolocator = Nominatim(user_agent="la_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Los Angeles latitude and longitude: {}, {}.'.format(latitude, longitude))

Los Angeles latitude and longitude: 34.0536909, -118.242766.


Use folium and map Los Angeles as a starting point, along with our neighborhood data frame.

In [1]:
!pip install folium

Collecting folium
  Using cached https://files.pythonhosted.org/packages/c3/83/e8cb37afc2f016a1cf4caab8d22caf7fe4156c4c15230d8abc9c83547e0c/folium-0.12.1-py2.py3-none-any.whl
Collecting branca>=0.3.0 (from folium)
  Using cached https://files.pythonhosted.org/packages/61/1f/570b0615c452265d57e4114e633231d6cd9b9d275256778a675681e4f711/branca-0.4.2-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [13]:
import folium

la_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# Markers
for lat, lng, name, type_loc in zip(la_neigh_df['lat'], la_neigh_df['lng'], la_neigh_df['name'], la_neigh_df['type']): 
    #print(name)
    label = '{},{}'.format(name, type_loc)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=20,
        popup=label,
        color='gray',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(la_map)  

In [14]:
la_map

In [16]:
# FourSquare set up
CLIENT_ID = 'O0HJUMSBLMA4FOPTXPXA0IO0P4DEWM2XMKD0IJ5Q0ZJRPUV3' # your Foursquare 
CLIENT_SECRET = 'O0L2WTWXSGY10DARU2CBPI450RB00LARJRIZ5FZ2KF01GQYW' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
radius = 500


url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

Your credentails:
CLIENT_ID: O0HJUMSBLMA4FOPTXPXA0IO0P4DEWM2XMKD0IJ5Q0ZJRPUV3
CLIENT_SECRET:O0L2WTWXSGY10DARU2CBPI450RB00LARJRIZ5FZ2KF01GQYW


'https://api.foursquare.com/v2/venues/search?client_id=O0HJUMSBLMA4FOPTXPXA0IO0P4DEWM2XMKD0IJ5Q0ZJRPUV3&client_secret=O0L2WTWXSGY10DARU2CBPI450RB00LARJRIZ5FZ2KF01GQYW&ll=34.0536909,-118.242766&v=20180604&radius=500&limit=100'

In [15]:
# Function to retrieve venue categories
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [18]:
import requests
from pandas.io.json import json_normalize

# Results from the GET
results = requests.get(url).json()
results

# Pull out the venues
venues = results['response']['venues']

# dataframe
venues_dataframe = json_normalize(venues)
venues_dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,...,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d126941735', 'name': 'G...",,,,,,,False,52ebf057498ee83c88d9c1f9,,...,"[Los Angeles, CA, United States]","[{'label': 'display', 'lat': 34.05378746696942...",34.053787,-118.242369,,,CA,Los Angeles Mayor's Office of Economic Develop...,v-1624285480,
1,"[{'id': '4bf58dd8d48988d126941735', 'name': 'G...",,,,,,,False,4d65fb1dc2ccb60ca24c6bac,200 N Spring St,...,"[200 N Spring St, Los Angeles, CA 90012, Unite...","[{'label': 'display', 'lat': 34.05415339478075...",34.054153,-118.243117,,90012.0,CA,Los Angeles Civic Center,v-1624285480,
2,"[{'id': '4bf58dd8d48988d129941735', 'name': 'C...",,,,,,,False,4b38d6b3f964a520fb5025e3,200 N Main St,...,"[200 N Main St (Temple Street), Los Angeles, C...","[{'label': 'display', 'lat': 34.05301072481605...",34.053011,-118.241863,,90012.0,CA,James K. Hahn City Hall East Building,v-1624285480,
3,"[{'id': '4bf58dd8d48988d129941735', 'name': 'C...",,,,,,,False,4b5113edf964a520314127e3,200 N Spring St,...,"[200 N Spring St (at Temple Ave), Los Angeles,...","[{'label': 'display', 'lat': 34.05348417688625...",34.053484,-118.242478,Civic Center,90012.0,CA,Los Angeles City Hall,v-1624285480,75727220.0
4,"[{'id': '4bf58dd8d48988d129941735', 'name': 'C...",,,,,,,False,4d9ccfabc593a1cd8dff5119,Los Angeles City Hall,...,"[Los Angeles City Hall, Los Angeles, CA 90012,...","[{'label': 'display', 'lat': 34.05393149240509...",34.053931,-118.243169,,90012.0,CA,Office of Mayor Eric Garcetti,v-1624285480,


In [19]:
# Filter the categories and clean up the venue names
filtered_columns = ['name', 'categories'] + [col for col in venues_dataframe.columns if col.startswith('location.')] + ['id']
venues_df_filtered = venues_dataframe.loc[:, filtered_columns]
venues_df_filtered['categories'] = venues_df_filtered.apply(get_category_type, axis=1)
venues_df_filtered.columns = [column.split('.')[-1] for column in venues_df_filtered.columns]
venues_df_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Los Angeles Mayor's Office of Economic Develop...,Government Building,,US,Los Angeles,United States,,38,"[Los Angeles, CA, United States]","[{'label': 'display', 'lat': 34.05378746696942...",34.053787,-118.242369,,,CA,52ebf057498ee83c88d9c1f9
1,Los Angeles Civic Center,Government Building,200 N Spring St,US,Los Angeles,United States,,60,"[200 N Spring St, Los Angeles, CA 90012, Unite...","[{'label': 'display', 'lat': 34.05415339478075...",34.054153,-118.243117,,90012.0,CA,4d65fb1dc2ccb60ca24c6bac
2,James K. Hahn City Hall East Building,City Hall,200 N Main St,US,Los Angeles,United States,Temple Street,112,"[200 N Main St (Temple Street), Los Angeles, C...","[{'label': 'display', 'lat': 34.05301072481605...",34.053011,-118.241863,,90012.0,CA,4b38d6b3f964a520fb5025e3
3,Los Angeles City Hall,City Hall,200 N Spring St,US,Los Angeles,United States,at Temple Ave,35,"[200 N Spring St (at Temple Ave), Los Angeles,...","[{'label': 'display', 'lat': 34.05348417688625...",34.053484,-118.242478,Civic Center,90012.0,CA,4b5113edf964a520314127e3
4,Office of Mayor Eric Garcetti,City Hall,Los Angeles City Hall,US,Los Angeles,United States,,45,"[Los Angeles City Hall, Los Angeles, CA 90012,...","[{'label': 'display', 'lat': 34.05393149240509...",34.053931,-118.243169,,90012.0,CA,4d9ccfabc593a1cd8dff5119


In [20]:
# Check the size
venues_df_filtered.shape

(100, 16)

In [21]:
# Nearby venues function
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        #print(lat)
        #print(lng)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lng, 
            lat, 
            radius, 
            LIMIT)
            
        #print(url)
        
        # make the GET request
        #results = requests.get(url).json()["response"]['groups'][0]['items']
        results = requests.get(url).json()['response'].get('groups',[{}])[0].get('items', [])
        
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
names=la_neighborhood_df['name']

In [23]:
# The neighborhood that have venues in La
la_venues = getNearbyVenues(names=names,
                                   latitudes=rslt_df['latitude'],
                                   longitudes=rslt_df['longitude']
                                  )

Acton
Adams-Normandie
Agoura Hills
Agua Dulce
Alhambra
Alondra Park
Artesia
Altadena
Angeles Crest
Arcadia
Arleta
Arlington Heights
Athens
Atwater Village
Avalon
Avocado Heights
Azusa
Vermont-Slauson
Baldwin Hills/Crenshaw
Baldwin Park
Bel-Air
Bellflower
Bell Gardens
Green Valley
Bell
Beverly Crest
Beverly Grove
Burbank
Koreatown
Beverly Hills
Beverlywood
Boyle Heights
Bradbury
Brentwood
Broadway-Manchester
Calabasas
Canoga Park
Carson
Carthay
Castaic Canyons
Chatsworth
Castaic
Central-Alameda
Century City
Cerritos
Charter Oak
Chatsworth Reservoir
Chesterfield Square
Cheviot Hills
Chinatown
Citrus
Claremont
Northridge
Commerce
Compton
Cypress Park
La Mirada
Covina
Cudahy
Culver City
Del Aire
Del Rey
Desert View Highlands
Diamond Bar
Downey
Downtown
Duarte
Eagle Rock
East Compton
East Hollywood
East La Mirada
Elizabeth Lake
East Los Angeles
East Pasadena
East San Gabriel
Echo Park
El Monte
El Segundo
El Sereno
Elysian Park
Elysian Valley
Vermont Square
Encino
Exposition Park
Fairfax
Flo

In [24]:
la_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Acton,-118.300208,34.031461,7-Eleven,34.033027,-118.299960,Convenience Store
1,Acton,-118.300208,34.031461,Shell,34.033095,-118.300025,Gas Station
2,Acton,-118.300208,34.031461,El Cusuco Restaurant,34.032527,-118.298860,Food
3,Acton,-118.300208,34.031461,Little Xian,34.032292,-118.299465,Sushi Restaurant
4,Acton,-118.300208,34.031461,Sushi Delight,34.032501,-118.299454,Sushi Restaurant
5,Acton,-118.300208,34.031461,Tacos La Estrella,34.032230,-118.300757,Taco Place
6,Acton,-118.300208,34.031461,El Rincon Hondureño,34.032527,-118.298860,Latin American Restaurant
7,Acton,-118.300208,34.031461,Studio 26,34.032464,-118.301678,Recording Studio
8,Acton,-118.300208,34.031461,Orange Door Sushi,34.032485,-118.299368,Sushi Restaurant
9,Acton,-118.300208,34.031461,confidence books.com,34.030469,-118.297048,Bookstore


In [25]:
la_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acton,13,13,13,13,13,13
Adams-Normandie,30,30,30,30,30,30
Agoura Hills,14,14,14,14,14,14
Agua Dulce,26,26,26,26,26,26
Alhambra,26,26,26,26,26,26
Alondra Park,5,5,5,5,5,5
Altadena,22,22,22,22,22,22
Angeles Crest,4,4,4,4,4,4
Arcadia,10,10,10,10,10,10
Arleta,8,8,8,8,8,8


In [26]:
print('Number of unique categories is {}.'.format(len(la_venues['Venue Category'].unique())))

Number of unique categories is 290.


Finally, filter out all venues that are not restaurants.

In [27]:
options = ['Fast Food Restaurant' ,'Breakfast Spot',
 'Café', 'Restaurant', 'Indian Restaurant', 'BBQ Joint' ,'Burger Joint',
 'American Restaurant', 'Pizza Place', 'Brewery' 'Thai Restaurant',
 'Deli / Bodega', 'Mexican Restaurant','Sushi Restaurant' , 'Taco Place',
 'Hawaiian Restaurant',
 'Taiwanese Restaurant', 'Vegetarian / Vegan Restaurant',
 'Vietnamese Restaurant' ,'Japanese Restaurant' ,
 'Korean Restaurant', 
 'Donburi Restaurant', 'Seafood Restaurant',  'Dumpling Restaurant',
 'Mediterranean Restaurant', 'Southern / Soul Food Restaurant' 'Diner',
 'Udon Restaurant', 'Empanada Restaurant', 'Ramen Restaurant' 'Cuban Restaurant',
 'Korean BBQ Restaurant', 'Hotel Bar', 'Brazilian Restaurant', 'Ethiopian Restaurant',
 'French Restaurant', 'Cajun / Creole Restaurant' ,'Filipino Restaurant',
 'Dim Sum Restaurant',  'Greek Restaurant' ,'Noodle House' , 'New American Restaurant',
 'Middle Eastern Restaurant', 'Falafel Restaurant',
 'Persian Restaurant', 'Caribbean Restaurant', 'Andhra Restaurant' ,
 'Russian Restaurant'
  ] 

# selecting rows based on condition 
la_rest_df = la_venues.loc[la_venues['Venue Category'].isin(options)] 
la_rest_df

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
3,Acton,-118.300208,34.031461,Little Xian,34.032292,-118.299465,Sushi Restaurant
4,Acton,-118.300208,34.031461,Sushi Delight,34.032501,-118.299454,Sushi Restaurant
5,Acton,-118.300208,34.031461,Tacos La Estrella,34.032230,-118.300757,Taco Place
8,Acton,-118.300208,34.031461,Orange Door Sushi,34.032485,-118.299368,Sushi Restaurant
13,Adams-Normandie,-118.759884,34.146736,El Pollo Loco,34.144732,-118.761088,Fast Food Restaurant
14,Adams-Normandie,-118.759884,34.146736,Sushi Raku,34.148230,-118.760163,Sushi Restaurant
16,Adams-Normandie,-118.759884,34.146736,Urbane Cafe,34.146573,-118.758956,Café
17,Adams-Normandie,-118.759884,34.146736,Jinky's Kanan Cafe,34.146280,-118.756833,Breakfast Spot
18,Adams-Normandie,-118.759884,34.146736,Boar Dough Tasting Room,34.144237,-118.756564,Restaurant
19,Adams-Normandie,-118.759884,34.146736,Lal Mirch,34.147822,-118.760536,Indian Restaurant


We now have two datasets, one, the neightborhoods, the othe, restaurants. We will combine and use theswe datasets to perfrom analysis.

## Methodology <a name="methodology"></a>

## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>