# The Battle of Neighborhoods

## Introduction

The University of Southern California with its project named The Neighborhood Data for Social Change (NDSC) platform, provides different resources they said it is head to civic actors who want to learn about their neighborhood. The platform offers different areas of study such as demography, education, environment, health, among others. Taking advantage of all these data sets, especially of the demographic dataset, which provides information about the population distribution by age distribution, households, race and ethnicity, age distribution among others, it would be nice to understand the population distribution in Los Angeles by race and ethnicity to analyse where the different races are located and how the neighborhoods are clustered for the largest	groups of people in Los Angeles.
To do that, the idea is to find the two largest groups in Los Angeles by using the NDSC (White, Black, Hispanic, Asian...) classification, clustering them by utilizing the Foursquare API, and comparing them to understand similarities and differences among races.

The importance of this analysis is that persons who want to invest or live in one of those neighborhoods can understand the influence of races in their neighborhoods, and they can make decisions based on this analysis.


## Data Sources


1. <a href="#item1">Understanding the NDSC dataset Race & Ethnicity</a>

2. <a href="#item2">Understanding the Foursquare API</a>



## Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Understanding the NDSC dataset = Race & Ethnicity

Race & Ethnicity has 7 years records from 2010  to 2016, 8 race and ethnicity classification, and 262 neighborhoods which provides 2344 location points with the latitude and logitude coordinates of each location point. 

Here is the link to the dataset: https://usc.data.socrata.com/api/views/jxw5-xxv5/rows.csv

For this analysis, we will use only records from 2016, so let´s prepare the data.

In [2]:
!wget -O urlRaceEthnicity.csv https://usc.data.socrata.com/api/views/jxw5-xxv5/rows.csv
dataframe= pd.read_csv('urlRaceEthnicity.csv')
dataframe.head()

--2019-06-29 18:40:15--  https://usc.data.socrata.com/api/views/jxw5-xxv5/rows.csv
Resolving usc.data.socrata.com (usc.data.socrata.com)... 52.206.140.199, 52.206.140.205, 52.206.68.26
Connecting to usc.data.socrata.com (usc.data.socrata.com)|52.206.140.199|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘urlRaceEthnicity.csv’

    [                  <=>                  ] 33,463,156  8.60MB/s   in 3.7s   

2019-06-29 18:40:24 (8.58 MB/s) - ‘urlRaceEthnicity.csv’ saved [33463156]



Unnamed: 0,Policy Area,Dataset,Year,Variable,Count,Percent,Tract,Tract Number,Neighborhood,GEOID,Location,Row ID,Date
0,Demography,Race & Ethnicity,2010,American Indian/Native Population,0.0,0.0,"Census Tract 1011.22, Los Angeles County, Cali...",101122,Tujunga,1400000US06037101122,"(34.267357, -118.29024)",American_Indian/Native_Population_2010_1400000...,01/01/2010
1,Demography,Race & Ethnicity,2010,American Indian/Native Population,31.0,0.830431,"Census Tract 1014, Los Angeles County, California",101400,Tujunga,1400000US06037101400,"(34.244255, -118.296428)",American_Indian/Native_Population_2010_1400000...,01/01/2010
2,Demography,Race & Ethnicity,2010,American Indian/Native Population,17.0,0.923411,"Census Tract 1021.03, Los Angeles County, Cali...",102103,Shadow Hills,1400000US06037102103,"(34.224155, -118.354339)",American_Indian/Native_Population_2010_1400000...,01/01/2010
3,Demography,Race & Ethnicity,2010,American Indian/Native Population,0.0,0.0,"Census Tract 1021.04, Los Angeles County, Cali...",102104,Shadow Hills,1400000US06037102104,"(34.216189, -118.3456235)",American_Indian/Native_Population_2010_1400000...,01/01/2010
4,Demography,Race & Ethnicity,2010,American Indian/Native Population,0.0,0.0,"Census Tract 1021.05, Los Angeles County, Cali...",102105,Sun Valley,1400000US06037102105,"(34.210852, -118.3480495)",American_Indian/Native_Population_2010_1400000...,01/01/2010


Filter the year 2016

In [3]:
dataframe2016 = dataframe[dataframe['Year']==2016]
dataframe2016.head()

Unnamed: 0,Policy Area,Dataset,Year,Variable,Count,Percent,Tract,Tract Number,Neighborhood,GEOID,Location,Row ID,Date
138,Demography,Race & Ethnicity,2016,Black Population,66.0,2.08993,"Census Tract 2715, Los Angeles County, California",271500,Mar Vista,1400000US06037271500,"(34.01663, -118.4375635)",Black_Population_2016_1400000US06037271500,01/01/2016
198,Demography,Race & Ethnicity,2016,Black Population,19.0,0.585697,"Census Tract 3112, Los Angeles County, California",311200,Burbank,1400000US06037311200,"(34.1714255, -118.3527755)",Black_Population_2016_1400000US06037311200,01/01/2016
317,Demography,Race & Ethnicity,2016,Black Population,136.0,3.448276,"Census Tract 3113, Los Angeles County, California",311300,Burbank,1400000US06037311300,"(34.173525, -118.342414)",Black_Population_2016_1400000US06037311300,01/01/2016
377,Demography,Race & Ethnicity,2016,Black Population,51.0,2.222222,"Census Tract 3114, Los Angeles County, California",311400,Burbank,1400000US06037311400,"(34.162038, -118.34958)",Black_Population_2016_1400000US06037311400,01/01/2016
455,Demography,Race & Ethnicity,2016,Black Population,161.0,2.931537,"Census Tract 3115, Los Angeles County, California",311500,Burbank,1400000US06037311500,"(34.164754, -118.33837)",Black_Population_2016_1400000US06037311500,01/01/2016


Split location into latitude and longitude

In [4]:
dataframe2016['latitude'], dataframe2016['longitude'] = dataframe2016['Location'].str.split(',', 1).str
dataframe2016['latitude'] = dataframe2016['latitude'].str.replace("(","")
dataframe2016['longitude'] = dataframe2016['longitude'].str.replace(")","")
dataframe2016['latitude'] = pd.to_numeric(dataframe2016['latitude'])
dataframe2016['longitude'] = pd.to_numeric(dataframe2016['longitude'])
dataframe2016.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: ht

Unnamed: 0,Policy Area,Dataset,Year,Variable,Count,Percent,Tract,Tract Number,Neighborhood,GEOID,Location,Row ID,Date,latitude,longitude
138,Demography,Race & Ethnicity,2016,Black Population,66.0,2.08993,"Census Tract 2715, Los Angeles County, California",271500,Mar Vista,1400000US06037271500,"(34.01663, -118.4375635)",Black_Population_2016_1400000US06037271500,01/01/2016,34.01663,-118.437563
198,Demography,Race & Ethnicity,2016,Black Population,19.0,0.585697,"Census Tract 3112, Los Angeles County, California",311200,Burbank,1400000US06037311200,"(34.1714255, -118.3527755)",Black_Population_2016_1400000US06037311200,01/01/2016,34.171425,-118.352775
317,Demography,Race & Ethnicity,2016,Black Population,136.0,3.448276,"Census Tract 3113, Los Angeles County, California",311300,Burbank,1400000US06037311300,"(34.173525, -118.342414)",Black_Population_2016_1400000US06037311300,01/01/2016,34.173525,-118.342414
377,Demography,Race & Ethnicity,2016,Black Population,51.0,2.222222,"Census Tract 3114, Los Angeles County, California",311400,Burbank,1400000US06037311400,"(34.162038, -118.34958)",Black_Population_2016_1400000US06037311400,01/01/2016,34.162038,-118.34958
455,Demography,Race & Ethnicity,2016,Black Population,161.0,2.931537,"Census Tract 3115, Los Angeles County, California",311500,Burbank,1400000US06037311500,"(34.164754, -118.33837)",Black_Population_2016_1400000US06037311500,01/01/2016,34.164754,-118.33837


Drop columns we do not need

In [5]:
dataframe2016 = dataframe2016.drop(columns=['Policy Area','Dataset','Tract','GEOID','Row ID','Date','Percent','Year','Tract Number','Location'])
dataframe2016.head()

Unnamed: 0,Variable,Count,Neighborhood,latitude,longitude
138,Black Population,66.0,Mar Vista,34.01663,-118.437563
198,Black Population,19.0,Burbank,34.171425,-118.352775
317,Black Population,136.0,Burbank,34.173525,-118.342414
377,Black Population,51.0,Burbank,34.162038,-118.34958
455,Black Population,161.0,Burbank,34.164754,-118.33837


Let´s see the distribution of the population by Race

In [6]:
#Creating Dataframe
df2016 = pd.DataFrame(data=dataframe2016, columns=['Variable','Count'])
df2016 = df2016.groupby(['Variable']).sum()
df2016= df2016.reset_index()
df2016.sort_values(by=['Count'], ascending=False)

Unnamed: 0,Variable,Count
3,Hispanic Population,4861648.0
7,White Population,2687787.0
1,Asian Population,1413105.0
2,Black Population,801182.0
6,Population of Two or More Races,220878.0
5,Other Race Population,29351.0
4,Native Hawaiian/Other Pacific Islander Population,24439.0
0,American Indian/Native Population,18765.0


As we can see, the Hispanic population and the White population are the two largest population groups, so that we are going to work with these groups.

## 2. Understanding the Foursquare API

The Foursquare API is a source for developers who want to search for a specific type of venue around a given location as well as a specific venue with data such as the full address, working hours, menu among others. Furthermore, it can possible to explore popular spots in a given location and trending venues as well.

Here is the API link is: https://api.foursquare.com/v2/venues/search?client_id=CLIENT_ID&client_secret=CLIENT_SECRET&ll=LATITUDE,LONGITUDE&v=VERSION&query=QUERY&radius=RADIUS&limit=LIMIT

where:

<div style="margin-top: 20px">

<font size = 3>

1. <a>CLIENT_ID = your Foursquare ID</a>
2. <a>CLIENT_SECRET =  your Foursquare Secret</a>
3. <a>LATITUDE = latitude of the given place</a>
4. <a>LONGITUDE = longitude of the given place</a>
5. <a>VERSION = date of the  database version 'yyyymmdd'</a>
6. <a>QUERY = the value you want to search</a>
7. <a>RADIUS = number of metter from the given place</a>
8. <a>LIMIT = number of records in the answer</a>

</font>
</div>

<div>

</div>

Important information: With your free Foursquare account you can access:

1. <a>105M places</a>
2. <a>2 Photos & 2 Tips per Venue</a>
3. <a>2 Queries per Second (QPS)</a>
4. <a>1 App per Account</a>
5. <a>Insight into API Usage</a>
6. <a>API Call Quota</a>
7. <a>99,500 Regular Calls + 500 Premium Calls</a>


Let´s see how we can call for venues for the first location point from the Race & Ethnicity dataset.

Define the credentials an parameters

In [8]:
# The code was removed by Watson Studio for sharing.

### Let´s see the venues around five  the location points given by the NDSC dataset

Define the corresponding URL

In [10]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)

Creating a function to look for nearby venues

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Looking for nearby venues for the first 5 location points from the population dataset

In [24]:
LAX_venues = getNearbyVenues(names=dataframe2016['Neighborhood'].head(),
                                latitudes = dataframe2016['latitude'].head(),
                                longitudes = dataframe2016['longitude'].head()
                                  )

Mar Vista
Burbank
Burbank
Burbank
Burbank


In [27]:
print(LAX_venues.shape)
LAX_venues.head()

(58, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mar Vista,34.01663,-118.437563,Ocean View Farms,34.013712,-118.440426,Garden
1,Mar Vista,34.01663,-118.437563,north venice little league,34.0146,-118.440083,Baseball Field
2,Mar Vista,34.01663,-118.437563,Mountain View Outlook,34.017705,-118.441549,Scenic Lookout
3,Mar Vista,34.01663,-118.437563,VENICE GARDENS,34.015355,-118.442468,Garden
4,Burbank,34.171425,-118.352775,Emerald Knights Comics and Games,34.172974,-118.354616,Toy / Game Store
