<h1>Introduction/Business Problem</h1>

Predicting single family home foreclosure risk is an inexact science which in the past twenty years has largely depended on statistical analysis. To date, statistical analysis is heavily reliant on industry data, such as buyer lending information or static locations (near freeways, schools, etc). But from looking at social media data like what is found on FourSquare, there exists an opportunity to consider a change to the existing banking model that would make predicting foreclosure risk more dynamic and therefore accurate. If true, lending banks could save significant amounts of money from the thousands of automated home mortgage decisions they make, beyond what they are making with real estate solution products available through companies like CoreLogic or SiteXData. 

The potential model to a solution for this problem comes from what this capstone course has already accomplished, that is, analyzing Foursquare neighborhood venue data for the city of Toronto. In this  case, however, I will be applying the code to the city of Los Angeles. However, it is not just a copy. Multiple adjusted steps are needed for this to work, well beyond what has been completed.

<h1>Data</h1> 

The steps to solve the above business problem are the following:
<OL>
    <LI>The lat/lon gecoding information for the city of Los Angeles will need to be integrated with LA zipcode/neighborhood data that is scraped from the web.</LI>
    <LI>2018 foreclosure CVS file data from the city of Los Angeles will also need to be combined into the Los Angeles table in step 1.</LI>
    <LI>Foursquare will then be used to explore Los Angeles, coming up with venues for each neighborhood.</LI>
    <LI>Venues will then be organized by categories.</LI>
    <LI>Each venue category will include foreclosure frequency within a 1 mile radius. For example, the result will include the number of 2018 foreclosures for the category "Fast Food Restaurants."</LI>
    <LI>The data can then be analyzed for consistent foreclosure trends in each neighborhood related to venues.</LI>
    <LI>From step number 6, charts can be made to show how specific venues either increase, have no impact, or reduce foreclosure risk.</LI>
</OL>
<p>
An example of potentially conclusive venue data would be the following:</p>

<table>
    <tr><td>RESULT</td><td>EFFECT</td></tr>
    <tr><td>increased frequencies of Los Angeles parks</td><td>consistent lower than average foreclosure risk</td></tr>
    <tr><td>increased frequency of Los Angeles cafe's</td><td>normal range foreclosure risk</td></tr>
    <tr><td>increased frequency of Los Angeles fast food restaurants</td><td>consistent above average foreclosure risk</td></tr>
</table>

The next step will be to apply the above code and analysis to the cities of Boston, Seattle, Chicago and Houston, using FourSquare data, to see if the results for 2018 data are repeatable there. If the results are consistent, this study can lead to a deeper mathematical analyis as the next step in another project.

<h1>Methodology</h1>

<h2>Step 1: Loading 2018 LA foreclosure data, including geocodes, from a CSV file 100 records at at a time into a new DF</h2>

<h3>https://docs.google.com/spreadsheets/d/1J6PCrub6X1MuFwt6ZSxBx0Jv-McbnOuCQIZzKJdeK7I/edit#gid=393329626</h3>

In [4]:
#DEFINITIONS
import pandas as pd
#  https://docs.google.com/spreadsheets/d/16OMQGiVATyPgUCCxRfxw5vDKjmSQ5XIQwPQqgbop828/edit#gid=1171515980  (last version)
# entire 3100 rows  fc_data = pd.read_csv('https://docs.google.com/spreadsheets/d/16OMQGiVATyPgUCCxRfxw5vDKjmSQ5XIQwPQqgbop828/export?format=csv&gid=1171515980',
fc_data = pd.read_csv('https://docs.google.com/spreadsheets/d/1J6PCrub6X1MuFwt6ZSxBx0Jv-McbnOuCQIZzKJdeK7I/export?format=csv&gid=393329626',
                   # Set first column as rownames in data frame
                   header='infer', index_col=None, engine='python', error_bad_lines=False
                  )
fc_data.head(20)

# https://kanoki.org/2018/12/25/read-google-spreadsheet-data-into-pandas-dataframe/

Unnamed: 0,APN,RegisteredDate,PropertyType,Address,City,State,Zip,Latitude,Longitude
0,6003016019,10/31/2018,Multi-Family,1217 W 60TH PL,LOS ANGELES,CA,90044,33.985025,-118.29654
1,2151015040,11/1/2018,Single Family,20519 W HATTERAS ST,LOS ANGELES,CA,91367,34.175498,-118.579552
2,5442030015,11/5/2018,Single Family,2801 N MARSH ST,LOS ANGELES,CA,90039,34.104301,-118.248531
3,5015020012,1/17/2018,Multi-Family,2071 W 52ND ST,LOS ANGELES,CA,90062,33.995044,-118.316025
4,7416002028,1/5/2018,Single Family,649 N FRIGATE AVE,LOS ANGELES,CA,90744,33.777863,-118.278247
5,2162005011,11/10/2018,Single Family,5257 N NEWCASTLE AVE,ENCINO,CA,91316,34.166353,-118.524347
6,6006027004,11/6/2018,Single Family,500 E 59TH PL,LOS ANGELES,CA,90003,33.986083,-118.266182
7,5460010052,11/21/2018,Single Family,2409 N YORKSHIRE DR,LOS ANGELES,CA,90065,34.110697,-118.230535
8,2524021020,1/4/2018,Single Family,13279 W VAUGHN ST,LOS ANGELES,CA,91340,34.281704,-118.422398
9,2425016027,2/2/2018,Single Family,3548 N MULTIVIEW DR,LOS ANGELES,CA,90068,34.130287,-118.360437


<h2>Step 2: Exploring and Clustering Foreclosed Addresses (2018) in Los Angeles</h2>

<h3>First, get lat/lon of Los Angeles</h3>

In [7]:
# Use geopy library to get the latitude and longitude values of Los Angeles
from geopy.geocoders import Nominatim 
#Nominatim is a tool to search OSM (open street map) data by name and address and to generate synthetic addresses of OSM points (reverse geocoding).
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium # folium is mapping tool
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [8]:
address = 'Los Angeles, California'
geolocator = Nominatim(user_agent="Los Angeles")
location = geolocator.geocode(address)
latitude_LA = location.latitude
longitude_LA = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude_LA, longitude_LA))

The geograpical coordinate of Los Angeles are 34.0536909, -118.2427666.


<h3>Show Map of Los Angeles</h3>

In [9]:
map_LA = folium.Map(location=[latitude_LA, longitude_LA], zoom_start=10)

# add markers to map
for lat, lng, address, zipcode in zip(fc_data['Latitude'], fc_data['Longitude'], fc_data['Address'], fc_data['Zip']):
    label = '{}, {}'.format(zipcode, address)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_LA)  
    
map_LA

<h3>Define Foursquare Credentials and Version</h3>

In [10]:
CLIENT_ID = "RBX2RESNE1HWSSGX020MOSI4CTL4SIP4QQPZFGJ2LF4EBOMX"  #Foursquare ID
CLIENT_SECRET = "YBZIPEL5GKJX2T13SPD5TDM4ZFXRV51SO2WDNKEGUTSMIWIU"
# was CLIENT_SECRET = "H2LKEFPFFGH4L3WLNDFLPVF0O11S55UMOAIUEX0S4MHH3GAV"  #Foursquare Secret

VERSION = "20180605"  #API Version

print("Your Credentials:")
print("CLIENT ID:" + CLIENT_ID)
print("CLIENT SECRET" + CLIENT_SECRET)

Your Credentials:
CLIENT ID:RBX2RESNE1HWSSGX020MOSI4CTL4SIP4QQPZFGJ2LF4EBOMX
CLIENT SECRETYBZIPEL5GKJX2T13SPD5TDM4ZFXRV51SO2WDNKEGUTSMIWIU


In [11]:
# defining radius and limit of venues to get
radius=500
LIMIT=100

<h3>Explore the first zip in the Dataframe.</h3>

In [12]:
# Get the first address zip
fc_data.loc[0, 'Zip']

90044

<h3>Get the address latitude and longitude values.</h3>

In [13]:

address_latitude = fc_data.loc[0, 'Latitude'] # address latitude value
address_longitude = fc_data.loc[0, 'Longitude'] # address longitude value
address_zip = fc_data.loc[0, 'Zip'] # address Zip

print('Latitude and longitude values of {} are {}, {}.'.format(address_zip, 
                                                               address_latitude, 
                                                               address_longitude))

Latitude and longitude values of 90044 are 33.985025, -118.29654.


<h3>The Top 100 venues that are in zip 90044 within a radius of 500 meters.</h3>

In [14]:
#First, let's create the GET request URL.
# radius = 500
# LIMIT = 100

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    address_latitude,
    address_longitude, 
    radius, 
    LIMIT)
    
url  #display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=RBX2RESNE1HWSSGX020MOSI4CTL4SIP4QQPZFGJ2LF4EBOMX&client_secret=YBZIPEL5GKJX2T13SPD5TDM4ZFXRV51SO2WDNKEGUTSMIWIU&v=20180605&ll=33.985025,-118.29654&radius=500&limit=100'

In [15]:
#Send the GET Request and examine the results
import requests

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cf346694c1f6753b8283024'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4fcaab5ee4b0d8a39149da92-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fishandchips_',
          'suffix': '.png'},
         'id': '4edd64a0c7ddd24ca188df1a',
         'name': 'Fish & Chips Shop',
         'pluralName': 'Fish & Chips Shops',
         'primary': True,
         'shortName': 'Fish & Chips'}],
       'id': '4fcaab5ee4b0d8a39149da92',
       'location': {'address': '5950 S Normandie Ave',
        'cc': 'US',
        'city': 'Los Angeles',
        'country': 'United States',
        'crossStreet': 'Normandie & 59th',
        'distance': 369,
        'formattedAddress': ['5950 S Normandie Ave (Normandie & 59th)',
         'Los Angeles, CA 9004

<h3>Get the category of the venue</h3>

In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

<h3>Clean the json and structure it into a pandas dataframe.</h3>

In [17]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,east west fish market,Fish & Chips Shop,33.9866,-118.300067
1,Don Lenchos Restaurant,Restaurant,33.983687,-118.300546
2,Happy Laundry 24hrs,Laundromat,33.986883,-118.300362
3,Amapulapa,Latin American Restaurant,33.987215,-118.300351
4,Normandie / Gage,Bus Stop,33.98214,-118.30017


In [18]:
# And how many venues were returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

5 venues were returned by Foursquare.


<h3>Explore Neighborhoods in Los Angeles 100 rows at a time. Repeat same process for all 2018 foreclosured addresses in Los Angeles.</h3>

In [19]:
def getNearbyVenues(address, city, zipcode, latitude, longitude, radius=500):
    
    venues_list=[]
    for addr, cty, zcode, lat, lng in zip(address, city, zipcode, latitude, longitude):
        print(addr)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            addr,
            zcode,
            lat, 
            lng,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['FC Address', 
                  'FC Zip',
                  'FC Latitude',
                  'FC Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<h3>Create code to run the above function on each foreclosed address and create a new dataframe called LA_venues.</h3>

In [20]:
LA_venues = getNearbyVenues(address=fc_data['Address'],
                                   city=fc_data['City'],
                                   zipcode=fc_data['Zip'],
                                   latitude=fc_data['Latitude'],
                                   longitude=fc_data['Longitude']
                                  )
# LA_venues.head(25)

1217 W 60TH PL
20519 W HATTERAS ST
2801 N MARSH ST
2071 W 52ND ST
649 N FRIGATE AVE
5257 N NEWCASTLE AVE
500 E 59TH PL
2409 N YORKSHIRE DR
13279 W VAUGHN ST
3548 N MULTIVIEW DR
4758 N SUNNYSLOPE AVE
901 N STRADA VECCHIA ROAD
9847 W PORTOLA DR
5869 W DAUPHIN AVE
6951 N ETIWANDA AVE
15468 W MORRISON ST
14235 W VANOWEN ST
213 N TIGERTAIL ROAD
1021 W LOWEN ST
2038 S MALCOLM AVE
15935 W COMMUNITY ST
6901 W CAHUENGA PARK TR
1027 S CENTRE ST
23222 W AETNA ST
13640 W GARBER ST
5211 N DARRO ROAD
20321 W VIA SANSOVINO
1916 N HANCOCK ST
929 S MUIRFIELD RD
10601 W WILSHIRE BLVD 1-72
7217 S DENKER AVE
12383 W COVELLO ST
7014 N COLBATH AVE
3749 N GLENALBYN DR
1334 E 57TH ST
11622 W KESWICK ST
9470 S HOBART BLVD
2568 S MILITARY AVE
635 S BYNNER DR
1240 N RAVENNA AVE
1871 W 25TH ST
2651 N ABERDEEN AVE
7129 W HILLROSE ST
19501 W ROMAR ST
1445  YOSEMITE DR
14325 W TORI CT
4868 N ADELE CT
1916 W WHITMORE AVE
11846 S ORCHARD AVE
727 N ALTA VISTA BLVD
11221 N AMBOY AVE
17169 W STARE ST
6242 W TEMPLE HILL D

<h3>Check the size of the resulting dataframe</h3>

In [21]:
print(LA_venues.shape)
LA_venues.head()

(952, 8)


Unnamed: 0,FC Address,FC Zip,FC Latitude,FC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1217 W 60TH PL,90044,33.985025,-118.29654,east west fish market,33.9866,-118.300067,Fish & Chips Shop
1,1217 W 60TH PL,90044,33.985025,-118.29654,Don Lenchos Restaurant,33.983687,-118.300546,Restaurant
2,1217 W 60TH PL,90044,33.985025,-118.29654,Happy Laundry 24hrs,33.986883,-118.300362,Laundromat
3,1217 W 60TH PL,90044,33.985025,-118.29654,Amapulapa,33.987215,-118.300351,Latin American Restaurant
4,1217 W 60TH PL,90044,33.985025,-118.29654,Normandie / Gage,33.98214,-118.30017,Bus Stop


<h3>Check how many venues were returned for each LA Foreclosed (FC) Address</h3>

In [22]:
LA_venues.groupby('FC Address').count()

Unnamed: 0_level_0,FC Zip,FC Latitude,FC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
FC Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1000 E 74TH ST,4,4,4,4,4,4,4
10000 N OWENSMOUTH AVE,13,13,13,13,13,13,13
1021 W LOWEN ST,4,4,4,4,4,4,4
1027 S CENTRE ST,23,23,23,23,23,23,23
10650 W CHIQUITA ST,12,12,12,12,12,12,12
1077 S RIMPAU BLVD,9,9,9,9,9,9,9
11163 N TELFAIR AVE,3,3,3,3,3,3,3
11221 N AMBOY AVE,6,6,6,6,6,6,6
11407 N SANTINI LANE,3,3,3,3,3,3,3
11622 W KESWICK ST,13,13,13,13,13,13,13


<h3>Find out how many unique categories can be curated from all the returned LA venues</h3>

In [23]:
print('There are {} unique categories.'.format(len(LA_venues['Venue Category'].unique())))

There are 217 unique categories.


<h3>Analyze Each LA Foreclosed Address</h3>

In [24]:
# one hot encoding
LA_onehot = pd.get_dummies(LA_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
LA_onehot['FC Address'] = LA_venues['FC Address'] 

# move neighborhood column to the first column
fixed_columns = [LA_onehot.columns[-1]] + list(LA_onehot.columns[:-1])
LA_onehot = LA_onehot[fixed_columns]

LA_onehot.head()

Unnamed: 0,FC Address,ATM,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h3>What is the new dataframe size?</h3>

In [25]:
LA_onehot.shape

(952, 218)

<h3>Group rows by LA FC Address by taking the mean of the frequency of occurrence of each category.</h3>

In [26]:
LA_grouped = LA_onehot.groupby('FC Address').mean().reset_index()
LA_grouped

Unnamed: 0,FC Address,ATM,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,1000 E 74TH ST,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
1,10000 N OWENSMOUTH AVE,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.076923,0.000000,0.000000,0.000000
2,1021 W LOWEN ST,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
3,1027 S CENTRE ST,0.000000,0.000000,0.000000,0.000000,0.043478,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.043478,0.000000,0.000000
4,10650 W CHIQUITA ST,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
5,1077 S RIMPAU BLVD,0.000000,0.111111,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
6,11163 N TELFAIR AVE,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
7,11221 N AMBOY AVE,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
8,11407 N SANTINI LANE,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
9,11622 W KESWICK ST,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000,0.076923,...,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000


<h3>Confirm the new size</h3>

In [27]:
LA_grouped.shape

(95, 218)

<h3>Print each LA FC address along with the top 5 most common LA venues</h3>

In [28]:
num_top_venues = 5

for address in LA_grouped['FC Address']:
    print("----"+address+"----")
    temp = LA_grouped[LA_grouped['FC Address'] == address].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1000 E 74TH ST----
                        venue  freq
0  Construction & Landscaping  0.25
1          Mexican Restaurant  0.25
2          Seafood Restaurant  0.25
3                        Bank  0.25
4                         ATM  0.00


----10000 N OWENSMOUTH AVE----
                           venue  freq
0            Japanese Restaurant  0.08
1  Vegetarian / Vegan Restaurant  0.08
2                     Smoke Shop  0.08
3                    Coffee Shop  0.08
4             Persian Restaurant  0.08


----1021 W LOWEN ST----
                  venue  freq
0        Baseball Field  0.25
1   Fried Chicken Joint  0.25
2  Gym / Fitness Center  0.25
3            Hobby Shop  0.25
4                   ATM  0.00


----1027 S CENTRE ST----
              venue  freq
0   Thai Restaurant  0.09
1  Sushi Restaurant  0.04
2               Bar  0.04
3        Sports Bar  0.04
4        Steakhouse  0.04


----10650 W CHIQUITA ST----
                venue  freq
0  Italian Restaurant  0.17
1                Pa

<h3>Put above into a pandas dataframe</h3>

In [35]:
# First write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [36]:
num_top_venues = 10
import numpy as np # library to handle data in a vectorized manner

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['FC Address']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
fcaddress_venues_sorted = pd.DataFrame(columns=columns)
fcaddress_venues_sorted['FC Address'] = LA_grouped['FC Address']

for ind in np.arange(LA_grouped.shape[0]):
    fcaddress_venues_sorted.iloc[ind, 1:] = return_most_common_venues(LA_grouped.iloc[ind, :], num_top_venues)

fcaddress_venues_sorted.head()

Unnamed: 0,FC Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1000 E 74TH ST,Bank,Seafood Restaurant,Mexican Restaurant,Construction & Landscaping,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop
1,10000 N OWENSMOUTH AVE,Hot Dog Joint,Japanese Restaurant,Salon / Barbershop,Coffee Shop,Train Station,Fast Food Restaurant,Persian Restaurant,Vegetarian / Vegan Restaurant,Smoke Shop,Music Venue
2,1021 W LOWEN ST,Gym / Fitness Center,Hobby Shop,Baseball Field,Fried Chicken Joint,Dive Shop,Fast Food Restaurant,Frame Store,Football Stadium,Food Truck,Food Stand
3,1027 S CENTRE ST,Thai Restaurant,Hotel,Bar,Pizza Place,Rental Car Location,Clothing Store,Rock Club,Sandwich Place,Café,Seafood Restaurant
4,10650 W CHIQUITA ST,Food Truck,Italian Restaurant,Park,Discount Store,Dance Studio,Breakfast Spot,Gym,Bar,Ramen Restaurant,Donut Shop


<h3>Cluster FCAddresses--Run k-means to cluster the FCAdresses into 5 clusters.</h3>

In [37]:
# set number of clusters
kclusters = 5

LA_grouped_clustering = LA_grouped.drop('FC Address', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(LA_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 4, 4, 4, 4, 4, 4, 4, 2, 1, 4, 4, 4, 4, 4, 1, 4, 4, 4, 1, 4, 3, 4,
       4, 1, 1, 4, 4, 1, 4, 4, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 2, 4, 4, 4,
       2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 0, 4, 4, 4,
       4, 4, 4, 1, 4, 4, 4, 2, 4, 1, 4, 4, 2, 4, 4, 1, 4, 4, 4, 4, 4, 4, 4,
       4, 4, 2], dtype=int32)

In [39]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# add clustering labels
fcaddress_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
LA_merged = fc_data

# merge LA_grouped with LA_data to add latitude/longitude for each address
LA_merged = LA_merged.join(fcaddress_venues_sorted.set_index('FC Address'), on='Address')
LA_merged.head(20) # check the last columns!

Unnamed: 0,APN,RegisteredDate,PropertyType,Address,City,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,6003016019,10/31/2018,Multi-Family,1217 W 60TH PL,LOS ANGELES,CA,90044,33.985025,-118.29654,4.0,Bus Stop,Restaurant,Latin American Restaurant,Laundromat,Fish & Chips Shop,Yoga Studio,Farmers Market,Football Stadium,Food Truck,Food Stand
1,2151015040,11/1/2018,Single Family,20519 W HATTERAS ST,LOS ANGELES,CA,91367,34.175498,-118.579552,4.0,Pool,Business Service,Construction & Landscaping,Home Service,Farmers Market,Football Stadium,Food Truck,Food Stand,Food Court,Food
2,5442030015,11/5/2018,Single Family,2801 N MARSH ST,LOS ANGELES,CA,90039,34.104301,-118.248531,4.0,Furniture / Home Store,Food Truck,Park,Brewery,Dance Studio,Café,Sandwich Place,Coffee Shop,Food Stand,Food Court
3,5015020012,1/17/2018,Multi-Family,2071 W 52ND ST,LOS ANGELES,CA,90062,33.995044,-118.316025,4.0,Park,Wine Bar,Playground,Yoga Studio,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop
4,7416002028,1/5/2018,Single Family,649 N FRIGATE AVE,LOS ANGELES,CA,90744,33.777863,-118.278247,4.0,Donut Shop,Burger Joint,Grocery Store,Market,Food Truck,Mexican Restaurant,Food,Boat or Ferry,Seafood Restaurant,Frame Store
5,2162005011,11/10/2018,Single Family,5257 N NEWCASTLE AVE,ENCINO,CA,91316,34.166353,-118.524347,4.0,Video Store,Pizza Place,Shoe Store,Chinese Restaurant,Sushi Restaurant,Eastern European Restaurant,Supermarket,Bakery,Convenience Store,Gas Station
6,6006027004,11/6/2018,Single Family,500 E 59TH PL,LOS ANGELES,CA,90003,33.986083,-118.266182,4.0,Taco Place,Clothing Store,Frame Store,Shopping Mall,Food Truck,Food Stand,Food Court,Food,Flower Shop,Flea Market
7,5460010052,11/21/2018,Single Family,2409 N YORKSHIRE DR,LOS ANGELES,CA,90065,34.110697,-118.230535,4.0,Business Service,Café,Salad Place,Grocery Store,Yoga Studio,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food
8,2524021020,1/4/2018,Single Family,13279 W VAUGHN ST,LOS ANGELES,CA,91340,34.281704,-118.422398,1.0,Taco Place,Convenience Store,Athletics & Sports,Mexican Restaurant,Yoga Studio,Fast Food Restaurant,Football Stadium,Food Truck,Food Stand,Food Court
9,2425016027,2/2/2018,Single Family,3548 N MULTIVIEW DR,LOS ANGELES,CA,90068,34.130287,-118.360437,4.0,Yoga Studio,Marijuana Dispensary,Italian Restaurant,Dive Shop,Mexican Restaurant,Mediterranean Restaurant,Scenic Lookout,Football Stadium,Food Truck,Food Stand


<h3>Visualize the resulting clusters</h3>

In [40]:
# create map
map_clusters = folium.Map(location=[latitude_LA, longitude_LA], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(LA_merged['Latitude'], LA_merged['Longitude'], LA_merged['Address'], LA_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[cluster-1],
        fill=True,
        # fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

folium.map.LayerControl('topright', collapsed=False).add_to(map_clusters)
map_clusters

<h3>Examine each cluster and determine the discriminating venue categories that distinguish each cluster.</h3>

<h3>Cluster 1</h3>

In [41]:
LA_merged.loc[LA_merged['Cluster Labels'] == 0, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
46,4868 N ADELE CT,CA,91364,34.159318,-118.572731,0.0,Smoke Shop,Yoga Studio,Farmers Market,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop,Flea Market


<h3>Cluster 2</h3>

In [97]:
LA_merged.loc[LA_merged['Cluster Labels'] == 1, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,13279 W VAUGHN ST,CA,91340,34.281704,-118.422398,1.0,Taco Place,Convenience Store,Athletics & Sports,Mexican Restaurant,Yoga Studio,Fast Food Restaurant,Football Stadium,Food Truck,Food Stand,Food Court
15,15468 W MORRISON ST,CA,91403,34.160296,-118.471627,1.0,Moving Target,General Entertainment,BBQ Joint,Fast Food Restaurant,French Restaurant,Frame Store,Football Stadium,Food Truck,Food Stand,Food Court
16,14235 W VANOWEN ST,CA,91405,34.193929,-118.443275,1.0,Dive Bar,Convenience Store,Furniture / Home Store,Breakfast Spot,Mexican Restaurant,Bank,BBQ Joint,Gym Pool,Dutch Restaurant,Flower Shop
27,1916 N HANCOCK ST,CA,90031,34.066688,-118.20893,1.0,Mexican Restaurant,Burger Joint,Park,Food Truck,Optical Shop,Lake,Football Stadium,Food Stand,Food Court,Food
32,7014 N COLBATH AVE,CA,91405,34.198098,-118.436712,1.0,Mexican Restaurant,Convenience Store,Video Store,Yoga Studio,Farmers Market,Football Stadium,Food Truck,Food Stand,Food Court,Food
33,3749 N GLENALBYN DR,CA,90065,34.093187,-118.216053,1.0,Liquor Store,Mexican Restaurant,Smoke Shop,Lawyer,Farmers Market,Football Stadium,Food Truck,Food Stand,Food Court,Food
35,11622 W KESWICK ST,CA,91605,34.210274,-118.385325,1.0,Rental Car Location,Liquor Store,Mexican Restaurant,Burger Joint,Gym / Fitness Center,Convenience Store,Auto Workshop,Sushi Restaurant,Donut Shop,Coffee Shop
39,1240 N RAVENNA AVE,CA,90744,33.787835,-118.268817,1.0,Dessert Shop,Bank,Mexican Restaurant,Seafood Restaurant,Burger Joint,Liquor Store,Pizza Place,Dog Run,Financial or Legal Service,Football Stadium
58,4235 W 59TH ST,CA,90043,33.98709,-118.348847,1.0,Seafood Restaurant,Southern / Soul Food Restaurant,Cosmetics Shop,Convenience Store,Video Store,Mexican Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Yoga Studio,Food Stand
71,1000 E 74TH ST,CA,90001,33.973003,-118.257629,1.0,Bank,Seafood Restaurant,Mexican Restaurant,Construction & Landscaping,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop


<h3>Cluster 3</h3>

In [98]:
LA_merged.loc[LA_merged['Cluster Labels'] == 2, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,9847 W PORTOLA DR,CA,90210,34.110322,-118.433459,2.0,Trail,Yoga Studio,Fried Chicken Joint,Frame Store,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop
21,6901 W CAHUENGA PARK TR,CA,90068,34.123224,-118.345339,2.0,Rock Club,Piano Bar,Marijuana Dispensary,Trail,Yoga Studio,Fabric Shop,Food Truck,Food Stand,Food Court,Food
47,1916 W WHITMORE AVE,CA,90039,34.095644,-118.251188,2.0,Trail,Music Venue,Liquor Store,Food Truck,Pilates Studio,Basketball Court,Food Stand,Food Court,Food,Flower Shop
70,11407 N SANTINI LANE,CA,91326,34.278566,-118.587486,2.0,Trail,Pool,Yoga Studio,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop,Flea Market
79,20628 W COMO LANE,CA,91326,34.277867,-118.582721,2.0,Pool,Trail,Scenic Lookout,Food Truck,Food Stand,Food Court,Food,Flower Shop,Flea Market,Fish & Chips Shop
87,7241 W WOODROW WILSON DR,CA,90068,34.124095,-118.35024,2.0,Trail,Rock Club,Yoga Studio,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop,Flea Market


<h3>Cluster 4</h3>

In [99]:
LA_merged.loc[LA_merged['Cluster Labels'] == 3, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,13375 N PHILLIPPI AVE,CA,91342,34.315109,-118.449756,3.0,Home Service,Yoga Studio,Fabric Shop,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop,Flea Market


<h3>Cluster 5</h3>

In [100]:
LA_merged.loc[LA_merged['Cluster Labels'] == 4, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1217 W 60TH PL,CA,90044,33.985025,-118.296540,4.0,Bus Stop,Restaurant,Latin American Restaurant,Laundromat,Fish & Chips Shop,Yoga Studio,Farmers Market,Football Stadium,Food Truck,Food Stand
1,20519 W HATTERAS ST,CA,91367,34.175498,-118.579552,4.0,Pool,Business Service,Construction & Landscaping,Home Service,Farmers Market,Football Stadium,Food Truck,Food Stand,Food Court,Food
2,2801 N MARSH ST,CA,90039,34.104301,-118.248531,4.0,Furniture / Home Store,Food Truck,Park,Brewery,Dance Studio,Café,Sandwich Place,Coffee Shop,Food Stand,Food Court
3,2071 W 52ND ST,CA,90062,33.995044,-118.316025,4.0,Park,Wine Bar,Playground,Yoga Studio,Football Stadium,Food Truck,Food Stand,Food Court,Food,Flower Shop
4,649 N FRIGATE AVE,CA,90744,33.777863,-118.278247,4.0,Donut Shop,Burger Joint,Grocery Store,Market,Food Truck,Mexican Restaurant,Food,Boat or Ferry,Seafood Restaurant,Frame Store
5,5257 N NEWCASTLE AVE,CA,91316,34.166353,-118.524347,4.0,Video Store,Pizza Place,Shoe Store,Chinese Restaurant,Sushi Restaurant,Eastern European Restaurant,Supermarket,Bakery,Convenience Store,Gas Station
6,500 E 59TH PL,CA,90003,33.986083,-118.266182,4.0,Taco Place,Clothing Store,Frame Store,Shopping Mall,Food Truck,Food Stand,Food Court,Food,Flower Shop,Flea Market
7,2409 N YORKSHIRE DR,CA,90065,34.110697,-118.230535,4.0,Business Service,Café,Salad Place,Grocery Store,Yoga Studio,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food
9,3548 N MULTIVIEW DR,CA,90068,34.130287,-118.360437,4.0,Yoga Studio,Marijuana Dispensary,Italian Restaurant,Dive Shop,Mexican Restaurant,Mediterranean Restaurant,Scenic Lookout,Football Stadium,Food Truck,Food Stand
10,4758 N SUNNYSLOPE AVE,CA,91423,34.157384,-118.426876,4.0,Cosmetics Shop,ATM,Smoke Shop,Clothing Store,Coffee Shop,Fabric Shop,Football Stadium,Bakery,Pet Store,Gym / Fitness Center
