<h1>Introduction/Business Problem</h1>

Predicting single family home foreclosure risk is an inexact science which in the past twenty years has largely depended on statistical analysis. To date, statistical analysis is heavily reliant on industry data, such as buyer lending information or static locations (near freeways, schools, etc). But from looking at social media data like what is found on FourSquare, there exists an opportunity to consider a change to the existing banking model that would make predicting foreclosure risk more dynamic and therefore accurate. If true, lending banks could save significant amounts of money from the thousands of automated home mortgage decisions they make, beyond what they are making with real estate solution products available through companies like CoreLogic or SiteXData. Other firms, such as Investment Banks or Financial Services, who deal in Mortgage Backed Securities, already rely on statistical predictive software to aid in their decisions. Lehman Brothers is an example of a financial services firm which before 2008 failed to make sound decisions with respect to their mortgage backed securities.  

The potential model to a solution for this problem comes from what this capstone course has already accomplished, that is, analyzing Foursquare neighborhood venue data for the city of Toronto. In the following study, however, I apply a similar code-based analysis to the city of Los Angeles. However, it is not just a copy. Adjusted steps have been made that go beyond the original Toronto Foursquare project with respect to addresses instead of neighborhoods.

<h1>Data</h1> 

The steps involved in analyzing this business problem were the following:
<OL>
    <LI>From their website, LA County provides 2018 lat/lon gecoding information for the city of Los Angeles, which also includes LA address, zipcode and neighborhood data.</LI>
    <LI>Included in this data, is a CSV file of 2018 foreclosures for the city of Los Angeles.</LI>
    <LI>In addition, I used Foursquare to explore Los Angeles, and come up with venues near each foreclosed address.</LI>
    <LI>FourSquare venues were organized by category type, such as restaurants, parks, cafes, and even intersections.</LI>
    <LI>As a result, each Los Angeles foreclosure included many venues within a 500 meter radius, at a maximum of 100 venues. For example, the result showed whether Parks, baseball fields and ATM's existed near each foreclosure address, along with many other venues. The venues were determined by crowdsourced data mined by FourSquare, and not by me.</LI>
    <LI>I then eye balled the resulting Foreclosure data to see if it was worth analyzing through Machine Learning. From my first observations I found consistent venue trends in the vicinity of each foreclosed address.</LI>
    <LI>As a result of my initial findings, I implemented Kmeans Clustering using Watson Studio to show how specific venues either increase, have no impact, or reduce foreclosure risk.</LI>
    <LI>Clustering provides a new and interesting approach to the problem of foreclosure risk. Clusters allow the analyst to find predictors based on venues that are unique to that cluster. It is perhaps unlikely to find a single venue predictor for all potential addresses. It could be possible, however, to predict foreclosure based on certain venues within clustered addresses that all share the same venues. In summary, clustering creates the possibility that foreclosure analysis will in the future become totally dependent on machine learning, requiring constant access to real data.]</LI>
</OL>
<p>
My initial predictions was that conclusive venue data did indeed exist to predict or at least signal foreclosure risk of properties in specific areas. This data suggested to me the rollowing possible results:</p>

<table>
    <tr><td><b>PREDICTED RESULT</b></td><td><b>PREDICTED EFFECT</b></td></tr>
    <tr><td>Low frequency of LA parks or common retail stores near LA foreclosed properties</td><td>Signals lower than average foreclosure risk</td></tr>
    <tr><td>Mild frequency of LA cafe's, sit down restaurants and hotels near foreclosed properties</td><td>Signals normal foreclosure risk</td></tr>
    <tr><td>High frequency of LA fast food venues of all types, including food trucks, burger joint and taco shops</td><td>Signals high foreclosure risk</td></tr>
</table>

Once I have completed the LAForeclosure/FourSquare analysis, and I have evidence it backs up my above predictions, I will make a business plan. The business plan will first include looking at much more foreclosure data from Los Angeles, then repeating the same analysis on other cities, such as Boston, Seattle, Chicago and Houston. I will continue to use FourSquare social media generated data, to see if the results for 2018 data are repeatable in all locations. If the results are consistent, this study can be integrated to a deeper mathematical analyis, that can then be serviced to financial institutions.

<h1>Methodology</h1>

<h2>Step 1: Loading 2018 LA foreclosure data, including geocodes, from a CSV file 100 records at at a time into a new DF</h2>

<h3>https://docs.google.com/spreadsheets/d/1J6PCrub6X1MuFwt6ZSxBx0Jv-McbnOuCQIZzKJdeK7I/edit#gid=393329626</h3>

In [1]:
#DEFINITIONS
import pandas as pd
#  https://docs.google.com/spreadsheets/d/16OMQGiVATyPgUCCxRfxw5vDKjmSQ5XIQwPQqgbop828/edit#gid=1171515980  (last version)
# entire 3100 rows  fc_data = pd.read_csv('https://docs.google.com/spreadsheets/d/16OMQGiVATyPgUCCxRfxw5vDKjmSQ5XIQwPQqgbop828/export?format=csv&gid=1171515980',
fc_data = pd.read_csv('https://docs.google.com/spreadsheets/d/1J6PCrub6X1MuFwt6ZSxBx0Jv-McbnOuCQIZzKJdeK7I/export?format=csv&gid=393329626',
                   # Set first column as rownames in data frame
                   header='infer', engine='python', error_bad_lines=False
                  )
fc_data.head(20)

# https://kanoki.org/2018/12/25/read-google-spreadsheet-data-into-pandas-dataframe/

Unnamed: 0,APN,RegisteredDate,PropertyType,Address,City,State,Zip,Latitude,Longitude
0,6003016019,10/31/2018,Multi-Family,1217 W 60TH PL,LOS ANGELES,CA,90044,33.985025,-118.29654
1,2151015040,11/1/2018,Single Family,20519 W HATTERAS ST,LOS ANGELES,CA,91367,34.175498,-118.579552
2,5442030015,11/5/2018,Single Family,2801 N MARSH ST,LOS ANGELES,CA,90039,34.104301,-118.248531
3,5015020012,1/17/2018,Multi-Family,2071 W 52ND ST,LOS ANGELES,CA,90062,33.995044,-118.316025
4,7416002028,1/5/2018,Single Family,649 N FRIGATE AVE,LOS ANGELES,CA,90744,33.777863,-118.278247
5,2162005011,11/10/2018,Single Family,5257 N NEWCASTLE AVE,ENCINO,CA,91316,34.166353,-118.524347
6,6006027004,11/6/2018,Single Family,500 E 59TH PL,LOS ANGELES,CA,90003,33.986083,-118.266182
7,5460010052,11/21/2018,Single Family,2409 N YORKSHIRE DR,LOS ANGELES,CA,90065,34.110697,-118.230535
8,2524021020,1/4/2018,Single Family,13279 W VAUGHN ST,LOS ANGELES,CA,91340,34.281704,-118.422398
9,2425016027,2/2/2018,Single Family,3548 N MULTIVIEW DR,LOS ANGELES,CA,90068,34.130287,-118.360437


<h2>Step 2: Exploring and Clustering Foreclosed Addresses (2018) in Los Angeles</h2>

<h3>First, get lat/lon of Los Angeles</h3>

In [2]:
# Use geopy library to get the latitude and longitude values of Los Angeles
from geopy.geocoders import Nominatim 
#Nominatim is a tool to search OSM (open street map) data by name and address and to generate synthetic addresses of OSM points (reverse geocoding).
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium # folium is mapping tool
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [3]:
address = 'Los Angeles, California'
geolocator = Nominatim(user_agent="Los Angeles")
location = geolocator.geocode(address)
latitude_LA = location.latitude
longitude_LA = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude_LA, longitude_LA))

The geograpical coordinate of Los Angeles are 34.0536909, -118.2427666.


<h3>Show Map of Los Angeles including 100 foreclosure addresses from 2018</h3>

In [4]:
map_LA = folium.Map(location=[latitude_LA, longitude_LA], zoom_start=10)

# add markers to map
for lat, lng, address, zipcode in zip(fc_data['Latitude'], fc_data['Longitude'], fc_data['Address'], fc_data['Zip']):
    label = '{}, {}'.format(zipcode, address)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_LA)  
    
map_LA

<h3>Define Foursquare Credentials and Version</h3>

In [5]:
CLIENT_ID = "RBX2RESNE1HWSSGX020MOSI4CTL4SIP4QQPZFGJ2LF4EBOMX"  #Foursquare ID
CLIENT_SECRET = "YBZIPEL5GKJX2T13SPD5TDM4ZFXRV51SO2WDNKEGUTSMIWIU"
# was CLIENT_SECRET = "H2LKEFPFFGH4L3WLNDFLPVF0O11S55UMOAIUEX0S4MHH3GAV"  #Foursquare Secret

VERSION = "20180605"  #API Version

print("Your Credentials:")
print("CLIENT ID:" + CLIENT_ID)
print("CLIENT SECRET" + CLIENT_SECRET)

Your Credentials:
CLIENT ID:RBX2RESNE1HWSSGX020MOSI4CTL4SIP4QQPZFGJ2LF4EBOMX
CLIENT SECRETYBZIPEL5GKJX2T13SPD5TDM4ZFXRV51SO2WDNKEGUTSMIWIU


In [6]:
#  radius is in meters and limit is the number of venues to get
radius=500
LIMIT=100

<h3>Explore the first zip in the Dataframe.</h3>

In [7]:
# Get the first address zip
fc_data.loc[0, 'Zip']

90044

<h3>Get the address latitude and longitude values.</h3>

In [8]:

address_latitude = fc_data.loc[0, 'Latitude'] # address latitude value
address_longitude = fc_data.loc[0, 'Longitude'] # address longitude value
address_zip = fc_data.loc[0, 'Zip'] # address Zip

print('Latitude and longitude values of {} are {}, {}.'.format(address_zip, 
                                                               address_latitude, 
                                                               address_longitude))

Latitude and longitude values of 90044 are 33.985025, -118.29654.


<h3>The Top 100 Los Angeles venues that are in zip 90044 within a radius of 500 meters.</h3>

In [9]:
#First, let's create the GET request URL.
# radius = 500
# LIMIT = 100

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    address_latitude,
    address_longitude, 
    radius, 
    LIMIT)
    
url  #display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=RBX2RESNE1HWSSGX020MOSI4CTL4SIP4QQPZFGJ2LF4EBOMX&client_secret=YBZIPEL5GKJX2T13SPD5TDM4ZFXRV51SO2WDNKEGUTSMIWIU&v=20180605&ll=33.985025,-118.29654&radius=500&limit=100'

In [12]:
#Send the GET Request and examine the results
import requests

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cf4932e1ed21914bf066cd0'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4fcaab5ee4b0d8a39149da92-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fishandchips_',
          'suffix': '.png'},
         'id': '4edd64a0c7ddd24ca188df1a',
         'name': 'Fish & Chips Shop',
         'pluralName': 'Fish & Chips Shops',
         'primary': True,
         'shortName': 'Fish & Chips'}],
       'id': '4fcaab5ee4b0d8a39149da92',
       'location': {'address': '5950 S Normandie Ave',
        'cc': 'US',
        'city': 'Los Angeles',
        'country': 'United States',
        'crossStreet': 'Normandie & 59th',
        'distance': 369,
        'formattedAddress': ['5950 S Normandie Ave (Normandie & 59th)',
         'Los Angeles, CA 9004

<h3>Get the category of each Los Angeles venue</h3>

In [10]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

<h3>Clean the json and structure it into a pandas dataframe.</h3>

In [13]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,east west fish market,Fish & Chips Shop,33.9866,-118.300067
1,Don Lenchos Restaurant,Restaurant,33.983687,-118.300546
2,Happy Laundry 24hrs,Laundromat,33.986883,-118.300362
3,Amapulapa,Latin American Restaurant,33.987215,-118.300351
4,Normandie / Gage,Bus Stop,33.98214,-118.30017


In [14]:
# And how many venues were returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


<h3>Explore Neighborhoods in Los Angeles 100 rows at a time. Repeat same process for all 2018 foreclosured addresses in Los Angeles.</h3>

In [15]:
def getNearbyVenues(address, city, zipcode, latitude, longitude, radius=500):
    
    venues_list=[]
    for addr, cty, zcode, lat, lng in zip(address, city, zipcode, latitude, longitude):
        print(addr)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            addr,
            zcode,
            lat, 
            lng,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['FC Address', 
                  'FC Zip',
                  'FC Latitude',
                  'FC Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<h3>Create code to run the above function on each foreclosed address and create a new dataframe called LA_venues.</h3>

In [16]:
LA_venues = getNearbyVenues(address=fc_data['Address'],
                                   city=fc_data['City'],
                                   zipcode=fc_data['Zip'],
                                   latitude=fc_data['Latitude'],
                                   longitude=fc_data['Longitude']
                                  )
# LA_venues.head(25)

1217 W 60TH PL
20519 W HATTERAS ST
2801 N MARSH ST
2071 W 52ND ST
649 N FRIGATE AVE
5257 N NEWCASTLE AVE
500 E 59TH PL
2409 N YORKSHIRE DR
13279 W VAUGHN ST
3548 N MULTIVIEW DR
4758 N SUNNYSLOPE AVE
901 N STRADA VECCHIA ROAD
9847 W PORTOLA DR
5869 W DAUPHIN AVE
6951 N ETIWANDA AVE
15468 W MORRISON ST
14235 W VANOWEN ST
213 N TIGERTAIL ROAD
1021 W LOWEN ST
2038 S MALCOLM AVE
15935 W COMMUNITY ST
6901 W CAHUENGA PARK TR
1027 S CENTRE ST
23222 W AETNA ST
13640 W GARBER ST
5211 N DARRO ROAD
20321 W VIA SANSOVINO
1916 N HANCOCK ST
929 S MUIRFIELD RD
10601 W WILSHIRE BLVD 1-72
7217 S DENKER AVE
12383 W COVELLO ST
7014 N COLBATH AVE
3749 N GLENALBYN DR
1334 E 57TH ST
11622 W KESWICK ST
9470 S HOBART BLVD
2568 S MILITARY AVE
635 S BYNNER DR
1240 N RAVENNA AVE
1871 W 25TH ST
2651 N ABERDEEN AVE
7129 W HILLROSE ST
19501 W ROMAR ST
1445  YOSEMITE DR
14325 W TORI CT
4868 N ADELE CT
1916 W WHITMORE AVE
11846 S ORCHARD AVE
727 N ALTA VISTA BLVD
11221 N AMBOY AVE
17169 W STARE ST
6242 W TEMPLE HILL D

<h3>Check the size of the resulting dataframe</h3>

In [17]:
print(LA_venues.shape)
LA_venues.head()

(969, 8)


Unnamed: 0,FC Address,FC Zip,FC Latitude,FC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1217 W 60TH PL,90044,33.985025,-118.29654,east west fish market,33.9866,-118.300067,Fish & Chips Shop
1,1217 W 60TH PL,90044,33.985025,-118.29654,Don Lenchos Restaurant,33.983687,-118.300546,Restaurant
2,1217 W 60TH PL,90044,33.985025,-118.29654,Happy Laundry 24hrs,33.986883,-118.300362,Laundromat
3,1217 W 60TH PL,90044,33.985025,-118.29654,Amapulapa,33.987215,-118.300351,Latin American Restaurant
4,1217 W 60TH PL,90044,33.985025,-118.29654,Normandie / Gage,33.98214,-118.30017,Bus Stop


<h3>Check how many venues were returned for each LA Foreclosed (FC) Address</h3>

In [18]:
LA_venues.groupby('FC Address').count()

Unnamed: 0_level_0,FC Zip,FC Latitude,FC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
FC Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1000 E 74TH ST,4,4,4,4,4,4,4
10000 N OWENSMOUTH AVE,12,12,12,12,12,12,12
1021 W LOWEN ST,4,4,4,4,4,4,4
1027 S CENTRE ST,24,24,24,24,24,24,24
10650 W CHIQUITA ST,12,12,12,12,12,12,12
1077 S RIMPAU BLVD,6,6,6,6,6,6,6
11163 N TELFAIR AVE,2,2,2,2,2,2,2
11221 N AMBOY AVE,6,6,6,6,6,6,6
11407 N SANTINI LANE,3,3,3,3,3,3,3
11622 W KESWICK ST,14,14,14,14,14,14,14


<h3>Find out how many unique categories can be curated from all the returned LA venues</h3>

In [19]:
print('There are {} unique categories.'.format(len(LA_venues['Venue Category'].unique())))

There are 215 unique categories.


<h3>Analyze Each LA Foreclosed Address</h3>

In [20]:
# one hot encoding
LA_onehot = pd.get_dummies(LA_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
LA_onehot['FC Address'] = LA_venues['FC Address'] 

# move neighborhood column to the first column
fixed_columns = [LA_onehot.columns[-1]] + list(LA_onehot.columns[:-1])
LA_onehot = LA_onehot[fixed_columns]

LA_onehot.head()

Unnamed: 0,FC Address,ATM,Acupuncturist,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo Exhibit
0,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1217 W 60TH PL,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h3>What is the new dataframe size?</h3>

In [21]:
LA_onehot.shape

(969, 216)

<h3>Group rows by LA FC Address by taking the mean of the frequency of occurrence of each category.</h3>

In [22]:
LA_grouped = LA_onehot.groupby('FC Address').mean().reset_index()
LA_grouped

Unnamed: 0,FC Address,ATM,Acupuncturist,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo Exhibit
0,1000 E 74TH ST,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
1,10000 N OWENSMOUTH AVE,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
2,1021 W LOWEN ST,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
3,1027 S CENTRE ST,0.000000,0.00000,0.000000,0.000000,0.000000,0.041667,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.041667,0.000000,0.0
4,10650 W CHIQUITA ST,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
5,1077 S RIMPAU BLVD,0.000000,0.00000,0.166667,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
6,11163 N TELFAIR AVE,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
7,11221 N AMBOY AVE,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
8,11407 N SANTINI LANE,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0
9,11622 W KESWICK ST,0.000000,0.00000,0.000000,0.071429,0.000000,0.000000,0.000000,0.071429,0.000000,...,0.000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,0.000000,0.000000,0.0


<h3>Confirm the new size</h3>

In [23]:
LA_grouped.shape

(93, 216)

<h3>Print each LA FC address along with the top 5 most common LA venues</h3>

In [24]:
num_top_venues = 5

for address in LA_grouped['FC Address']:
    print("----"+address+"----")
    temp = LA_grouped[LA_grouped['FC Address'] == address].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1000 E 74TH ST----
                        venue  freq
0          Seafood Restaurant  0.25
1  Construction & Landscaping  0.25
2          Mexican Restaurant  0.25
3                        Bank  0.25
4                   Pet Store  0.00


----10000 N OWENSMOUTH AVE----
                  venue  freq
0        Sandwich Place  0.08
1  Fast Food Restaurant  0.08
2            Smoke Shop  0.08
3           Coffee Shop  0.08
4    Persian Restaurant  0.08


----1021 W LOWEN ST----
                  venue  freq
0        Baseball Field  0.25
1   Fried Chicken Joint  0.25
2  Gym / Fitness Center  0.25
3            Hobby Shop  0.25
4                   ATM  0.00


----1027 S CENTRE ST----
             venue  freq
0  Thai Restaurant  0.08
1        Rock Club  0.04
2            Hotel  0.04
3             Café  0.04
4       Sports Bar  0.04


----10650 W CHIQUITA ST----
                venue  freq
0  Italian Restaurant  0.17
1                Park  0.17
2                 Bar  0.08
3        Dance Studio  

<h3>Put above into a pandas dataframe</h3>

In [25]:
# First write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
num_top_venues = 10
import numpy as np # library to handle data in a vectorized manner

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['FC Address']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
fcaddress_venues_sorted = pd.DataFrame(columns=columns)
fcaddress_venues_sorted['FC Address'] = LA_grouped['FC Address']

for ind in np.arange(LA_grouped.shape[0]):
    fcaddress_venues_sorted.iloc[ind, 1:] = return_most_common_venues(LA_grouped.iloc[ind, :], num_top_venues)

fcaddress_venues_sorted.head()

Unnamed: 0,FC Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1000 E 74TH ST,Bank,Mexican Restaurant,Construction & Landscaping,Seafood Restaurant,Farmers Market,Football Stadium,Food Truck,Food Stand,Food Court,Food
1,10000 N OWENSMOUTH AVE,Vegetarian / Vegan Restaurant,Pet Store,Salon / Barbershop,Coffee Shop,Fast Food Restaurant,Train Station,Persian Restaurant,Sandwich Place,Bus Station,Japanese Restaurant
2,1021 W LOWEN ST,Gym / Fitness Center,Hobby Shop,Baseball Field,Fried Chicken Joint,Farm,Football Stadium,Food Truck,Food Stand,Food Court,Food
3,1027 S CENTRE ST,Thai Restaurant,Brewery,Sandwich Place,Taco Place,Sushi Restaurant,Steakhouse,Sports Bar,Mexican Restaurant,Seafood Restaurant,Café
4,10650 W CHIQUITA ST,Italian Restaurant,Park,Gym,Food Truck,Breakfast Spot,General Entertainment,Bar,Discount Store,Ramen Restaurant,Dance Studio


<h3>Cluster FCAddresses--Run k-means to cluster the FCAdresses into 5 clusters.</h3>

In [27]:
# set number of clusters
kclusters = 5

LA_grouped_clustering = LA_grouped.drop('FC Address', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(LA_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

# my addition, not necessary, kmeans.labels_ = kmeans.labels_.astype('int32') 

array([0, 0, 0, 0, 0, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 4,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0,
       0, 0, 0, 0, 0, 3, 0, 0, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1], dtype=int32)

In [28]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# add clustering labels
fcaddress_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
LA_merged = fc_data

# merge LA_grouped with LA_data to add latitude/longitude for each address
LA_merged = LA_merged.join(fcaddress_venues_sorted.set_index('FC Address'), on='Address')
LA_merged.head(20) # check the last columns!

Unnamed: 0,APN,RegisteredDate,PropertyType,Address,City,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,6003016019,10/31/2018,Multi-Family,1217 W 60TH PL,LOS ANGELES,CA,90044,33.985025,-118.29654,0.0,Bus Stop,Restaurant,Food,Latin American Restaurant,Laundromat,Fish & Chips Shop,Zoo Exhibit,Farmers Market,Football Stadium,Food Truck
1,2151015040,11/1/2018,Single Family,20519 W HATTERAS ST,LOS ANGELES,CA,91367,34.175498,-118.579552,0.0,Pool,Business Service,Farmers Market,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop
2,5442030015,11/5/2018,Single Family,2801 N MARSH ST,LOS ANGELES,CA,90039,34.104301,-118.248531,0.0,Furniture / Home Store,Food Truck,Brewery,Dance Studio,Café,Sandwich Place,Park,Zoo Exhibit,Farmers Market,Food Stand
3,5015020012,1/17/2018,Multi-Family,2071 W 52ND ST,LOS ANGELES,CA,90062,33.995044,-118.316025,0.0,Playground,Zoo Exhibit,Farm,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop
4,7416002028,1/5/2018,Single Family,649 N FRIGATE AVE,LOS ANGELES,CA,90744,33.777863,-118.278247,0.0,Donut Shop,Burger Joint,Market,Food Truck,Mobile Phone Shop,Mexican Restaurant,Food,Boat or Ferry,Grocery Store,Seafood Restaurant
5,2162005011,11/10/2018,Single Family,5257 N NEWCASTLE AVE,ENCINO,CA,91316,34.166353,-118.524347,0.0,Video Store,Italian Restaurant,Bakery,Discount Store,Pet Store,Convenience Store,Eastern European Restaurant,Chinese Restaurant,Shoe Store,Middle Eastern Restaurant
6,6006027004,11/6/2018,Single Family,500 E 59TH PL,LOS ANGELES,CA,90003,33.986083,-118.266182,4.0,Taco Place,Food,Shopping Mall,Zoo Exhibit,Fried Chicken Joint,Football Stadium,Food Truck,Food Stand,Food Court,Fish & Chips Shop
7,5460010052,11/21/2018,Single Family,2409 N YORKSHIRE DR,LOS ANGELES,CA,90065,34.110697,-118.230535,0.0,Scenic Lookout,Salad Place,Music Venue,Café,Grocery Store,Bar,Zoo Exhibit,Dry Cleaner,Dumpling Restaurant,Discount Store
8,2524021020,1/4/2018,Single Family,13279 W VAUGHN ST,LOS ANGELES,CA,91340,34.281704,-118.422398,0.0,Convenience Store,Taco Place,Mexican Restaurant,Athletics & Sports,Other Repair Shop,Zoo Exhibit,Fast Food Restaurant,Football Stadium,Food Truck,Food Stand
9,2425016027,2/2/2018,Single Family,3548 N MULTIVIEW DR,LOS ANGELES,CA,90068,34.130287,-118.360437,0.0,Italian Restaurant,Marijuana Dispensary,Food Truck,Café,Mexican Restaurant,Mediterranean Restaurant,Scenic Lookout,Yoga Studio,Drugstore,Financial or Legal Service


<h3>Visualize the resulting clusters</h3>

In [39]:
# create map
import math
map_clusters = folium.Map(location=[latitude_LA, longitude_LA], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = [] 
for lat, lon, poi, cluster in zip(LA_merged['Latitude'], LA_merged['Longitude'], LA_merged['Address'], LA_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    if math.isnan(cluster): cluster = 0
    if isinstance(cluster, float): cluster = int(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[(cluster-1)], 
        fill=True,
        fill_color=rainbow[(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

<h3>Examine each cluster and determine the discriminating venue categories that distinguish each cluster.</h3>

<h3>Cluster 1</h3>

In [40]:
LA_merged.loc[LA_merged['Cluster Labels'] == 0, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1217 W 60TH PL,CA,90044,33.985025,-118.296540,0.0,Bus Stop,Restaurant,Food,Latin American Restaurant,Laundromat,Fish & Chips Shop,Zoo Exhibit,Farmers Market,Football Stadium,Food Truck
1,20519 W HATTERAS ST,CA,91367,34.175498,-118.579552,0.0,Pool,Business Service,Farmers Market,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop
2,2801 N MARSH ST,CA,90039,34.104301,-118.248531,0.0,Furniture / Home Store,Food Truck,Brewery,Dance Studio,Café,Sandwich Place,Park,Zoo Exhibit,Farmers Market,Food Stand
3,2071 W 52ND ST,CA,90062,33.995044,-118.316025,0.0,Playground,Zoo Exhibit,Farm,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop
4,649 N FRIGATE AVE,CA,90744,33.777863,-118.278247,0.0,Donut Shop,Burger Joint,Market,Food Truck,Mobile Phone Shop,Mexican Restaurant,Food,Boat or Ferry,Grocery Store,Seafood Restaurant
5,5257 N NEWCASTLE AVE,CA,91316,34.166353,-118.524347,0.0,Video Store,Italian Restaurant,Bakery,Discount Store,Pet Store,Convenience Store,Eastern European Restaurant,Chinese Restaurant,Shoe Store,Middle Eastern Restaurant
7,2409 N YORKSHIRE DR,CA,90065,34.110697,-118.230535,0.0,Scenic Lookout,Salad Place,Music Venue,Café,Grocery Store,Bar,Zoo Exhibit,Dry Cleaner,Dumpling Restaurant,Discount Store
8,13279 W VAUGHN ST,CA,91340,34.281704,-118.422398,0.0,Convenience Store,Taco Place,Mexican Restaurant,Athletics & Sports,Other Repair Shop,Zoo Exhibit,Fast Food Restaurant,Football Stadium,Food Truck,Food Stand
9,3548 N MULTIVIEW DR,CA,90068,34.130287,-118.360437,0.0,Italian Restaurant,Marijuana Dispensary,Food Truck,Café,Mexican Restaurant,Mediterranean Restaurant,Scenic Lookout,Yoga Studio,Drugstore,Financial or Legal Service
10,4758 N SUNNYSLOPE AVE,CA,91423,34.157384,-118.426876,0.0,Clothing Store,ATM,Pet Store,Smoke Shop,Chinese Restaurant,Coffee Shop,Bakery,Gym / Fitness Center,Fabric Shop,Pilates Studio


<h3>Cluster 2</h3>

In [35]:
LA_merged.loc[LA_merged['Cluster Labels'] == 1, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,9847 W PORTOLA DR,CA,90210,34.110322,-118.433459,1.0,Trail,Farm,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop,Financial or Legal Service
47,1916 W WHITMORE AVE,CA,90039,34.095644,-118.251188,1.0,Trail,Liquor Store,Music Venue,Food Truck,Pilates Studio,Home Service,Basketball Court,Farm,Food Stand,Food Court
70,11407 N SANTINI LANE,CA,91326,34.278566,-118.587486,1.0,Trail,Pool,Dessert Shop,Farm,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop
79,20628 W COMO LANE,CA,91326,34.277867,-118.582721,1.0,Scenic Lookout,Trail,Pool,Diner,Farm,Football Stadium,Food Truck,Food Stand,Food Court,Food


<h3>Cluster 3</h3>

In [36]:
LA_merged.loc[LA_merged['Cluster Labels'] == 2, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,7129 W HILLROSE ST,CA,91042,34.262797,-118.286691,2.0,Construction & Landscaping,Zoo Exhibit,Farm,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop
61,13375 N PHILLIPPI AVE,CA,91342,34.315109,-118.449756,2.0,Construction & Landscaping,Home Service,Zoo Exhibit,Farm,Football Stadium,Food Truck,Food Stand,Food Court,Food,Fish & Chips Shop


<h3>Cluster 4</h3>

In [37]:
LA_merged.loc[LA_merged['Cluster Labels'] == 3, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,6901 W CAHUENGA PARK TR,CA,90068,34.123224,-118.345339,3.0,Art Gallery,Piano Bar,Rock Club,Marijuana Dispensary,Zoo Exhibit,Farmers Market,Football Stadium,Food Truck,Food Stand,Food Court
87,7241 W WOODROW WILSON DR,CA,90068,34.124095,-118.35024,3.0,Art Gallery,Rock Club,Zoo Exhibit,Farm,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food


<h3>Cluster 5</h3>

In [38]:
LA_merged.loc[LA_merged['Cluster Labels'] == 4, LA_merged.columns[[3] + list(range(5, LA_merged.shape[1]))]]

Unnamed: 0,Address,State,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,500 E 59TH PL,CA,90003,33.986083,-118.266182,4.0,Taco Place,Food,Shopping Mall,Zoo Exhibit,Fried Chicken Joint,Football Stadium,Food Truck,Food Stand,Food Court,Fish & Chips Shop
24,13640 W GARBER ST,CA,91331,34.248274,-118.430194,4.0,Taco Place,Locksmith,Zoo Exhibit,Farmers Market,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Food
53,11163 N TELFAIR AVE,CA,91331,34.272738,-118.435245,4.0,Food,Shopping Mall,Zoo Exhibit,Farmers Market,French Restaurant,Football Stadium,Food Truck,Food Stand,Food Court,Fish & Chips Shop


<h1>Results</h1>

1.) Low numbers of LA parks or common retail store venues exist near high numbers of foreclosed properties. The above clustered results show this to be true.
<OL>
<LI>Cluster1 Of the first five most common venues that are near 84 foreclosed properties, only 5 are parks and only 7 are either retail stores or banks.
<LI>Cluster2 Of the first five most common venues that are near 4 foreclosed properties, there are no parks or retail stores.
<LI>Cluster3 Of the first five most common venues that are near 2 foreclosed properties, there are no parks or retail stores.
<LI>Cluster4 Of the first five most common venues that are near 2 foreclosed properties, there are no parks or retail stores.
<LI>Cluster5 Of the first five most common venues that are near 3 foreclosed properties, there are no parks and one retail store.
</OL>
2.) Average frequency of LA cafe's, sit down restaurants and grocery stores are moderately present near foreclosed properties.
<OL>
<LI>Cluster1  Of the first five most common venues that are near 84 foreclosed properties, at leat one venue per property is a restaurant, cafe or grocery store.
<LI>Cluster2  Of the first five most common venues that are near 4 foreclosed properties, there are two farms and two restaurants.
<LI>Cluster3  Of the first five most common venues that are near 2 foreclosed properties, there are two farms.
<LI>Cluster4  Of the first five most common venues that are near 2 foreclosed properties, there is one restaurant.
<LI>Cluster5  Of the first five most common venues that are near 3 foreclosed properties, there are no parks and one retail store.
</OL>
3.) High frequency of Los Angeles fast food venues of all types, including burger joints, taco joints and food trucks, dominate foreclosured property locations.
<OL>
<LI>For cluster1, of the first five most common venues that are near 84 foreclosed properties, at least one is a burger joint, food truck, taco stand or misc other fast food.
<LI>For cluster2, of the first seven most common venues that are near 4 foreclosed properties, same as above.
<LI>For cluster3, of the first six most common venues that are near 2 foreclosed properties, at least two are food trucks.
<LI>For cluster4, of the first eight most common venues that are near 2 foreclosed properties, at least three are food trucks, with one marijuana dispensary.
<LI>For cluster,5 of the first nine most common venues that are near 3 foreclosed properties, thirteen are fast food venues.
</OL>

<h1>Discussion</h1>

Based on my observations, clustering foreclosure properties leads to surprising results. For example, in one very small cluster of foreclosed properties the nearest venues are an art gallery, piano bar and a rock club. It seems like such a cluster could represent a type of microclimate for foreclosures in an elite neighborhood. This suggests that stereotyping foreclosed venues will not work. Real data needs to be studied in order to create such microclimates of potential foreclosure risk. That fast food venues are associated with foreclosures in neighborhoods is obvious from the results, but the above data shows such an analysis only applies to certain clusters, and should not be applied across the board. I predict machine learning will play a huge role in this area in the future, if not already.

<h1>Conclusion</h1>

While my results are intriguing and even quite possibly accurate in their ability to indicate property foreclosure risk, clearly I need to evaluate many more data sets  to come up with convincing proof that FourSquare type data can lead to real time analysis of foreclosure trends throughout the world. I have 3000 more properties to include in the above analysis, along with downloads from other large cities in other states.