# The Battle of the Neighborhoods, but make it Miami
## IBM/Coursera Applied Data Science Capstone Project
#### Submitted by Carolina Belmonte

### Introduction

Before we dive in to what you're really here for, I wanted to share my thoughts on my journey through this course. I came in with basically zero knowledge of any of these concepts, and much worse...zero programming (read: Python) experience. I am very happy to report that this class has made me come a long way, but I am walking away (especially after all of the struggles it took get this capstone done!!) knowing where to focus myself - I look forward to deep diving into Python now that this course is coming to a close for me. I feel that to fully take advantage of the concepts & libraries presented in this course, I really need to sharpen my Python.
Thank you instructors & fellow students for an amazing journey!

##### Description of the problem and discussion of the background

<i>Miami, officially the City of Miami, is the seat of Miami-Dade County, and the cultural, economic and financial center of South Florida in the United States.</i> <br><br>
If you think I'm making up the bit above, I invite you to navigate to Google and enter "Miami, Florida" in the search bar :) As a Miami native, I've had the privilege of always being able to boast that I live where people vacation. 
<br>However, no place is perfect - Miami real estate is almost prohibitively expensive for your average citizen. According to <a href="https://www.zillow.com/miami-fl/home-values/">Zillow</a>: <i>The median home value in Miami is <u>366 519 USD</u>. Miami home values have gone up 0.6\% over the past year and Zillow predicts they will rise 3\% within the next year. The median list price per square foot in Miami is <u>421 USD</u>, which is higher than the Miami-Fort Lauderdale-West Palm Beach Metro average of <u>230 USD</u>. The median price of homes currently listed in Miami is <u>490 000 USD</u> while the median price of homes that sold is <u>337,800 USD</u>.</i> 

<br>This connundrum has left me wondering - for many years, in fact - how can we look for reasonably priced real estate in an area of Miami that still has a lot to offer?

##### Description of the data and how it will be used to solve the problem

Luckily, we live in a world (world-wide-web) full of open data sets. I knew I wanted to segment the city of Miami by its Zip Codes, so I first went on the hunt for a data set that contained all of the zip codes in Miami. I was able to find and download one from <a href="https://public.opendatasoft.com/explore/?sort=modified">ODS</a>. Next, I needed to locate data on property values in the City of Miami. This data was actually quite easy to locate and extract, as <a href="https://gis-mdc.opendata.arcgis.com/">Miami-Dade County (where the city sits) has an open data hub as well</a>. The final component would be data on nearby venues for each zip code. Based on the labs done with Foursquare, we will use the Foursquare API as the source of our venue data based on our zip codes. Putting all of these pieces together, we'll be able to explore Miami's zip codes to figure out where we can buy real estate sensibly but still have access to the amazing things Miami has to offer.

### Methodology

#### Let's start by downloading all the necessary libraries.

In [31]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# library to handle requests
import requests

In [2]:
# I've decided to import most of the visualization modules taught in this course in case they come in handy
%matplotlib inline 

import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mpl
import matplotlib.pyplot as plt

In [3]:
# Geo-visualization modules
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from folium import plugins
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster

# Address to latitude, longitude values
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [4]:
# SciKit Learn libraries Required
from sklearn import preprocessing
from sklearn.cluster import KMeans

Data was downloaded from the websites linked in the Introduction. I downloaded my data sets as CSV files and saved them as assets in my IBM Watson Studion project for access within my notebook. Let's start by getting the zip code data loaded and cleaned. Below you will notice that we are leveraging the benefits of the Watson Studio environment as assets will be added to notebook cells and converted directly into a pandas dataframe.

In [5]:
import types
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_9ca09bc0f7bd41479471f07e1e7b78a1 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='vro_ahuDuNFghIl5yxoq1i3ZBqoeWt1lkLUQWm354Pq1',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_9ca09bc0f7bd41479471f07e1e7b78a1.get_object(Bucket='pythonbasicsfordatascienceproject-donotdelete-pr-0fgaed45xfn755',Key='us-zip-code-latitude-and-longitude.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# If you are reading an Excel file into a pandas DataFrame, replace `read_csv` by `read_excel` in the next statement.
df_data_0 = pd.read_csv(body)
df_data_0.head()


Unnamed: 0,Zip;City;State;Latitude;Longitude;Timezone;Daylight savings time flag;geopoint
33233;Miami;FL;25.558428;-80.458168;-5;1;25.558428,-80.458168
33119;Miami Beach;FL;25.784526;-80.131967;-5;1;25.784526,-80.131967
33296;Miami;FL;25.558428;-80.458168;-5;1;25.558428,-80.458168
33168;Miami;FL;25.892185;-80.21032;-5;1;25.892185,-80.21032
33148;Miami;FL;25.558428;-80.458168;-5;1;25.558428,-80.458168


That's a messy and un-usable dataframe - let's clean it up!

In [6]:
# first let's rescue the data from the clutches of the index column
miami_zips_df = df_data_0.reset_index()
miami_zips_df.head()

Unnamed: 0,index,Zip;City;State;Latitude;Longitude;Timezone;Daylight savings time flag;geopoint
0,33233;Miami;FL;25.558428;-80.458168;-5;1;25.55...,-80.458168
1,33119;Miami Beach;FL;25.784526;-80.131967;-5;1...,-80.131967
2,33296;Miami;FL;25.558428;-80.458168;-5;1;25.55...,-80.458168
3,33168;Miami;FL;25.892185;-80.21032;-5;1;25.892185,-80.21032
4,33148;Miami;FL;25.558428;-80.458168;-5;1;25.55...,-80.458168


In [7]:
# now we can split up the string to properly populate the data frame
miami_zips = miami_zips_df['index'].str.split(";")
data = miami_zips.to_list()
names = ["ZipCode", "City", "State", "Latitude", "Longitude", "Timezone", "DSTflag", "Geopoint"]
miami_df = pd.DataFrame(data, columns=names)

miami_df.head()

Unnamed: 0,ZipCode,City,State,Latitude,Longitude,Timezone,DSTflag,Geopoint
0,33233,Miami,FL,25.558428,-80.458168,-5,1,25.558428
1,33119,Miami Beach,FL,25.784526,-80.131967,-5,1,25.784526
2,33296,Miami,FL,25.558428,-80.458168,-5,1,25.558428
3,33168,Miami,FL,25.892185,-80.21032,-5,1,25.892185
4,33148,Miami,FL,25.558428,-80.458168,-5,1,25.558428


In [8]:
# we are only interested in the city of Miami, so let's drop any zip codes associated to other cities within Miami-Dade County
miami_final = miami_df[miami_df.City != 'Miami Beach']
miami_final = miami_final[miami_final.City != 'North Miami Beach']

miami_final['Latitude'] = miami_final['Latitude'].astype(float)
miami_final['Longitude'] = miami_final['Longitude'].astype(float)

miami_final.head()

Unnamed: 0,ZipCode,City,State,Latitude,Longitude,Timezone,DSTflag,Geopoint
0,33233,Miami,FL,25.558428,-80.458168,-5,1,25.558428
2,33296,Miami,FL,25.558428,-80.458168,-5,1,25.558428
3,33168,Miami,FL,25.892185,-80.21032,-5,1,25.892185
4,33148,Miami,FL,25.558428,-80.458168,-5,1,25.558428
5,33110,Miami,FL,25.846874,-80.20827,-5,1,25.846874


In [9]:
# I re-indexed my df so it would look pretty
miami_final.index = np.arange(1, len(miami_final) + 1)
miami_final

Unnamed: 0,ZipCode,City,State,Latitude,Longitude,Timezone,DSTflag,Geopoint
1,33233,Miami,FL,25.558428,-80.458168,-5,1,25.558428
2,33296,Miami,FL,25.558428,-80.458168,-5,1,25.558428
3,33168,Miami,FL,25.892185,-80.21032,-5,1,25.892185
4,33148,Miami,FL,25.558428,-80.458168,-5,1,25.558428
5,33110,Miami,FL,25.846874,-80.20827,-5,1,25.846874
6,33197,Miami,FL,25.558428,-80.458168,-5,1,25.558428
7,33183,Miami,FL,25.699968,-80.40811,-5,1,25.699968
8,33102,Miami,FL,25.558428,-80.458168,-5,1,25.558428
9,33165,Miami,FL,25.734828,-80.3583,-5,1,25.734828
10,33131,Miami,FL,25.767368,-80.1893,-5,1,25.767368


Now that we've gotten our zip codes, and corresponding lat/long values, sorted we can take a look at property data. Just like with the zip code data, I have hosted property data in my environment so I can insert the code into the cell below directly and covert my csv to a data frame.

In [10]:
body = client_9ca09bc0f7bd41479471f07e1e7b78a1.get_object(Bucket='pythonbasicsfordatascienceproject-donotdelete-pr-0fgaed45xfn755',Key='Property_Boundary_View.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# If you are reading an Excel file into a pandas DataFrame, replace `read_csv` by `read_excel` in the next statement.
df_data_0 = pd.read_csv(body)
df_data_0.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,OBJECTID,PID,FOLIO,TTRRSS,X_COORD,Y_COORD,TRUE_SITE_ADDR,TRUE_SITE_UNIT,TRUE_SITE_CITY,TRUE_SITE_ZIP_CODE,TRUE_MAILING_ADDR1,TRUE_MAILING_ADDR2,TRUE_MAILING_ADDR3,TRUE_MAILING_CITY,TRUE_MAILING_STATE,TRUE_MAILING_ZIP_CODE,TRUE_MAILING_COUNTRY,TRUE_OWNER1,TRUE_OWNER2,TRUE_OWNER3,CONDO_FLAG,PARENT_FOLIO,DOR_CODE_CUR,DOR_DESC,SUBDIVISION,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,FLOOR_COUNT,UNIT_COUNT,BUILDING_ACTUAL_AREA,BUILDING_HEATED_AREA,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR,DOS_1,PRICE_1,LEGAL,Shape__Area,Shape__Length
0,2001,113685,131120100000.0,534112,920990.6,552140.7,138 NE 82 TER,,Miami,33138-3709,5794 SW 40 ST #210,,,MIAMI,FL,33155,USA,ROYAL PALM GARDENS LLC,,,N,,802.0,MULTIFAMILY 2-9 UNITS : 2 LIVING UNITS,13112008.0,3.0,1.0,0.0,1.0,2.0,1326.0,1326.0,7750.0,1926.0,2020.0,124688.0,20111028.0,41000.0,ROYAL PALM GARDENS PB 7-71 LOT 12 BLK 6 LOT SI...,815.398438,134.828166
1,2002,40679,131120100000.0,534112,920942.5,552138.7,128 NE 82 TER,,Miami,33138-3709,5794 SW 40 ST #210,,,MIAMI,FL,33155,USA,ROYAL PALM GARDENS LLC,,,N,,1081.0,VACANT LAND - COMMERCIAL : VACANT LAND,13112008.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7750.0,0.0,2020.0,56732.0,20120615.0,27000.0,ROYAL PALM GARDENS PB 7-71 LOT 13 BLK 6 LOT SI...,865.296875,136.77697
2,2003,475223,131120100000.0,534112,920971.9,551997.5,127 NE 82 ST,,Miami,33138-3707,5794 SW 40 ST #210,,,MIAMI,FL,33155,USA,ROYAL PALM GARDENS LLC,,,N,,101.0,RESIDENTIAL - SINGLE FAMILY : 1 UNIT,13112008.0,2.0,2.0,0.0,1.0,1.0,1126.0,876.0,13500.0,1936.0,2020.0,137233.0,20120127.0,50000.0,12 53 41 ROYAL PALM GARDENS PB 7-71 LOTS 14 & ...,1475.435547,155.686181
3,2004,53896,131120100000.0,534112,921045.0,552001.3,145 NE 82 ST,,Miami,33138-3707,145 NE 82 ST,,,MIAMI,FL,33138-3707,,FANETTE ELIACIN,,,N,,101.0,RESIDENTIAL - SINGLE FAMILY : 1 UNIT,13112008.0,3.0,1.0,0.0,1.0,1.0,1520.0,1520.0,6750.0,1926.0,2020.0,63150.0,19920401.0,0.0,12 53 41 ROYAL PALM GARDENS PB 7-71 LOT 16 LES...,766.699219,124.370745
4,2005,406751,131120100000.0,534112,921093.3,552004.5,151 NE 82 ST,,Miami,33138-3776,665 NE 195 ST #320,,,MIAMI,FL,33179,USA,JEAN CLAUDE REMY,MOREL FAUSTIN,,N,,803.0,MULTIFAMILY 2-9 UNITS : MULTIFAMILY 3 OR MORE ...,13112008.0,12.0,6.0,0.0,1.0,6.0,4961.0,4961.0,6750.0,1972.0,2020.0,384978.0,20130419.0,100.0,12 53 41 ROYAL PALM GARDENS PB 7-71 LOT 17 LES...,715.695312,122.156526


In [11]:
# there are a lot of columns with data that we won't be using, as I am primarily interested in property value, size, and number of beds/baths. 
# let's go ahead and drop most of the columns from this df
columns_drop = ['PID', 
                'FOLIO', 
                'TTRRSS', 
                'TRUE_SITE_UNIT', 
                'TRUE_MAILING_ADDR1', 
                'TRUE_MAILING_ADDR2', 
                'TRUE_MAILING_ADDR3', 
                'TRUE_MAILING_CITY', 
                'TRUE_MAILING_STATE', 
                'TRUE_MAILING_ZIP_CODE',
                'TRUE_MAILING_COUNTRY', 
                'TRUE_OWNER1', 
                'TRUE_OWNER2', 
                'TRUE_OWNER3', 
                'CONDO_FLAG', 
                'PARENT_FOLIO', 
                'DOR_CODE_CUR', 
                'SUBDIVISION', 
                'FLOOR_COUNT', 
                'UNIT_COUNT', 
                'BUILDING_ACTUAL_AREA', 
                'BUILDING_HEATED_AREA', 
                'DOS_1', 
                'PRICE_1', 
                'LEGAL', 
                'Shape__Area', 
                'Shape__Length']

miami_props = df_data_0.drop(columns=columns_drop)

miami_props.head()

Unnamed: 0,OBJECTID,X_COORD,Y_COORD,TRUE_SITE_ADDR,TRUE_SITE_CITY,TRUE_SITE_ZIP_CODE,DOR_DESC,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR
0,2001,920990.6,552140.7,138 NE 82 TER,Miami,33138-3709,MULTIFAMILY 2-9 UNITS : 2 LIVING UNITS,3.0,1.0,0.0,7750.0,1926.0,2020.0,124688.0
1,2002,920942.5,552138.7,128 NE 82 TER,Miami,33138-3709,VACANT LAND - COMMERCIAL : VACANT LAND,0.0,0.0,0.0,7750.0,0.0,2020.0,56732.0
2,2003,920971.9,551997.5,127 NE 82 ST,Miami,33138-3707,RESIDENTIAL - SINGLE FAMILY : 1 UNIT,2.0,2.0,0.0,13500.0,1936.0,2020.0,137233.0
3,2004,921045.0,552001.3,145 NE 82 ST,Miami,33138-3707,RESIDENTIAL - SINGLE FAMILY : 1 UNIT,3.0,1.0,0.0,6750.0,1926.0,2020.0,63150.0
4,2005,921093.3,552004.5,151 NE 82 ST,Miami,33138-3776,MULTIFAMILY 2-9 UNITS : MULTIFAMILY 3 OR MORE ...,12.0,6.0,0.0,6750.0,1972.0,2020.0,384978.0


In [12]:
# In order for my zip codes in this property value df to match my original df, I need to strip away the extra four digits after the hyphen
#let's take care of this issue below
# new data frame with split value columns 
zipcode_split = miami_props['TRUE_SITE_ZIP_CODE'].str.split("-", n = 1, expand = True) 
  
# making separate Zip Code column from extracted 5 digit zip code
miami_props['ZipCode']= zipcode_split[0] 

# Dropping old zip code column
miami_props.drop(columns =['TRUE_SITE_ZIP_CODE'], inplace = True) 

miami_props.head() 

Unnamed: 0,OBJECTID,X_COORD,Y_COORD,TRUE_SITE_ADDR,TRUE_SITE_CITY,DOR_DESC,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR,ZipCode
0,2001,920990.6,552140.7,138 NE 82 TER,Miami,MULTIFAMILY 2-9 UNITS : 2 LIVING UNITS,3.0,1.0,0.0,7750.0,1926.0,2020.0,124688.0,33138
1,2002,920942.5,552138.7,128 NE 82 TER,Miami,VACANT LAND - COMMERCIAL : VACANT LAND,0.0,0.0,0.0,7750.0,0.0,2020.0,56732.0,33138
2,2003,920971.9,551997.5,127 NE 82 ST,Miami,RESIDENTIAL - SINGLE FAMILY : 1 UNIT,2.0,2.0,0.0,13500.0,1936.0,2020.0,137233.0,33138
3,2004,921045.0,552001.3,145 NE 82 ST,Miami,RESIDENTIAL - SINGLE FAMILY : 1 UNIT,3.0,1.0,0.0,6750.0,1926.0,2020.0,63150.0,33138
4,2005,921093.3,552004.5,151 NE 82 ST,Miami,MULTIFAMILY 2-9 UNITS : MULTIFAMILY 3 OR MORE ...,12.0,6.0,0.0,6750.0,1972.0,2020.0,384978.0,33138


In [13]:
# I want to make sure all my data types make sense
miami_props.dtypes

OBJECTID                 int64
X_COORD                float64
Y_COORD                float64
TRUE_SITE_ADDR          object
TRUE_SITE_CITY          object
DOR_DESC                object
BEDROOM_COUNT          float64
BATHROOM_COUNT         float64
HALF_BATHROOM_COUNT    float64
LOT_SIZE               float64
YEAR_BUILT             float64
ASSESSMENT_YEAR_CUR    float64
ASSESSED_VAL_CUR       float64
ZipCode                 object
dtype: object

In [14]:
# Let's look closely at the df and make sure everything is making sense
miami_props.describe()

Unnamed: 0,OBJECTID,X_COORD,Y_COORD,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR
count,577292.0,577292.0,577292.0,571589.0,571589.0,571589.0,571589.0,571589.0,571589.0,571589.0
mean,288646.5,877711.845646,514535.741808,3.232636,2.182406,0.103461,210230.8,1728.381592,2020.0,491200.8
std,166649.990137,34052.983055,49966.986417,20.421266,10.689394,1.270989,9754276.0,695.900207,0.0,4413117.0
min,1.0,697973.7,293562.2,0.0,0.0,0.0,0.0,0.0,2020.0,0.0
25%,144323.75,853208.975,481953.95,2.0,1.0,0.0,5000.0,1950.0,2020.0,112630.0
50%,288646.5,877099.0,516777.15,3.0,2.0,0.0,7500.0,1965.0,2020.0,186913.0
75%,432969.25,904565.6,555324.85,4.0,2.0,0.0,10400.0,1989.0,2020.0,292576.0
max,577292.0,945703.4,597579.5,12590.0,2066.0,564.0,3131234000.0,9999.0,2020.0,428750000.0


In [15]:
# Remember our problem is focused on residential real estate, so we should not concern ourselves with vacant land, commercial properties, etc.
# Let’s delete all rows for which column ‘DOR_DESC’ has value that doesn't contain 'Residential'
word = 'RESIDENTIAL'
miami_resprops = miami_props[miami_props['DOR_DESC'].str.contains(word, na=False)]

miami_resprops.tail()

Unnamed: 0,OBJECTID,X_COORD,Y_COORD,TRUE_SITE_ADDR,TRUE_SITE_CITY,DOR_DESC,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR,ZipCode
577286,577287,869794.7,446500.9,22219 SW 99 CT,Cutler Bay,RESIDENTIAL - SINGLE FAMILY : CLUSTER HOME,3.0,2.0,0.0,3383.0,1991.0,2020.0,87389.0,33190
577287,577288,869791.7,446562.5,22211 SW 99 CT,Cutler Bay,RESIDENTIAL - SINGLE FAMILY : CLUSTER HOME,3.0,2.0,0.0,3385.0,1991.0,2020.0,119829.0,33190
577288,577289,869793.2,446625.3,22203 SW 99 CT,Cutler Bay,RESIDENTIAL - SINGLE FAMILY : CLUSTER HOME,3.0,2.0,0.0,6786.0,1991.0,2020.0,158776.0,33190
577290,577291,869022.3,446397.7,10051 SW 223 ST,Cutler Bay,RESIDENTIAL - SINGLE FAMILY : CLUSTER HOME,3.0,2.0,0.0,3349.0,1990.0,2020.0,102939.0,33190
577291,577292,869029.6,446449.6,10059 SW 223 ST,Cutler Bay,RESIDENTIAL - SINGLE FAMILY : CLUSTER HOME,3.0,2.0,0.0,5613.0,1990.0,2020.0,187977.0,33190


In [16]:
# now we group by zip codes to see property values per zip code

miami_resValByZip = miami_resprops.groupby(['ZipCode']).mean().reset_index()

miami_resValByZip.head()

Unnamed: 0,ZipCode,OBJECTID,X_COORD,Y_COORD,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR
0,33010,123141.114621,894776.442391,544649.571697,2.869797,1.526574,0.024973,7673.960241,1916.535966,2020.0,172986.43159
1,33012,116389.582378,886800.725099,557846.343084,3.146033,1.80564,0.054294,7580.413665,1950.972151,2020.0,188134.515446
2,33013,114356.047,895567.325691,556135.312196,2.962974,1.558864,0.043827,7663.585571,1937.442799,2020.0,176322.429198
3,33014,270633.015964,883514.875542,569591.769443,2.92003,1.976355,0.099398,6192.408661,1963.320934,2020.0,214939.747289
4,33015,215992.943526,879492.142725,585711.218606,3.099324,2.025377,0.068851,6098.800471,1982.882891,2020.0,199362.781591


In [17]:
miami_resValByZip.dtypes

ZipCode                 object
OBJECTID               float64
X_COORD                float64
Y_COORD                float64
BEDROOM_COUNT          float64
BATHROOM_COUNT         float64
HALF_BATHROOM_COUNT    float64
LOT_SIZE               float64
YEAR_BUILT             float64
ASSESSMENT_YEAR_CUR    float64
ASSESSED_VAL_CUR       float64
dtype: object

In [18]:
miami_resValByZip.describe()

Unnamed: 0,OBJECTID,X_COORD,Y_COORD,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR
count,78.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0
mean,252652.619759,888722.491345,520506.321807,4.669117,3.759802,0.77871,14394.341919,1879.992544,2020.0,1028645.0
std,159538.36947,33309.600705,46728.705869,13.605275,14.477517,5.792759,22588.811274,153.901375,0.0,4822036.0
min,11546.819942,823984.961081,404760.538953,1.96063,1.031496,0.0,4315.893584,1072.909091,2020.0,92011.52
25%,118077.465439,866212.642523,497589.197299,2.803877,1.658668,0.044802,6685.693195,1867.914851,2020.0,168806.9
50%,231491.329149,889310.72442,523459.410916,3.14939,2.066119,0.09785,8651.455454,1930.873019,2020.0,231682.0
75%,367725.675075,917044.190362,554276.064778,3.351057,2.342319,0.170897,11898.98158,1963.24943,2020.0,446245.3
max,561602.691959,942793.207955,593900.028437,123.181818,129.818182,51.272727,180068.945677,2004.138158,2020.0,42563110.0


In [19]:
# clearly, as we can see above, there are still irrelevant entries
# it is highly unlikely that any residential properties for an average person have more than 10 (and that's generous!) bedrooms
# let's drop rows where BEDROOM_COUNT is greater than 10
miami_resValByZip = miami_resValByZip[miami_resValByZip['BEDROOM_COUNT'] < 10]
miami_resValByZip

Unnamed: 0,ZipCode,OBJECTID,X_COORD,Y_COORD,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSMENT_YEAR_CUR,ASSESSED_VAL_CUR
0,33010,123141.114621,894776.442391,544649.571697,2.869797,1.526574,0.024973,7673.960241,1916.535966,2020.0,172986.4
1,33012,116389.582378,886800.725099,557846.343084,3.146033,1.80564,0.054294,7580.413665,1950.972151,2020.0,188134.5
2,33013,114356.047,895567.325691,556135.312196,2.962974,1.558864,0.043827,7663.585571,1937.442799,2020.0,176322.4
3,33014,270633.015964,883514.875542,569591.769443,2.92003,1.976355,0.099398,6192.408661,1963.320934,2020.0,214939.7
4,33015,215992.943526,879492.142725,585711.218606,3.099324,2.025377,0.068851,6098.800471,1982.882891,2020.0,199362.8
5,33016,325500.600337,875064.695302,570623.02185,3.378463,2.364972,0.112262,6657.026529,1968.526861,2020.0,266347.5
6,33018,227362.528864,870052.76872,571381.432423,3.206935,2.112412,0.25166,6353.553506,1933.947252,2020.0,230815.2
7,33030,317273.959626,824538.973396,416869.616888,2.89053,1.733629,0.060233,180068.945677,1813.157722,2020.0,141039.9
8,33031,471629.61755,824244.291459,433590.170343,3.135617,2.091037,0.136086,95031.644078,1852.089629,2020.0,253524.3
9,33032,473817.464045,855946.594886,435385.656288,3.304794,2.083648,0.259317,12159.703618,1928.490724,2020.0,163432.7


In [20]:
# dropping a few more columns that no longer seem relevant
columns_drop = ['OBJECTID', 
                'X_COORD', 
                'Y_COORD', 
                'ASSESSMENT_YEAR_CUR']

miami_resValByZip = miami_resValByZip.drop(columns=columns_drop)

miami_resValByZip.head()

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR
0,33010,2.869797,1.526574,0.024973,7673.960241,1916.535966,172986.43159
1,33012,3.146033,1.80564,0.054294,7580.413665,1950.972151,188134.515446
2,33013,2.962974,1.558864,0.043827,7663.585571,1937.442799,176322.429198
3,33014,2.92003,1.976355,0.099398,6192.408661,1963.320934,214939.747289
4,33015,3.099324,2.025377,0.068851,6098.800471,1982.882891,199362.781591


In [21]:
# let's cast our key numerical features to int-type, to be safe

miami_resValByZip['ZipCode'] = miami_resValByZip['ZipCode'].astype(int)
miami_resValByZip['YEAR_BUILT'] = miami_resValByZip['YEAR_BUILT'].astype(int)
miami_resValByZip['ASSESSED_VAL_CUR'] = miami_resValByZip['ASSESSED_VAL_CUR'].astype(int)

In [22]:
miami_resValByZip

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR
0,33010,2.869797,1.526574,0.024973,7673.960241,1916,172986
1,33012,3.146033,1.80564,0.054294,7580.413665,1950,188134
2,33013,2.962974,1.558864,0.043827,7663.585571,1937,176322
3,33014,2.92003,1.976355,0.099398,6192.408661,1963,214939
4,33015,3.099324,2.025377,0.068851,6098.800471,1982,199362
5,33016,3.378463,2.364972,0.112262,6657.026529,1968,266347
6,33018,3.206935,2.112412,0.25166,6353.553506,1933,230815
7,33030,2.89053,1.733629,0.060233,180068.945677,1813,141039
8,33031,3.135617,2.091037,0.136086,95031.644078,1852,253524
9,33032,3.304794,2.083648,0.259317,12159.703618,1928,163432


In [23]:
# let's prepare to merge our original df with the property values df
# the ZipCode column must be the same dtype for the merge to work, so let's fix that below
miami_final['ZipCode'] = miami_final['ZipCode'].astype(int)

In [24]:
# finally, let's merge these dataframes
miami_merged = miami_resValByZip.merge(miami_final.set_index('ZipCode'), on='ZipCode')
miami_merged

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,Timezone,DSTflag,Geopoint
0,33122,4.592308,4.776923,0.776923,7061.707692,1984,1045073,Miami,FL,25.799962,-80.31775,-5,1,25.799962
1,33125,2.48466,1.41443,0.024209,14272.148852,1863,171685,Miami,FL,25.782176,-80.23607,-5,1,25.782176
2,33126,2.702824,1.574775,0.014442,6496.428084,1933,190741,Miami,FL,25.777977,-80.29718,-5,1,25.777977
3,33127,2.284481,1.231615,0.022245,6272.913886,1774,135492,Miami,FL,25.813808,-80.2058,-5,1,25.813808
4,33128,1.96063,1.031496,0.007874,5977.007874,1353,270639,Miami,FL,25.777143,-80.20225,-5,1,25.777143
5,33129,2.928938,2.088645,0.104762,9216.364183,1876,493747,Miami,FL,25.757227,-80.20656,-5,1,25.757227
6,33130,2.123552,1.297297,0.034749,6216.880309,1577,459774,Miami,FL,25.768277,-80.20339,-5,1,25.768277
7,33132,6.375,5.916667,0.0,13218.75,1453,3315576,Miami,FL,25.784326,-80.18753,-5,1,25.784326
8,33133,2.881622,2.140721,0.171351,9039.566766,1825,692483,Miami,FL,25.730678,-80.2441,-5,1,25.730678
9,33134,2.963431,1.980088,0.08015,8182.408534,1916,428261,Miami,FL,25.753927,-80.27034,-5,1,25.753927


In [25]:
# dropping more irrelevant columns
miami_merged = miami_merged.drop(columns=['Timezone', 'DSTflag', 'Geopoint'])
miami_merged

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude
0,33122,4.592308,4.776923,0.776923,7061.707692,1984,1045073,Miami,FL,25.799962,-80.31775
1,33125,2.48466,1.41443,0.024209,14272.148852,1863,171685,Miami,FL,25.782176,-80.23607
2,33126,2.702824,1.574775,0.014442,6496.428084,1933,190741,Miami,FL,25.777977,-80.29718
3,33127,2.284481,1.231615,0.022245,6272.913886,1774,135492,Miami,FL,25.813808,-80.2058
4,33128,1.96063,1.031496,0.007874,5977.007874,1353,270639,Miami,FL,25.777143,-80.20225
5,33129,2.928938,2.088645,0.104762,9216.364183,1876,493747,Miami,FL,25.757227,-80.20656
6,33130,2.123552,1.297297,0.034749,6216.880309,1577,459774,Miami,FL,25.768277,-80.20339
7,33132,6.375,5.916667,0.0,13218.75,1453,3315576,Miami,FL,25.784326,-80.18753
8,33133,2.881622,2.140721,0.171351,9039.566766,1825,692483,Miami,FL,25.730678,-80.2441
9,33134,2.963431,1.980088,0.08015,8182.408534,1916,428261,Miami,FL,25.753927,-80.27034


Let's explore and cluster Miami zip codes. We will make use of geopy's geolocator to get Miami, Fl's latitude and longitude coordinates. We will also leverage the Folium library to create geo-visualizations.

In [26]:
# Let's get the geographical (lat, long) coordinates of Miami, FL
address = 'Miami, FL'

geolocator = Nominatim(user_agent="miami_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Miami are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Miami are 25.7742658, -80.1936589.


In [27]:
#Let's define all the variables needed to create the URL for the API call to Foursquare

CLIENT_ID = '2W2BM5Q5KFXA1KYRKK2F3H3I413J5RD2LHVDA2ZWFNI24Z5T' # my Foursquare ID
CLIENT_SECRET = 'QTFPM40TQ5UY0HMKTDJJ22F21DUBHQT4ONLT0Q51G44AU1O5' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

In [28]:
# create map of Miami using latitude and longitude values
map_miamiZips = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for zipcode, lat, lng in zip(miami_merged['ZipCode'], miami_merged['Latitude'], miami_merged['Longitude']):
    label = '{}'.format(zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_miamiZips)  
    
map_miamiZips

I'm going to borrow the function from the Foursquare lab that fetches the top 100 venues within a radius of 500 meters that are in each zip code in the Miami dataframe.

In [29]:
def getNearbyVenues(zipcodes, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for zipcode, lat, lng in zip(zipcodes, latitudes, longitudes):
        print(zipcodes)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            zipcode, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['ZipCode', 
                  'Zip Code Latitude', 
                  'Zip Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Let's examine how many venues were returned for each neighborhood.

In [32]:
venues_by_zip = getNearbyVenues(zipcodes=miami_merged['ZipCode'],
                                   latitudes=miami_merged['Latitude'],
                                   longitudes=miami_merged['Longitude']
                                  )

venues_by_zip.head()

0     33122
1     33125
2     33126
3     33127
4     33128
5     33129
6     33130
7     33132
8     33133
9     33134
10    33135
11    33136
12    33137
13    33138
14    33142
15    33143
16    33144
17    33145
18    33146
19    33147
20    33150
21    33154
22    33155
23    33156
24    33157
25    33158
26    33161
27    33162
28    33165
29    33166
30    33167
31    33168
32    33169
33    33170
34    33172
35    33173
36    33174
37    33175
38    33176
39    33177
40    33178
41    33179
42    33180
43    33181
44    33182
45    33183
46    33184
47    33185
48    33186
49    33187
50    33189
51    33190
52    33193
53    33194
54    33196
Name: ZipCode, dtype: int64
0     33122
1     33125
2     33126
3     33127
4     33128
5     33129
6     33130
7     33132
8     33133
9     33134
10    33135
11    33136
12    33137
13    33138
14    33142
15    33143
16    33144
17    33145
18    33146
19    33147
20    33150
21    33154
22    33155
23    33156
24    33157
25    33158


Unnamed: 0,ZipCode,Zip Code Latitude,Zip Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,33122,25.799962,-80.31775,Martinez Meat Market,25.802926,-80.316784,Food & Drink Shop
1,33122,25.799962,-80.31775,Brazilian Taste,25.796711,-80.318181,Brazilian Restaurant
2,33122,25.799962,-80.31775,Subway,25.796243,-80.319861,Sandwich Place
3,33122,25.799962,-80.31775,CVS pharmacy,25.797198,-80.320867,Pharmacy
4,33122,25.799962,-80.31775,Wells Fargo,25.797679,-80.313575,Bank


In [33]:
venues_by_zip.groupby('ZipCode').count()

Unnamed: 0_level_0,Zip Code Latitude,Zip Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
ZipCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
33122,12,12,12,12,12,12
33125,12,12,12,12,12,12
33126,4,4,4,4,4,4
33127,3,3,3,3,3,3
33128,9,9,9,9,9,9
33129,9,9,9,9,9,9
33130,27,27,27,27,27,27
33132,48,48,48,48,48,48
33133,64,64,64,64,64,64
33134,6,6,6,6,6,6


And now let's see how many unique categories exist.

In [34]:
print('There are {} uniques categories.'.format(len(venues_by_zip['Venue Category'].unique())))

There are 181 uniques categories.


We'll need to perform one-hot encoding to analyze and cluster our selected zip codes.

In [35]:
# one hot encoding
miamizips_onehot = pd.get_dummies(venues_by_zip[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
miamizips_onehot['ZipCode'] = venues_by_zip['ZipCode'] 

# move neighborhood column to the first column
fixed_columns = [miamizips_onehot.columns[-1]] + list(miamizips_onehot.columns[:-1])
miamizips_onehot = miamizips_onehot[fixed_columns]

miamizips_onehot.head()

Unnamed: 0,ZipCode,ATM,American Restaurant,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Bakery,Bank,Bar,Baseball Field,Basketball Stadium,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Burger Joint,Business Service,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Department Store,Dessert Shop,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Empanada Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Fishing Store,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hotel,Hotel Pool,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lake,Latin American Restaurant,Laundromat,Lawyer,Leather Goods Store,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,New American Restaurant,Nightclub,Office,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Post Office,Pub,Record Shop,Recreation Center,Rental Car Location,Rental Service,Resort,Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Spiritual Center,Sporting Goods Shop,Sports Bar,Storage Facility,Supplement Shop,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park Ride / Attraction,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,33122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,33122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,33122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,33122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,33122,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next, let's group rows by zip code and by taking the mean of the frequency of occurrence of each category as was shown in the lab.

In [36]:
miamizips_grouped = miamizips_onehot.groupby('ZipCode').mean().reset_index()
miamizips_grouped

Unnamed: 0,ZipCode,ATM,American Restaurant,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Bakery,Bank,Bar,Baseball Field,Basketball Stadium,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Burger Joint,Business Service,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Department Store,Dessert Shop,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Empanada Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Fishing Store,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hotel,Hotel Pool,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lake,Latin American Restaurant,Laundromat,Lawyer,Leather Goods Store,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,New American Restaurant,Nightclub,Office,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Post Office,Pub,Record Shop,Recreation Center,Rental Car Location,Rental Service,Resort,Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Spiritual Center,Sporting Goods Shop,Sports Bar,Storage Facility,Supplement Shop,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park Ride / Attraction,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,33122,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.416667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,33125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,33126,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,33127,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,33128,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,33129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,33130,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.074074,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.074074,0.0,0.148148,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0
7,33132,0.0,0.041667,0.020833,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.020833,0.041667,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.020833,0.041667,0.020833,0.0,0.0,0.020833,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.041667,0.0,0.0,0.0,0.020833,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.041667,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,33133,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.03125,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046875,0.015625,0.046875,0.015625,0.0,0.046875,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.046875,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.015625,0.0,0.0,0.015625,0.03125,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.03125,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625
9,33134,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [37]:
miamizips_grouped['ZipCode'] = miamizips_grouped['ZipCode'].astype(str)

Let's print each postal code along with the top 5 most common venues.

In [38]:
num_top_venues = 5

for code in miamizips_grouped['ZipCode']:
    print("----"+code+"----")
    temp = miamizips_grouped[miamizips_grouped['ZipCode'] == code].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----33122----
                  venue  freq
0     Electronics Store  0.42
1                  Bank  0.08
2  Brazilian Restaurant  0.08
3         Jewelry Store  0.08
4   Japanese Restaurant  0.08


----33125----
                       venue  freq
0  Latin American Restaurant  0.17
1                      Hotel  0.08
2        Rental Car Location  0.08
3             Sandwich Place  0.08
4                       Park  0.08


----33126----
                        venue  freq
0                        Café  0.25
1  Construction & Landscaping  0.25
2         American Restaurant  0.25
3              Shipping Store  0.25
4                Optical Shop  0.00


----33127----
         venue  freq
0   Restaurant  0.33
1  Gas Station  0.33
2  Pizza Place  0.33
3          ATM  0.00
4    Pet Store  0.00


----33128----
                 venue  freq
0   Seafood Restaurant  0.33
1          Fish Market  0.11
2         Soccer Field  0.11
3          Pizza Place  0.11
4  American Restaurant  0.11


----33129----


And of course we need to put this into a pandas df!<br>
Let's start by sorting the venues into descending order, so we can set up our df to show the 10 most common venue types in each postal code.

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['ZipCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
zips_venues_sorted = pd.DataFrame(columns=columns)
zips_venues_sorted['ZipCode'] = miamizips_grouped['ZipCode']

for ind in np.arange(miamizips_grouped.shape[0]):
    zips_venues_sorted.iloc[ind, 1:] = return_most_common_venues(miamizips_grouped.iloc[ind, :], num_top_venues)

zips_venues_sorted.head()

Unnamed: 0,ZipCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33122,Electronics Store,Pharmacy,Jewelry Store,Japanese Restaurant,Brazilian Restaurant,Sandwich Place,Food & Drink Shop,Bank,Food,Frozen Yogurt Shop
1,33125,Latin American Restaurant,Arts & Crafts Store,Rental Car Location,Pharmacy,Sandwich Place,Park,Grocery Store,Baseball Field,Bakery,Moving Target
2,33126,American Restaurant,Café,Construction & Landscaping,Shipping Store,Yoga Studio,Food Court,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
3,33127,Gas Station,Restaurant,Pizza Place,Food,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand
4,33128,Seafood Restaurant,Spanish Restaurant,American Restaurant,Soccer Field,Pizza Place,Restaurant,Fish Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant


Now we'll run some K Means Clustering on this data and segment the zip codes into 15 clusters. I am using this unsupervised method because I want to let the data guide my exploration (rather than me guiding my data). I arrived at Kclusters = 15 after several rounds of trial and error. 

In [41]:
# set number of clusters
kclusters = 15

#create sub df for clustering
miami_zips_clustering = miamizips_grouped.drop('ZipCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(miami_zips_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:15]

array([ 1,  1, 12, 11,  1,  1,  1,  1,  1, 12,  6,  1,  1,  1,  6],
      dtype=int32)

Let's create a new dataframe that includes the clusters as well as the top 10 venues for each postal code.

In [42]:
# add clustering labels
zips_venues_sorted.insert(0, 'ClusterLabels', kmeans.labels_)

In [43]:
zips_venues_sorted

Unnamed: 0,ClusterLabels,ZipCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,33122,Electronics Store,Pharmacy,Jewelry Store,Japanese Restaurant,Brazilian Restaurant,Sandwich Place,Food & Drink Shop,Bank,Food,Frozen Yogurt Shop
1,1,33125,Latin American Restaurant,Arts & Crafts Store,Rental Car Location,Pharmacy,Sandwich Place,Park,Grocery Store,Baseball Field,Bakery,Moving Target
2,12,33126,American Restaurant,Café,Construction & Landscaping,Shipping Store,Yoga Studio,Food Court,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
3,11,33127,Gas Station,Restaurant,Pizza Place,Food,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand
4,1,33128,Seafood Restaurant,Spanish Restaurant,American Restaurant,Soccer Field,Pizza Place,Restaurant,Fish Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
5,1,33129,Lawyer,Latin American Restaurant,Salon / Barbershop,Moving Target,Shopping Plaza,Construction & Landscaping,Gas Station,Italian Restaurant,Food Court,Frozen Yogurt Shop
6,1,33130,Pizza Place,Pharmacy,Fast Food Restaurant,Grocery Store,Gym,Bar,Latin American Restaurant,Bakery,Park,Restaurant
7,1,33132,Theater,Light Rail Station,Lounge,Sporting Goods Shop,Science Museum,Basketball Stadium,Coffee Shop,Bar,Restaurant,Café
8,1,33133,Hotel,Ice Cream Shop,New American Restaurant,Italian Restaurant,Coffee Shop,Lingerie Store,French Restaurant,Farmers Market,Thai Restaurant,Dog Run
9,12,33134,Golf Course,American Restaurant,Café,Pool,Gym,Tennis Court,Food Court,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint


In [44]:
# casting our column of interest to the same type so we can merge without issue
miami_merged['ZipCode'] = miami_merged['ZipCode'].astype(str)

In [45]:
# merge dfs to add latitude/longitude for each postal code
miami_clusters = miami_merged.merge(zips_venues_sorted.set_index('ZipCode'), on='ZipCode')

miami_clusters.head()

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33122,4.592308,4.776923,0.776923,7061.707692,1984,1045073,Miami,FL,25.799962,-80.31775,1,Electronics Store,Pharmacy,Jewelry Store,Japanese Restaurant,Brazilian Restaurant,Sandwich Place,Food & Drink Shop,Bank,Food,Frozen Yogurt Shop
1,33125,2.48466,1.41443,0.024209,14272.148852,1863,171685,Miami,FL,25.782176,-80.23607,1,Latin American Restaurant,Arts & Crafts Store,Rental Car Location,Pharmacy,Sandwich Place,Park,Grocery Store,Baseball Field,Bakery,Moving Target
2,33126,2.702824,1.574775,0.014442,6496.428084,1933,190741,Miami,FL,25.777977,-80.29718,12,American Restaurant,Café,Construction & Landscaping,Shipping Store,Yoga Studio,Food Court,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
3,33127,2.284481,1.231615,0.022245,6272.913886,1774,135492,Miami,FL,25.813808,-80.2058,11,Gas Station,Restaurant,Pizza Place,Food,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand
4,33128,1.96063,1.031496,0.007874,5977.007874,1353,270639,Miami,FL,25.777143,-80.20225,1,Seafood Restaurant,Spanish Restaurant,American Restaurant,Soccer Field,Pizza Place,Restaurant,Fish Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant


In [46]:
miami_clusters.describe()

Unnamed: 0,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,Latitude,Longitude,ClusterLabels
count,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50.0
mean,3.077567,2.067351,0.111021,9575.03136,1886.66,384558.5,25.77527,-80.284188,3.72
std,0.661267,0.794475,0.124701,5446.161581,133.100945,485892.0,0.094495,0.091717,3.902066
min,1.96063,1.031496,0.0,4315.893584,1353.0,92011.0,25.56071,-80.4765,0.0
25%,2.724827,1.640677,0.035336,6437.359519,1867.75,186285.2,25.725049,-80.35979,1.0
50%,3.142078,2.048313,0.088501,8530.981186,1932.0,243991.0,25.77271,-80.271445,1.0
75%,3.341364,2.205032,0.145417,10029.77971,1963.75,414568.0,25.834453,-80.205695,6.0
max,6.375,5.916667,0.776923,39504.517248,1989.0,3315576.0,25.962069,-80.1276,14.0


### Results

Now we will map our clusters to get some visual insight into the results of our KMeans Clustering efforts.

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(miami_clusters['Latitude'], miami_clusters['Longitude'], miami_clusters['ZipCode'], miami_clusters['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Let's deep dive into each cluster so we can see what's going on. Cluster 1 is abnormally large...

<b>Cluster 0</b>

In [48]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 0, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,33174,3.181629,2.059329,0.087692,6417.669997,1966,206923,Miami,FL,25.763044,-80.35919,0,Fast Food Restaurant,Yoga Studio,Food & Drink Shop,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 1</b>

In [49]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 1, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33122,4.592308,4.776923,0.776923,7061.707692,1984,1045073,Miami,FL,25.799962,-80.31775,1,Electronics Store,Pharmacy,Jewelry Store,Japanese Restaurant,Brazilian Restaurant,Sandwich Place,Food & Drink Shop,Bank,Food,Frozen Yogurt Shop
1,33125,2.48466,1.41443,0.024209,14272.148852,1863,171685,Miami,FL,25.782176,-80.23607,1,Latin American Restaurant,Arts & Crafts Store,Rental Car Location,Pharmacy,Sandwich Place,Park,Grocery Store,Baseball Field,Bakery,Moving Target
4,33128,1.96063,1.031496,0.007874,5977.007874,1353,270639,Miami,FL,25.777143,-80.20225,1,Seafood Restaurant,Spanish Restaurant,American Restaurant,Soccer Field,Pizza Place,Restaurant,Fish Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
5,33129,2.928938,2.088645,0.104762,9216.364183,1876,493747,Miami,FL,25.757227,-80.20656,1,Lawyer,Latin American Restaurant,Salon / Barbershop,Moving Target,Shopping Plaza,Construction & Landscaping,Gas Station,Italian Restaurant,Food Court,Frozen Yogurt Shop
6,33130,2.123552,1.297297,0.034749,6216.880309,1577,459774,Miami,FL,25.768277,-80.20339,1,Pizza Place,Pharmacy,Fast Food Restaurant,Grocery Store,Gym,Bar,Latin American Restaurant,Bakery,Park,Restaurant
7,33132,6.375,5.916667,0.0,13218.75,1453,3315576,Miami,FL,25.784326,-80.18753,1,Theater,Light Rail Station,Lounge,Sporting Goods Shop,Science Museum,Basketball Stadium,Coffee Shop,Bar,Restaurant,Café
8,33133,2.881622,2.140721,0.171351,9039.566766,1825,692483,Miami,FL,25.730678,-80.2441,1,Hotel,Ice Cream Shop,New American Restaurant,Italian Restaurant,Coffee Shop,Lingerie Store,French Restaurant,Farmers Market,Thai Restaurant,Dog Run
11,33136,2.722034,1.716949,0.120339,5478.51039,1586,184800,Miami,FL,25.786326,-80.2029,1,Athletics & Sports,History Museum,Southern / Soul Food Restaurant,Wings Joint,Dry Cleaner,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
12,33137,2.764954,2.037298,0.111893,9573.017994,1777,714559,Miami,FL,25.817325,-80.19046,1,Café,Pizza Place,Coffee Shop,Boutique,Yoga Studio,Seafood Restaurant,Scenic Lookout,Cosmetics Shop,Japanese Restaurant,Doctor's Office
13,33138,2.751637,1.977087,0.094313,10036.945929,1865,373489,Miami,FL,25.853184,-80.18622,1,Pizza Place,New American Restaurant,Restaurant,Fast Food Restaurant,Department Store,Gym,Liquor Store,Ice Cream Shop,Discount Store,Italian Restaurant


<b>Cluster 2</b>

In [50]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 2, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,33176,3.441064,2.191287,0.105216,15047.942158,1954,307995,Miami,FL,25.653431,-80.35999,2,Optical Shop,Yoga Studio,Gastropub,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 3</b>

In [51]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 3, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,33182,3.352266,2.258749,0.169535,8341.860677,1980,254870,Miami,FL,25.781127,-80.40467,3,Intersection,Latin American Restaurant,Food,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 4</b>

In [52]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 4, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,33177,3.277309,1.945347,0.101081,10601.547849,1964,167558,Miami,FL,25.595983,-80.40234,4,Rental Car Location,Arepa Restaurant,Park,Food,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 5</b>

In [53]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 5, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
43,33183,3.153376,2.115756,0.059968,9455.160609,1974,217801,Miami,FL,25.699968,-80.40811,5,Lake,Construction & Landscaping,Food,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 6</b>

In [54]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 6, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,33135,2.652551,1.619898,0.034694,6354.777929,1851,219882,Miami,FL,25.766577,-80.23576,6,Latin American Restaurant,Market,Fast Food Restaurant,Pharmacy,Road,Seafood Restaurant,Fried Chicken Joint,Furniture / Home Store,Music Venue,Grocery Store
14,33142,2.333815,1.253757,0.012428,6048.401675,1781,92011,Miami,FL,25.812625,-80.2369,6,Fast Food Restaurant,Nightclub,Storage Facility,Fried Chicken Joint,Sandwich Place,Home Service,Yoga Studio,Food & Drink Shop,Frozen Yogurt Shop,French Restaurant
16,33144,2.782299,1.623742,0.05495,7908.685684,1938,215809,Miami,FL,25.76226,-80.30839,6,Fast Food Restaurant,Latin American Restaurant,Spanish Restaurant,Pharmacy,Discount Store,Sandwich Place,Hobby Shop,Grocery Store,Mobile Phone Shop,Fried Chicken Joint
19,33147,2.483416,1.26425,0.020556,7330.187165,1825,95309,Miami,FL,25.850124,-80.23773,6,Latin American Restaurant,Fast Food Restaurant,Fried Chicken Joint,Bank,Food Court,Gas Station,Garden,Furniture / Home Store,Frozen Yogurt Shop,French Restaurant
21,33154,3.131408,2.639364,0.186211,10242.960625,1894,1284910,Miami,FL,25.881391,-80.1276,6,Grocery Store,Bank,Japanese Restaurant,Asian Restaurant,Park,Yoga Studio,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
30,33168,2.550804,1.386817,0.022508,8783.316024,1931,120855,Miami,FL,25.892185,-80.21032,6,Fast Food Restaurant,Gas Station,Road,Fried Chicken Joint,Chinese Restaurant,Bank,Southern / Soul Food Restaurant,Rental Service,Discount Store,Electronics Store
35,33175,3.222594,1.997016,0.096302,9079.68191,1948,232548,Miami,FL,25.733204,-80.41197,6,Latin American Restaurant,Cuban Restaurant,Pizza Place,Breakfast Spot,Mexican Restaurant,Pharmacy,Burger Joint,Fishing Store,Fish Market,Gas Station
45,33185,3.564484,2.429345,0.105038,5884.531941,1987,257648,Miami,FL,25.723173,-80.43995,6,Latin American Restaurant,Spanish Restaurant,Italian Restaurant,Locksmith,Grocery Store,Playground,Recreation Center,Art Gallery,Food Court,Gas Station


<b>Cluster 7</b>

In [55]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 7, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,33167,2.733208,1.489984,0.013434,10588.79688,1919,114402,Miami,FL,25.885739,-80.23264,7,Southern / Soul Food Restaurant,Health & Beauty Service,Yoga Studio,Food & Drink Shop,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 8</b>

In [56]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 8, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,33190,3.374917,2.140635,0.318001,5776.990841,1987,184546,Miami,FL,25.56071,-80.3494,8,Pizza Place,Baseball Field,Scenic Lookout,Yoga Studio,Food & Drink Shop,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck


<b>Cluster 9</b>

In [57]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 9, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,33179,3.219129,2.182579,0.194705,7526.08139,1970,256751,Miami,FL,25.95872,-80.17941,9,Martial Arts Dojo,Pool,Food,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 10</b>

In [58]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 10, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,33150,2.345986,1.319149,0.018617,16032.011523,1827,124468,Miami,FL,25.851974,-80.20566,10,Fishing Store,Rental Service,Furniture / Home Store,Intersection,Food,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 11</b>

In [59]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 11, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,33127,2.284481,1.231615,0.022245,6272.913886,1774,135492,Miami,FL,25.813808,-80.2058,11,Gas Station,Restaurant,Pizza Place,Food,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand


<b>Cluster 12</b>

In [60]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 12, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,33126,2.702824,1.574775,0.014442,6496.428084,1933,190741,Miami,FL,25.777977,-80.29718,12,American Restaurant,Café,Construction & Landscaping,Shipping Store,Yoga Studio,Food Court,Garden,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
9,33134,2.963431,1.980088,0.08015,8182.408534,1916,428261,Miami,FL,25.753927,-80.27034,12,Golf Course,American Restaurant,Café,Pool,Gym,Tennis Court,Food Court,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint


<b>Cluster 13</b>

In [61]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 13, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,33155,2.930316,1.726623,0.037097,8697.142349,1948,250911,Miami,FL,25.739011,-80.30685,13,Gym,Other Great Outdoors,Playground,Flower Shop,Yoga Studio,Food Court,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant


<b>Cluster 14</b>

In [62]:
miami_clusters.loc[miami_clusters['ClusterLabels'] == 14, miami_clusters.columns[[0] + list(range(1, miami_clusters.shape[1]))]]

Unnamed: 0,ZipCode,BEDROOM_COUNT,BATHROOM_COUNT,HALF_BATHROOM_COUNT,LOT_SIZE,YEAR_BUILT,ASSESSED_VAL_CUR,City,State,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,33166,2.938213,1.8134,0.051613,9685.909154,1929,296306,Miami,FL,25.824725,-80.30476,14,Grocery Store,Gym Pool,Food Truck,Trail,Yoga Studio,Food & Drink Shop,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant


### Discussion

After closely looking at each cluster, it seems there is much room for improvement. Zip codes were densely clustered between Clusters 1 and 6. Cluster 1 is all over the place and doesn't provide much insight. However, Cluster 6 is our gold mine. We can see that the zip codes in Cluster 6 boast very reasonable property prices, fair bed/bath distribution and access to a plethora of awesome nearby venues. Based on this finding, I would recommend that any average buyer looking to purchase real estate in Miami, Fl further explore the zip codes in Cluster 6. 
<br>
In terms of where to steer these efforts in the future, my thoughts are as follows:
Given the complexity of a city like Miami, very different approaches can be tried in clustering and classification studies. However, not every method can yield the same high quality results for this metropolis. I used the KMeans Clustering because of its speed and efficacy. However, given the results it is clear that for more in-depth guidance, the data set should be expanded and the details of the neighborhood and/or street should be further looked into. I hope that as this information is refined, it would be subsequently turned into an easy to access web app that can be accessed via mobile or even a chat-based interface!

### Conclusion

In conclusion, we set out to recommend Miami viable zip codes for average home buyers who want to pay reasonable prices but still have access to a good variety of Miami's venues. We did so by gathering zip code data, property data and Foursquare location data on the city of Miami. We used this information to cluster Miami zip codes and then further explore those clusters. As time goes on, I hope to refine this project so that I may share it with my friends and network of real estate agents in Miami - we have all been curious about this for as long as I can remember and currently only have anecdotal data to serve us.