## Data Science Capstone Project : The Battle of Neighborhoods (Week 2)
### By : Damien


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>
    
1. <a href="#item1">Description of Problem</a>
    
2. <a href="#item2">Description of Data</a>
    
3. <a href="#item3">Methodology : Download, Explore and Clean the Dataset</a>

4. <a href="#item4">Methodology : Explore Neighborhoods</a>

5. <a href="#item5">Methodology: Analyze Each Zip Code</a>

6. <a href="#item6">Methodology: Cluster Neighborhoods(Zip Codes)</a>

7. <a href="#item7">Results: Examine Clusters</a> 
    
8. <a href="#item8">Discussion: Observations and recommendations</a> 
    
9. <a href="#item9">Conclusion</a> 
</font>
</div>

## 1. Description of Problem: 

Where should I stay when I travel to Fort Lauderdale, Florida, USA on vacation so that I am 
close to the best shopping malls and close to restuarants and coffee shops ? 

I like to travel to different cities to experience the local culture, food, athmosphere and of course the best shopping. Bringing all of this data toegther visually is difficult and I would like to be able to see it altogether in a way that is personalised to me. My friends and family will also use it.


## 2. Description of Data

Zipcode data from https://www.zip-codes.com/city/fl-fort-lauderdale.asp 
for the city of Fort Lauderdale, Florida, USA will be used and combined with Lat-Long data from 
https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/

Only zip codes with population greater than zero will be used to find populated areas for our data set.

https://www.10best.com/destinations/florida/fort-lauderdale/shopping/shopping-centers-districts/

Will provide the top 10 shopping malls and Lat-Long data from https://www.latlong.net/ for each shopping mall will be acquired and collated. An overlay of top 10 best shopping malls data on the foursquare clustered venue data for the locations will be the final output. New analysis may be possible once initial findings are discovered. The data sets are all on different websites in different formats. A lot of data scraping and wrangling is required to create a clean data set.

A new clustered data set will be created and an overlay of shopping mall data will highlight best zip code(s) to stay in when I travel to Fort Lauderdale, Florida, USA.



## 3. Methodology : Download, Explore and Clean the Dataset

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


<a id='item1'></a>

#### Download and Explore Dataset

In [3]:
# Load USA ZIP data
df_all_zip = pd.read_csv("us-zip-code-latitude-and-longitude_cpy.txt", sep = ';')
print('All USA Zip Data downloaded!', end='\n\n')
df_all_zip = df_all_zip[['Zip', 'Latitude', 'Longitude']]   # Drop all columns except Zip, Lat, Long
df_all_zip.head()  # check sample

All USA Zip Data downloaded!



Unnamed: 0,Zip,Latitude,Longitude
0,71937,34.398483,-94.39398
1,72044,35.624351,-92.16056
2,56171,43.660847,-94.74357
3,49430,43.010337,-85.89754
4,52585,41.194129,-91.98027


In [4]:
# Load Fort Lauderdale Data

df_zip = pd.read_csv("List_of_zip_codes.txt", sep = '\t')
print('Fort Lauderdale Zip Data downloaded!', end='\n\n')

df_zip[['ZIP Code', 'Population']].head()  # check sample data, 0 Population rows will need to be removed

df_zip = df_zip.rename(columns={"ZIP Code": "Zip"})   #Let's clean the data and simplify

df_zip = df_zip[['Zip', 'Population']]  # drop other columns

df_zip.head()  # check data

Fort Lauderdale Zip Data downloaded!



Unnamed: 0,Zip,Population
0,ZIP Code 33301,14586
1,ZIP Code 33302,0
2,ZIP Code 33303,0
3,ZIP Code 33304,17724
4,ZIP Code 33305,11927


In [5]:
# Drop rows with population value 0 and NaN

df_zip = df_zip.set_index("Population")
df_zip = df_zip.drop("0", axis=0) # Delete all rows with value 0
df_zip.apply(lambda x: pd.Series(x.dropna().values))  # drop rows with NaN values
df_zip.drop(df_zip.tail(1).index,inplace=True) # drop last n rows ( n=1 here ) to clean last line

df_zip = df_zip.reset_index(drop=False)   #  reset index
#df_zip.tail()  # check bottom of data set

# Drop "ZIP Code " text from Zip column
df_zip_test = pd.concat([df_zip['Population'], df_zip['Zip'].str.split(' ', expand=True)], axis=1)

df_zip_test = df_zip_test.rename(columns={2: "Zip"})  # rename column

df_zip_test = df_zip_test[['Zip', 'Population']]  # Just keep 2 columns

df_zip_test.head()

Unnamed: 0,Zip,Population
0,33301,14586
1,33304,17724
2,33305,11927
3,33306,3397
4,33308,28217


In [6]:
# MERGE DAT SETS with all USA ZIP data to get Lat Long values

df_zip_test["Zip"] = df_zip_test['Zip'].astype('int')   # change data type to int to merge on zip code

df_result = pd.merge(df_zip_test,
                 df_all_zip[['Zip', 'Latitude', 'Longitude']],
                 on='Zip')
df_result.head()


Unnamed: 0,Zip,Population,Latitude,Longitude
0,33301,14586,26.121114,-80.13187
1,33304,17724,26.137693,-80.12646
2,33305,11927,26.153728,-80.12606
3,33306,3397,26.165212,-80.11379
4,33308,28217,26.191111,-80.10846


In [7]:
# add a column of Borough with values Fort Lauderdale as first column

df_result['Borough'] = "Fort Lauderdale"   #  add column with default value

df_result = df_result.rename(columns={"Zip": "Neighborhood"}) #and change column name from Zip to Neighborhood

cols = df_result.columns.tolist()  # reorder columns
cols = cols[-1:] + cols[:-1]   # Rearrange cols : moved the last element to the first position:

df_result = df_result[cols]  #    #Then reorder the dataframe like this: OR    df = df.ix[:, cols]

df_result = df_result.drop(['Population'], axis = 1)   # drop population column as no longer needed

neighborhoods = df_result
neighborhoods.head()


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Fort Lauderdale,33301,26.121114,-80.13187
1,Fort Lauderdale,33304,26.137693,-80.12646
2,Fort Lauderdale,33305,26.153728,-80.12606
3,Fort Lauderdale,33306,26.165212,-80.11379
4,Fort Lauderdale,33308,26.191111,-80.10846


#### Use geopy library to get the latitude and longitude values of Fort Lauderdale

## 4. Methodology : Explore Neighborhoods

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>fl_explorer</em>, as shown below.

In [8]:
address = 'Fort Lauderdale, FL'

geolocator = Nominatim(user_agent="fl_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Fort Lauderdale, FL are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Fort Lauderdale, FL are 26.1223084, -80.1433786.


#### Create a map of Fort Lauderdale, FL with neighborhoods(ZIP Codes) superimposed on top.

In [9]:
# create map of Fort Lauderdale, Florida using latitude and longitude values

map_FL = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=15,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_FL) 
    
map_FL

In [10]:

FTL_data = neighborhoods[neighborhoods['Borough'] == 'Fort Lauderdale'].reset_index(drop=True)
FTL_data.head()


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Fort Lauderdale,33301,26.121114,-80.13187
1,Fort Lauderdale,33304,26.137693,-80.12646
2,Fort Lauderdale,33305,26.153728,-80.12606
3,Fort Lauderdale,33306,26.165212,-80.11379
4,Fort Lauderdale,33308,26.191111,-80.10846


## 5. Methodology: Analyze Each Zip Code

### Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [11]:
CLIENT_ID = '1PSNDXVMG0S4ZUJT5PG0IYJCDGGH1YT2EBYGN43HR43NTYE5' # your Foursquare ID
CLIENT_SECRET = 'QKBF5WN5MV4VDQM0RVNEWIFKFBPKHVJWMI32X0UDSAALAGYY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1PSNDXVMG0S4ZUJT5PG0IYJCDGGH1YT2EBYGN43HR43NTYE5
CLIENT_SECRET:QKBF5WN5MV4VDQM0RVNEWIFKFBPKHVJWMI32X0UDSAALAGYY


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [12]:
FTL_data.loc[0, 'Neighborhood']   #  this returns a zip code for the neighborhood

33301

Get the neighborhood's latitude and longitude values.

In [13]:
neighborhood_latitude = FTL_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = FTL_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = FTL_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of 33301 are 26.121114000000002, -80.13186999999999.


#### Now, let's get the top 100 venues that are in this Zip code within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [14]:
radius = 500
LIMIT = 100

latitude = neighborhood_latitude
longitude = neighborhood_longitude

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?client_id=1PSNDXVMG0S4ZUJT5PG0IYJCDGGH1YT2EBYGN43HR43NTYE5&client_secret=QKBF5WN5MV4VDQM0RVNEWIFKFBPKHVJWMI32X0UDSAALAGYY&ll=26.121114000000002,-80.13186999999999&v=20180605&radius=500&limit=100'

Send the GET request and examine the resutls

In [15]:
results = requests.get(url).json()

Use the *items* key. create  **get_category_type** function  to use with Foursquare data.

In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [17]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,B Square Burgers,Burger Joint,26.119312,-80.132368
1,Hoffman's Chocolates,Chocolate Shop,26.119168,-80.133253
2,Louie Bossi's Ristorante Bar Pizzeria,Lounge,26.119232,-80.132543
3,Vinos Wine Bar on Las Olas,Wine Bar,26.119008,-80.133235
4,Luigi's Coal Oven Pizza,Pizza Place,26.119643,-80.128819


And how many venues were returned by Foursquare?

In [18]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

55 venues were returned by Foursquare.


<a id='item2'></a>

#### Let's create a function to repeat the same process to all the neighborhoods ( zip codes ) in Fort Lauderdale

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Let's write the code to run the above function on each neighborhood(zip) and create a new dataframe called *FLT_venues*.

In [20]:
# type your answer here

FTL_venues = getNearbyVenues(names=FTL_data['Neighborhood'],
                                   latitudes=FTL_data['Latitude'],
                                   longitudes=FTL_data['Longitude']
                                  )

FTL_venues

33301
33304
33305
33306
33308
33309
33311
33312
33313
33314
33315
33316
33317
33319
33321
33322
33323
33324
33325
33326
33327
33328
33330
33331
33332
33334
33351


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,33301,26.121114,-80.13187,B Square Burgers,26.119312,-80.132368,Burger Joint
1,33301,26.121114,-80.13187,Hoffman's Chocolates,26.119168,-80.133253,Chocolate Shop
2,33301,26.121114,-80.13187,Louie Bossi's Ristorante Bar Pizzeria,26.119232,-80.132543,Lounge
3,33301,26.121114,-80.13187,Vinos Wine Bar on Las Olas,26.119008,-80.133235,Wine Bar
4,33301,26.121114,-80.13187,Luigi's Coal Oven Pizza,26.119643,-80.128819,Pizza Place
5,33301,26.121114,-80.13187,Caffe Europa,26.119039,-80.133523,Italian Restaurant
6,33301,26.121114,-80.13187,Macabi Cigars And Liqor Bar,26.119365,-80.130595,Liquor Store
7,33301,26.121114,-80.13187,Rocco's Tacos and Tequila Bar,26.119431,-80.129044,Mexican Restaurant
8,33301,26.121114,-80.13187,Asia Bay,26.119479,-80.131619,Asian Restaurant
9,33301,26.121114,-80.13187,Gran Forno Bakery,26.119482,-80.130219,Bakery


#### Let's check the size of the resulting dataframe

In [21]:
print(FTL_venues.shape)
FTL_venues.head()

(303, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,33301,26.121114,-80.13187,B Square Burgers,26.119312,-80.132368,Burger Joint
1,33301,26.121114,-80.13187,Hoffman's Chocolates,26.119168,-80.133253,Chocolate Shop
2,33301,26.121114,-80.13187,Louie Bossi's Ristorante Bar Pizzeria,26.119232,-80.132543,Lounge
3,33301,26.121114,-80.13187,Vinos Wine Bar on Las Olas,26.119008,-80.133235,Wine Bar
4,33301,26.121114,-80.13187,Luigi's Coal Oven Pizza,26.119643,-80.128819,Pizza Place


Let's check how many venues were returned for each neighborhood

In [22]:
FTL_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
33301,55,55,55,55,55,55
33304,28,28,28,28,28,28
33305,1,1,1,1,1,1
33306,26,26,26,26,26,26
33308,20,20,20,20,20,20
33309,7,7,7,7,7,7
33311,1,1,1,1,1,1
33312,4,4,4,4,4,4
33313,4,4,4,4,4,4
33314,6,6,6,6,6,6


#### Let's find out how many unique categories can be curated from all the returned venues

In [23]:
print('There are {} uniques categories.'.format(len(FTL_venues['Venue Category'].unique())))

There are 123 uniques categories.


<a id='item3'></a>

## 5. Methodology : Analyze Each Zip code

In [24]:
# one hot encoding
FTL_onehot = pd.get_dummies(FTL_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
FTL_onehot['Neighborhood'] = FTL_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [FTL_onehot.columns[-1]] + list(FTL_onehot.columns[:-1])
FTL_onehot = FTL_onehot[fixed_columns]

FTL_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Big Box Store,Boat or Ferry,Boutique,Breakfast Spot,Bridal Shop,Burger Joint,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Service,Event Space,Farmers Market,Fast Food Restaurant,Flower Shop,Food,French Restaurant,Garden,Gastropub,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,Home Service,Hotel,Hotel Pool,Ice Cream Shop,Intersection,Italian Restaurant,Juice Bar,Kids Store,Latin American Restaurant,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Multiplex,Nail Salon,Nightclub,Office,Other Repair Shop,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Pool,Print Shop,Pub,Public Art,Record Shop,Rental Car Location,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoothie Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Sports Bar,Steakhouse,Supplement Shop,Sushi Restaurant,Taco Place,Thai Restaurant,Thrift / Vintage Store,Toll Plaza,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,33301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,33301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,33301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,33301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,33301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [25]:
FTL_onehot.shape

(303, 124)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
FTL_grouped = FTL_onehot.groupby('Neighborhood').mean().reset_index()
FTL_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Big Box Store,Boat or Ferry,Boutique,Breakfast Spot,Bridal Shop,Burger Joint,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Service,Event Space,Farmers Market,Fast Food Restaurant,Flower Shop,Food,French Restaurant,Garden,Gastropub,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,Home Service,Hotel,Hotel Pool,Ice Cream Shop,Intersection,Italian Restaurant,Juice Bar,Kids Store,Latin American Restaurant,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Multiplex,Nail Salon,Nightclub,Office,Other Repair Shop,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Pool,Print Shop,Pub,Public Art,Record Shop,Rental Car Location,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoothie Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Sports Bar,Steakhouse,Supplement Shop,Sushi Restaurant,Taco Place,Thai Restaurant,Thrift / Vintage Store,Toll Plaza,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,33301,0.018182,0.036364,0.0,0.018182,0.054545,0.0,0.0,0.0,0.0,0.018182,0.018182,0.072727,0.0,0.0,0.0,0.018182,0.0,0.018182,0.018182,0.018182,0.018182,0.0,0.018182,0.036364,0.018182,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.018182,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.036364,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.109091,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.018182,0.054545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0
1,33304,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.035714,0.035714,0.035714,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
2,33305,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,33306,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.038462,0.0,0.0,0.0,0.115385,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,33308,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
5,33309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,33311,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,33312,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,33313,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,33314,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [28]:
FTL_grouped.shape

(26, 124)

#### Let's print each neighborhood along with the top 5 most common venues

In [29]:
num_top_venues = 5
i=0
for hood in FTL_grouped['Neighborhood']:
    
    Zip = FTL_grouped['Neighborhood'][i] 
    print("---- ZIP Code :", Zip, "----")  
    temp = FTL_grouped[FTL_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
    i = i + 1

---- ZIP Code : 33301 ----
                venue  freq
0  Italian Restaurant  0.11
1                 Bar  0.07
2    Asian Restaurant  0.05
3  Mexican Restaurant  0.05
4          Restaurant  0.04


---- ZIP Code : 33304 ----
                  venue  freq
0             Wine Shop  0.07
1  Fast Food Restaurant  0.07
2          Intersection  0.04
3                 Hotel  0.04
4            Public Art  0.04


---- ZIP Code : 33305 ----
               venue  freq
0            Dog Run   1.0
1  Accessories Store   0.0
2          Multiplex   0.0
3                Pub   0.0
4         Print Shop   0.0


---- ZIP Code : 33306 ----
                venue  freq
0          Restaurant  0.12
1         Pizza Place  0.12
2  Italian Restaurant  0.08
3      Breakfast Spot  0.08
4           Rock Club  0.04


---- ZIP Code : 33308 ----
                venue  freq
0  Italian Restaurant  0.10
1  Seafood Restaurant  0.10
2   German Restaurant  0.10
3         Yoga Studio  0.05
4                 Bar  0.05


---- ZIP 

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = FTL_grouped['Neighborhood']

for ind in np.arange(FTL_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(FTL_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33301,Italian Restaurant,Bar,Asian Restaurant,Mexican Restaurant,Pizza Place,American Restaurant,Clothing Store,Restaurant,Ice Cream Shop,French Restaurant
1,33304,Wine Shop,Fast Food Restaurant,Intersection,Donut Shop,Clothing Store,Rental Car Location,Public Art,Coffee Shop,Grocery Store,Convenience Store
2,33305,Dog Run,Yoga Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop,Event Service,Event Space
3,33306,Pizza Place,Restaurant,Italian Restaurant,Breakfast Spot,Coffee Shop,Big Box Store,Diner,Pub,Clothing Store,Rock Club
4,33308,German Restaurant,Seafood Restaurant,Italian Restaurant,Yoga Studio,Pharmacy,Pub,Nail Salon,Record Shop,Mexican Restaurant,Lounge


<a id='item4'></a>

## 6. Methodology: Cluster Neighborhoods(Zip Codes) : 

Run *k*-means to cluster the neighborhood into 5 clusters.

In [32]:
# set number of clusters
kclusters = 7

FTL_grouped_clustering = FTL_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(FTL_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 4, 1, 1, 1, 2, 1, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [33]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

FTL_merged = FTL_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
FTL_merged = FTL_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

FTL_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Fort Lauderdale,33301,26.121114,-80.13187,1.0,Italian Restaurant,Bar,Asian Restaurant,Mexican Restaurant,Pizza Place,American Restaurant,Clothing Store,Restaurant,Ice Cream Shop,French Restaurant
1,Fort Lauderdale,33304,26.137693,-80.12646,1.0,Wine Shop,Fast Food Restaurant,Intersection,Donut Shop,Clothing Store,Rental Car Location,Public Art,Coffee Shop,Grocery Store,Convenience Store
2,Fort Lauderdale,33305,26.153728,-80.12606,4.0,Dog Run,Yoga Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop,Event Service,Event Space
3,Fort Lauderdale,33306,26.165212,-80.11379,1.0,Pizza Place,Restaurant,Italian Restaurant,Breakfast Spot,Coffee Shop,Big Box Store,Diner,Pub,Clothing Store,Rock Club
4,Fort Lauderdale,33308,26.191111,-80.10846,1.0,German Restaurant,Seafood Restaurant,Italian Restaurant,Yoga Studio,Pharmacy,Pub,Nail Salon,Record Shop,Mexican Restaurant,Lounge


#### Check all data before starting to cluster

In [34]:
# Ensure all data types are correct

#print(FTL_merged.dtypes)

# Drop NaN rows
FTL_merged = FTL_merged.dropna()
FTL_merged = FTL_merged.reset_index(drop=True)

#Change datatype to int for plotting circles.
FTL_merged['Cluster Labels'] = FTL_merged['Cluster Labels'].astype('int')


In [35]:
#Load in Top 10 Best Shopping area data

mall_List_data = { 'Mall_Name':['Swap Shop', 'Dania Antique Row', 'Riverwalk', 
                                 'Pompano Citi Centre', 'Downtown Hollywood', 'Coral Ridge Mall',
                                'Broward Mall', 'Sawgrass Mills', 'Las Olas Boulevard', 'The Galleria'], 
                  'Latitude':[26.136870, 26.051330, 26.120050, 
                              26.233470, 26.011290, 26.170970,
                             26.120740, 26.146210, 26.120060, 26.136560], 
                  'Longitude':[-80.192128, -80.144070, -80.147950,
                               -80.102870, -80.144210, -80.119120,
                              -80.255400, -80.324690, -80.115510, -80.113580] }

# Create DataFrame 
mall_List = pd.DataFrame(mall_List_data)
mall_List

Unnamed: 0,Mall_Name,Latitude,Longitude
0,Swap Shop,26.13687,-80.192128
1,Dania Antique Row,26.05133,-80.14407
2,Riverwalk,26.12005,-80.14795
3,Pompano Citi Centre,26.23347,-80.10287
4,Downtown Hollywood,26.01129,-80.14421
5,Coral Ridge Mall,26.17097,-80.11912
6,Broward Mall,26.12074,-80.2554
7,Sawgrass Mills,26.14621,-80.32469
8,Las Olas Boulevard,26.12006,-80.11551
9,The Galleria,26.13656,-80.11358


In [36]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(FTL_merged['Latitude'], FTL_merged['Longitude'], FTL_merged['Neighborhood'], FTL_merged['Cluster Labels']):
    label = folium.Popup('Zip Code ' + str(poi) + ', Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

### Display Shopping areas with smaller BLACK DOTS

for lt, ln, mall in zip(mall_List['Latitude'], mall_List['Longitude'], mall_List['Mall_Name']):
    mall_label = folium.Popup(str(mall), parse_html=True)
    folium.CircleMarker(
        [lt, ln],
        radius=5,
        popup= mall_label,
        color="Black", 
        fill=True,
        fill_color="Black", 
        fill_opacity=0.7).add_to(map_clusters)

       
map_clusters

<a id='item5'></a>

## 7. Results: Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. 

#### Cluster 0

In [37]:
FTL_merged.loc[FTL_merged['Cluster Labels'] == 0, FTL_merged.columns[[1] + list(range(5, FTL_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,33317,Home Service,Yoga Studio,French Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Service
13,33319,Home Service,Restaurant,Yoga Studio,French Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop


#### Cluster 1 : Includes the recommended Zip Code 33301

In [38]:
FTL_merged.loc[FTL_merged['Cluster Labels'] == 1, FTL_merged.columns[[1] + list(range(5, FTL_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33301,Italian Restaurant,Bar,Asian Restaurant,Mexican Restaurant,Pizza Place,American Restaurant,Clothing Store,Restaurant,Ice Cream Shop,French Restaurant
1,33304,Wine Shop,Fast Food Restaurant,Intersection,Donut Shop,Clothing Store,Rental Car Location,Public Art,Coffee Shop,Grocery Store,Convenience Store
3,33306,Pizza Place,Restaurant,Italian Restaurant,Breakfast Spot,Coffee Shop,Big Box Store,Diner,Pub,Clothing Store,Rock Club
4,33308,German Restaurant,Seafood Restaurant,Italian Restaurant,Yoga Studio,Pharmacy,Pub,Nail Salon,Record Shop,Mexican Restaurant,Lounge
5,33309,Thrift / Vintage Store,Pizza Place,Mexican Restaurant,Grocery Store,Mobile Phone Shop,Donut Shop,Office,Event Service,Fast Food Restaurant,Farmers Market
7,33312,Park,Boat or Ferry,Garden,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Service,Event Space
8,33313,Convenience Store,Multiplex,Food,Department Store,Yoga Studio,French Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run
9,33314,Bakery,Construction & Landscaping,Coffee Shop,Toll Plaza,Automotive Shop,Farmers Market,Yoga Studio,Food,Flower Shop,Fast Food Restaurant
10,33315,Hotel,German Restaurant,Fast Food Restaurant,Sports Bar,Liquor Store,Mediterranean Restaurant,Donut Shop,Nightclub,Other Repair Shop,Discount Store
11,33316,Seafood Restaurant,Italian Restaurant,Burger Joint,Mexican Restaurant,Coffee Shop,Sushi Restaurant,Diner,Deli / Bodega,Clothing Store,Paper / Office Supplies Store


#### Cluster 2

In [39]:
FTL_merged.loc[FTL_merged['Cluster Labels'] == 2, FTL_merged.columns[[1] + list(range(5, FTL_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,33311,Motorcycle Shop,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Service,Event Space


#### Cluster 3

In [40]:
FTL_merged.loc[FTL_merged['Cluster Labels'] == 3, FTL_merged.columns[[1] + list(range(5, FTL_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,33324,Golf Course,Yoga Studio,French Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Service


#### Cluster 4

In [41]:
FTL_merged.loc[FTL_merged['Cluster Labels'] == 4, FTL_merged.columns[[1] + list(range(5, FTL_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,33305,Dog Run,Yoga Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop,Event Service,Event Space


#### Cluster 5

In [42]:
FTL_merged.loc[FTL_merged['Cluster Labels'] == 5, FTL_merged.columns[[1] + list(range(5, FTL_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,33331,Event Service,Yoga Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Space


#### Cluster 6

In [43]:
FTL_merged.loc[FTL_merged['Cluster Labels'] == 6, FTL_merged.columns[[1] + list(range(5, FTL_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,33321,Print Shop,Yoga Studio,French Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Event Service


### Methodology: Calc the sum of the distances between each shopping mall lat,long and the zip code to determine the zip code with total smallest sum of distances to give the 'best' area to stay in to be close to shops and restaurants.

Distance Formula

This uses the ‘haversine’ formula to calculate the great-circle distance between two points – that is, the shortest distance over the earth’s surface – giving an ‘as-the-crow-flies’ distance between the points (ignoring any hills they fly over, of course!).
Haversine
formula: 	a = sin²(Δφ/2) + cos φ1 ⋅ cos φ2 ⋅ sin²(Δλ/2)


φ/λ for lati­tude/longi­tude in radians 

https://www.movable-type.co.uk/scripts/latlong.html


In [45]:
from math import sin, cos, sqrt, atan2, radians

In [83]:
# create function to calculate the distance between two lat-long coords

def crow_flies_distance(lat1, lon1, lat2, lon2) :
    # approximate radius of earth in km
    R = 6373.0

    lat1 = radians(lat1)
    lon1 = radians(lon1)
    lat2 = radians(lat2)
    lon2 = radians(lon2)

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = round(R * c, 2)
    #print("Result:", distance, "Km")
    
    return( distance )


In [84]:
test_neigh1 = neighborhoods

j=0
for Mall_Name in mall_List['Mall_Name'] :
    test_neigh1[mall_List['Mall_Name'][j]] = 0.0 # mall_List['Mall_Name'][j]
    j = j + 1
    
#test_neigh1.head()

In [85]:
## Loop to populate new dataframe with distances while calling function crow_flies_distance

j=0
for Mall_Name in mall_List['Mall_Name'] :
    i=0
    for Neighborhood in test_neigh1['Neighborhood'] :
        zip_target_lat = test_neigh1['Latitude'][i] 
        zip_target_long = test_neigh1['Longitude'][i] 

        mall_target_lat = mall_List['Latitude'][j]
        mall_target_long = mall_List['Longitude'][j]

        distance = crow_flies_distance(zip_target_lat, zip_target_long, mall_target_lat, mall_target_long)
        test_neigh1[Mall_Name][i] = distance
        i = i + 1
    j = j+1

test_neigh1.head()   # check data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Swap Shop,Dania Antique Row,Riverwalk,Pompano Citi Centre,Downtown Hollywood,Coral Ridge Mall,Broward Mall,Sawgrass Mills,Las Olas Boulevard,The Galleria,Sum_of_distances
0,Fort Lauderdale,33301,26.121114,-80.13187,6.27,7.86,1.61,12.83,12.28,5.69,12.34,19.46,1.64,2.51,82.49
1,Fort Lauderdale,33304,26.137693,-80.12646,6.56,9.77,2.91,10.91,14.17,3.77,13.01,19.82,2.25,1.29,84.46
2,Fort Lauderdale,33305,26.153728,-80.12606,6.86,11.53,4.34,9.17,15.95,2.04,13.43,19.85,3.89,2.28,89.34
3,Fort Lauderdale,33306,26.165212,-80.11379,8.43,13.02,6.07,7.67,17.39,0.83,14.98,21.16,5.03,3.19,97.77
4,Fort Lauderdale,33308,26.191111,-80.10846,10.3,15.95,8.83,4.74,20.32,2.48,16.63,22.16,7.93,6.09,115.43


### Find the Zip code closest to all top hotels by distance calculation and compare with Visual Map

In [92]:
# Create new column = sum of distances for each zip code

test_neigh2 = test_neigh1

# Find the minium distance in that column and display the Zip code

test_neigh3 = test_neigh2[['Swap Shop', 'Dania Antique Row', 'Riverwalk', 'Pompano Citi Centre', 'Downtown Hollywood', 'Coral Ridge Mall', 'Broward Mall', 'Sawgrass Mills', 'Las Olas Boulevard', 'The Galleria']]
    
test_neigh2['Sum_of_distances'] = test_neigh3.sum(axis=1)

min_index = test_neigh2['Sum_of_distances'].idxmin()

#print("index of minimum value is : ", min_index, end="\n\n")

print("Zip code with total shortest distance to all shopping areas is ZIP : ", test_neigh2['Neighborhood'][min_index], end="\n\n")

print("Additional information can now be sourced at https://www.zipdatamaps.com/{}".format(test_neigh2['Neighborhood'][min_index]), end="\n\n")


Zip code with total shortest distance to all shopping areas is ZIP :  33301

Additional information can now be sourced at https://www.zipdatamaps.com/33301



### **Zip code with total shortest distance to all shopping areas is ZIP :  33301  and agrees with clustering on the map !**

## 8. Discussion: Observations and recommendations

The area closest to the coast had a higher density of zipcodes and shopping areas giving the 'best' location as zip code 33301 with 3 shopping areas of the Top 10 within close proximity.

An area further away from the coast at Zip code = 33323 is beside Sawgrass Mills. One of the top shopping areas and staying in this area may be a lower cost area to avail of the same amenities and shopping, however, this would need to be investigated.

Adding the Top 10 rank to each label of shopping area would provide more info to users/audience.

An overlay of hotels could also be added to make it easier to find suitable locations on a new map and deepen the analysis and personalisation for the user. 

Using distance calculation also gave the same result and is a quick and easy way to estimate a location/zip code area to consider as a place to stay to be located close to shopping and restuarants. Further analysis of other cities would validate this approach. 

Access to additional data sources such as https://www.zipdatamaps.com/33301 could be used to download additional useful information about this area. Further coding required to scrape this data and align with findings.


## 9. Conclusion

Foursquare data was used to answer the question of "Where should I stay in Fort Lauderdale, Florida when I travel there so
I am close to the best shopping with lots of restuarants and amenities ?". Zip code data was used with Lat-Long data to gather venue data from Foursquare
to assess the areas with most amenities and use k-means clustering to show the areas most suitable. Top 10 Shopping area data was taken 
from bestTop10.com and lat-long data from latlong.net to overlay the best shopping experience with the best restuarants and venues. 

Visually, the Zip code area 33301 was the best to stay in close proximity to 3 of the top 10 shopping areas.

Further analysis was carried out to calculate the shortest total sum of distances between zip code and lat-long centres to be able to predict the best zip code to stay in to be located as close as possible to all shopping areas. The result returned agreed with the clustering exercise. 

The project has shown how foursquare data and other data sets can be combined to answer a meaningful question for a traveller visitng a new city.

