# Capstone Project - The Battle of Neighborhoods (Zipcodes)
### Neighborhoods = Zipcodes in my analysis


## Background

A client is looking to open a new restaurant in Dallas TX, but she is not sure about the best location for his new venue. The investor is new to the restaurant business and want to start small and is thinking of opening either a pizza restaurant, cafe or wings restaurant and do not want to start a restaurant in a Zipcode that have these restaurant categories in the top five(5).

Although not the fastest growing city in Texas; Dallas, with a population of 1.245 Mil (2018) and numerous businesses that is based in and around Dallas is prime ground for restaurants. 
The Dallas-Fort Worth-Arlington core-based statistical area by the U.S. Census Bureau -- had 13,763 restaurants in the fall of 2015.
Dallas has 4 times more restaurants per person than New York City.

Dallas is not as expensive as New York or San Francisco, but mortgage or rent can be a substantial chunk of your cost. It is highly preferable to find either a location that is on the outskirts of the city or a type of restaurant that does not have a big footprint.

More important the client has children that are Elementary school and Middle school age and has and appartment in Irving, Texas. This means the children will be in a school in the Irving Independent School district. 
This means she prefers to be with in 5 to 10 kilometers of her home.

The question is what zipcode will be the best to open a restaurant in Dallas where competition will not be too much, while within 5 to 10 kilometers from the clients home?

## Data section

### Data requirements:

1. We need all the zip codes that fall inside the City of Dallas, with their longitude and Latitudes.
2. Source: The data is available at: https://www.dallasopendata.com/Geography-Boundaries/FY-2017-City-of-Dallas-City-Limits/ad4m-4kje and can be downloaded in any one off the formats below: 
* Geospatial formats: KML, KMZ, GeoJSON, Shapefile
* Non-geospatial file types: CSV, JSON, XML,  etc
* The data can also be accessed via The Socrata Open Data API at  https://dev.socrata.com/
3. We need to access Foursquare API to plot all the restaurants per Dallas Zip codes and group the zip codes by most common restaurant type.
4. We know the client's children will be in the Irving ISD. We will place circles around Irving to determine the distance to and from a possible restaurant location.

In [1]:
import pandas as pd
#pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)

import itertools
import numpy as np
import json # library to handle JSON files
import requests # library to handle requests
from sklearn.cluster import KMeans

## Load Texas ZipCodes Data From CSV File and Investigate data

In [2]:
df = pd.read_csv('TexasZipCodes.csv')
df.head()

Unnamed: 0,Zipcodes,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopointlat,geopointlon
0,75475,Randolph,TX,33.485315,-96.25525,-6,1,33.485315,-96.25525
1,75757,Bullard,TX,32.136787,-95.3671,-6,1,32.136787,-95.3671
2,78650,McDade,TX,30.283941,-97.23563,-6,1,30.283941,-97.23563
3,75010,Carrollton,TX,33.030556,-96.89328,-6,1,33.030556,-96.89328
4,76054,Hurst,TX,32.858398,-97.17681,-6,1,32.858398,-97.17681


In [3]:
df.columns

Index(['Zipcodes', 'City', 'State', 'Latitude', 'Longitude', 'Timezone',
       'Daylight savings time flag', 'geopointlat', 'geopointlon'],
      dtype='object')

In [4]:
df.shape

(2742, 9)

### Let's select some features that we want to use

In [5]:
dflight = df[['Zipcodes', 'City', 'State', 'Latitude', 'Longitude']]
dflight.head()

Unnamed: 0,Zipcodes,City,State,Latitude,Longitude
0,75475,Randolph,TX,33.485315,-96.25525
1,75757,Bullard,TX,32.136787,-95.3671
2,78650,McDade,TX,30.283941,-97.23563
3,75010,Carrollton,TX,33.030556,-96.89328
4,76054,Hurst,TX,32.858398,-97.17681


In [6]:
dflight.shape

(2742, 5)

In [7]:
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 6.4MB/s ta 0:00:011
[?25hCollecting click (from geocoder)
  Using cached https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Collecting future (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
[K     |████████████████████████████████| 829kB 3.5MB/s eta 0:00:01
Building wheels for collected packages: future
  Building wheel for future (setup.py) ... [?25l

In [8]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [9]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

In [10]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    brotlipy-0.7.0             |py36h8c4c3a4_1000         346 KB  conda-forge
    chardet-3.0.4              |py36h9f0ad1d_1006         188 KB  conda-forge
    cryptography-2.9.2         |   py36h45558ae_0         613 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    pandas-1.0.5               |   py36h83

In [11]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import os

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

## Filter on Zipcodes that is in  Dallas

In [12]:
#Filter on Zipcodes that is in  Dallas
dallas_data=dflight[dflight['City'].str.contains("Dallas")]
dallas_data=dallas_data.reset_index(drop=True)
dallas_data.head()

Unnamed: 0,Zipcodes,City,State,Latitude,Longitude
0,75294,Dallas,TX,32.767268,-96.777626
1,75255,Dallas,TX,32.669783,-96.614921
2,75374,Dallas,TX,32.767268,-96.777626
3,75252,Dallas,TX,32.998132,-96.79088
4,75275,Dallas,TX,32.767268,-96.777626


In [13]:
dallas_data.shape

(122, 5)

## Find Dallas geographical coordinates

In [14]:
address = 'Dallas, Texas'

geolocator = Nominatim(user_agent="To_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dallas, Texas, USA is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dallas, Texas, USA is 32.7762719, -96.7968559.


## Map of Dallas, TX, USA  with zipcodes superimposed on top.

In [15]:
map_dallas = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, zipcodes, city in zip( dallas_data['Latitude'], dallas_data['Longitude'], dallas_data['Zipcodes'], dallas_data['City']):
    label = '{}, {}'.format(zipcodes, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dallas)  
    
map_dallas

# Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = '355RLWAUP40MLINWD15PZXIETQH01XAJUNHRNPSDPI2WCACM' # your Foursquare ID
CLIENT_SECRET = 'PAZB1BAI5BMTESTMA3DSNVCQ04ZOWJC4DAHWZAOVS5QNTDSR' # your Foursquare Secret
VERSION = '20200101' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 355RLWAUP40MLINWD15PZXIETQH01XAJUNHRNPSDPI2WCACM
CLIENT_SECRET:PAZB1BAI5BMTESTMA3DSNVCQ04ZOWJC4DAHWZAOVS5QNTDSR


### Exploring the zipcodes

In [17]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=300
    LIMIT=100
    categoryId='4d4b7105d754a06374d81259'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            categoryId, 
            radius, 
            LIMIT)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zipcode', 
                  'Zipcode Latitude', 
                  'Zipcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
dallas_venues = getNearbyVenues(names=dallas_data['Zipcodes'], 
                               latitudes=dallas_data['Latitude'],
                               longitudes=dallas_data['Longitude']
                              )




75294
75255
75374
75252
75275
75202
75392
75355
75389
75356
75270
75220
75234
75368
75215
75231
75251
75382
75373
75379
75214
75210
75246
75363
75393
75216
75326
75238
75242
75250
75247
75207
75263
75285
75223
75287
75336
75323
75232
75376
75283
75240
75381
75388
75212
75245
75204
75226
75353
75264
75205
75301
75230
75254
75364
75221
75237
75398
75357
75219
75315
75228
75387
75359
75262
75386
75239
75222
75390
75260
75229
75397
75206
75235
75209
75360
75201
75350
75342
75346
75378
75249
75236
75284
75367
75248
75265
75267
75225
75258
75208
75313
75217
75354
75396
75370
75371
75310
75295
75233
75320
75380
75277
75227
75243
75339
75394
75312
75224
75303
75241
75286
75244
75253
75261
75211
75218
75372
75395
75391
75203
75266


In [19]:
print(dallas_venues.shape)
dallas_venues.head()

(201, 7)


Unnamed: 0,Zipcode,Zipcode Latitude,Zipcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,75255,32.669783,-96.614921,Sid's Food Mart,32.669854,-96.614021,Deli / Bodega
1,75202,32.77988,-96.80502,Y. O. Ranch Steakhouse,32.781296,-96.806402,Steakhouse
2,75202,32.77988,-96.80502,Tutta's,32.781305,-96.807423,Pizza Place
3,75202,32.77988,-96.80502,Record Grill,32.779976,-96.806781,American Restaurant
4,75202,32.77988,-96.80502,Latin Deli,32.778747,-96.805873,Latin American Restaurant


In [20]:
dallas_count = dallas_venues
dallas_count.groupby(['Venue Category']).size()

Venue Category
American Restaurant               7
Asian Restaurant                  7
BBQ Joint                         4
Bakery                            2
Bistro                            1
Breakfast Spot                    3
Burger Joint                      6
Burrito Place                     3
Café                              7
Cajun / Creole Restaurant         2
Chinese Restaurant                3
Comfort Food Restaurant           1
Cuban Restaurant                  2
Deli / Bodega                     3
Diner                             1
Donut Shop                        8
Fast Food Restaurant             17
Food                              6
Food Court                        1
Food Truck                        6
French Restaurant                 3
Fried Chicken Joint              10
Italian Restaurant                6
Japanese Restaurant               2
Korean Restaurant                 1
Latin American Restaurant         1
Mediterranean Restaurant          1
Mexican Resta

## Analyze each zipcode

In [21]:
# one hot encoding
dallas_onehot = pd.get_dummies(dallas_venues[['Venue Category']], prefix="", prefix_sep="")

# add zipcodeborhood column back to dataframe
dallas_onehot['Zipcode'] = dallas_venues['Zipcode'] 

# move zipcode column to the first column
fixed_columns = [dallas_onehot.columns[-1]] + list(dallas_onehot.columns[:-1])
dallas_onehot = dallas_onehot[fixed_columns]

dallas_onehot.head()

Unnamed: 0,Zipcode,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Burger Joint,Burrito Place,Café,...,Salad Place,Sandwich Place,Seafood Restaurant,Soup Place,Steakhouse,Sushi Restaurant,Taco Place,Thai Restaurant,Vegetarian / Vegan Restaurant,Wings Joint
0,75255,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,75202,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,75202,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,75202,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,75202,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
dallas_onehot.columns

Index(['Zipcode', 'American Restaurant', 'Asian Restaurant', 'BBQ Joint',
       'Bakery', 'Bistro', 'Breakfast Spot', 'Burger Joint', 'Burrito Place',
       'Café', 'Cajun / Creole Restaurant', 'Chinese Restaurant',
       'Comfort Food Restaurant', 'Cuban Restaurant', 'Deli / Bodega', 'Diner',
       'Donut Shop', 'Fast Food Restaurant', 'Food', 'Food Court',
       'Food Truck', 'French Restaurant', 'Fried Chicken Joint',
       'Italian Restaurant', 'Japanese Restaurant', 'Korean Restaurant',
       'Latin American Restaurant', 'Mediterranean Restaurant',
       'Mexican Restaurant', 'New American Restaurant', 'Noodle House',
       'Pizza Place', 'Restaurant', 'Salad Place', 'Sandwich Place',
       'Seafood Restaurant', 'Soup Place', 'Steakhouse', 'Sushi Restaurant',
       'Taco Place', 'Thai Restaurant', 'Vegetarian / Vegan Restaurant',
       'Wings Joint'],
      dtype='object')

In [23]:
dallas_onehot.describe() 

Unnamed: 0,Zipcode,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Burger Joint,Burrito Place,Café,...,Salad Place,Sandwich Place,Seafood Restaurant,Soup Place,Steakhouse,Sushi Restaurant,Taco Place,Thai Restaurant,Vegetarian / Vegan Restaurant,Wings Joint
count,201.0,201.0,201.0,201.0,201.0,201.0,201.0,201.0,201.0,201.0,...,201.0,201.0,201.0,201.0,201.0,201.0,201.0,201.0,201.0,201.0
mean,75230.79602,0.034826,0.034826,0.0199,0.00995,0.004975,0.014925,0.029851,0.014925,0.034826,...,0.0199,0.109453,0.004975,0.004975,0.014925,0.014925,0.039801,0.004975,0.004975,0.004975
std,27.168607,0.183796,0.183796,0.140007,0.099501,0.070535,0.121557,0.1706,0.121557,0.183796,...,0.140007,0.312986,0.070535,0.070535,0.121557,0.121557,0.195979,0.070535,0.070535,0.070535
min,75201.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,75202.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,75223.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,75270.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,75287.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


## Group rows by zipcode and by taking the mean of the frequency of occurrence of each category

In [24]:
zipcodes_grouped = dallas_onehot.groupby('Zipcode').mean().reset_index()
zipcodes_grouped.head()

Unnamed: 0,Zipcode,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Burger Joint,Burrito Place,Café,...,Salad Place,Sandwich Place,Seafood Restaurant,Soup Place,Steakhouse,Sushi Restaurant,Taco Place,Thai Restaurant,Vegetarian / Vegan Restaurant,Wings Joint
0,75201,0.105263,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,...,0.0,0.105263,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0
1,75202,0.060606,0.090909,0.030303,0.0,0.0,0.0,0.060606,0.030303,0.030303,...,0.030303,0.151515,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0
2,75204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,75205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,75206,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0


In [25]:
#Sort venues decending
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Create the new dataframe and display the top 5 venues for each zipcode.

In [26]:
num_top_venues = 5

#indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zipcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
zipcodes_venues_sorted = pd.DataFrame(columns=columns)
zipcodes_venues_sorted['Zipcode'] = zipcodes_grouped['Zipcode']

for ind in np.arange(zipcodes_grouped.shape[0]):
    zipcodes_venues_sorted.iloc[ind, 1:] = return_most_common_venues(zipcodes_grouped.iloc[ind, :], num_top_venues)

zipcodes_venues_sorted.head()

Unnamed: 0,Zipcode,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,75201,Food Truck,American Restaurant,Sandwich Place,New American Restaurant,BBQ Joint
1,75202,Sandwich Place,Asian Restaurant,Fried Chicken Joint,Fast Food Restaurant,American Restaurant
2,75204,Fast Food Restaurant,Wings Joint,Chinese Restaurant,Food Court,Food
3,75205,Food,Wings Joint,Chinese Restaurant,Food Court,Fast Food Restaurant
4,75206,Mexican Restaurant,French Restaurant,Restaurant,Donut Shop,Café


## Cluster Zipcodes

Run k-means to cluster the zipcodes into 5 clusters.

In [27]:
# set number of clusters
kclusters = 5

dallas_grouped_clustering = zipcodes_grouped.drop('Zipcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, init='k-means++', random_state=0, n_init = 10).fit(dallas_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20] 

array([0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 2, 0, 3, 0, 2, 0],
      dtype=int32)

In [28]:
# add clustering labels
zipcodes_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dallas_merged = dallas_data

# merge Dallas_grouped with Dallas_data to add latitude/longitude for each Zipcode
dallas_merged = dallas_merged.join(zipcodes_venues_sorted.set_index('Zipcode'), on='Zipcodes')

dallas_merged.head() # check the last columns!

Unnamed: 0,Zipcodes,City,State,Latitude,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,75294,Dallas,TX,32.767268,-96.777626,,,,,,
1,75255,Dallas,TX,32.669783,-96.614921,4.0,Deli / Bodega,Wings Joint,Chinese Restaurant,Food Court,Food
2,75374,Dallas,TX,32.767268,-96.777626,,,,,,
3,75252,Dallas,TX,32.998132,-96.79088,,,,,,
4,75275,Dallas,TX,32.767268,-96.777626,,,,,,


## Visualizing the resulting clusters.

### Need to cleanup NaN values

In [29]:
dallas_drop=dallas_merged #create a new data frame so that dallas_merged stays unchanged
print(dallas_drop)

dallas_fin = dallas_drop.dropna()
print(dallas_fin)

     Zipcodes    City State   Latitude  Longitude  Cluster Labels  \
0       75294  Dallas    TX  32.767268 -96.777626             NaN   
1       75255  Dallas    TX  32.669783 -96.614921             4.0   
2       75374  Dallas    TX  32.767268 -96.777626             NaN   
3       75252  Dallas    TX  32.998132 -96.790880             NaN   
4       75275  Dallas    TX  32.767268 -96.777626             NaN   
..        ...     ...   ...        ...        ...             ...   
117     75372  Dallas    TX  32.767268 -96.777626             NaN   
118     75395  Dallas    TX  32.767268 -96.777626             NaN   
119     75391  Dallas    TX  32.767268 -96.777626             NaN   
120     75203  Dallas    TX  32.745831 -96.806720             NaN   
121     75266  Dallas    TX  32.767268 -96.777626             NaN   

    1th Most Common Venue 2th Most Common Venue 3th Most Common Venue  \
0                     NaN                   NaN                   NaN   
1           Deli / Bodega

In [30]:
dallas_fin.head()

Unnamed: 0,Zipcodes,City,State,Latitude,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,75255,Dallas,TX,32.669783,-96.614921,4.0,Deli / Bodega,Wings Joint,Chinese Restaurant,Food Court,Food
5,75202,Dallas,TX,32.77988,-96.80502,0.0,Sandwich Place,Asian Restaurant,Fried Chicken Joint,Fast Food Restaurant,American Restaurant
10,75270,Dallas,TX,32.78133,-96.80198,0.0,Sandwich Place,Mexican Restaurant,Salad Place,Fast Food Restaurant,Fried Chicken Joint
11,75220,Dallas,TX,32.867977,-96.86306,0.0,Pizza Place,Food,Bakery,Mexican Restaurant,Chinese Restaurant
12,75234,Dallas,TX,32.925975,-96.88322,0.0,Breakfast Spot,Mexican Restaurant,American Restaurant,Fast Food Restaurant,Donut Shop


## Get longitude and latitude of Irving,TX (Location of client's home)

In [31]:
address = 'Irving, Texas'

geolocator = Nominatim(user_agent="To_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Irving, Texas, USA is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Irving, Texas, USA is 32.8295183, -96.9442177.


In [32]:
# Got error "TypeError: list indices must be integers or slices, not float"
dallas_fin['Cluster Labels'] = dallas_fin['Cluster Labels'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [33]:
#help(folium.Icon)

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Irving Longitude and Latitude with 5 km radius around client's home.
lat = 32.8295183
lon = -96.9442177

folium.Marker([lat, lon]).add_to(map_clusters)
folium.Circle([lat, lon],
                    radius=5000,
                    color='blue',
                    weight=1,
                    fill=True,
                    fill_color='blue',
                    fill_opacity=0.03,
                   ).add_to(map_clusters)    

folium.Marker([lat, lon]).add_to(map_clusters)
folium.Circle([lat, lon],
                    radius=10000,
                   
                    color='blue',
                    weight=1,
                    fill=True,
                    fill_color='blue',
                    fill_opacity=0.03,
                   ).add_to(map_clusters)  

folium.Marker([lat, lon]).add_to(map_clusters)
folium.Circle([lat, lon],
                    radius=15000,
                    color='blue',
                    weight=1,
                    fill=True,
                    fill_color='blue',
                    fill_opacity=0.03,
                   ).add_to(map_clusters)  

folium.Marker([lat, lon]).add_to(map_clusters)
folium.Circle([lat, lon],
                    radius=20000,
                    color='blue',
                    weight=1,
                    fill=True,
                    fill_color='red',
                    fill_opacity=0.03,
                   ).add_to(map_clusters)  

folium.Marker([lat, lon]).add_to(map_clusters)
folium.Circle([lat, lon],
                    radius=25000,
                    popup=folium.Popup('5km radius increment', parse_html=True),
                    color='blue',
                    weight=1,
                    fill=True,
                    fill_color='red',
                    fill_opacity=0.03,
                   ).add_to(map_clusters)  

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dallas_fin['Latitude'], dallas_fin['Longitude'], dallas_fin['Zipcodes'], dallas_fin['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.9).add_to(map_clusters)

map_clusters

 ## Examine Clusters

### Examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

In [35]:
dall_cat0=dallas_fin.loc[dallas_fin['Cluster Labels'] == 0, dallas_fin.columns[[0] + list(range(5, dallas_fin.shape[1]))]]
dall_cat0

Unnamed: 0,Zipcodes,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,75202,0,Sandwich Place,Asian Restaurant,Fried Chicken Joint,Fast Food Restaurant,American Restaurant
10,75270,0,Sandwich Place,Mexican Restaurant,Salad Place,Fast Food Restaurant,Fried Chicken Joint
11,75220,0,Pizza Place,Food,Bakery,Mexican Restaurant,Chinese Restaurant
12,75234,0,Breakfast Spot,Mexican Restaurant,American Restaurant,Fast Food Restaurant,Donut Shop
22,75246,0,American Restaurant,Food,New American Restaurant,Chinese Restaurant,Fast Food Restaurant
27,75238,0,Breakfast Spot,Wings Joint,Chinese Restaurant,Food Court,Food
31,75207,0,American Restaurant,Café,Restaurant,BBQ Joint,Bakery
34,75223,0,Mexican Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Taco Place,Fried Chicken Joint
35,75287,0,Sushi Restaurant,Fried Chicken Joint,Food,Wings Joint,Chinese Restaurant
38,75232,0,Pizza Place,Fried Chicken Joint,Donut Shop,Sandwich Place,Cajun / Creole Restaurant


In [36]:
dall_cat1=dallas_fin.loc[dallas_fin['Cluster Labels'] == 1, dallas_fin.columns[[0] + list(range(5, dallas_fin.shape[1]))]]
dall_cat1

Unnamed: 0,Zipcodes,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
92,75217,1,New American Restaurant,Wings Joint,Chinese Restaurant,Food,Fast Food Restaurant


In [37]:
dall_cat2=dallas_fin.loc[dallas_fin['Cluster Labels'] == 2, dallas_fin.columns[[0] + list(range(5, dallas_fin.shape[1]))]]
dall_cat2

Unnamed: 0,Zipcodes,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
99,75233,2,Mexican Restaurant,Wings Joint,Chinese Restaurant,Food,Fast Food Restaurant
103,75227,2,Mexican Restaurant,Wings Joint,Chinese Restaurant,Food,Fast Food Restaurant
115,75211,2,Mexican Restaurant,Wings Joint,Chinese Restaurant,Food,Fast Food Restaurant


In [38]:
dall_cat3=dallas_fin.loc[dallas_fin['Cluster Labels'] == 3, dallas_fin.columns[[0] + list(range(5, dallas_fin.shape[1]))]]
dall_cat3

Unnamed: 0,Zipcodes,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
70,75229,3,Taco Place,Wings Joint,Cajun / Creole Restaurant,Food,Fast Food Restaurant


In [39]:
dall_cat4=dallas_fin.loc[dallas_fin['Cluster Labels'] == 4, dallas_fin.columns[[0] + list(range(5, dallas_fin.shape[1]))]]
dall_cat4

Unnamed: 0,Zipcodes,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,75255,4,Deli / Bodega,Wings Joint,Chinese Restaurant,Food Court,Food
