# Description of the Problem and Discussion of the Background (Introduction Section)

### North Dallas-Fort Worth Metroplex, TX relocation and restuarant prospects

Per the [Dallas Cultural Map](https://dallas.culturemap.com/news/city-life/01-09-20-dfw-lead-population-growth-2020-2029-cushman-wakefield/), following a decade of eye-popping population growth, Dallas-Fort-Worth is expected in this decade to once again lead the nation’s metro areas for the number of new residents.   
New data from commercial real estate services company Cushman & Wakefield shows DFW gained 1,349,378 residents from 2010 through 2019. In terms of the number of new residents tallied during the past decade, DFW ranked first among U.S. metro areas, the data indicates.   
From 2020 through 2029, DFW is projected to tack on another 1,393,623 residents, Cushman & Wakefield says. For the second decade in a row, that would be the highest number of new residents for any metro area, the company says.   
Also per [bizournals](https://www.bizjournals.com/dallas/news/2019/11/21/study-660-companies-moving-facilities-out-of.html), some 660 companies moved 765 facilities out of California in the past two years, and Dallas-Fort Worth has been the beneficiary of many of the relocations, according to a new report. The departures from the Golden State between January 2018 and now involve corporate headquarters, manufacturing facilities, data centers, research hubs, software and engineering centers and a few warehouses.

With all this information at hand a team is looking for a good location in the North Dallas-Fort-Worth Metroplex to setup their restaurant where they can make the most profit

# Target audience for this analyses

  * Business men looking at capitalizing on the growth of the North Dallas region to open restaurants or other business
  * People looking homes near the North Dallas business district who would like to leave near areas with lots of restaurants

# A description of the data and how it will be used to solve the problem. 

Two data sources where used for this analyses, first data source is made up of all the zip codes, zip code names and population in the Collin and Dnton counties which are both located in north Dallas where most of the migration has taken place and also where majurity of the companies that moved to the Dallas-Firt-Worth Metroplex are located. 
The second data source contains the zip codes, zip code names and both latitude and longitude of all zip codes.
Both data sources where merged on the zip code and zip code names.

**Obtain the Data and analysing the neighborhoods**

  * Pandas will be used to scrap the data of the counties from [zipdatamaps](https://www.zipdatamaps.com/list-of-zip-codes-in-texas.php)
  * The csv for the second data source will be downloaded from [zipdatamaps](https://www.zipdatamaps.com/list-of-zip-codes-in-texas.php)
  * Two counties (Collin and Denton) will be selected for the analyses
  * Use Foresquare Data to obtain info about restaurants
  * Data Visualization and Statistical Analysis
  * Analysis Using Clustering, Specially K-Means Clustering
    - Maximize the number of clusters.
    - Visualization using Chloropleth Map
  * Compare the Neighborhoods to Find the Best Place for Starting up a Restaurant

# Data preparation

### Import necessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


**Use panda to get data from [zipdatamaps](https://www.zipdatamaps.com/list-of-zip-codes-in-texas.php)**

We will create 2 dataframes then concatenate them in one main dataframe which will contain all the zipcodes for both counties

In [2]:
collin_county_df = pd.read_html("https://www.zipdatamaps.com/collin-tx-county-zipcodes")
denton_county_df = pd.read_html("https://www.zipdatamaps.com/denton-tx-county-zipcodes")
collin_county_df = collin_county_df[1]
collin_county_df.columns = ['ZIP Code', 'ZIP Code Name', 'Population', 'Type']
denton_county_df = denton_county_df[1]
denton_county_df.columns = ['ZIP Code', 'ZIP Code Name', 'Population', 'Type']

In [3]:
collin_county_df.head()

Unnamed: 0,ZIP Code,ZIP Code Name,Population,Type
0,75002.0,Allen,63140.0,Non-Unique
1,75009.0,Celina,8785.0,Non-Unique
2,75013.0,Allen,30347.0,Non-Unique
3,75023.0,Plano,45452.0,Non-Unique
4,75024.0,Plano,36039.0,Non-Unique


In [4]:
denton_county_df.head()

Unnamed: 0,ZIP Code,ZIP Code Name,Population,Type
0,75007.0,Carrollton,51624.0,Non-Unique
1,75009.0,Celina,8785.0,Non-Unique
2,75010.0,Carrollton,21607.0,Non-Unique
3,75019.0,Coppell,38666.0,Non-Unique
4,75022.0,Flower Mound,22545.0,Non-Unique


Concatenating both dataframes

In [5]:
frames = [collin_county_df, denton_county_df]

In [6]:
north_dfw_df = pd.concat(frames, ignore_index=True)

Removing duplicate zip codes

In [7]:
north_dfw_df.drop_duplicates(subset=['ZIP Code'], inplace=True)

In [8]:
north_dfw_df.dropna(inplace=True)

To reduce how much memory we would use later, we will need to drop all zip codes with zero population as they will definitely not be needed for our analyses.

In [9]:
north_dfw_df = north_dfw_df[north_dfw_df['Population'] != 0]

We will be droping column 'Type' as it wont be need for this project 

In [10]:
north_dfw_df.drop(columns=['Type'], axis=1, inplace=True)

In [11]:
north_dfw_df.head()

Unnamed: 0,ZIP Code,ZIP Code Name,Population
0,75002.0,Allen,63140.0
1,75009.0,Celina,8785.0
2,75013.0,Allen,30347.0
3,75023.0,Plano,45452.0
4,75024.0,Plano,36039.0


### Get zip code data for Dallas-Fort-Worth from [opendatasoft](https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=TX)

In [12]:
zip_df = pd.read_csv('us-zip-code-latitude-and-longitude.csv', sep=';', usecols=['Zip','City', 'State', 'Latitude', 'Longitude'])

In [13]:
zip_df = zip_df.rename(columns={'Zip': 'ZIP Code', 'City': 'ZIP Code Name'})

In [14]:
zip_df.head()

Unnamed: 0,ZIP Code,ZIP Code Name,State,Latitude,Longitude
0,75475,Randolph,TX,33.485315,-96.25525
1,75757,Bullard,TX,32.136787,-95.3671
2,78650,McDade,TX,30.283941,-97.23563
3,75010,Carrollton,TX,33.030556,-96.89328
4,76054,Hurst,TX,32.858398,-97.17681


In [15]:
north_dfw_df['ZIP Code'] = north_dfw_df['ZIP Code'].astype('int32')

In [16]:
north_dfw_df = pd.merge(left=north_dfw_df, right=zip_df, on=['ZIP Code', 'ZIP Code Name'])
north_dfw_df

Unnamed: 0,ZIP Code,ZIP Code Name,Population,State,Latitude,Longitude
0,75002,Allen,63140.0,TX,33.092846,-96.62447
1,75009,Celina,8785.0,TX,33.327927,-96.76129
2,75013,Allen,30347.0,TX,33.106582,-96.69402
3,75023,Plano,45452.0,TX,33.054671,-96.73506
4,75024,Plano,36039.0,TX,33.07707,-96.79859
5,75025,Plano,50926.0,TX,33.086868,-96.74504
6,75034,Frisco,72723.0,TX,33.143792,-96.83938
7,75035,Frisco,47553.0,TX,33.130086,-96.78177
8,75069,McKinney,34108.0,TX,33.195073,-96.60363
9,75070,McKinney,74734.0,TX,33.212203,-96.67522


We would select only the cities within North Dallas thats located close to the business district in Plano

In [17]:
north_dfw_df = north_dfw_df.loc[north_dfw_df['ZIP Code Name'].isin(['Plano', 'Frisco', 'Richardson', 'Allen', 'The Colony', 'Carrollton', 'McKinney'])]

In [18]:
north_dfw_df.reset_index(drop=True, inplace=True)

In [19]:
north_dfw_df.shape

(18, 6)

In [20]:
north_dfw_df

Unnamed: 0,ZIP Code,ZIP Code Name,Population,State,Latitude,Longitude
0,75002,Allen,63140.0,TX,33.092846,-96.62447
1,75013,Allen,30347.0,TX,33.106582,-96.69402
2,75023,Plano,45452.0,TX,33.054671,-96.73506
3,75024,Plano,36039.0,TX,33.07707,-96.79859
4,75025,Plano,50926.0,TX,33.086868,-96.74504
5,75034,Frisco,72723.0,TX,33.143792,-96.83938
6,75035,Frisco,47553.0,TX,33.130086,-96.78177
7,75069,McKinney,34108.0,TX,33.195073,-96.60363
8,75070,McKinney,74734.0,TX,33.212203,-96.67522
9,75074,Plano,44622.0,TX,33.028921,-96.68102


Now we get the latitude and longtitude of Plano, Texas which is the business destrict of North Dallas

In [21]:
geolocator = Nominatim(user_agent="dfw_explorer")
address = 'Plano, TX'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dallas-Fort-Worth are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dallas-Fort-Worth are 33.0136764, -96.6925096.


Use folium to show the zip codes we want to explore

In [22]:
# create map of Dallas-Fort-Worth using latitude and longitude values
map_dfw = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(north_dfw_df['Latitude'], north_dfw_df['Longitude'], north_dfw_df['ZIP Code Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dfw)  
    
map_dfw

**Define Foursquare Credentials and Version**

In [49]:
CLIENT_ID = 'your-client-ID' # your Foursquare ID
CLIENT_SECRET = 'your-client-secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: your-client-ID
CLIENT_SECRET:your-client-secret


**Explore zip codes in North Dallas-Fort-Worth, Texas**

Get all venues within 10miles from the zip code

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=30000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['ZIP Code Name', 
                  'ZIP Code Latitude', 
                  'ZIP Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [25]:
north_dfw_venues = getNearbyVenues(names=north_dfw_df['ZIP Code Name'],
                                   latitudes=north_dfw_df['Latitude'],
                                   longitudes=north_dfw_df['Longitude']
                                  )

Allen
Allen
Plano
Plano
Plano
Frisco
Frisco
McKinney
McKinney
Plano
Plano
Richardson
Richardson
Plano
Plano
Carrollton
Carrollton
The Colony


In [26]:
# Create a Data-Frame out of it to Concentrate Only on Restaurants 

north_dfw_Venues_only_restaurant = north_dfw_venues[north_dfw_venues['Venue Category']\
                                                          .str.contains('Restaurant')].reset_index(drop=True)
north_dfw_Venues_only_restaurant.index = np.arange(1, len(north_dfw_Venues_only_restaurant)+1)
print ("Shape of the Data-Frame with Venue Category only Restaurant: ", north_dfw_Venues_only_restaurant.shape)
north_dfw_Venues_only_restaurant.head()

Shape of the Data-Frame with Venue Category only Restaurant:  (468, 7)


Unnamed: 0,ZIP Code Name,ZIP Code Latitude,ZIP Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Allen,33.092846,-96.62447,Chick-fil-A,33.129069,-96.650699,Fast Food Restaurant
2,Allen,33.092846,-96.62447,Pho Crystal Vietnamese Cuisine,33.130105,-96.643033,Vietnamese Restaurant
3,Allen,33.092846,-96.62447,Mio Nonno Wood Fire Pizza,33.129612,-96.674279,Italian Restaurant
4,Allen,33.092846,-96.62447,Black Walnut Café - Allen,33.129348,-96.675808,American Restaurant
5,Allen,33.092846,-96.62447,Yanni's Greek Cafe,33.011051,-96.61011,Greek Restaurant


In [27]:
## Show in Map the Top Rated Restaruants in the Top 5 Districts

map_restaurants = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the Venues based on the Major Districts
Districts = ['Plano', 'Frisco', 'Richardson', 'Allen', 'The Colony', 'Carrollton', 'McKinney']

x = np.arange(len(Districts))

rainbow = ['#00ff00', '#ff00ff','#0000ff','#ffa500' ,'#ff0000', '#000000', '#ffffff']

# add markers to the map
# markers_colors = []
for lat, lon, poi, distr in zip(north_dfw_Venues_only_restaurant['Venue Latitude'], 
                                  north_dfw_Venues_only_restaurant['Venue Longitude'], 
                                  north_dfw_Venues_only_restaurant['Venue Category'], 
                                  north_dfw_Venues_only_restaurant['ZIP Code Name']):
    label = folium.Popup(str(poi) + ' ' + str(distr), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[Districts.index(distr)-1],
        fill=True,
        fill_color=rainbow[Districts.index(distr)-1],
        fill_opacity=0.3).add_to(map_restaurants)
       
map_restaurants

In [28]:
north_dfw_Venues_only_restaurant.head()

Unnamed: 0,ZIP Code Name,ZIP Code Latitude,ZIP Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Allen,33.092846,-96.62447,Chick-fil-A,33.129069,-96.650699,Fast Food Restaurant
2,Allen,33.092846,-96.62447,Pho Crystal Vietnamese Cuisine,33.130105,-96.643033,Vietnamese Restaurant
3,Allen,33.092846,-96.62447,Mio Nonno Wood Fire Pizza,33.129612,-96.674279,Italian Restaurant
4,Allen,33.092846,-96.62447,Black Walnut Café - Allen,33.129348,-96.675808,American Restaurant
5,Allen,33.092846,-96.62447,Yanni's Greek Cafe,33.011051,-96.61011,Greek Restaurant


In [29]:
north_dfw_Venues_only_restaurant.groupby('ZIP Code Name').count()

Unnamed: 0_level_0,ZIP Code Latitude,ZIP Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
ZIP Code Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allen,50,50,50,50,50,50
Carrollton,53,53,53,53,53,53
Frisco,55,55,55,55,55,55
McKinney,48,48,48,48,48,48
Plano,185,185,185,185,185,185
Richardson,52,52,52,52,52,52
The Colony,25,25,25,25,25,25


In [30]:
print('There are {} uniques categories.'.format(len(north_dfw_Venues_only_restaurant['Venue Category'].unique())))

There are 20 uniques categories.


## Analyze Each Neighborhood

In [31]:
# one hot encoding
north_dfw_onehot = pd.get_dummies(north_dfw_Venues_only_restaurant[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
north_dfw_onehot['ZIP Code Name'] = north_dfw_Venues_only_restaurant['ZIP Code Name'] 

# move neighborhood column to the first column
col = north_dfw_onehot.pop("ZIP Code Name")
north_dfw_onehot.insert(0, col.name, col)

north_dfw_onehot.head()

Unnamed: 0,ZIP Code Name,American Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,New American Restaurant,Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Sushi Restaurant,Tex-Mex Restaurant,Thai Restaurant,Vietnamese Restaurant
1,Allen,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Allen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,Allen,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Allen,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Allen,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Next, let's group rows by zip code name and by taking the mean of the frequency of occurrence of each category**

In [32]:
north_dfw_grouped = north_dfw_onehot.groupby('ZIP Code Name').mean().reset_index()
north_dfw_grouped

Unnamed: 0,ZIP Code Name,American Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,New American Restaurant,Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Sushi Restaurant,Tex-Mex Restaurant,Thai Restaurant,Vietnamese Restaurant
0,Allen,0.06,0.04,0.04,0.0,0.18,0.08,0.12,0.0,0.04,0.0,0.04,0.16,0.02,0.0,0.04,0.02,0.1,0.0,0.02,0.04
1,Carrollton,0.075472,0.037736,0.0,0.0,0.150943,0.037736,0.113208,0.037736,0.0,0.018868,0.075472,0.150943,0.037736,0.056604,0.018868,0.037736,0.075472,0.037736,0.037736,0.0
2,Frisco,0.072727,0.036364,0.0,0.0,0.218182,0.072727,0.181818,0.0,0.036364,0.0,0.036364,0.163636,0.036364,0.0,0.0,0.036364,0.090909,0.0,0.0,0.018182
3,McKinney,0.083333,0.041667,0.041667,0.041667,0.208333,0.0625,0.104167,0.0,0.041667,0.0,0.041667,0.166667,0.0,0.0,0.020833,0.041667,0.020833,0.0,0.041667,0.041667
4,Plano,0.07027,0.037838,0.005405,0.0,0.183784,0.064865,0.140541,0.021622,0.0,0.0,0.075676,0.075676,0.032432,0.032432,0.059459,0.027027,0.075676,0.032432,0.037838,0.027027
5,Richardson,0.115385,0.038462,0.0,0.0,0.134615,0.096154,0.057692,0.038462,0.0,0.038462,0.076923,0.019231,0.057692,0.057692,0.076923,0.0,0.076923,0.038462,0.057692,0.019231
6,The Colony,0.08,0.04,0.0,0.0,0.16,0.04,0.16,0.0,0.0,0.0,0.08,0.08,0.04,0.04,0.04,0.04,0.12,0.04,0.04,0.0


Function to sort the venues in descending order

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

New dataframe to display the top 10 venues for each neighborhood

In [34]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['ZIP Code Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
zip_code_name_venues_sorted = pd.DataFrame(columns=columns)
zip_code_name_venues_sorted['ZIP Code Name'] = north_dfw_grouped['ZIP Code Name']

for ind in np.arange(north_dfw_grouped.shape[0]):
    zip_code_name_venues_sorted.iloc[ind, 1:] = return_most_common_venues(north_dfw_grouped.iloc[ind, :], num_top_venues)

zip_code_name_venues_sorted

Unnamed: 0,ZIP Code Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allen,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,Sushi Restaurant,Greek Restaurant,American Restaurant,Korean Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Vietnamese Restaurant
1,Carrollton,Mexican Restaurant,Fast Food Restaurant,Italian Restaurant,American Restaurant,Sushi Restaurant,Mediterranean Restaurant,Restaurant,Greek Restaurant,Tex-Mex Restaurant,Southern / Soul Food Restaurant
2,Frisco,Fast Food Restaurant,Italian Restaurant,Mexican Restaurant,Sushi Restaurant,American Restaurant,Greek Restaurant,Korean Restaurant,Brazilian Restaurant,Southern / Soul Food Restaurant,New American Restaurant
3,McKinney,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,American Restaurant,Greek Restaurant,Thai Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Korean Restaurant
4,Plano,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
5,Richardson,Fast Food Restaurant,American Restaurant,Greek Restaurant,Sushi Restaurant,Seafood Restaurant,Mediterranean Restaurant,Restaurant,New American Restaurant,Thai Restaurant,Italian Restaurant
6,The Colony,Italian Restaurant,Fast Food Restaurant,Sushi Restaurant,American Restaurant,Mexican Restaurant,Mediterranean Restaurant,Tex-Mex Restaurant,Southern / Soul Food Restaurant,Seafood Restaurant,Restaurant


Cluster Neighborhoods

In [35]:
# set number of clusters
kclusters = 7

north_dfw_grouped_clustering = north_dfw_grouped.drop('ZIP Code Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(north_dfw_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([5, 3, 1, 4, 0, 2, 6], dtype=int32)

In [36]:
# add clustering labels
zip_code_name_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

north_dfw_merged = north_dfw_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
north_dfw_merged = north_dfw_merged.join(zip_code_name_venues_sorted.set_index('ZIP Code Name'), on='ZIP Code Name')

north_dfw_merged # check the last columns!

Unnamed: 0,ZIP Code,ZIP Code Name,Population,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75002,Allen,63140.0,TX,33.092846,-96.62447,5,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,Sushi Restaurant,Greek Restaurant,American Restaurant,Korean Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Vietnamese Restaurant
1,75013,Allen,30347.0,TX,33.106582,-96.69402,5,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,Sushi Restaurant,Greek Restaurant,American Restaurant,Korean Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Vietnamese Restaurant
2,75023,Plano,45452.0,TX,33.054671,-96.73506,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
3,75024,Plano,36039.0,TX,33.07707,-96.79859,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
4,75025,Plano,50926.0,TX,33.086868,-96.74504,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
5,75034,Frisco,72723.0,TX,33.143792,-96.83938,1,Fast Food Restaurant,Italian Restaurant,Mexican Restaurant,Sushi Restaurant,American Restaurant,Greek Restaurant,Korean Restaurant,Brazilian Restaurant,Southern / Soul Food Restaurant,New American Restaurant
6,75035,Frisco,47553.0,TX,33.130086,-96.78177,1,Fast Food Restaurant,Italian Restaurant,Mexican Restaurant,Sushi Restaurant,American Restaurant,Greek Restaurant,Korean Restaurant,Brazilian Restaurant,Southern / Soul Food Restaurant,New American Restaurant
7,75069,McKinney,34108.0,TX,33.195073,-96.60363,4,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,American Restaurant,Greek Restaurant,Thai Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Korean Restaurant
8,75070,McKinney,74734.0,TX,33.212203,-96.67522,4,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,American Restaurant,Greek Restaurant,Thai Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Korean Restaurant
9,75074,Plano,44622.0,TX,33.028921,-96.68102,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant


In [37]:
north_dfw_merged.dropna(inplace=True)

In [38]:
north_dfw_merged = north_dfw_merged.astype({'Cluster Labels': 'int32'})

Visualize the clusters

In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(north_dfw_merged['Latitude'], north_dfw_merged['Longitude'], north_dfw_merged['ZIP Code Name'], north_dfw_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [40]:
#try groupby to play with restaurant 
dfw_5_zip_Venues_restaurant = north_dfw_Venues_only_restaurant.groupby(['ZIP Code Name'])['Venue Category'].apply(lambda x: x[x.str.contains('Restaurant')].count())
dfw_5_zip_Venues_restaurant_df = dfw_5_zip_Venues_restaurant.to_frame().reset_index()
dfw_5_zip_Venues_restaurant_df.columns = ['ZIP Code Name', 'Number of Restaurant']

dfw_5_zip_Venues_restaurant_df.index = np.arange(1, len(dfw_5_zip_Venues_restaurant_df)+1)

list_rest_no = dfw_5_zip_Venues_restaurant_df['Number of Restaurant'].to_list()
print (list_rest_no)

[50, 53, 55, 48, 185, 52, 25]


**Map showing radius of the clusters representing number of restaurants in each zip code**

In [41]:

# create map
map_restaurants10 = folium.Map(location=[latitude, longitude])

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
#rainbow = ['#00ff00', '#ff00ff','#0000ff','#ffa500' ,'#ff0000']
Districts = ['Plano', 'Frisco', 'Richardson', 'Allen', 'The Colony', 'Carrollton', 'McKinney']

# add markers to the map
for lat, lon, poi, cluster in zip(north_dfw_merged['Latitude'], north_dfw_merged['Longitude'], north_dfw_merged['ZIP Code Name'], north_dfw_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=list_rest_no[Districts.index(poi)]*0.5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_restaurants10)
       
map_restaurants10

### Examine Clusters

In [42]:
north_dfw_merged.loc[north_dfw_merged['Cluster Labels'] == 0, north_dfw_merged.columns[[1] + list(range(5, north_dfw_merged.shape[1]))]]

Unnamed: 0,ZIP Code Name,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Plano,-96.73506,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
3,Plano,-96.79859,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
4,Plano,-96.74504,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
9,Plano,-96.68102,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
10,Plano,-96.74038,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
13,Plano,-96.80492,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant
14,Plano,-96.61113,0,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Mediterranean Restaurant,American Restaurant,Greek Restaurant,Seafood Restaurant,Brazilian Restaurant,Thai Restaurant


In [43]:
north_dfw_merged.loc[north_dfw_merged['Cluster Labels'] == 1, north_dfw_merged.columns[[1] + list(range(5, north_dfw_merged.shape[1]))]]

Unnamed: 0,ZIP Code Name,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Frisco,-96.83938,1,Fast Food Restaurant,Italian Restaurant,Mexican Restaurant,Sushi Restaurant,American Restaurant,Greek Restaurant,Korean Restaurant,Brazilian Restaurant,Southern / Soul Food Restaurant,New American Restaurant
6,Frisco,-96.78177,1,Fast Food Restaurant,Italian Restaurant,Mexican Restaurant,Sushi Restaurant,American Restaurant,Greek Restaurant,Korean Restaurant,Brazilian Restaurant,Southern / Soul Food Restaurant,New American Restaurant


In [44]:
north_dfw_merged.loc[north_dfw_merged['Cluster Labels'] == 2, north_dfw_merged.columns[[1] + list(range(5, north_dfw_merged.shape[1]))]]

Unnamed: 0,ZIP Code Name,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Richardson,-96.74093,2,Fast Food Restaurant,American Restaurant,Greek Restaurant,Sushi Restaurant,Seafood Restaurant,Mediterranean Restaurant,Restaurant,New American Restaurant,Thai Restaurant,Italian Restaurant
12,Richardson,-96.65901,2,Fast Food Restaurant,American Restaurant,Greek Restaurant,Sushi Restaurant,Seafood Restaurant,Mediterranean Restaurant,Restaurant,New American Restaurant,Thai Restaurant,Italian Restaurant


In [45]:
north_dfw_merged.loc[north_dfw_merged['Cluster Labels'] == 3, north_dfw_merged.columns[[1] + list(range(5, north_dfw_merged.shape[1]))]]

Unnamed: 0,ZIP Code Name,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Carrollton,-96.89773,3,Mexican Restaurant,Fast Food Restaurant,Italian Restaurant,American Restaurant,Sushi Restaurant,Mediterranean Restaurant,Restaurant,Greek Restaurant,Tex-Mex Restaurant,Southern / Soul Food Restaurant
16,Carrollton,-96.89328,3,Mexican Restaurant,Fast Food Restaurant,Italian Restaurant,American Restaurant,Sushi Restaurant,Mediterranean Restaurant,Restaurant,Greek Restaurant,Tex-Mex Restaurant,Southern / Soul Food Restaurant


In [46]:
north_dfw_merged.loc[north_dfw_merged['Cluster Labels'] == 4, north_dfw_merged.columns[[1] + list(range(5, north_dfw_merged.shape[1]))]]

Unnamed: 0,ZIP Code Name,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,McKinney,-96.60363,4,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,American Restaurant,Greek Restaurant,Thai Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Korean Restaurant
8,McKinney,-96.67522,4,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,American Restaurant,Greek Restaurant,Thai Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Korean Restaurant


In [47]:
north_dfw_merged.loc[north_dfw_merged['Cluster Labels'] == 5, north_dfw_merged.columns[[1] + list(range(5, north_dfw_merged.shape[1]))]]

Unnamed: 0,ZIP Code Name,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allen,-96.62447,5,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,Sushi Restaurant,Greek Restaurant,American Restaurant,Korean Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Vietnamese Restaurant
1,Allen,-96.69402,5,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,Sushi Restaurant,Greek Restaurant,American Restaurant,Korean Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Vietnamese Restaurant


In [48]:
north_dfw_merged.loc[north_dfw_merged['Cluster Labels'] == 6, north_dfw_merged.columns[[1] + list(range(5, north_dfw_merged.shape[1]))]]

Unnamed: 0,ZIP Code Name,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,The Colony,-96.88957,6,Italian Restaurant,Fast Food Restaurant,Sushi Restaurant,American Restaurant,Mexican Restaurant,Mediterranean Restaurant,Tex-Mex Restaurant,Southern / Soul Food Restaurant,Seafood Restaurant,Restaurant
