# Dance Studio Placement in Atlanta, GA

This notebook follows the report "Dance Studio Placement" also included in this repository.  The aim of this project is to find locations to start a dance studio that would be beneficial for students, parents, the studio owner, and studio staff.  The included descriptions in this notebook are an abbreviated version of what is contained in the PDF report.

## 1) Import Libraries and functions

In [1]:
#Import libraries

import numpy as np  #library for taking care of data vectors and arrays

import pandas as pd  #data analysis library
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # make json file into a pandas dataframe

import json #json library

import requests #request handling

import matplotlib.cm as cm #plotting libraries
import matplotlib.colors as colors

import folium #map making library

import urllib.request #used for web request handling
from bs4 import BeautifulSoup #used for scraping the webpage
import html5lib #used for reading html

import geopy 
from geopy.geocoders import Nominatim #used to convert an address into latitude and longitude

from time import sleep #used to prevent timing out the geocoder

#Libraries for DBSCAN
import pandas as pd, numpy as np, matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint

from haversine import haversine, Unit

print('All done!')

All done!


## 2) Gather Data

For this analysis, the best place to start a dance studio for all parties involved will be a location that:

1. Avoids competition with existing studios (not near existing studios)
2. Has a good student base (near public schools)
3. Has potential dance teachers in the area (near universities and colleges)
4. Is conveneient for parents (near schools and grocery stores)

Data from Foursquare will be used to locate existing studios, universities, and grocery stores.  

Data from the http://georgia.educationbug.org/public-schools/county-fulton.html will be used for locations of schools.

Data from https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Georgia_(U.S._state) will be used to find Universities.

### 2.1) Atlanta Zip Codes

Having a list of zip codes around the greater Atlanta area will allow us to constrain our search and inform which universities and schools we consider.

In [2]:
#Raw web scrape

url_zc = 'https://www.realsourcebrokers.com/atlanta-zip-code-map/'
response_zc = requests.get(url_zc)
soup_zc = BeautifulSoup(response_zc.content, 'lxml')

In [3]:
#Create lists with all zip codes and neighborhood names
zip_paragraph = soup_zc.find_all('p')

zip_code = []  #empty lists to fill
neighborhoods = []

#Create lists with zip codes and neighborhoods
for x in zip_paragraph :  
    row = x.get_text()
    zip_code.append( row[0:5] )
    nhood = row.replace( row[0:8], '').replace('Real Estate','')
    neighborhoods.append( nhood )
    
    
#Delete description paragraphs
del neighborhoods[0:2]  
del zip_code[0:2]


#Organize lists into data frame
atldf = pd.DataFrame({'Zip Code':zip_code, 'Neighborhood': neighborhoods})

In [4]:
#Add latitude and longitude to zip code dataframe

geolocator = Nominatim(user_agent='foursquare_agent') #setup geolocator to find coordinates

lat_zc = [] #empty lists to fill
long_zc = []

#Create lists of latitude and longitude
for x in atldf['Zip Code'] :
    zc = x + ' , GA'
    location = geolocator.geocode(zc)
    lat_zc.append(location.latitude)
    long_zc.append(location.longitude)
    sleep(1)

#Add lat and long to dataframe
atldf['Latitude'] = lat_zc
atldf['Longitude'] = long_zc

#Preview dataframe
atldf.head() 

Unnamed: 0,Zip Code,Neighborhood,Latitude,Longitude
0,30002,Avondale Estates,33.779842,-84.257246
1,30021,Clarkston,33.818102,-84.231757
2,30030,City of Decatur - Oakhurst - Winnona Park,33.764085,-84.31239
3,30032,East Lake - South Decatur,33.742619,-84.265073
4,30033,North Decatur,33.823763,-84.284779


In [5]:
#Make a map of atlanta to get a feel for the area we're working in

latitude = atldf.iloc[16]['Latitude']  #Use neighborhood coordinates from Midtown Atlanta
longitude = atldf.iloc[16]['Longitude']
map_Atlanta = folium.Map(location=[latitude, longitude], zoom_start=11)

#Add markers to map
for lat, lng, label in zip( atldf['Latitude'], atldf['Longitude'], atldf['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Atlanta)

map_Atlanta

### 2.2) List of Schools in Georgia

Our list of zip codes will help us to figure out which schools we need.  

On EduBug, public schools are listed by county, so finding which counties are the most common among our zip codes will allow us to find the right lists of schools.

In [6]:
#Figure out which counties are most prevalent among these zip codes
for x in atldf['Zip Code'] :
    zc = x + ' , GA'
    print(geolocator.geocode(zc))

DeKalb County, Georgia, 30002, United States of America
Scottdale, DeKalb County, Georgia, 30021, United States of America
Oakhurst, Decatur, DeKalb County, Georgia, 30030, United States of America
DeKalb County, Georgia, 30032, United States of America
North Druid Hills, DeKalb County, Georgia, 30033, United States of America
Marietta, Cobb County, Georgia, 30080, United States of America
Tucker, DeKalb County, Georgia, 30084, United States of America
Atlanta, Fulton County, Georgia, 30303, United States of America
Atlanta, Fulton County, Georgia, 30305, United States of America
Atlanta, Fulton County, Georgia, 30306, United States of America
East Lake, Atlanta, Fulton County, Georgia, 30307, United States of America
Vine City, Atlanta, Fulton County, Georgia, 30308, United States of America
Vine City, Atlanta, Fulton County, Georgia, 30309, United States of America
Cascade Heights, Atlanta, Fulton County, Georgia, 30310, United States of America
Atlanta, Fulton County, Georgia, 30311

Pretty clearly, Fulton and DeKalb counties are the most common.

In [7]:
#Raw web scrape - Fulton County and Dekalb County

url_f = 'http://georgia.educationbug.org/public-schools/county-fulton.html'
response_f = requests.get(url_f)
soup_f = BeautifulSoup(response_f.content, 'lxml')

url_DK = 'http://georgia.educationbug.org/public-schools/county-dekalb.html'
response_DK = requests.get(url_DK)
soup_DK = BeautifulSoup(response_DK.content, 'lxml')

In [8]:
#Arrange into lists
schools_f = soup_f.findAll('a') #find all link text
schools_DK = soup_DK.findAll('a')

public_schools_f = [] #blank lists to fill
public_schools_DK = []

for link in schools_f:
    public_schools_f.append(link.get_text())
    
for link in schools_DK:
    public_schools_DK.append(link.get_text())
    

#Make data frame for Fulton, clean up
psdf = pd.DataFrame({'School Name':public_schools_f}) #psdf stands for public schools data frame
psdf = psdf.drop(psdf.index[0:33])
psdf = psdf[:-7]
psdf = psdf.drop_duplicates()

#Make data frame for DeKalb, clean up
psdf2 = pd.DataFrame({'School Name':public_schools_DK})
psdf2 = psdf2.drop(psdf.index[0:33])
psdf2 = psdf2[:-7]
psdf2 = psdf2.drop_duplicates()

#Combine dataframes into 1
psdf = psdf.append(psdf2).reset_index(drop=True)
psdf = psdf.drop_duplicates().reset_index(drop=True)

In [10]:
#Geocode Schools
lats = [] #empty lists to fill
longs = []
idx = 0 #index tracker
psdf = psdf.reset_index(drop=True) #reset index in case rerun after timeout

for x in psdf['School Name']:
    sn = x + ' , GA'
    location = geolocator.geocode(sn)
    y = isinstance(location, geopy.location.Location)
    
    if 'Georgia' in str(location) and y==True:  #Check if the school is in Georgia; if a value was returned
        lats.append(location.latitude)          #Update latitude and longitude
        longs.append(location.longitude)
    else:
        psdf = psdf.drop(idx)  #Delete entry in dataframe otherwise
        
    sleep(1)
    idx +=1

psdf['Latitude'] = lats
psdf['Longitude'] = longs

print('All done!')

All done!


In [11]:
#Weed out schools that were mistaken for others in Georgia
#Constrain to maximimum and minimum ATL zip code coords

psdf = psdf.drop_duplicates(subset=['Latitude','Longitude']).reset_index(drop=True) #Drop any schools that have multiple names/schools for one location

max_lat = atldf['Latitude'].max() 
max_lon = atldf['Longitude'].max()
min_lat = atldf['Latitude'].min()
min_lon = atldf['Longitude'].min()

# #Constrain by Latitude
psdf = psdf.set_index('Latitude')
for lat in psdf.index:
    if lat>max_lat or lat<min_lat:
        psdf = psdf.drop(lat)
        
psdf = psdf.reset_index(drop=False)

#Constrain by Longitude
psdf = psdf.set_index('Longitude')
for lon in psdf.index:
    if lon>max_lon or lon<min_lon:
        psdf = psdf.drop(lon)
    
        
psdf = psdf.reset_index(drop=False)
psdf['Category'] = 'School'

psdf.to_csv('psdf.csv')  #Save dataframe to csv locally

print('All done!')

All done!


In [12]:
#Make a map of Atlanta with schools

latitude = atldf.iloc[16]['Latitude']  #Use neighborhood coordinates from Midtown Atlanta
longitude = atldf.iloc[16]['Longitude']
map_Features = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map for schools
for lat, lng, label in zip( psdf['Latitude'], psdf['Longitude'], psdf['Category']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Features)

map_Features

### 2.3) List of Universities and Colleges in Georgia

The Wikipedia page https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Georgia_(U.S._state) contains a list of universities and colleges in Georgia.  Obviously, not all of them will be around Atlanta, so it will be necessary to constrain the list of colleges to those around Atlanta.

In [13]:
#Raw web scrape

url_U = 'https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Georgia_(U.S._state)'
response_U = requests.get(url_U)
soup_U = BeautifulSoup(response_U.content, 'lxml')

In [14]:
#Arrange the table text into lists

college_table = soup_U.findAll('table')[0] #Get 1st table on page
college_table = college_table.findAll('tr')
del college_table[0] #Delete headers

College_Name=[] #Empty list to fill

for entry in college_table:
    row = entry.findAll('td')
    College_Name.append(row[0].find(text=True)) #Assign first element to list College_Name

GA_Colleges = pd.DataFrame({'College Name':College_Name})

In [15]:
#Geocode colleges
lat = [] #empty lists to fill
long = []
idx = 0


#Create lists of latitude and longitude
for college in GA_Colleges['College Name']:
    location = geolocator.geocode(college)
    y = isinstance(location, geopy.location.Location)
    if y == True:
        lat.append(location.latitude)
        long.append(location.longitude)
    else:
        GA_Colleges = GA_Colleges.drop(idx)
    sleep(1)
    idx +=1
    
    
#Add lat and long to dataframe
GA_Colleges
print(lat)
GA_Colleges['Latitude'] = lat
GA_Colleges['Longitude'] = long
GA_Colleges.head() #preview dataframe

[33.776033, 33.9404278, 33.4693345, 33.754794000000004, 32.421438050000006, 34.03883185, 33.5750493, 30.8471471, 31.5678798, 38.6456168, 32.502024500000005, 32.5336633, 33.0820873, 32.80878545, 32.0229237, 33.8661343, 31.4652158, 33.709279050000006, 31.1829103, 34.7748018, 32.5921084, 33.9782299, 34.1709319, 33.0473406, 33.7501687]


Unnamed: 0,College Name,Latitude,Longitude
0,Georgia Institute of Technology,33.776033,-84.398841
1,University of Georgia,33.940428,-83.373049
2,Augusta University,33.469335,-81.988562
3,Georgia State University,33.754794,-84.387896
4,Georgia Southern University,32.421438,-81.784505


In [16]:
#Constrain Colleges to relevant zipcodes

# #Constrain by Latitude
GA_Colleges = GA_Colleges.set_index('Latitude')
for lat in GA_Colleges.index:
    if lat>max_lat or lat<min_lat:
        GA_Colleges = GA_Colleges.drop(lat)
        
GA_Colleges = GA_Colleges.reset_index(drop=False)

#Constrain by Longitude
GA_Colleges = GA_Colleges.set_index('Longitude')
for lon in GA_Colleges.index:
    if lon>max_lon or lon<min_lon:
        GA_Colleges = GA_Colleges.drop(lon)
    
        
GA_Colleges = GA_Colleges.reset_index(drop=False)
GA_Colleges['Category'] = 'College'
print('All done!')


All done!


In [17]:
#Add colleges to map

# add markers to map for colleges
for lat, lng, label in zip( GA_Colleges['Latitude'], GA_Colleges['Longitude'], GA_Colleges['Category']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=20,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Features)

map_Features

### 2.4) Import Foursquare API

The FourSquare API will provide locations for dance studios, grocery stores, and performing arts venues.

In [18]:
#Define Foursquare credentials

CLIENT_ID = '4GM1A03YOPGDEIR0CVZDIDKOQTPGJCSAOVGUW5WH122BKJPF' # your Foursquare ID
CLIENT_SECRET = 'MBYZDG2IIVPUYMCFKCJ2O52ZZT4VCDCRGMVLK3L3NY5O5GIP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4GM1A03YOPGDEIR0CVZDIDKOQTPGJCSAOVGUW5WH122BKJPF
CLIENT_SECRET:MBYZDG2IIVPUYMCFKCJ2O52ZZT4VCDCRGMVLK3L3NY5O5GIP


In [19]:
#Define a function to find all venues in Atlanta

def getNearbyVenues(names, latitudes, longitudes, radius=1400):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
#Run function to find venues in neighborhoods

atlanta_venues = getNearbyVenues(names=atldf['Neighborhood'],
                                   latitudes=atldf['Latitude'],
                                   longitudes=atldf['Longitude']
                                  )


Avondale Estates 
Clarkston 
City of Decatur - Oakhurst - Winnona Park 
East Lake - South Decatur
North Decatur  
Smyrna  
Tucker  
Downtown - Central Business District - Fairlee Poplar
Buckhead - Garden Hills - Haynes Manor - Peachtree Battle - Peachtree Hills - Tuxedo Park
Virginia Highlands - Morningside/Lenox Park - Poncey-Highland - Druid Hills
Candler Park - Druid Hills - Edgewood - Emory - Inman Park - Lake Claire - Little Five Points
Midtown - Old Fourth Ward
Midtown - Ansley Park - Brookwood Hills - Loring Heights
Adair Park - Capitol View - Oakland City - West End
Cascade  
Downtown Atlanta - Grant Park
Downtown Atlanta - Castlebury Hill
Vines City - Mozely Park
Grant Park - Peoplestown - Lakewood
Cabbagetown - East Atlanta Village - Ormewood Park - South DeKalb
East Lake   - Kirkwood   - Edgewood  
Home Park - Northwest Atlanta - Collier Hills - Underwood Hills - Midtown West
Brookhaven - North Atlanta - Dunwoody
Morningside/Lenox Park - Piedmont Heights - Lenox -Lavista Par

In [22]:
#Compile dataframes for each kind of venue
dance_studios = atlanta_venues[['Neighborhood', 'Neighborhood Latitude','Neighborhood Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']].where(atlanta_venues['Venue Category']=='Dance Studio').dropna()
grocery_stores = atlanta_venues[['Neighborhood', 'Neighborhood Latitude','Neighborhood Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']].where(atlanta_venues['Venue Category']=='Grocery Store').dropna()
performance_venues = atlanta_venues[['Neighborhood', 'Neighborhood Latitude','Neighborhood Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']].where(atlanta_venues['Venue Category']=='Performing Arts Venue').dropna()

## 3) Preliminary Analysis -- Maps

Qualitatively, it looks like the southwest and eastern areas might be good candidates.  

It's a bit tough to say for sure, though, what areas might be better than others.  Doing some more analysis might help to narrow this down further.

In [23]:
#Add dance studios, grocery stores, and performance venues to map


# add markers to map for dance studios
for lat, lng, label in zip( dance_studios['Venue Latitude'], dance_studios['Venue Longitude'], dance_studios['Venue Category']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Features)
    
#Add markers for grocery stores
for lat, lng, label in zip( grocery_stores['Venue Latitude'], grocery_stores['Venue Longitude'], grocery_stores['Venue Category']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='white',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Features)

    
#Add markers for performance venues
for lat, lng, label in zip( performance_venues['Venue Latitude'], performance_venues['Venue Longitude'], performance_venues['Venue Category']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Features)

map_Features

## 4) Fine Analysis -- DBSCAN Clustering

The DBSCAN algorithm will be an appropriate and useful tool for this situation.  It can cluster together these different features based on how close together they are.  That would yield more discrete areas to consider locating in.

First, it will make sense to seek out and drop any locations within 2 miles of an existing dance studio.  That way, DBSCAN will automatically make clusters which not only exclude existing studios, but also only include relevant locations which are 2 miles or more away from existing studios.

### 4.1) Compile all dataframes and info together

In [24]:
#Make a dataframe with all the locations

#Combine venue data frames
all_venues = dance_studios.append(grocery_stores)
all_venues = all_venues.append(performance_venues)
all_venues = all_venues.reset_index(drop=True)

#Remove all information except venue coordinates
all_venues = all_venues.drop(['Neighborhood Latitude', 'Neighborhood Longitude','Neighborhood','Venue'], axis=1)
all_venues = all_venues.rename(columns={"Venue Latitude": "Latitude", "Venue Longitude": "Longitude",'Venue Category':'Category'})

#Add in college and school data frames
all_venues = all_venues.append(GA_Colleges.drop(['College Name'], axis = 1))
all_venues = all_venues.append(psdf.drop(['School Name'], axis=1))
all_venues = all_venues.reset_index(drop=True)

#View dataframe
all_venues

Unnamed: 0,Latitude,Longitude,Category
0,33.825482,-84.289883,Dance Studio
1,33.74112,-84.34961,Dance Studio
2,33.830702,-84.363622,Dance Studio
3,33.681068,-84.381351,Dance Studio
4,33.819506,-84.228496,Grocery Store
5,33.810461,-84.239526,Grocery Store
6,33.73944,-84.253608,Grocery Store
7,33.75402,-84.270818,Grocery Store
8,33.751289,-84.274384,Grocery Store
9,33.901594,-84.487494,Grocery Store


In [25]:
#Add zip codes for each venue

from uszipcode import SearchEngine
from uszipcode import Zipcode
search = SearchEngine(simple_zipcode=True)
all_zips = []

for lat, lng in zip( all_venues['Latitude'], all_venues['Longitude']):
    result = search.by_coordinates(lat, lng, radius=10, returns=1)
    zipstring = str(result)
    zipcode = int(zipstring[24:29])
    all_zips.append(zipcode)
    
all_venues['Zip Code'] = all_zips

### 4.2) Drop all locations within 2 miles of Dance Studios

In [27]:
# Create new dataframe for drop
all_venues_d = all_venues

#Create new column with a coordinate pair
all_venues_d['Coords Tuple'] = list(zip(all_venues_d['Latitude'],all_venues_d['Longitude']))

#Setup the dataframe for dropping rows based on Category
all_venues_d = all_venues_d.set_index('Category')

#Make dataframe with only dance studios
dance_studios_d = all_venues_d.loc['Dance Studio']
dance_studios_d = dance_studios_d.reset_index(drop=False)

#Drop dance studios from all_venues_d
all_venues_d = all_venues_d.reset_index(drop=False)
all_venues_d = all_venues_d[ all_venues_d.Category != 'Dance Studio' ] #Drop dance studios
all_venues_d = all_venues_d.reset_index(drop=True)

In [28]:
ind = 0 #index tracker

#Drop all locations within 2 miles of an existing dance studio
for coord in all_venues_d['Coords Tuple']:
    dist = []
    
    for ds in dance_studios_d['Coords Tuple']:
        dist.append( haversine( coord, ds, unit=Unit.MILES ) )
        
    if any(d <= 2 for d in dist) == True:
        all_venues_d = all_venues_d.drop(ind)
    ind +=1

all_venues_d = all_venues_d.reset_index(drop=True) #Clean up dataframe

### 4.3) DBSCAN

DBSCAN will be run with an epsilon value of 1.25 miles.  This means that within each cluster, nearby locations will be at most 1.25 miles apart.

In [29]:
#Setup for DBSCAN
cluster_df = all_venues_d[['Latitude','Longitude']]
coords = cluster_df.to_numpy()
mi_per_radian = 3958.76133
epsilon = 1.25 / mi_per_radian

#Run DBSCAN
db = DBSCAN(eps=epsilon, min_samples=1, algorithm='auto', metric='haversine').fit(np.radians(coords))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))

#Add cluster labels to dataframe
all_venues_d['Cluster'] = cluster_labels

Number of clusters: 29


In [31]:
#Create a map displaying the clusters

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)


x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
label_color = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, poi, zipc, cluster in zip(all_venues_d['Latitude'], all_venues_d['Longitude'], all_venues_d['Category'], all_venues_d['Zip Code'], all_venues_d['Cluster']):
    label = folium.Popup(str(zipc) +'  ' + str(poi) + ' Cluster: ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=label_color[cluster-1],
        fill=True,
        fill_color=label_color[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

### 4.4) Cluster Sorting

It's pretty clear where the dance studios were removed, and there are plenty of areas that look dense with locations.  In order to ensure we're getting as many of the desired features and locations within the we might locate, let's sort the clusters based on whether or not they have particular location categories.

In other words, let's figure out which clusters are best.

In [32]:
#Identify clusters with each a school, grocery store, and university as ideal candidates

idealness = []
categories = ['School', 'College', 'Grocery Store', 'Performing Arts Venue']

for x in range(0, num_clusters):
    true_false = []
    for cat in categories:
        cluster_cats = list ( all_venues_d[ all_venues_d.Cluster == x ]['Category'] )
        true_false.append( any( ven == cat for ven in cluster_cats ) )
    idealness.append(true_false)

In [33]:
#Sort clusters based on containing most important locations

ideal_df = pd.DataFrame(columns=categories)
row = 0 #index tracker

for entry in idealness:
    ideal_df.loc[row, 'School'] = entry[0]
    ideal_df.loc[row, 'College'] = entry[1]
    ideal_df.loc[row, 'Grocery Store'] = entry[2]
    ideal_df.loc[row, 'Performing Arts Venue'] = entry[3]
    row +=1

In [34]:
#Add centerpoint of cluster, add zipcode of centerpoint to give an idea of where to start looking

def get_centermost_point(cluster):
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    return tuple(centermost_point)

centermost_points = clusters.map(get_centermost_point)

In [35]:
#Add Center point, zipcode of center point to dataframe

lat = [] #empty lists to fill
lon = []

#Add centerpoint coordinates
for coord in centermost_points:
    lat.append( coord[0] )
    lon.append( coord[1] )

ideal_df['Center Lat'] = lat
ideal_df['Center Lon'] = lon


#Add zipcode of centerpoint
zips = []
for lat, lng in zip( ideal_df['Center Lat'], ideal_df['Center Lon']):
    result = search.by_coordinates(lat, lng, radius=10, returns=1)
    zipstring = str(result)
    zipcode = int(zipstring[24:29])
    zips.append(zipcode)

ideal_df['Center Zip Code'] = zips

In [36]:
#Sort dataframe by priority
ideal_df = ideal_df.reset_index(drop=False)
ideal_df = ideal_df.rename(columns={'index':'Cluster'})
ideal_df = ideal_df.sort_values(categories, ascending=False).reset_index(drop=True)

#View top 10 clusters of the dataframe
top10 = ideal_df.head(10)
top10

Unnamed: 0,Cluster,School,College,Grocery Store,Performing Arts Venue,Center Lat,Center Lon,Center Zip Code
0,3,True,True,True,True,33.767884,-84.374371,30308
1,8,True,True,False,False,33.715108,-84.417983,30310
2,0,True,False,True,False,33.819506,-84.228496,30021
3,1,True,False,True,False,33.73944,-84.253608,30032
4,5,True,False,True,False,33.866952,-84.252094,30345
5,9,True,False,False,False,33.759273,-84.454374,30314
6,10,True,False,False,False,33.6602,-84.4067,30354
7,11,True,False,False,False,33.91044,-84.426028,30328
8,12,True,False,False,False,33.899269,-84.369926,30342
9,13,True,False,False,False,33.757051,-84.417428,30314


In [49]:
#Plot center points of top 10 clusters

map_topclusters = folium.Map(location=[latitude, longitude], zoom_start=12)


x = np.arange(10)
ys = [i + x + (i*x)**2 for i in range(0,10)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
label_color = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, rank, zipc, cluster in zip(top10['Center Lat'], top10['Center Lon'], top10.index, top10['Center Zip Code'], top10['Cluster']):
    label = folium.Popup(str(zipc) +'  ' + 'Rank: ' + str(rank + 1) + ' Cluster: ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=label_color[rank],
        fill=True,
        fill_color=label_color[rank],
        fill_opacity=.8).add_to(map_topclusters)

map_topclusters

### 5) Final Thoughts

On this map the centerpoints of the top 10 potential areas to open a dance studio in Atlanta, Georgia.  The ranks start with 1 being the best, and are color coded so that 1 is purple, and as the rank decreases the color shifts toward the red end of the color spectrum.

Obviously, personal preferences, financial limitations, and other factors will come into play when considering where to start a dance studio, and this analysis does not address all of these other factors.  Instead, the aim here was to supply a potential studio owner with targeted regions that will support the basic needs of a new studio, and maybe even provide a slight edge over their competition.