# Capstone Project - The Battle of Neighborhoods 

## Business Understanding

_For an inexperienced entrepreneur finding the most optimal place to open your business can be pretty hard nowadays and the location of
your business could be the differential of failure or success. The objective of this project will consist in finding the best place to open your **Pet Shop in Florianopolis,Brazil**._

_We will try to find a neighborhood that have a high **population by pet venues**,indicating possible demand.We will also try to find a neighborhood with parks in the vicinity._

_Using Foursquare location data to analyze Florianopolis neighborhoods ,we will find the most **promissing neighborhoods** and expose them in a map for ease in visualization._


## Data section

 <h> <i> 
This project data will be divided in three parts : 
<ul>
  <li><b >Location Data :</b> Will be mainly used to collect pet and park venues location data using Foursquare database.</li>
  <li><b >Population Data :</b> Will be used to collect Florianopolis neighborhoods population.
    <a href="https://pt.wikipedia.org/wiki/Lista_de_distritos_e_bairros_de_Florianópolis">ref </a>
    </li>
  <li><b >Neighborhoods Data :</b> Will be used to collect the neighborhoods names and location.
      <a href="https://pt.wikipedia.org/wiki/Lista_de_distritos_e_bairros_de_Florianópolis">ref¹ </a>
<a href="https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#place_id%3DChIJ1zLGsk45J5URRscEagtVvIE">ref² </a> 
    </li>
</ul>
     <i/> <h/>

## Importing Libraries

In [1]:
# Data Manipulation
import numpy as np
import pandas as pd

# Web
from bs4 import BeautifulSoup
import requests

# Locations and Maps
import geopy
import folium
from folium.plugins import HeatMap

# Data Visualization
import matplotlib.cm as cm
import matplotlib.colors as colors

# Machine Learning
from sklearn.cluster import KMeans


# Data Gathering

### Population Data

In [2]:
# Web Scraping
res = requests.get("https://pt.wikipedia.org/wiki/Lista_de_distritos_e_bairros_de_Florianópolis")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[3]
df = pd.read_html(str(table))
df = df[0]
# For this analysis we will use only neighborhoods with populations bigger than 1000.
df = df.head(59)
df.head()




Unnamed: 0,Posição,Bairro,População
0,1,Centro,44.074
1,2,Capoeiras,19.323
2,3,Trindade,15.031
3,4,Agronômica,14.591
4,5,Saco dos Limões,13.771


In [3]:
# Drop the useless column
df = df.drop(columns = 'Posição')

# Translate the columns name to english
df.columns = ['Neighborhood','Population']
df.head()

Unnamed: 0,Neighborhood,Population
0,Centro,44.074
1,Capoeiras,19.323
2,Trindade,15.031
3,Agronômica,14.591
4,Saco dos Limões,13.771


<i>Since the Population Data is using  a <b> dot </b> to represent the thousands division , when converted to a DataFrame the <b>  dot </b>  now represents a float first decimal place.
To solve this we will multiply the population by <b>1000</b> and change the population column type to <b>int</b>.</i>

In [4]:
df['Population'] = df['Population'] * 1000
df = df.astype(dtype = {'Population': 'int64'}, copy = False)

### Location Data

I tried to use google geocoding API to gather the coordinates of each neighboorhood , unfortunately the API isn't free , so i manualy used it's <a href="https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#place_id%3DChIJ1zLGsk45J5URRscEagtVvIE"> geocoding website </a> to gather all neighborhood coordinates.

In [5]:
coordinates = {'Centro':[-27.592269,-48.549027],'Capoeiras':[-27.598399,-48.591291],
              'Trindade':[-27.594124,-48.526226],'Agronômica':[-27.578449,-48.536231],
              'Saco dos Limões':[-27.605591,-48.531228],'Coqueiros':[-27.607956,-48.581662],
              'Monte Cristo':[-27.590706,-48.600056],'Jardim Atlântico':[-27.580896,-48.5963],
              'Itacorubi':[-27.591887,-48.493989],'Costeira do Pirajubaé':[-27.634991,-48.521946],
              'Capivari':[-27.4523,-48.401314],'Tapera da Base':[-27.688876,-48.561252],
              'Estreito':[-27.592205,-48.577521],'Monte Verde':[-27.558532,-48.49711],
              'Balneário':[-27.579533,-48.582528],'São João do Rio Vermelho':[-27.491495,-48.416288],
              'Canto':[-27.585268,-48.585032],'Abraão':[-27.605795,-48.595048],
              'Santa Mônica':[-27.590339,-48.512471],'Lagoa':[-27.603092,-48.47123],
              'Saco Grande':[-27.540117,-48.503721],'Córrego Grande':[-27.593609,-48.502741],
              'Canasvieiras':[-27.432257,-48.458211],'Pantanal':[-27.614619,-48.516222],
               'Coloninha':[-27.590402,-48.592543],'Barra da Lagoa':[-27.574089,-48.431267],
              'Carianos':[-27.662146,-48.537481],'José Mendes':[-27.610284,-48.547489],
              'Ingleses Centro':[-27.433011,-48.401314],'João Paulo':[-27.559727,-48.511221],
              'Campeche Leste':[-27.687526,-48.491222],'Campeche Sul':[-27.699395,-48.501026],
               'Rio Tavares Central':[-27.663414,-48.491222],'Santinho':[-27.459512,-48.381354],
              'Ponta das Canas':[-27.413127,-48.426274],'Vargem do Bom Jesus':[-27.442069,-48.426274],
              'Armação':[-27.750104,-48.507471],'Cachoeira do Bom Jesus Leste':[-27.430611,-48.421281],
              'Pântano do Sul':[-27.779987,-48.507594],'Itaguaçu':[-27.614546,-48.592543],
              'Jurere Leste':[-27.444496,-48.486223],'Campeche Norte':[-27.676062,-48.486223],
              'Vargem Grande':[-27.455797,-48.447498],'Campeche Central':[-27.675197,-48.503721],
              'Ressacada':[-27.666543,-48.531652],'Morro das Pedras':[-27.709706,-48.502471],
              'Alto Ribeirão Leste':[-27.704021,-48.519973],'Alto Ribeirão':[-27.703906,-48.536231],
              'Ribeirão da Ilha':[-27.714585,-48.560626],'Santo Antônio':[-27.510713,-48.512471],
              'Sambaqui':[-27.490126,-48.530603],'Ingleses Sul':[-27.442792,-48.385096],
              'Bom Abrigo':[-27.611454,-48.595674],'Jurere Oeste':[-27.442079,-48.506221],
              'Porto da Lagoa':[-27.632027,-48.47123],'Cachoeira do Bom Jesus':[-27.43122,-48.436261],
              'Rio Tavares do Norte':[-27.645742,-48.472479],'Pedregal':[-27.690174,-48.544675],
              'Ratones':[-27.508891,-48.487473]}


# Merging the coordinates data and the population data to form the Florianópolis DataFrame
cor_df = pd.DataFrame(columns = ['Latitude','Longitude'],data = coordinates.values())
fln_df = df.join(cor_df)
fln_df.head()

Unnamed: 0,Neighborhood,Population,Latitude,Longitude
0,Centro,44074,-27.592269,-48.549027
1,Capoeiras,19323,-27.598399,-48.591291
2,Trindade,15030,-27.594124,-48.526226
3,Agronômica,14591,-27.578449,-48.536231
4,Saco dos Limões,13770,-27.605591,-48.531228


<h2>Map of Florianópolis Neighborhoods </h2>

In [6]:
# Getting Florianópolis coordinates using geopy
geolocator = geopy.Nominatim(user_agent="fln_explorer")
location = geolocator.geocode(query = {'Florianopolis'})
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Florianopolis are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Florianopolis are -27.5973002, -48.5496098.


In [51]:
# Create a folium map
map_fln = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers for each neighborhood
for lat, lng,neighborhood,population in zip(fln_df['Latitude'], fln_df['Longitude'], fln_df['Neighborhood'], fln_df['Population']):
    label = '{}, {}'.format(neighborhood, population)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1,
        parse_html=False).add_to(map_fln)

# Show The map
map_fln




<h3> Venues Data </h3>

In [8]:
# Foursquare Credencials
CLIENT_ID = 'BEGUHX3LX512OSNNSTGRRA5YDBCYGBUINTY5UV4JFCCXH4RK'
CLIENT_SECRET = 'W301A1SX1F3YWEZL4NABLM2TJWA30OWEJU1PP11A3CPA3WSO'
VERSION = '20180323'

<i> Using the Foursquare API we will gather for each neighborhood all the  <ul> 
    <li><b> Pet Related Venues </b> : Pet Café, Pet Service, Pet Store and Veterinarian. </li>
    <li><b> Park Venues </b> : National Park, Park, Playground ,Dog Run and more. </li>
    </ul>   </i>

In [9]:
pet_categories = ['56aa371be4b08b9a8d573508', # Pet Café
                  '5032897c91d4c4b30a586d69', # Pet Service
                  '4bf58dd8d48988d100951735', # Pet Store
                  '4d954af4a243a5684765b473'] # Veterinarian

park_categories = ['4bf58dd8d48988d1e5941735', # Dog Run
                   '4bf58dd8d48988d15f941735', # Field
                   '4bf58dd8d48988d161941735', # Lake
                   '52e81612bcbc57f1066b7a21', # National Park
                   '4bf58dd8d48988d162941735', # Other Great Outdoors
                   '4bf58dd8d48988d163941735', # Park 
                   '52e81612bcbc57f1066b7a25', # Pedestrian Plaza
                   '4bf58dd8d48988d1e7941735'] # Playground

<i>Using the foursquare API we will gather in each neighborhood all the pet venues information name,location,neighborhood,category and 
return a panda DataFrame with each row representing a pet venue and columns for each information.</i> 


In [10]:
def getNearbyPet(names, latitudes, longitudes, radius = 500):
    count = 0
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print('\n',name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={},{},{},{}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            pet_categories[0],pet_categories[1],pet_categories[2],pet_categories[3]) # Pet Categories
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
    
        venues_list.append([(
            name, 
            lat, 
            lng,
            fln_df['Population'][count],
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        count += 1

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude',
                  'Neighborhood Population',  
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [11]:
# Generate the Florianópolis pet venues DataFrame
fln_pet = getNearbyPet(names = fln_df['Neighborhood'],
                                   latitudes = fln_df['Latitude'],
                                   longitudes = fln_df['Longitude'],
                                   radius = 1500
                                  )

In [12]:
# Show the number of pet venues
print('There are {} pet venues in Florianópolis'.format(fln_pet.shape[0]))

# Show the pet venues DataFrame
fln_pet.head()

There are 262 pet venues in Florianópolis


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Neighborhood Population,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,-27.592269,-48.549027,44074,Bicho De Luxo Pet Shop,-27.591899,-48.552906,Pet Store
1,Centro,-27.592269,-48.549027,44074,Pet Care Center,-27.591784,-48.55691,Pet Store
2,Centro,-27.592269,-48.549027,44074,Veterinária 3 Irmãos,-27.588739,-48.546933,Pet Store
3,Centro,-27.592269,-48.549027,44074,King Of Dogs,-27.593366,-48.548713,Pet Store
4,Centro,-27.592269,-48.549027,44074,Clínica Veterinária Estimacao,-27.591184,-48.552696,Pet Store


In [13]:
# We will do the same process with the park venues
def getNearbyPark(names, latitudes, longitudes, radius = 500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            #print('\n',name)
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={},{},{},{},{},{},{},{}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius,
                park_categories[0],park_categories[1],park_categories[2],park_categories[3],
                park_categories[4],park_categories[5],park_categories[6],park_categories[7])
            
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
               name,
               v['venue']['name'], 
               v['venue']['location']['lat'], 
               v['venue']['location']['lng'],  
               v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
# Generate the Florianópolis park DataFrame
fln_park = getNearbyPark(names = fln_df['Neighborhood'],
                                   latitudes = fln_df['Latitude'],
                                   longitudes = fln_df['Longitude'],
                                   radius = 1500
                                  )

In [16]:
# Show the number of park venues
print('There are {} parks in Florianópolis'.format(fln_park.shape[0]))

# Show the Florianópolis Park DataFrame
fln_park.head()

There are 266 parks in Florianópolis


Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,Travessa Ratclif,-27.599276,-48.548737,Pedestrian Plaza
1,Centro,Calçadão da Beira-Mar,-27.584658,-48.545238,Pedestrian Plaza
2,Centro,Parque da Luz,-27.590094,-48.55926,Park
3,Centro,Calçadão da Felipe Schmidt,-27.596666,-48.551672,Pedestrian Plaza
4,Centro,Largo São Sebastião,-27.588708,-48.551762,Other Great Outdoors


In [17]:
# Analyzing what pet category appears the most
fln_pet.groupby(['Venue Category'],sort = False).count()



Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Neighborhood Population,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Pet Store,160,160,160,160,160,160,160
Veterinarian,54,54,54,54,54,54,54
Pet Service,47,47,47,47,47,47,47
Pet Café,1,1,1,1,1,1,1


<i>Since the <b>pet venues</b> are the focus of the analysis we will make a column for each category , as for the <b>park venues</b> we will make a column with the total sum of all categories<i/>

### Pet Venues

In [18]:
fln_DataFrame = fln_df.copy() # Make a copy of the original dataframe
list_of_neighborhoods  = list(fln_DataFrame['Neighborhood'].values) # Make a list with all neighborhoods

# Make a DataFrame with double index and a column with the time each category repeats
df_join_pet = pd.DataFrame(fln_pet.groupby(['Neighborhood','Venue Category'],sort = False).count()['Venue'])
df_join_pet.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue
Neighborhood,Venue Category,Unnamed: 2_level_1
Centro,Pet Store,7
Centro,Veterinarian,3
Centro,Pet Service,1
Capoeiras,Pet Store,4
Capoeiras,Veterinarian,2


In [19]:
# Set the default value of each category to zero
fln_DataFrame['Pet Store'] = 0 
fln_DataFrame['Veterinarian'] = 0 
fln_DataFrame['Pet Service'] = 0 
fln_DataFrame['Pet Café'] = 0 

count = 0
# iterate over each row of the Dataframe adding it's value if exists for each neighborhood
for neighborhood,venue in df_join_pet.index:
    #print(neighborhood,venue)
    if venue == 'Pet Store':
        fln_DataFrame.loc[list_of_neighborhoods.index(neighborhood), venue] = df_join_pet.values[count]
    elif venue == 'Veterinarian':
        fln_DataFrame.loc[list_of_neighborhoods.index(neighborhood), venue] = df_join_pet.values[count]
    elif venue == 'Pet Service':
        fln_DataFrame.loc[list_of_neighborhoods.index(neighborhood), venue] = df_join_pet.values[count]
    elif venue == 'Pet Café':
        fln_DataFrame.loc[list_of_neighborhoods.index(neighborhood), venue] = df_join_pet.values[count]
    count += 1
    
fln_DataFrame.head()


Unnamed: 0,Neighborhood,Population,Latitude,Longitude,Pet Store,Veterinarian,Pet Service,Pet Café
0,Centro,44074,-27.592269,-48.549027,7,3,1,0
1,Capoeiras,19323,-27.598399,-48.591291,4,2,0,0
2,Trindade,15030,-27.594124,-48.526226,4,2,1,0
3,Agronômica,14591,-27.578449,-48.536231,6,5,0,0
4,Saco dos Limões,13770,-27.605591,-48.531228,2,2,0,0


### Park Venues

In [20]:
# Show how many venues each neighborhood have
fln_park.groupby('Neighborhood',sort = False).count().head()

Unnamed: 0_level_0,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Centro,15,15,15,15
Capoeiras,5,5,5,5
Trindade,2,2,2,2
Agronômica,4,4,4,4
Saco dos Limões,3,3,3,3


In [21]:
df_join_park = pd.DataFrame(fln_park.groupby('Neighborhood',sort = False).count()['Venue'])
df_join_park.reset_index(level=0, inplace=True) # Transforming the index in a column
fln_DataFrame['Number of Parks'] = 0 # setting zero as the default value
# iterate in all neighborhoods to see what neighborhoods don't have park venues
for index,j in enumerate(fln_DataFrame['Neighborhood']):
    if j not in df_join_park['Neighborhood'].values:
        print('Index : {} , Park Name : {}'.format(index,j))    

Index : 48 , Park Name : Ribeirão da Ilha[1]
Index : 49 , Park Name : Santo Antônio


In [22]:
a = 0
# For each neighborhood add the total park venue value if the venue don't have any pass to the next
for i,j in enumerate(fln_DataFrame['Neighborhood']):
    if i in [48,49]:
        pass
    else:
        fln_DataFrame.iloc[i,8] = df_join_park.iloc[a,1]
        a += 1
        
fln_DataFrame.head()  

Unnamed: 0,Neighborhood,Population,Latitude,Longitude,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks
0,Centro,44074,-27.592269,-48.549027,7,3,1,0,15
1,Capoeiras,19323,-27.598399,-48.591291,4,2,0,0,5
2,Trindade,15030,-27.594124,-48.526226,4,2,1,0,2
3,Agronômica,14591,-27.578449,-48.536231,6,5,0,0,4
4,Saco dos Limões,13770,-27.605591,-48.531228,2,2,0,0,3


## Methodology

_In order to not overcomplicate the analysis ,we will only focus neighborhoods with a population larger than a thousand._

_The first part of our analysis we collected all the data mentioned in the Data section of our notebook._

_The second part of our analysis we will cluster the neighborhoods with theirs population,number of parks and pet venues categories using the **k-means clustering** aproach. On top of that we will use maps to ilustrate and facilate the analysis of each cluster._

_In our final part of our analysis we will select the clusters that are more appropriate to our criterias **(population by pet venues,number of parks)**.Then we will present a final map with all the selected neighborhoods and theirs pet venues and parks, and a final dataframe with all information regarding those neighborhoods._

## Clustering the neighborhoods

In [23]:
# To try to decrease the bias in the analysis , i shuffled the order of the DataFrame's rows
fln_final_df = fln_DataFrame.copy()
fln_final_df = fln_final_df.sample(frac=1,random_state=42).reset_index(drop=True)

# set number of clusters
kclusters = 7

# Drop non clusterable variables 
fln_cluster = fln_DataFrame.drop(['Neighborhood','Latitude','Longitude'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 42).fit(fln_cluster)

In [24]:
# add the clustering labels to the DataFrame
fln_final_df.insert(0, 'Cluster Labels', kmeans.labels_)

In [25]:
fln_final_df.head()

Unnamed: 0,Cluster Labels,Neighborhood,Population,Latitude,Longitude,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks
0,1,Centro,44074,-27.592269,-48.549027,7,3,1,0,15
1,5,Coqueiros,13592,-27.607956,-48.581662,3,1,0,0,6
2,2,Ponta das Canas,2473,-27.413127,-48.426274,0,0,1,0,3
3,2,Monte Verde,6197,-27.558532,-48.49711,3,0,2,0,5
4,2,Morro das Pedras,1527,-27.709706,-48.502471,2,0,0,0,2


In [26]:
# Show the distribution of the clusters
fln_final_df.groupby('Cluster Labels').count()

Unnamed: 0_level_0,Neighborhood,Population,Latitude,Longitude,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,14,14,14,14,14,14,14,14,14
1,1,1,1,1,1,1,1,1,1
2,6,6,6,6,6,6,6,6,6
3,19,19,19,19,19,19,19,19,19
4,3,3,3,3,3,3,3,3,3
5,1,1,1,1,1,1,1,1,1
6,15,15,15,15,15,15,15,15,15


## Map of Florianópolis Clustered Neighborhoods

In [70]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers for each neighborhood
markers_colors = []
for lat, lon, poi, cluster in zip(fln_final_df['Latitude'], fln_final_df['Longitude'], fln_final_df['Neighborhood'], fln_final_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 6,
        popup = label,
        color = rainbow[cluster-1],
        fill = True,
        fill_color = rainbow[cluster-1],
        fill_opacity = 0.8).add_to(map_clusters)
       
map_clusters

In [28]:
# In order to facilitate the analysis we will add a new column with the amount of people by Pet venues for each neighborhood
fln_final_df['Population by Pet Venues'] = fln_final_df['Population'] // (fln_final_df['Pet Store']+fln_final_df['Veterinarian']+fln_final_df['Pet Service']+fln_final_df['Pet Café'])
fln_final_df.head()

Unnamed: 0,Cluster Labels,Neighborhood,Population,Latitude,Longitude,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
0,1,Centro,44074,-27.592269,-48.549027,7,3,1,0,15,4006.0
1,5,Coqueiros,13592,-27.607956,-48.581662,3,1,0,0,6,3398.0
2,2,Ponta das Canas,2473,-27.413127,-48.426274,0,0,1,0,3,2473.0
3,2,Monte Verde,6197,-27.558532,-48.49711,3,0,2,0,5,1239.0
4,2,Morro das Pedras,1527,-27.709706,-48.502471,2,0,0,0,2,763.0


## Analyzing the Clusters


### Cluster 0

In [29]:
# Create a DataFrame with the all neighborhoods inside the cluster
clus_0 = fln_final_df.loc[fln_final_df['Cluster Labels'] == 0, fln_final_df.columns[[1] + [2] + list(range(5, fln_final_df.shape[1]))]]
clus_0

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
11,Agronômica,14591,6,5,0,0,4,1326.0
12,Armação,2247,1,1,0,0,3,1123.0
13,Campeche Sul,2802,4,0,0,0,5,700.0
14,Itacorubi,10307,4,0,0,0,3,2576.0
15,Abraão,5210,5,1,0,0,7,868.0
16,Monte Cristo,12634,10,7,0,0,7,743.0
17,Saco dos Limões,13770,2,2,0,0,3,3442.0
18,Porto da Lagoa,1200,2,1,1,0,2,300.0
19,Lagoa,5081,2,1,1,0,7,1270.0
20,Campeche Leste,2973,3,2,0,0,5,594.0


In [30]:
# Create a map with the all the cluster neighborhoods ,parks and pet venues

# Create a folium map
result_map_fln = folium.Map(location=[latitude, longitude], zoom_start=11)

# Define a function to make cluster maps
def cluster_maps(clus_df,result_map):
    # Add all the selected neighborhoods
    for lat, lng,neighborhood,population in zip(fln_final_df['Latitude'], fln_final_df['Longitude'],
                                            fln_final_df['Neighborhood'], fln_final_df['Population']):
        if neighborhood in clus_df['Neighborhood'].values:
            label = '{}, {}'.format(neighborhood, population)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius = 20,
                popup = label,
                color = 'blue',
                fill = True,
                fill_color = '#3186cc',
                fill_opacity = 0.5,
                parse_html = False).add_to(result_map)
        # Add all the pet venues in the selected neighborhoods
    for lat,lng,neighborhood,category in zip(fln_pet['Venue Latitude'], fln_pet['Venue Longitude'],
                                            fln_pet['Neighborhood'], fln_pet['Venue Category']):
        if neighborhood in clus_df['Neighborhood'].values:
            label = '{}, {}'.format(neighborhood, category)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius = 5,
                popup = label,
                color = 'red',
                fill = True,
                fill_color = '#FF0000',
                fill_opacity = 0.5,
                parse_html = False).add_to(result_map)
                
        # Add all the park venues in the selected neighborhoods        
    for lat,lng,neighborhood,category in zip(fln_park['Venue Latitude'], fln_park['Venue Longitude'],
                                            fln_park['Neighborhood'], fln_park['Venue Category']):
        if neighborhood in clus_df['Neighborhood'].values:
            label = '{}, {}'.format(neighborhood, category)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius = 5,
                popup = label,
                color = 'green',
                fill = True,
                fill_color = '#008000',
                fill_opacity = 0.5,
                parse_html = False).add_to(result_map)
            
    # Add a legend to the map
    legend_html = """
     <div style="position:fixed;
     bottom: 50px; 
     left: 50px; 
     width: 120px; 
     height: 90px; 
     border:2px solid grey; 
     z-index: 9999;
     font-size:14px;">
     &nbsp;<b>Labels</b><br>
     &nbsp;<i class="fa fa-circle fa-1x" style="color:green"></i>&nbsp;Parks<br>
     &nbsp;<i class="fa fa-circle fa-1x" style="color:red"></i>&nbsp;Pet Venues<br>
     &nbsp;<i class="fa fa-circle fa-1x" style="color:blue"></i>&nbsp;Neighborhoods
     </div>"""
    result_map.get_root().html.add_child(folium.Element(legend_html))
    return result_map

cluster_maps(clus_0,result_map_fln)

In [31]:
# Analyzing some useful information about the cluster
clus_0 = clus_0.drop([21]) # Drop the columns with undefined values 
mean_0 = clus_0.mean(skipna = True)
median_0 = clus_0.median(skipna = True)
std_0 = clus_0.std(skipna = True)
maxx_0 = clus_0.max(skipna = True,numeric_only = True)
minn_0 = clus_0.min(skipna = True,numeric_only = True)

# Create a DataFrame to facilitate the analysis
info_clus_0 = pd.DataFrame(data = {'mean':mean_0,'median':median_0,'std':std_0,'max':maxx_0,'min':minn_0})
info_clus_0

Unnamed: 0,mean,median,std,max,min
Population,6098.076923,5081.0,4969.965484,14591.0,1199.0
Pet Store,3.384615,3.0,2.534379,10.0,1.0
Veterinarian,1.615385,1.0,2.103111,7.0,0.0
Pet Service,0.538462,0.0,1.126601,4.0,0.0
Pet Café,0.0,0.0,0.0,0.0,0.0
Number of Parks,4.153846,4.0,1.951331,7.0,1.0
Population by Pet Venues,1169.230769,868.0,895.631449,3442.0,299.0


### Cluster 1

In [32]:
clus_1 = fln_final_df.loc[fln_final_df['Cluster Labels'] == 1, fln_final_df.columns[[1] + [2] + list(range(5, fln_final_df.shape[1]))]]
clus_1

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
0,Centro,44074,7,3,1,0,15,4006.0


### Cluster 2

In [49]:
clus_2 = fln_final_df.loc[fln_final_df['Cluster Labels'] == 2, fln_final_df.columns[[1] + [2] + list(range(5, fln_final_df.shape[1]))]]
clus_2

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
2,Ponta das Canas,2473,0,0,1,0,3,2473.0
3,Monte Verde,6197,3,0,2,0,5,1239.0
4,Morro das Pedras,1527,2,0,0,0,2,763.0
5,Jurere Oeste,1220,0,0,1,0,4,1220.0
6,Pedregal,1034,0,0,1,0,3,1034.0
7,Barra da Lagoa,3812,1,0,0,0,6,3812.0


In [34]:
# Create a map with the all the cluster neighborhoods ,parks and pet venues

# Create a folium map
result_map_fln = folium.Map(location=[latitude, longitude], zoom_start=11)
cluster_maps(clus_2,result_map_fln)

In [35]:
# Analyzing some useful information about the cluster
mean_2 = clus_2.mean(skipna = True)
median_2 = clus_2.median(skipna = True)
std_2 = clus_2.std(skipna = True)
maxx_2 = clus_2.max(skipna = True,numeric_only = True)
minn_2 = clus_2.min(skipna = True,numeric_only = True)

# Create a DataFrame to facilitate the analysis
info_clus_2 = pd.DataFrame(data = {'mean':mean_2,'median':median_2,'std':std_2,'max':maxx_2,'min':minn_2})
info_clus_2

Unnamed: 0,mean,median,std,max,min
Population,2710.5,2000.0,1992.902682,6197.0,1034.0
Pet Store,1.0,0.5,1.264911,3.0,0.0
Veterinarian,0.0,0.0,0.0,0.0,0.0
Pet Service,0.833333,1.0,0.752773,2.0,0.0
Pet Café,0.0,0.0,0.0,0.0,0.0
Number of Parks,3.833333,3.5,1.47196,6.0,2.0
Population by Pet Venues,1756.833333,1229.5,1166.459501,3812.0,763.0


### Cluster 3

In [36]:
clus_3 = fln_final_df.loc[fln_final_df['Cluster Labels'] == 3, fln_final_df.columns[[1] + [2] + list(range(5, fln_final_df.shape[1]))]]
clus_3

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
40,Capoeiras,19323,4,2,0,0,5,3220.0
41,Córrego Grande,4833,6,0,0,0,5,805.0
42,Trindade,15030,4,2,1,0,2,2147.0
43,Alto Ribeirão Leste,1493,2,0,1,0,1,497.0
44,Itaguaçu,2229,4,0,0,0,4,557.0
45,Vargem do Bom Jesus,2286,1,0,0,0,4,2286.0
46,Pantanal,4703,3,1,0,0,4,1175.0
47,Ribeirão da Ilha[1],1376,0,0,1,0,0,1376.0
48,Capivari,8686,2,2,0,0,2,2171.0
49,Canasvieiras,4822,2,1,1,0,5,1205.0


In [37]:
# Create a map with the all the cluster neighborhoods ,parks and pet venues

# Create a folium map
result_map_fln = folium.Map(location=[latitude, longitude], zoom_start=11)
cluster_maps(clus_3,result_map_fln)

In [68]:
result_map_fln = folium.Map(location=[latitude, longitude], zoom_start=11)
HeatMap(fln_pet[['Venue Latitude','Venue Longitude']].values.tolist(),radius = 19).add_to(result_map_fln)
result_map_fln 


In [38]:
# Analyzing some useful information about the cluster
clus_3 = clus_3.drop([58]) # Drop the columns with undefined values 
mean_3 = clus_3.mean(skipna = True)
median_3 = clus_3.median(skipna = True)
std_3 = clus_3.std(skipna = True)
maxx_3 = clus_3.max(skipna = True,numeric_only = True)
minn_3 = clus_3.min(skipna = True,numeric_only = True)

# Create a DataFrame to facilitate the analysis
info_clus_3 = pd.DataFrame(data = {'mean':mean_3,'median':median_3,'std':std_3,'max':maxx_3,'min':minn_3})
info_clus_3

Unnamed: 0,mean,median,std,max,min
Population,5576.833333,4762.5,5130.259049,19323.0,1022.0
Pet Store,3.055556,3.0,2.235337,8.0,0.0
Veterinarian,0.833333,1.0,0.857493,2.0,0.0
Pet Service,0.722222,1.0,0.826442,3.0,0.0
Pet Café,0.0,0.0,0.0,0.0,0.0
Number of Parks,4.611111,5.0,2.37979,8.0,0.0
Population by Pet Venues,1293.388889,1098.5,855.772519,3220.0,264.0


### Cluster 4

In [39]:
clus_4 = fln_final_df.loc[fln_final_df['Cluster Labels'] == 4, fln_final_df.columns[[1] + [2] + list(range(5, fln_final_df.shape[1]))]]
clus_4

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
8,Alto Ribeirão,1486,0,0,2,0,1,743.0
9,Estreito,7007,8,1,2,0,9,637.0
10,Santo Antônio,1351,3,1,3,1,0,168.0


In [40]:
# Analyzing some useful information about the cluster
mean_4 = clus_4.mean(skipna = True)
median_4 = clus_4.median(skipna = True)
std_4 = clus_4.std(skipna = True)
maxx_4 = clus_4.max(skipna = True,numeric_only = True)
minn_4 = clus_4.min(skipna = True,numeric_only = True)

# Create a DataFrame to facilitate the analysis
info_clus_4 = pd.DataFrame(data = {'mean':mean_4,'median':median_4,'std':std_4,'max':maxx_4,'min':minn_4})
info_clus_4

Unnamed: 0,mean,median,std,max,min
Population,3281.333333,1486.0,3227.227964,7007.0,1351.0
Pet Store,3.666667,3.0,4.041452,8.0,0.0
Veterinarian,0.666667,1.0,0.57735,1.0,0.0
Pet Service,2.333333,2.0,0.57735,3.0,2.0
Pet Café,0.333333,0.0,0.57735,1.0,0.0
Number of Parks,3.333333,1.0,4.932883,9.0,0.0
Population by Pet Venues,516.0,637.0,306.001634,743.0,168.0


### Cluster 5

In [41]:
clus_5 = fln_final_df.loc[fln_final_df['Cluster Labels'] == 5, fln_final_df.columns[[1] + [2] + list(range(5, fln_final_df.shape[1]))]]
clus_5

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
1,Coqueiros,13592,3,1,0,0,6,3398.0


### Cluster 6

In [42]:
clus_6 = fln_final_df.loc[fln_final_df['Cluster Labels'] == 6, fln_final_df.columns[[1] + [2] + list(range(5, fln_final_df.shape[1]))]]
clus_6

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
25,Costeira do Pirajubaé,9301,1,0,1,0,3,4650.0
26,José Mendes,3514,1,1,0,0,4,1757.0
27,Carianos,3656,1,0,1,0,1,1828.0
28,Canto,5560,7,1,2,0,10,556.0
29,Coloninha,4431,7,4,2,0,8,340.0
30,Santinho,2521,1,0,0,0,5,2521.0
31,Rio Tavares do Norte,1082,2,0,1,0,3,360.0
32,Campeche Norte,2009,1,2,1,0,6,502.0
33,Tapera da Base,7081,0,0,0,0,5,inf
34,Rio Tavares Central,2613,3,0,2,0,2,522.0


In [43]:
# Create a map with the all the cluster neighborhoods ,parks and pet venues

# Create a folium map
result_map_fln = folium.Map(location=[latitude, longitude], zoom_start=11)
cluster_maps(clus_6,result_map_fln)

In [44]:
# Analyzing some useful information about the cluster
clus_6 = clus_6.drop([33]) # Drop the columns with undefined values 
mean_6 = clus_6.mean(skipna = True)
median_6 = clus_6.median(skipna = True)
std_6 = clus_6.std(skipna = True)
maxx_6 = clus_6.max(skipna = True,numeric_only = True)
minn_6 = clus_6.min(skipna = True,numeric_only = True)

# Create a DataFrame to facilitate the analysis
info_clus_6 = pd.DataFrame(data = {'mean':mean_6,'median':median_6,'std':std_6,'max':maxx_6,'min':minn_6})
info_clus_6

Unnamed: 0,mean,median,std,max,min
Population,3220.928571,2567.0,2135.745896,9301.0,1082.0
Pet Store,2.428571,2.0,2.13809,7.0,0.0
Veterinarian,0.857143,0.5,1.167321,4.0,0.0
Pet Service,1.0,1.0,0.784465,2.0,0.0
Pet Café,0.0,0.0,0.0,0.0,0.0
Number of Parks,4.714286,4.5,2.729569,10.0,1.0
Population by Pet Venues,1230.428571,539.0,1252.193627,4650.0,315.0


## Results

_After analysis of the clusters ,clusters maps and statistical variables , we will exclude from the analysis the **Clusters 0,3 and 4**._

_We will now select the best neighborhoods within the remaining clusters , following the criteria established in the business understanding._

In [45]:
# Create a boolean list with all the neighborhoods that have values above the overall dataframe mean
cl = (clus_2[['Population by Pet Venues','Number of Parks']] > clus_2[['Population by Pet Venues','Number of Parks']].mean()).values.tolist()
index = 2
ana_clus_2 = clus_2.copy()
for pop,park in cl:
    if pop is False: # If the population is below the global mean discart
        ana_clus_2.drop([index],inplace = True)
    elif park is False and clus_2.loc[[index],'Population by Pet Venues'].values < 2000: # if the number of parks is below the global mean and the population is below 2000 discart
        print(index)
        ana_clus_2.drop([index],inplace = True)
    index += 1
    
# Return the neighborhoods that supply the conditions above    
ana_clus_2

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
2,Ponta das Canas,2473,0,0,1,0,3,2473.0
7,Barra da Lagoa,3812,1,0,0,0,6,3812.0


In [46]:
# Create a boolean list with all the neighborhoods that have values above the overall dataframe mean
cl_6 = (clus_6[['Population by Pet Venues','Number of Parks']] > clus_6[['Population by Pet Venues','Number of Parks']].mean()).values.tolist()
ana_clus_6 = clus_6.copy()
ana_clus_6 = ana_clus_6.reset_index(drop=True)
index = 0
for pop,park in cl_6:
    if pop is False: # If the population is below the global mean discart
        ana_clus_6.drop([index],inplace = True) 
    elif park is False and ana_clus_6.loc[[index],'Population by Pet Venues'].values < 2000: # if the number of parks is below the global mean and the population is below 2000 discart
        ana_clus_6.drop([index],inplace = True)
    index += 1
    
# Return the neighborhoods that supply the conditions above    
ana_clus_6

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
0,Costeira do Pirajubaé,9301,1,0,1,0,3,4650.0
5,Santinho,2521,1,0,0,0,5,2521.0
11,Cachoeira do Bom Jesus Leste,2241,0,0,1,0,3,2241.0


In [47]:
# Make a dataframe with all the selected the neighborhoods
final_cluster = pd.concat([ana_clus_2,ana_clus_6,clus_1,clus_5]).reset_index(drop = True)
final_cluster

Unnamed: 0,Neighborhood,Population,Pet Store,Veterinarian,Pet Service,Pet Café,Number of Parks,Population by Pet Venues
0,Ponta das Canas,2473,0,0,1,0,3,2473.0
1,Barra da Lagoa,3812,1,0,0,0,6,3812.0
2,Costeira do Pirajubaé,9301,1,0,1,0,3,4650.0
3,Santinho,2521,1,0,0,0,5,2521.0
4,Cachoeira do Bom Jesus Leste,2241,0,0,1,0,3,2241.0
5,Centro,44074,7,3,1,0,15,4006.0
6,Coqueiros,13592,3,1,0,0,6,3398.0


In [71]:
# Create a folium map
final_map_fln = folium.Map(location=[latitude, longitude], zoom_start=11)

cluster_maps(final_cluster,final_map_fln)

## Final Results and Discussion

_In this project we analyzed 59 out of the 85 neighborhoods of Florianópolis:_
* Using the IBGE, the brazilian institute of geography and statistic, database we collected the population and name of each neighborhood.
* Using the foursquare API, we found 263 pet venues and 262 parks in the vicinity of each selected neighborhood.

_After gathering all the data we clustered the neighborhood and selected the most appropriate among our criteria, **(population by pet venues and number of parks)** the selected ones were:_
* **Barra da Lagoa** : <a href="https://i.pinimg.com/originals/f1/a3/75/f1a375f5eb7298fa2569942c4777db13.jpg">photo </a>
    * Is located near a lagoon
    * Is very nature friendly
* **Cachoeira do Bom Jesus Leste** : 
<a href = "https://cdnstatic8.com/viagensecaminhos.com//wp-content/uploads/2011/02/florianopolis-praia-cachoeira-bom-jesus.jpg">photo </a>
    * Mostly residential buildings  
    * Neighborhood in current expansion
    * Is very nature friendly
* **Centro** :
<a href = "https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcTg9WssZIb8W4aG852bkmgGXxUPbhVqGZPBDiiIc3Y7c142x9Ww&usqp=CAU">photo </a>
   * Is located in downtown Florianópolis
   * Have the biggest population amoung Florianópolis neighborhoods
   * Have the most parks amoung Florianópolis neighborhoods
   * Is the most urbanized area in Florianópolis
   * Is one of the most expensive neighborhoods
* **Coqueiros** :
<a href = "https://i.ytimg.com/vi/3epuMB7h0B8/maxresdefault.jpg">photo </a>
    * Is located close to downtown
    * Is the biggest neighborhood off the island
    * Has a lot of restaurants nearby
* **Costeira do Pirajubaé** :
<a href="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRkkT2RxAean_G2A8Vly23OPxR_g-3S00VtIc1ZoCwdBzw_N--3&usqp=CAU">photo </a>
    * Is near a soccer stadium
    * Has a beautiful sidewalk by the sea
    * Mostly residential buildings 
* **Ponta das Canas** :
<a href = "https://guiafloripa.com.br/wp-content/uploads/2020/01/ponta-das-canas-cachoeira-400x600.jpg">photo </a>
    * Is located near a beach
    * Very crowded in the summer
* **Santinho** :
<a href = "https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcQOn7-KqXqi7YLKPUOSwBKnBDJvYAUMmdVR0nR_zhPHTmp0V4dh&usqp=CAU">photo </a>
    * Is located near a beach
    * Very crowded in the summer
    * Is very nature friendly
    * Mostly residential buildings 