# Coursera Capstone Project

## Introduction

The city of Campinas is one the biggest and most important cities in the state of São Paulo in Brazil. With a population over 1 million, it is the the third most populous municipality in the state and the fourteenth most populous Brazilian city. The city's metropolitan area, Metropolitan Region of Campinas, is responsible for 1.8% of all Brazilian GDP and 11.4% of the State of São Paulo GDP, being campinas the 10th richest city in Brazil, with a gross domestic product of 36.68 billion reais (2010) [1].
Due to its relevance for the local economy, the city is a popular target for entrepreneurs and companies looking for opportunities to expand their business. In this context, the current analysis aims at answering a gym owner where would be the best best neighborhood to place his new gym branch focusing areas with low level of competition.


## Analysis

The JSON file created from the shapefile contains the information that defines the limits of each neighborhood in the city. This data comprises a set of latitude and longitude pairs that when connected define the contour of each neighborhoods. These contours are shown on the picture below over the map of campinas.



In [3]:
import pandas as pd

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

Getting the latitude and longitude data from campinas

In [4]:
address = 'Avenida Barão de Itapura, Campinas, São Paulo, Brazil'

geolocator = Nominatim(user_agent="campinas_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Campinas are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Campinas are -22.88373, -47.057304.


The geographical boundaries of each neighborhood were needed, as well as demographic data. This information is available at the citiy's data portal (https://informacao-didc.campinas.sp.gov.br/metadados.php) under the "PD2018 Unidades Territoriais Básicas (UTB) e Unidades Territoriais Rurais (UTR)" tab in the form a shapefile (.shp extension), which contains the population and demographic density of each region as well as the geometric data that defines its boundaries. In order to be handled by the folium library, the .shp format needed to be first converted into GeoJson file. Additionally, the geospatial data provided by the city portal used the coordinate system SIRGAS 2000 UTM 23 S (EPSG:31983) which needed to be converted to the Latitude and Longitude coordinate system (EPSG:4326) which can be interpreted by the folium library. For both file extension and coordinate system conversion, the website OGRE (https://ogre.adc4gis.com/) was used, resulting in the file "campinas_geo_data.json".



In [11]:
import requests
import json
r = requests.get('https://raw.githubusercontent.com/brunobsalles/Coursera_Capstone/master/Campinas_geo_data.json')
geo_json_data = r.json()

In [12]:

map = folium.Map(location=[latitude, longitude], zoom_start=11)

#folium.GeoJson(geo_json_data).add_to(map)

folium.GeoJson(
    geo_json_data,
    style_function=lambda feature: {
        'fillColor': 'green',
        'color': 'darkred',
        'weight': 0.5,
    }
).add_to(map)


map

Converting the JSON file into a pandas dataframe

In [14]:
from pandas.io.json import json_normalize
df = json_normalize(geo_json_data['features'])

# define the dataframe columns
column_names = ['Neighborhood', 'Population', 'Density', 'Latitude', 'Longitude'] 

# instantiate the dataframe
campinas_neighborhoods = pd.DataFrame(columns=column_names)
    
df = json_normalize(geo_json_data['features'])

The data from each neighborhood was organized in the "campinas_neighborhoods" dataframe, where the latitude and longitude values of each neighborhood are the average values of the points that form the neighborhood contour.

In [15]:
import numpy as np
for index, row1 in df.iterrows():
    a = row1['geometry.coordinates'][0]
    longitude = np.mean([row[0] for row in a])
    latitude = np.mean([row[1] for row in a])
    campinas_neighborhoods = campinas_neighborhoods.append({
                                  'Neighborhood': row1['properties.denominaca'],
                                  'Latitude': latitude,
                                  'Longitude': longitude,
                                   'Population': row1['properties.tot_pop'],
                                    'Density': row1['properties.densidade_']}, ignore_index=True)

In [16]:
for col in campinas_neighborhoods:
    try:
        campinas_neighborhoods[col] = campinas_neighborhoods[col].astype(float)
    except ValueError:
        pass

In [17]:
campinas_neighborhoods.head()

Unnamed: 0,Neighborhood,Population,Density,Latitude,Longitude
0,Pq. Valenca/Pq. Itajai,49026.0,3906.320993,-22.952906,-47.196391
1,Joaquim Egidio,849.0,0.0,-22.886818,-46.932617
2,UTR - Pedra Branca,1883.0,0.0,-22.995581,-47.065494
3,Pq. Ecologico,0.0,0.0,-22.909946,-47.020844
4,UTR - Amarais / Barao Geraldo 1,2664.0,39.06321,-22.753775,-47.069453


Plotting the neighborhoods as marks over the map of campinas

In [18]:
map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(campinas_neighborhoods['Latitude'], campinas_neighborhoods['Longitude'], campinas_neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
map

The same information can be viewed as a choropleth map using the population of each neighborhood as parameter

In [19]:
bins = np.linspace(0, campinas_neighborhoods['Population'].max(),10)


map = folium.Map(location=[latitude, longitude], zoom_start=11)

# generate choropleth map
map.choropleth(
    geo_data=geo_file,
    data=campinas_neighborhoods,
    columns=['Neighborhood', 'Population'],
    key_on='feature.properties.denominaca',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    bins=bins,
    legend_name='Population in the neighbohoods of Campinas'
)

# display map
map



The Foursquare API was used to get the number and type of venues in each neighborhood from Campinas. This data enabled also the classification of the regions into clusters according to its venues, helping to target at the most promising neighborhood to place the new gym facility.

In [20]:
# The code was removed by Watson Studio for sharing.

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We can limit the number of venues returned by Foursquare API by 200 and get the venues within a radius of 700 meters around the coordinates of each neighborhood.

In [22]:

LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 700 # define radius

campinas_venues = getNearbyVenues(names=campinas_neighborhoods['Neighborhood'],
                                   latitudes=campinas_neighborhoods['Latitude'],
                                   longitudes=campinas_neighborhoods['Longitude']
                                  )


Pq. Valenca/Pq. Itajai
Joaquim Egidio
UTR - Pedra Branca
Pq. Ecologico
UTR - Amarais / Barao Geraldo 1
UTR - Gargantilha / Sousas / Joaquim Egidio
UTR - Campo Grande 2
Fazenda Santa Elisa
Jd. Santa Maria
Swift/Jd.Sao Vicente/Jd.Esmeraldina
UTR - Amarais / Barao Geraldo 2
UTR - Samambaia
MM-70
UTR - Friburgo / Fogueteiro
Distrito Industrial de Campinas
Aeroporto de Viracopos
N. Campinas / Vila Brandina/ Jd. Flamboyant
Pq. Xangrila
Vila Nova / Guanabara/ Castelo
Pq. Imperador/ Notre Dame
UTR - Descampado 1
Bananal
Galleria
Ceasa
Center Santa Genebra
UTR - Furnas / Tanquinho 2
UTR - Furnas / Tanquinho 1
Jd. das Bandeiras/ Jd. Sao Jose
Bairro das Palmeiras
Jd. Maria Rosa/ Pq. Sao Paulo
Jd. Fernanda/ Jd. Itaguacu
UTR - Campo Grande 3
Abaete/Pedra Branca
Carlos Gomes/ Monte Belo
UTR - Descampado 3
Jd. Conceicao-Sousas
V. Costa e Silva/ Primavera/ Pq. Taquaral
Jd. Miriam/ Alphaville Campinas
Jd. Nova Mercedes
Swiss Park
J. Santa Genebra/ Mansoes Santo Antonio
Jd. N. Sra. Auxiliadora/ Taquaral

In [23]:
print('There are {} uniques categories.'.format(len(campinas_venues['Venue Category'].unique())))

There are 180 uniques categories.


In [24]:
campinas_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Pq. Valenca/Pq. Itajai,-22.952906,-47.196391,Fran Bazar,-22.954944,-47.197322,Arts & Crafts Store
1,Pq. Valenca/Pq. Itajai,-22.952906,-47.196391,Toca Do Pastel,-22.952035,-47.193625,Dumpling Restaurant
2,Pq. Valenca/Pq. Itajai,-22.952906,-47.196391,Tia Da Sorveteria,-22.956122,-47.1995,Ice Cream Shop
3,Joaquim Egidio,-22.886818,-46.932617,Restaurante Rancho Vô Joaquim,-22.888362,-46.934044,Brazilian Restaurant
4,Pq. Ecologico,-22.909946,-47.020844,D2,-22.910333,-47.018909,Badminton Court


Exploring the data returned by the Foursquare API by making a one hot encoding with the categories of each venue and the neighborhoods

In [25]:
# one hot encoding
campinas_onehot = pd.get_dummies(campinas_venues[['Venue Category']], prefix="", prefix_sep="")

# move neighborhood column to the first column
fixed_columns = list(campinas_onehot.columns)
campinas_onehot = campinas_onehot[fixed_columns]

# add neighborhood column back to dataframe
campinas_onehot['Neighborhood'] = campinas_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns= list(campinas_onehot.columns)
fixed_columns.remove('Neighborhood')
fixed_columns = ['Neighborhood'] + fixed_columns
campinas_onehot = campinas_onehot[fixed_columns]

campinas_onehot

Unnamed: 0,Neighborhood,Acai House,Accessories Store,Afghan Restaurant,Airport Terminal,Art Museum,Arts & Crafts Store,Athletics & Sports,Auto Workshop,Automotive Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Store,Volleyball Court,Warehouse Store,Water Park,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Pq. Valenca/Pq. Itajai,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Pq. Valenca/Pq. Itajai,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Pq. Valenca/Pq. Itajai,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Joaquim Egidio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Pq. Ecologico,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,UTR - Amarais / Barao Geraldo 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,UTR - Amarais / Barao Geraldo 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,UTR - Amarais / Barao Geraldo 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,UTR - Campo Grande 2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,UTR - Campo Grande 2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
campinas_grouped = campinas_onehot.groupby('Neighborhood').sum().reset_index()
campinas_grouped

Unnamed: 0,Neighborhood,Acai House,Accessories Store,Afghan Restaurant,Airport Terminal,Art Museum,Arts & Crafts Store,Athletics & Sports,Auto Workshop,Automotive Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Store,Volleyball Court,Warehouse Store,Water Park,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Aeroporto de Viracopos,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Alto da Nova Campinas/Gramado,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bairro das Palmeiras,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
3,Bosque das Palmeiras,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bosque/ Jd. Proenca,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
5,CIATEC II,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Cambui,1,0,0,0,0,0,2,0,0,...,0,0,0,0,0,0,1,1,0,2
7,Campo Grande/Jd. Florence,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Carlos Gomes/ Monte Belo,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Ceasa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Going futher in the analysis by defining the most common venues in each neighborhood.

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = campinas_grouped['Neighborhood']

for ind in np.arange(campinas_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(campinas_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aeroporto de Viracopos,Airport Terminal,Rental Car Location,Yoga Studio,Drugstore,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant
1,Alto da Nova Campinas/Gramado,Restaurant,Yoga Studio,Drugstore,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant,Electronics Store
2,Bairro das Palmeiras,Tennis Court,Pool,Lake,Brazilian Restaurant,BBQ Joint,Diner,Restaurant,Volleyball Court,Gym,Social Club
3,Bosque das Palmeiras,Pizza Place,Fishing Spot,Pet Store,Yoga Studio,Drugstore,Flea Market,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant
4,Bosque/ Jd. Proenca,Bar,Japanese Restaurant,Health & Beauty Service,Plaza,Park,Restaurant,Drugstore,Campground,Snack Place,Bakery
5,CIATEC II,Breakfast Spot,Gym / Fitness Center,General Entertainment,Brazilian Restaurant,Bakery,Electronics Store,Food,Flower Shop,Flea Market,Fishing Spot
6,Cambui,Japanese Restaurant,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Bar,Pizza Place,Brazilian Restaurant,Ice Cream Shop,Yoga Studio
7,Campo Grande/Jd. Florence,Hot Dog Joint,Bar,Electronics Store,Food,Flower Shop,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm
8,Carlos Gomes/ Monte Belo,Plaza,Farm,Food,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Empanada Restaurant,Electronics Store,Dumpling Restaurant
9,Ceasa,Farmers Market,Garden Center,Pier,Diner,Yoga Studio,Flea Market,Fishing Spot,Fast Food Restaurant,Farm,Empanada Restaurant


Clustering the neighborhoods into 3 clusters using k-means 

In [41]:
# set number of clusters
kclusters = 3

grouped_clustering = campinas_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1], dtype=int32)

In [42]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

campinas_merged = campinas_venues[['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude']].groupby('Neighborhood').mean().reset_index()

# merging dataframes to add latitude/longitude for each neighborhood
campinas_merged = campinas_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

campinas_merged 

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aeroporto de Viracopos,-23.017966,-47.142582,1,Airport Terminal,Rental Car Location,Yoga Studio,Drugstore,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant
1,Alto da Nova Campinas/Gramado,-22.911052,-46.999863,1,Restaurant,Yoga Studio,Drugstore,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant,Electronics Store
2,Bairro das Palmeiras,-22.899202,-47.016513,1,Tennis Court,Pool,Lake,Brazilian Restaurant,BBQ Joint,Diner,Restaurant,Volleyball Court,Gym,Social Club
3,Bosque das Palmeiras,-22.793723,-47.038488,1,Pizza Place,Fishing Spot,Pet Store,Yoga Studio,Drugstore,Flea Market,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant
4,Bosque/ Jd. Proenca,-22.915268,-47.043093,1,Bar,Japanese Restaurant,Health & Beauty Service,Plaza,Park,Restaurant,Drugstore,Campground,Snack Place,Bakery
5,CIATEC II,-22.817927,-47.045844,1,Breakfast Spot,Gym / Fitness Center,General Entertainment,Brazilian Restaurant,Bakery,Electronics Store,Food,Flower Shop,Flea Market,Fishing Spot
6,Cambui,-22.894652,-47.052444,0,Japanese Restaurant,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Bar,Pizza Place,Brazilian Restaurant,Ice Cream Shop,Yoga Studio
7,Campo Grande/Jd. Florence,-22.948952,-47.153379,1,Hot Dog Joint,Bar,Electronics Store,Food,Flower Shop,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm
8,Carlos Gomes/ Monte Belo,-22.757406,-46.990825,1,Plaza,Farm,Food,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Empanada Restaurant,Electronics Store,Dumpling Restaurant
9,Ceasa,-22.841832,-47.095432,1,Farmers Market,Garden Center,Pier,Diner,Yoga Studio,Flea Market,Fishing Spot,Fast Food Restaurant,Farm,Empanada Restaurant


In [31]:
campinas_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aeroporto de Viracopos,-23.017966,-47.142582,1,Airport Terminal,Rental Car Location,Yoga Studio,Drugstore,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant
1,Alto da Nova Campinas/Gramado,-22.911052,-46.999863,1,Restaurant,Yoga Studio,Drugstore,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant,Electronics Store
2,Bairro das Palmeiras,-22.899202,-47.016513,1,Tennis Court,Pool,Lake,Brazilian Restaurant,BBQ Joint,Diner,Restaurant,Volleyball Court,Gym,Social Club
3,Bosque das Palmeiras,-22.793723,-47.038488,1,Pizza Place,Fishing Spot,Pet Store,Yoga Studio,Drugstore,Flea Market,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant
4,Bosque/ Jd. Proenca,-22.915268,-47.043093,1,Bar,Japanese Restaurant,Health & Beauty Service,Plaza,Park,Restaurant,Drugstore,Campground,Snack Place,Bakery


In [32]:
campinas_merged.sort_values(by='Cluster Labels') 

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Cambui,-22.894652,-47.052444,0,Japanese Restaurant,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Bar,Pizza Place,Brazilian Restaurant,Ice Cream Shop,Yoga Studio
0,Aeroporto de Viracopos,-23.017966,-47.142582,1,Airport Terminal,Rental Car Location,Yoga Studio,Drugstore,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm,Empanada Restaurant
51,Sao Femando/ V.Orozimbo Maia/ Carlos Lourenco,-22.923972,-47.025517,1,Plaza,Ice Cream Shop,Café,Bakery,Salad Place,Deli / Bodega,Dumpling Restaurant,Fishing Spot,Fast Food Restaurant,Farmers Market
50,Recanto dos Dourados,-22.789224,-46.996815,1,Farm,Yoga Studio,Food & Drink Shop,Flower Shop,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Empanada Restaurant,Electronics Store
49,Real Parque,-22.830889,-47.100153,1,Gym / Fitness Center,Gym,School,Brazilian Restaurant,Bakery,Yoga Studio,Dumpling Restaurant,Flea Market,Fishing Spot,Fast Food Restaurant
48,Pq.Industrial/ Sao Bernardo,-22.919642,-47.080356,1,Brazilian Restaurant,Dessert Shop,Pharmacy,Bagel Shop,Seafood Restaurant,Sandwich Place,Market,Supermarket,Sushi Restaurant,Farmers Market
47,Pq. das Universidades/ Santa Candida,-22.835862,-47.050505,1,Bar,Soccer Field,Deli / Bodega,Brazilian Restaurant,College Cafeteria,Burger Joint,Empanada Restaurant,Flower Shop,Flea Market,Fishing Spot
46,Pq. Xangrila,-22.801838,-47.018790,1,Bakery,IT Services,Yoga Studio,Dumpling Restaurant,Flower Shop,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm
45,Pq. Valenca/Pq. Itajai,-22.952906,-47.196391,1,Dumpling Restaurant,Ice Cream Shop,Arts & Crafts Store,Yoga Studio,Flower Shop,Flea Market,Fishing Spot,Fast Food Restaurant,Farmers Market,Farm
44,Pq. Sao Quirino,-22.861827,-47.034831,1,Fast Food Restaurant,Farmers Market,Video Store,Steakhouse,Bakery,Yoga Studio,Dumpling Restaurant,Flower Shop,Flea Market,Fishing Spot


Showing the clusters in the map using different colors for each cluster

In [33]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(campinas_merged['Neighborhood Latitude'], campinas_merged['Neighborhood Longitude'], campinas_merged['Neighborhood'], campinas_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Selecting only the venues with the categories related to gyms

In [34]:
gym = []
gym = campinas_grouped[['Neighborhood','Gym / Fitness Center', 'Gym']]
gym['Total Gyms'] = gym['Gym / Fitness Center'] + gym['Gym']
gym = campinas_neighborhoods.merge(gym, on='Neighborhood')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Ensuring that the values are floats in order to calculate the ratio population/gym

In [35]:
gym['Population'] = gym['Population'].astype(float)
gym['Total Gyms'] = gym['Total Gyms'].astype(float)
gym['Pop/gym'] = gym['Population'] / gym['Total Gyms']

Removing infinite and NaN values

In [36]:
gym = gym.loc[(gym['Pop/gym'] != np.inf) & (gym['Pop/gym'] != 'NaN') & (gym['Pop/gym'] != np.nan)]
gym = gym.sort_values(by='Pop/gym', ascending=False)
gym.head()

  result = method(y)


Unnamed: 0,Neighborhood,Population,Density,Latitude,Longitude,Gym / Fitness Center,Gym,Total Gyms,Pop/gym
69,Jd. Santa Lucia/ V. Uniao/ Jd. do Lago,85844.0,0.0,-22.945369,-47.104134,1,0,1.0,85844.0
58,Centro,34961.0,7.578698,-22.902441,-47.061105,0,1,1.0,34961.0
59,Jd. Eulina/ Jd. Chapadao/ Bonfim,31037.0,4.447642,-22.893458,-47.094633,0,1,1.0,31037.0
6,Swift/Jd.Sao Vicente/Jd.Esmeraldina,25858.0,64.858324,-22.937642,-47.0176,1,0,1.0,25858.0
25,V. Costa e Silva/ Primavera/ Pq. Taquaral,23896.0,36.744077,-22.864328,-47.056631,0,1,1.0,23896.0


Ploting the results of the ratio population/gym as a choropleth map

In [43]:
map2 = folium.Map(location=[latitude, longitude], zoom_start=11)


# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
map2.choropleth(
    geo_data=geo_file,
    data=gym,
    columns=['Neighborhood', 'Pop/gym'],
    key_on='feature.properties.denominaca',
    fill_color='BuPu', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Population/gym'
)
folium.LayerControl().add_to(map2)
# display map
map2



## Conclusion

The neighborhood Jd. Santa Lucia/V. Uniao/Jd. do Lago is clearly the best place to place the new branch due its high population that is served by only one fitness center. All the other neighborhoods have less than half of the population per gym ratio, which represents a much lower potential for the new business.
