<h1>Fernando's New Business</h1>
<h2> Description</h2>

Fernando is a mexican salesman, he has been working at a tec company for the last 30 years. He is finally retiring and wants to start a business with his savings. He doesn't know what kind of business he wants to run but he knows where he wants to open it, in Mexico City's wealthiest borough: Cuauhtemoc.

Fernando asked a Data Scientist for help. He wants him to find using data what kind of business to open and where he can open it. He wants little or no competition so he can earn big amounts of money so the data scientist will have to find the less common tipes of venues/businesses in each neighborhood.

<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="#Section_1"> Gathering the Data </a></li>
     <li><a href="#Section_2">Preparing the Data</a> </li>
     <li><a href="#Section_3">Obtaining the Venues</a></li>
     <li><a href="#Section_4">Finding out the least type of Venue per Neighborhood</a></li>
     <li><a href="#Section_5">Finding out the Where to Open the Business </a></li>
</div>

<hr>

<h2 id="section1"> Gathering the Data </h2>

The data scientist will get the neighborhood info from Foursquare, but for that he needs the list of neighborhoods inside the Cuauhtemoc borough. 

Fortunately, Mexico City's has a public database about the list of neighborhoods in the city. The information is puclic and esay to download in a csv file.

First import the necessary libraries for the project.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import requests
import matplotlib.pyplot as plt
import matplotlib.cm as cm 
import matplotlib.colors as colors
import json
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import folium

Then load the csv file and save it as a dataframe and translate the columns names.

The data is located in the following webpage.
https://datos.cdmx.gob.mx/dataset/coloniascdmx/resource/03368e1e-f05e-4bea-ac17-58fc650f6fee

In [2]:
mex_neigh = pd.read_csv("coloniascdmx.csv")
mex_neigh.rename(columns = {'nombre':'Neighborhood','alcaldia':'Borough'},inplace=True)
mex_neigh.head()

Unnamed: 0,id,Neighborhood,entidad,geo_point_2d,geo_shape,cve_alc,Borough,cve_col,secc_com,secc_par
0,0,LOMAS DE CHAPULTEPEC,9.0,"19.4228411174,-99.2157935754","{""type"": ""Polygon"", ""coordinates"": [[[-99.2201...",16,MIGUEL HIDALGO,16-042,"4924, 4931, 4932, 4935, 4936, 4940, 4987","4923, 4937, 4938, 4939, 4942"
1,1,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),9.0,"19.4106158914,-99.2262487268","{""type"": ""Polygon"", ""coordinates"": [[[-99.2296...",16,MIGUEL HIDALGO,16-044,4963,4964
2,2,DEL BOSQUE (POLANCO),9.0,"19.4342189235,-99.2094037513","{""type"": ""Polygon"", ""coordinates"": [[[-99.2082...",16,MIGUEL HIDALGO,16-026,,"4918, 4919"
3,3,PEDREGAL DE SANTA URSULA I,9.0,"19.314862237,-99.1477954505","{""type"": ""Polygon"", ""coordinates"": [[[-99.1458...",3,COYOACAN,03-135,"433, 500, 431, 513, 501","424, 425, 426, 430, 499"
4,4,AJUSCO I,9.0,"19.324571116,-99.1561602234","{""type"": ""Polygon"", ""coordinates"": [[[-99.1585...",3,COYOACAN,03-128,"376, 377, 378, 379, 404, 493, 498",374


The mex_neigh dataframe has all the neighborhoods from the city. The column Borough shows the neighborhood's borough and the geo_point_2d shows each neighborhood's coordinates. The rest of the columns are not useful so we can drop them.

<h2 id="section2"> Preparing the Data </h2>

The only thing that we need on this projects are the list of neighborhoods and their coordinates so the first step is to drop the columns that not useful for us.

In [3]:
mex_neigh.drop(columns=['id','entidad','geo_shape','cve_alc','cve_col','secc_com','secc_par'],axis=1,inplace=True)
mex_neigh.head()

Unnamed: 0,Neighborhood,geo_point_2d,Borough
0,LOMAS DE CHAPULTEPEC,"19.4228411174,-99.2157935754",MIGUEL HIDALGO
1,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),"19.4106158914,-99.2262487268",MIGUEL HIDALGO
2,DEL BOSQUE (POLANCO),"19.4342189235,-99.2094037513",MIGUEL HIDALGO
3,PEDREGAL DE SANTA URSULA I,"19.314862237,-99.1477954505",COYOACAN
4,AJUSCO I,"19.324571116,-99.1561602234",COYOACAN


The column 'geo_point_2d' has the coordinates of each neighborhood. The next step is to split the coordinates and save it in Latitude and Longitude coordinates and drop the geo_point_2d column.

In [4]:
mex_split = mex_neigh['geo_point_2d'].str.split(',',expand=True)
mex_neigh['Latitude']=mex_split[0]
mex_neigh['Longitude']=mex_split[1]
mex_neigh.drop(columns=['geo_point_2d'],axis=1,inplace=True)
mex_neigh.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,LOMAS DE CHAPULTEPEC,MIGUEL HIDALGO,19.4228411174,-99.2157935754
1,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),MIGUEL HIDALGO,19.4106158914,-99.2262487268
2,DEL BOSQUE (POLANCO),MIGUEL HIDALGO,19.4342189235,-99.2094037513
3,PEDREGAL DE SANTA URSULA I,COYOACAN,19.314862237,-99.1477954505
4,AJUSCO I,COYOACAN,19.324571116,-99.1561602234


Fernando wants to open his bussiness in the Cuauhtemoc borough so let's create a new dataframe called cuau_neigh that has all the neighborhoods from the Cuauhtemoc borough.

In [5]:
mex_cuau = mex_neigh[mex_neigh['Borough']=='CUAUHTEMOC'].reset_index(drop=True)
mex_cuau.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,TABACALERA,CUAUHTEMOC,19.4357759781,-99.1539492806
1,CENTRO VII,CUAUHTEMOC,19.4302248036,-99.1281413675
2,GUERRERO I,CUAUHTEMOC,19.4490761845,-99.1437494279
3,NONOALCO-TLATELOLCO (U HAB) II,CUAUHTEMOC,19.4533147946,-99.1417694775
4,JUAREZ,CUAUHTEMOC,19.4270038256,-99.1616054122


Check if there are any non denifed values in the Latitude and Longitude columns.

In [6]:
mex_cuau[mex_cuau.Latitude.isnull()]

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
13,MAZA,CUAUHTEMOC,,


The zip code for the Maza Neighborhood is 06270. The pgeocode library finds the location of a neighborhood based on its zip code. Find the zip code for the Maza neighoborhood and replace the unknown coordinates.

In [7]:
import pgeocode
nomi = pgeocode.Nominatim('mx')
Maza = nomi.query_postal_code("06270")
mex_cuau['Latitude']=mex_cuau['Latitude'].replace(np.nan,Maza.latitude)
mex_cuau['Longitude'] =mex_cuau['Longitude'].replace(np.nan,Maza.longitude)
mex_cuau

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,TABACALERA,CUAUHTEMOC,19.4357759781,-99.1539492806
1,CENTRO VII,CUAUHTEMOC,19.4302248036,-99.1281413675
2,GUERRERO I,CUAUHTEMOC,19.4490761845,-99.1437494279
3,NONOALCO-TLATELOLCO (U HAB) II,CUAUHTEMOC,19.4533147946,-99.1417694775
4,JUAREZ,CUAUHTEMOC,19.4270038256,-99.1616054122
5,SANTA MARIA (U HAB),CUAUHTEMOC,19.4564342667,-99.157053889
6,CENTRO II,CUAUHTEMOC,19.4398500953,-99.1285178964
7,ROMA NORTE I,CUAUHTEMOC,19.4194185761,-99.1691619817
8,CENTRO IV,CUAUHTEMOC,19.4336362466,-99.1360300552
9,ROMA SUR I,CUAUHTEMOC,19.4088498024,-99.1613175937


Using geopy we can verify that all the neighborhoods on the dataframe are located on the Cuauhtemoc borough.

In [8]:
from geopy.geocoders import Nominatim
# create map
address = 'Centro, Mexico City'

geolocator = Nominatim(user_agent="mex_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of the borough Cuauhtemoc in Mexico City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of the borough Cuauhtemoc in Mexico City are 19.4065152, -99.1550183.


In [9]:
import folium
# create map of Mxico City using latitude and longitude values
map_mex = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(mex_cuau['Latitude'], mex_cuau['Longitude'], mex_cuau['Borough'], mex_cuau['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mex)  
    
map_mex

With this data, the data scientist now can find the venues using foursquare.

<h2 id="section3"> Obtaining the Venues </h2>

The next step is to find out what kind of venues are located in the Cuahtemoc borough and where exactly are they located. In order to know that we can use the Foursquare API. First we have to geo our Foursquare credentials 

In [10]:
CLIENT_ID = 'PWJUI1C335YAEP5EZ2U0PR1K4AV5Z51BFC1LIOG5OGFCUXGC' # your Foursquare ID
CLIENT_SECRET = 'PM2FU40TUSHEIZ20ZOVOGC3GY4M2BK3SAJZ5EKNOEUV05YUV' # your Foursquare Secret
ACCESS_TOKEN = 'FEXV1SH3YRZIGPV1C3QA2KB21XRF510AALHTRBAHMWFS0L5P' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PWJUI1C335YAEP5EZ2U0PR1K4AV5Z51BFC1LIOG5OGFCUXGC
CLIENT_SECRET:PM2FU40TUSHEIZ20ZOVOGC3GY4M2BK3SAJZ5EKNOEUV05YUV


With the credentials ready, we must find all the venues inside the Cuauhtemoc Borough, I will use the getNearbyvenues function that was defined in this course. Also, I need to impor the requests library to get the venues.

In [11]:
import requests
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(\
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

I will get the venues for each neighboorhod within the Cuauhtemoc borough and its coordinates and save them in a dataframe called cuau_venues.

In [12]:
cuau_venues = getNearbyVenues(mex_cuau['Neighborhood'],
                              mex_cuau['Latitude'],
                              mex_cuau['Longitude'])
cuau_venues.head()
print(cuau_venues.shape)
print("The are "+str(len(cuau_venues["Venue Category"].unique())) +" venue categories.")

(1726, 7)
The are 208 venue categories.


In conclusion, there are 1726 venues and 208 types of venue in the Cuahtemoc borough. 

<h2 id="section4"> Finding out the least type of Venue per Neighborhood </h2>

To help Fernando to decide what kind of bussiness he wants to open, is convenienly to look what types of venues are more common in each neigboorhood. So the first step is to find out the category of each venue. We can use the get_dummies function on Venue_Category column.

In [13]:
cuau_one_hot = pd.get_dummies(cuau_venues[["Venue Category"]])
cuau_one_hot["Neighborhood"] = cuau_venues["Neighborhood"]
# move neighborhood column to the first column
fixed_columns = [cuau_one_hot.columns[-1]] + list(cuau_one_hot.columns[:-1])
cuau_one_hot = cuau_one_hot[fixed_columns]
cuau_one_hot.head()

Unnamed: 0,Neighborhood,Venue Category_Advertising Agency,Venue Category_American Restaurant,Venue Category_Antique Shop,Venue Category_Arcade,Venue Category_Argentinian Restaurant,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Arts & Crafts Store,Venue Category_Arts & Entertainment,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_Auto Garage,Venue Category_Auto Workshop,Venue Category_BBQ Joint,Venue Category_Bakery,Venue Category_Bar,Venue Category_Basketball Court,Venue Category_Bed & Breakfast,Venue Category_Beer Bar,Venue Category_Beer Garden,Venue Category_Big Box Store,Venue Category_Bistro,Venue Category_Bookstore,Venue Category_Boutique,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Bridal Shop,Venue Category_Bridge,Venue Category_Bubble Tea Shop,Venue Category_Building,Venue Category_Burger Joint,Venue Category_Burrito Place,Venue Category_Bus Station,Venue Category_Cafeteria,Venue Category_Café,Venue Category_Camera Store,Venue Category_Candy Store,Venue Category_Casino,Venue Category_Cheese Shop,Venue Category_Chinese Restaurant,Venue Category_Chocolate Shop,Venue Category_Church,Venue Category_Clothing Store,Venue Category_Cocktail Bar,Venue Category_Coffee Shop,Venue Category_College Administrative Building,Venue Category_Comfort Food Restaurant,Venue Category_Concert Hall,Venue Category_Convenience Store,Venue Category_Cosmetics Shop,Venue Category_Coworking Space,Venue Category_Creperie,Venue Category_Cuban Restaurant,Venue Category_Cupcake Shop,Venue Category_Dance Studio,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Diner,Venue Category_Dive Bar,Venue Category_Dog Run,Venue Category_Donut Shop,Venue Category_Dry Cleaner,Venue Category_Escape Room,Venue Category_Event Space,Venue Category_Exhibit,Venue Category_Eye Doctor,Venue Category_Factory,Venue Category_Falafel Restaurant,Venue Category_Farmers Market,Venue Category_Fast Food Restaurant,Venue Category_Film Studio,Venue Category_Flea Market,Venue Category_Food,Venue Category_Food & Drink Shop,Venue Category_Food Court,Venue Category_Food Stand,Venue Category_Food Truck,Venue Category_French Restaurant,Venue Category_Fried Chicken Joint,Venue Category_Frozen Yogurt Shop,Venue Category_Fruit & Vegetable Store,Venue Category_Furniture / Home Store,Venue Category_Gaming Cafe,Venue Category_Garden,Venue Category_Gastropub,Venue Category_General College & University,Venue Category_General Entertainment,Venue Category_German Restaurant,Venue Category_Gift Shop,Venue Category_Gourmet Shop,Venue Category_Greek Restaurant,Venue Category_Grocery Store,Venue Category_Gym,Venue Category_Gym / Fitness Center,Venue Category_Gymnastics Gym,Venue Category_Health & Beauty Service,Venue Category_Health Food Store,Venue Category_Herbs & Spices Store,Venue Category_Historic Site,Venue Category_History Museum,Venue Category_Hostel,Venue Category_Hotel,Venue Category_Hotel Bar,Venue Category_Ice Cream Shop,Venue Category_Indie Movie Theater,Venue Category_Indie Theater,Venue Category_Indoor Play Area,Venue Category_Italian Restaurant,Venue Category_Japanese Restaurant,Venue Category_Jazz Club,Venue Category_Jewelry Store,Venue Category_Juice Bar,Venue Category_Korean Restaurant,Venue Category_Latin American Restaurant,Venue Category_Laundromat,Venue Category_Library,Venue Category_Lingerie Store,Venue Category_Liquor Store,Venue Category_Lounge,Venue Category_Luggage Store,Venue Category_Market,Venue Category_Martial Arts School,Venue Category_Mediterranean Restaurant,Venue Category_Memorial Site,Venue Category_Men's Store,Venue Category_Mexican Restaurant,Venue Category_Middle Eastern Restaurant,Venue Category_Miscellaneous Shop,Venue Category_Monument / Landmark,Venue Category_Motorcycle Shop,Venue Category_Movie Theater,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Non-Profit,Venue Category_North Indian Restaurant,Venue Category_Optical Shop,Venue Category_Other Great Outdoors,Venue Category_Paper / Office Supplies Store,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Performing Arts Venue,Venue Category_Perfume Shop,Venue Category_Peruvian Restaurant,Venue Category_Pet Service,Venue Category_Pet Store,Venue Category_Pharmacy,Venue Category_Photography Studio,Venue Category_Pie Shop,Venue Category_Pizza Place,Venue Category_Playground,Venue Category_Plaza,Venue Category_Pool,Venue Category_Pool Hall,Venue Category_Print Shop,Venue Category_Pub,Venue Category_Public Art,Venue Category_Ramen Restaurant,Venue Category_Record Shop,Venue Category_Recording Studio,Venue Category_Restaurant,Venue Category_Rock Club,Venue Category_Roof Deck,Venue Category_Russian Restaurant,Venue Category_Salad Place,Venue Category_Salon / Barbershop,Venue Category_Salsa Club,Venue Category_Sandwich Place,Venue Category_Scenic Lookout,Venue Category_Science Museum,Venue Category_Sculpture Garden,Venue Category_Seafood Restaurant,Venue Category_Shoe Store,Venue Category_Shopping Mall,Venue Category_Snack Place,Venue Category_Soccer Field,Venue Category_Soccer Stadium,Venue Category_Southern / Soul Food Restaurant,Venue Category_Spa,Venue Category_Spanish Restaurant,Venue Category_Speakeasy,Venue Category_Sporting Goods Shop,Venue Category_Sports Club,Venue Category_Stadium,Venue Category_Stationery Store,Venue Category_Steakhouse,Venue Category_Supermarket,Venue Category_Sushi Restaurant,Venue Category_Taco Place,Venue Category_Tapas Restaurant,Venue Category_Tattoo Parlor,Venue Category_Tea Room,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Thrift / Vintage Store,Venue Category_Toy / Game Store,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Warehouse Store,Venue Category_Water Park,Venue Category_Wine Bar,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio,Venue Category_Yucatecan Restaurant
0,TABACALERA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,TABACALERA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,TABACALERA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,TABACALERA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,TABACALERA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now, let's group the dataframe by neighborhood and take the mean of the frequency. 

In [14]:
cuau_group = cuau_one_hot.groupby("Neighborhood").mean().reset_index()
print(cuau_group.shape)
cuau_group.head()

(64, 209)


Unnamed: 0,Neighborhood,Venue Category_Advertising Agency,Venue Category_American Restaurant,Venue Category_Antique Shop,Venue Category_Arcade,Venue Category_Argentinian Restaurant,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Arts & Crafts Store,Venue Category_Arts & Entertainment,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_Auto Garage,Venue Category_Auto Workshop,Venue Category_BBQ Joint,Venue Category_Bakery,Venue Category_Bar,Venue Category_Basketball Court,Venue Category_Bed & Breakfast,Venue Category_Beer Bar,Venue Category_Beer Garden,Venue Category_Big Box Store,Venue Category_Bistro,Venue Category_Bookstore,Venue Category_Boutique,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Bridal Shop,Venue Category_Bridge,Venue Category_Bubble Tea Shop,Venue Category_Building,Venue Category_Burger Joint,Venue Category_Burrito Place,Venue Category_Bus Station,Venue Category_Cafeteria,Venue Category_Café,Venue Category_Camera Store,Venue Category_Candy Store,Venue Category_Casino,Venue Category_Cheese Shop,Venue Category_Chinese Restaurant,Venue Category_Chocolate Shop,Venue Category_Church,Venue Category_Clothing Store,Venue Category_Cocktail Bar,Venue Category_Coffee Shop,Venue Category_College Administrative Building,Venue Category_Comfort Food Restaurant,Venue Category_Concert Hall,Venue Category_Convenience Store,Venue Category_Cosmetics Shop,Venue Category_Coworking Space,Venue Category_Creperie,Venue Category_Cuban Restaurant,Venue Category_Cupcake Shop,Venue Category_Dance Studio,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Diner,Venue Category_Dive Bar,Venue Category_Dog Run,Venue Category_Donut Shop,Venue Category_Dry Cleaner,Venue Category_Escape Room,Venue Category_Event Space,Venue Category_Exhibit,Venue Category_Eye Doctor,Venue Category_Factory,Venue Category_Falafel Restaurant,Venue Category_Farmers Market,Venue Category_Fast Food Restaurant,Venue Category_Film Studio,Venue Category_Flea Market,Venue Category_Food,Venue Category_Food & Drink Shop,Venue Category_Food Court,Venue Category_Food Stand,Venue Category_Food Truck,Venue Category_French Restaurant,Venue Category_Fried Chicken Joint,Venue Category_Frozen Yogurt Shop,Venue Category_Fruit & Vegetable Store,Venue Category_Furniture / Home Store,Venue Category_Gaming Cafe,Venue Category_Garden,Venue Category_Gastropub,Venue Category_General College & University,Venue Category_General Entertainment,Venue Category_German Restaurant,Venue Category_Gift Shop,Venue Category_Gourmet Shop,Venue Category_Greek Restaurant,Venue Category_Grocery Store,Venue Category_Gym,Venue Category_Gym / Fitness Center,Venue Category_Gymnastics Gym,Venue Category_Health & Beauty Service,Venue Category_Health Food Store,Venue Category_Herbs & Spices Store,Venue Category_Historic Site,Venue Category_History Museum,Venue Category_Hostel,Venue Category_Hotel,Venue Category_Hotel Bar,Venue Category_Ice Cream Shop,Venue Category_Indie Movie Theater,Venue Category_Indie Theater,Venue Category_Indoor Play Area,Venue Category_Italian Restaurant,Venue Category_Japanese Restaurant,Venue Category_Jazz Club,Venue Category_Jewelry Store,Venue Category_Juice Bar,Venue Category_Korean Restaurant,Venue Category_Latin American Restaurant,Venue Category_Laundromat,Venue Category_Library,Venue Category_Lingerie Store,Venue Category_Liquor Store,Venue Category_Lounge,Venue Category_Luggage Store,Venue Category_Market,Venue Category_Martial Arts School,Venue Category_Mediterranean Restaurant,Venue Category_Memorial Site,Venue Category_Men's Store,Venue Category_Mexican Restaurant,Venue Category_Middle Eastern Restaurant,Venue Category_Miscellaneous Shop,Venue Category_Monument / Landmark,Venue Category_Motorcycle Shop,Venue Category_Movie Theater,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Non-Profit,Venue Category_North Indian Restaurant,Venue Category_Optical Shop,Venue Category_Other Great Outdoors,Venue Category_Paper / Office Supplies Store,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Performing Arts Venue,Venue Category_Perfume Shop,Venue Category_Peruvian Restaurant,Venue Category_Pet Service,Venue Category_Pet Store,Venue Category_Pharmacy,Venue Category_Photography Studio,Venue Category_Pie Shop,Venue Category_Pizza Place,Venue Category_Playground,Venue Category_Plaza,Venue Category_Pool,Venue Category_Pool Hall,Venue Category_Print Shop,Venue Category_Pub,Venue Category_Public Art,Venue Category_Ramen Restaurant,Venue Category_Record Shop,Venue Category_Recording Studio,Venue Category_Restaurant,Venue Category_Rock Club,Venue Category_Roof Deck,Venue Category_Russian Restaurant,Venue Category_Salad Place,Venue Category_Salon / Barbershop,Venue Category_Salsa Club,Venue Category_Sandwich Place,Venue Category_Scenic Lookout,Venue Category_Science Museum,Venue Category_Sculpture Garden,Venue Category_Seafood Restaurant,Venue Category_Shoe Store,Venue Category_Shopping Mall,Venue Category_Snack Place,Venue Category_Soccer Field,Venue Category_Soccer Stadium,Venue Category_Southern / Soul Food Restaurant,Venue Category_Spa,Venue Category_Spanish Restaurant,Venue Category_Speakeasy,Venue Category_Sporting Goods Shop,Venue Category_Sports Club,Venue Category_Stadium,Venue Category_Stationery Store,Venue Category_Steakhouse,Venue Category_Supermarket,Venue Category_Sushi Restaurant,Venue Category_Taco Place,Venue Category_Tapas Restaurant,Venue Category_Tattoo Parlor,Venue Category_Tea Room,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Thrift / Vintage Store,Venue Category_Toy / Game Store,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Warehouse Store,Venue Category_Water Park,Venue Category_Wine Bar,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio,Venue Category_Yucatecan Restaurant
0,ALGARIN,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.26087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.173913,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ASTURIAS,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.166667,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,ASTURIAS (AMPL),0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,ATLAMPA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BUENAVISTA I,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now, using the next two functions we can see the most and least common venues for each neighborhood.

In [15]:
def most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

def least_common_venues(row, num_least_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=True)
    
    return row_categories_sorted.index.values[0:num_least_venues]

First we are going to look at 20 the most common venues.

In [17]:
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns_top = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns_top.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns_top.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_common_venues_sorted = pd.DataFrame(columns=columns_top)
neighborhoods_common_venues_sorted['Neighborhood'] = cuau_group['Neighborhood']

for ind in np.arange(cuau_group.shape[0]):
    neighborhoods_common_venues_sorted.iloc[ind, 1:] = most_common_venues(cuau_group.iloc[ind, :], num_top_venues)
neighborhoods_common_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,ALGARIN,Venue Category_Mexican Restaurant,Venue Category_Taco Place,Venue Category_Bakery,Venue Category_Clothing Store,Venue Category_Advertising Agency,Venue Category_Print Shop,Venue Category_Coffee Shop,Venue Category_Food Truck,Venue Category_Bed & Breakfast,Venue Category_Steakhouse,Venue Category_Bar,Venue Category_Brewery,Venue Category_Gym / Fitness Center,Venue Category_Gift Shop,Venue Category_Escape Room,Venue Category_Film Studio,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market,Venue Category_Falafel Restaurant,Venue Category_Factory
1,ASTURIAS,Venue Category_Bakery,Venue Category_Mexican Restaurant,Venue Category_Ice Cream Shop,Venue Category_Juice Bar,Venue Category_Café,Venue Category_Gift Shop,Venue Category_Bar,Venue Category_Taco Place,Venue Category_BBQ Joint,Venue Category_Cupcake Shop,Venue Category_Burger Joint,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Dessert Shop,Venue Category_Argentinian Restaurant,Venue Category_Falafel Restaurant,Venue Category_Food Court,Venue Category_Department Store,Venue Category_Food & Drink Shop,Venue Category_Food,Venue Category_Flea Market
2,ASTURIAS (AMPL),Venue Category_Taco Place,Venue Category_Mexican Restaurant,Venue Category_Ice Cream Shop,Venue Category_Park,Venue Category_Diner,Venue Category_Bar,Venue Category_Hotel,Venue Category_Gift Shop,Venue Category_Café,Venue Category_Flea Market,Venue Category_Restaurant,Venue Category_Dessert Shop,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Argentinian Restaurant,Venue Category_Burrito Place,Venue Category_Eye Doctor,Venue Category_Film Studio,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market
3,ATLAMPA,Venue Category_Mexican Restaurant,Venue Category_Optical Shop,Venue Category_Photography Studio,Venue Category_Convenience Store,Venue Category_Pharmacy,Venue Category_Gym / Fitness Center,Venue Category_Nightclub,Venue Category_Yucatecan Restaurant,Venue Category_Film Studio,Venue Category_Fast Food Restaurant,Venue Category_Farmers Market,Venue Category_Falafel Restaurant,Venue Category_Factory,Venue Category_Eye Doctor,Venue Category_Exhibit,Venue Category_Dry Cleaner,Venue Category_Event Space,Venue Category_Escape Room,Venue Category_Food,Venue Category_Donut Shop
4,BUENAVISTA I,Venue Category_Movie Theater,Venue Category_Furniture / Home Store,Venue Category_Brewery,Venue Category_Ice Cream Shop,Venue Category_Boutique,Venue Category_Clothing Store,Venue Category_Frozen Yogurt Shop,Venue Category_Sporting Goods Shop,Venue Category_Lingerie Store,Venue Category_Shopping Mall,Venue Category_Garden,Venue Category_Luggage Store,Venue Category_Bubble Tea Shop,Venue Category_Rock Club,Venue Category_Candy Store,Venue Category_Flea Market,Venue Category_Salad Place,Venue Category_Mexican Restaurant,Venue Category_American Restaurant,Venue Category_Donut Shop


For example, giving a quick look we can see that the most common venues are the Mexican restaurants, Taco places, bakeries and ice cream shops. Fernando doesn't want to much competition so he won't be interested in this types of venues. The next step is to look at the least common venues. 

In [19]:
num_least_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns_least = ['Neighborhood']
for ind in np.arange(num_least_venues):
    try:
        columns_least.append('{}{} Least Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns_least.append('{}th Least Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_least_venues_sorted = pd.DataFrame(columns=columns_least)
neighborhoods_least_venues_sorted['Neighborhood'] = cuau_group['Neighborhood']

for ind in np.arange(cuau_group.shape[0]):
    neighborhoods_least_venues_sorted.iloc[ind, 1:] = least_common_venues(cuau_group.iloc[ind, :], num_top_venues)
neighborhoods_least_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Least Common Venue,2nd Least Common Venue,3rd Least Common Venue,4th Least Common Venue,5th Least Common Venue,6th Least Common Venue,7th Least Common Venue,8th Least Common Venue,9th Least Common Venue,10th Least Common Venue,11th Least Common Venue,12th Least Common Venue,13th Least Common Venue,14th Least Common Venue,15th Least Common Venue,16th Least Common Venue,17th Least Common Venue,18th Least Common Venue,19th Least Common Venue,20th Least Common Venue
0,ALGARIN,Venue Category_Hotel Bar,Venue Category_Movie Theater,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Non-Profit,Venue Category_North Indian Restaurant,Venue Category_Optical Shop,Venue Category_Other Great Outdoors,Venue Category_Paper / Office Supplies Store,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Performing Arts Venue,Venue Category_Perfume Shop,Venue Category_Peruvian Restaurant,Venue Category_Pet Service,Venue Category_Pet Store,Venue Category_Pharmacy
1,ASTURIAS,Venue Category_Advertising Agency,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Non-Profit,Venue Category_North Indian Restaurant,Venue Category_Optical Shop,Venue Category_Other Great Outdoors,Venue Category_Paper / Office Supplies Store,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Performing Arts Venue,Venue Category_Perfume Shop,Venue Category_Peruvian Restaurant,Venue Category_Pet Service,Venue Category_Pet Store,Venue Category_Pharmacy,Venue Category_Photography Studio
2,ASTURIAS (AMPL),Venue Category_Advertising Agency,Venue Category_Motorcycle Shop,Venue Category_Movie Theater,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Non-Profit,Venue Category_North Indian Restaurant,Venue Category_Optical Shop,Venue Category_Other Great Outdoors,Venue Category_Paper / Office Supplies Store,Venue Category_Pedestrian Plaza,Venue Category_Performing Arts Venue,Venue Category_Perfume Shop,Venue Category_Peruvian Restaurant,Venue Category_Pet Service,Venue Category_Pet Store,Venue Category_Pharmacy
3,ATLAMPA,Venue Category_Advertising Agency,Venue Category_Motorcycle Shop,Venue Category_Movie Theater,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_New American Restaurant,Venue Category_Non-Profit,Venue Category_North Indian Restaurant,Venue Category_Other Great Outdoors,Venue Category_Paper / Office Supplies Store,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Performing Arts Venue,Venue Category_Perfume Shop,Venue Category_Peruvian Restaurant,Venue Category_Pet Service,Venue Category_Pet Store,Venue Category_Pie Shop,Venue Category_Pizza Place
4,BUENAVISTA I,Venue Category_Advertising Agency,Venue Category_Monument / Landmark,Venue Category_Motorcycle Shop,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Non-Profit,Venue Category_North Indian Restaurant,Venue Category_Miscellaneous Shop,Venue Category_Other Great Outdoors,Venue Category_Park,Venue Category_Pedestrian Plaza,Venue Category_Performing Arts Venue,Venue Category_Perfume Shop,Venue Category_Peruvian Restaurant,Venue Category_Pet Service,Venue Category_Pet Store,Venue Category_Pharmacy


Doing the same as the last example, Fernando could decide into opening an advertising agecency, a movie theater, a music store or a new american restaurant and he could have little or no competition. This is the type of business that Fernando is looking.

<h2 id="section5"> Finding out the Where to Open the Business </h2>

Let's assume that Fernando has decided for a business to open thanks for the data we gave him. He went to look for a place to open his venue but he finds out that the rent is way to expensive or the space is to small, etc. Fernando has already made his mind and doesn't want to look for a new type of business, what can we do for him?

Well, using ML we can make clusters of neighborhood with similar properties, like having similar least common venues. So we are going to use k-means clustering to find similar neighborhoods where Fernando can look around for a place to rent and still have little to no competition. In this prohect we are going to make 15 clusters.

In [20]:
from sklearn.cluster import KMeans
kclusters = 15

cuau_cluster = cuau_group.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cuau_cluster)

We have our cluster so now we are going to update our least common venues dataframe.

In [21]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

cuau_final = mex_cuau
cuau_final = cuau_final.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
cuau_final 

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude,Cluster Labels,1st Least Common Venue,2nd Least Common Venue,3rd Least Common Venue,4th Least Common Venue,5th Least Common Venue,6th Least Common Venue,7th Least Common Venue,8th Least Common Venue,9th Least Common Venue,10th Least Common Venue,11th Least Common Venue,12th Least Common Venue,13th Least Common Venue,14th Least Common Venue,15th Least Common Venue,16th Least Common Venue,17th Least Common Venue,18th Least Common Venue,19th Least Common Venue,20th Least Common Venue
0,TABACALERA,CUAUHTEMOC,19.4357759781,-99.1539492806,1,,,,,,,,,,,,,,,,,,,,
1,CENTRO VII,CUAUHTEMOC,19.4302248036,-99.1281413675,9,,,,,,,,,,,,,,,,,,,,
2,GUERRERO I,CUAUHTEMOC,19.4490761845,-99.1437494279,5,,,,,,,,,,,,,,,,,,,,
3,NONOALCO-TLATELOLCO (U HAB) II,CUAUHTEMOC,19.4533147946,-99.1417694775,6,,,,,,,,,,,,,,,,,,,,
4,JUAREZ,CUAUHTEMOC,19.4270038256,-99.1616054122,7,,,,,,,,,,,,,,,,,,,,
5,SANTA MARIA (U HAB),CUAUHTEMOC,19.4564342667,-99.157053889,12,,,,,,,,,,,,,,,,,,,,
6,CENTRO II,CUAUHTEMOC,19.4398500953,-99.1285178964,9,,,,,,,,,,,,,,,,,,,,
7,ROMA NORTE I,CUAUHTEMOC,19.4194185761,-99.1691619817,7,,,,,,,,,,,,,,,,,,,,
8,CENTRO IV,CUAUHTEMOC,19.4336362466,-99.1360300552,1,,,,,,,,,,,,,,,,,,,,
9,ROMA SUR I,CUAUHTEMOC,19.4088498024,-99.1613175937,13,,,,,,,,,,,,,,,,,,,,


For the final step, to make thinfgs easier for Fernando, we are going to show all the clister into the Cuahtemoc map so he can find the similar neighborghoods easier.

In [22]:
import matplotlib.pyplot as plt
import matplotlib.cm as cm 
import matplotlib.colors as colors
# create map
address = 'Centro,Mexico City'

geolocator = Nominatim(user_agent="mex_explorer")
location = geolocator.geocode(address)
lat = location.latitude
long = location.longitude
map_clusters = folium.Map(location=[lat, long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cuau_final['Latitude'], cuau_final['Longitude'], cuau_final['Neighborhood'], cuau_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

With all this information, Fernando have the tools to pick what kind of business he want to open and places where he can open it.