<h1>Fernando's New Business</h1>

<h2> Introduction/Business Problem </h2>

Fernando is a mexican salesman, he has been working at a tec company for the last 30 years. He is finally retiring and wants to start a business with his savings. He doesn't know what kind of business he wants to run but he knows where he wants to open it, in Mexico City's wealthiest borough: Cuauhtemoc.

Fernando asked a Data Scientist for help. He wants him to find using data what kind of business to open and where he can open it. He wants little or no competition so he can earn big amounts of money so the data scientist will have to find the less common tipes of venues/businesses in each neighborhood.

<h2> Data </h2>

The data scientist will get the neighborhood info from Forusquare, but for that he needs the list of neighborhoods inside the Cuauhteoc borough. 

Fortunately, Mexico City's has a public database about the list of neighborhoods in the city. The information is puclic and esay to download in a csv file.

First import the necessary libraries for the project.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np

Then load the csv file and save it as a dataframe and translate the columns names.

In [2]:
mex_neigh = pd.read_csv("coloniascdmx.csv")
mex_neigh.rename(columns = {'nombre':'Neighborhood','alcaldia':'Borough'},inplace=True)
mex_neigh.head()

Unnamed: 0,id,Neighborhood,entidad,geo_point_2d,geo_shape,cve_alc,Borough,cve_col,secc_com,secc_par
0,0,LOMAS DE CHAPULTEPEC,9.0,"19.4228411174,-99.2157935754","{""type"": ""Polygon"", ""coordinates"": [[[-99.2201...",16,MIGUEL HIDALGO,16-042,"4924, 4931, 4932, 4935, 4936, 4940, 4987","4923, 4937, 4938, 4939, 4942"
1,1,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),9.0,"19.4106158914,-99.2262487268","{""type"": ""Polygon"", ""coordinates"": [[[-99.2296...",16,MIGUEL HIDALGO,16-044,4963,4964
2,2,DEL BOSQUE (POLANCO),9.0,"19.4342189235,-99.2094037513","{""type"": ""Polygon"", ""coordinates"": [[[-99.2082...",16,MIGUEL HIDALGO,16-026,,"4918, 4919"
3,3,PEDREGAL DE SANTA URSULA I,9.0,"19.314862237,-99.1477954505","{""type"": ""Polygon"", ""coordinates"": [[[-99.1458...",3,COYOACAN,03-135,"433, 500, 431, 513, 501","424, 425, 426, 430, 499"
4,4,AJUSCO I,9.0,"19.324571116,-99.1561602234","{""type"": ""Polygon"", ""coordinates"": [[[-99.1585...",3,COYOACAN,03-128,"376, 377, 378, 379, 404, 493, 498",374


The mex_neigh dataframe has all the neighborhoods from the city. The column Borough shows the neighborhood's borough and the geo_point_2d shows each neighborhood's coordinates. The rest of the columns are not useful so we can drop them.

In [3]:
mex_neigh.drop(columns=['id','entidad','geo_shape','cve_alc','cve_col','secc_com','secc_par'],axis=1,inplace=True)
mex_neigh.head()

Unnamed: 0,Neighborhood,geo_point_2d,Borough
0,LOMAS DE CHAPULTEPEC,"19.4228411174,-99.2157935754",MIGUEL HIDALGO
1,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),"19.4106158914,-99.2262487268",MIGUEL HIDALGO
2,DEL BOSQUE (POLANCO),"19.4342189235,-99.2094037513",MIGUEL HIDALGO
3,PEDREGAL DE SANTA URSULA I,"19.314862237,-99.1477954505",COYOACAN
4,AJUSCO I,"19.324571116,-99.1561602234",COYOACAN


The column 'geo_point_2d' has the coordinates of each neighborhood. The next step is to split the coordinates and save it in Latitude and Longitude coordinates.

In [4]:
mex_split = mex_neigh['geo_point_2d'].str.split(',',expand=True)
mex_neigh['Latitude']=mex_split[0]
mex_neigh['Longitude']=mex_split[1]
mex_neigh.head()

Unnamed: 0,Neighborhood,geo_point_2d,Borough,Latitude,Longitude
0,LOMAS DE CHAPULTEPEC,"19.4228411174,-99.2157935754",MIGUEL HIDALGO,19.4228411174,-99.2157935754
1,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),"19.4106158914,-99.2262487268",MIGUEL HIDALGO,19.4106158914,-99.2262487268
2,DEL BOSQUE (POLANCO),"19.4342189235,-99.2094037513",MIGUEL HIDALGO,19.4342189235,-99.2094037513
3,PEDREGAL DE SANTA URSULA I,"19.314862237,-99.1477954505",COYOACAN,19.314862237,-99.1477954505
4,AJUSCO I,"19.324571116,-99.1561602234",COYOACAN,19.324571116,-99.1561602234


Drop the geo_pint_2d and create a new dataframe called cuau_neigh that has all the neighborhoods from the Cuauhtemoc Borough.

In [5]:
mex_neigh.drop(columns=['geo_point_2d'],axis=1,inplace=True)
mex_cuau = mex_neigh[mex_neigh['Borough']=='CUAUHTEMOC'].reset_index(drop=True)
mex_cuau.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,TABACALERA,CUAUHTEMOC,19.4357759781,-99.1539492806
1,CENTRO VII,CUAUHTEMOC,19.4302248036,-99.1281413675
2,GUERRERO I,CUAUHTEMOC,19.4490761845,-99.1437494279
3,NONOALCO-TLATELOLCO (U HAB) II,CUAUHTEMOC,19.4533147946,-99.1417694775
4,JUAREZ,CUAUHTEMOC,19.4270038256,-99.1616054122


Check if there are any non denifed values in the Latitude and Longitude columns.

In [6]:
mex_cuau[mex_cuau.Latitude.isnull()]

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
13,MAZA,CUAUHTEMOC,,


The zip code for the Maza Neighborhood is 06270. The pgeocode library finds the location of a neighborhood based on its zip code. Fins the zip code for the Maza neighoborhood and replace the Nan.

In [8]:
import pgeocode
nomi = pgeocode.Nominatim('mx')
Maza = nomi.query_postal_code("06270")
mex_cuau['Latitude']=mex_cuau['Latitude'].replace(np.nan,Maza.latitude)
mex_cuau['Longitude'] =mex_cuau['Longitude'].replace(np.nan,Maza.longitude)
mex_cuau

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,TABACALERA,CUAUHTEMOC,19.4357759781,-99.1539492806
1,CENTRO VII,CUAUHTEMOC,19.4302248036,-99.1281413675
2,GUERRERO I,CUAUHTEMOC,19.4490761845,-99.1437494279
3,NONOALCO-TLATELOLCO (U HAB) II,CUAUHTEMOC,19.4533147946,-99.1417694775
4,JUAREZ,CUAUHTEMOC,19.4270038256,-99.1616054122
5,SANTA MARIA (U HAB),CUAUHTEMOC,19.4564342667,-99.157053889
6,CENTRO II,CUAUHTEMOC,19.4398500953,-99.1285178964
7,ROMA NORTE I,CUAUHTEMOC,19.4194185761,-99.1691619817
8,CENTRO IV,CUAUHTEMOC,19.4336362466,-99.1360300552
9,ROMA SUR I,CUAUHTEMOC,19.4088498024,-99.1613175937


Last, using geopy and folium, locate the Cuauhtemoc neighborhoods in the mexico city map.

In [9]:
from geopy.geocoders import Nominatim
# create map
address = 'Centro, Mexico City'

geolocator = Nominatim(user_agent="mex_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of the borough Cuauhtemoc in Mexico City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of the borough Cuauhtemoc in Mexico City are 19.4065152, -99.1550183.


In [11]:
import folium
# create map of Mxico City using latitude and longitude values
map_mex = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(mex_cuau['Latitude'], mex_cuau['Longitude'], mex_cuau['Borough'], mex_cuau['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mex)  
    
map_mex

With this data, the data scientist now can find the venues using forsquare.