# The Battle of the neighborhoods

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction:  <a name="introduction"></a>

A young adult is in the process of making a decision of where to buy a property to live there but is undecided on in which city and neighborhood should he buy it. He currently wants to compare and have a recommendation about neighborhoods in two cities in South America where he has family: Santiago de Chile, Chile and Cali, Colombia, specifically in the communes of Ñuñoa and #22 respectively.

This kind of problem of deciding where to buy is very important since the price of a property is higher than other kind of purchases and people usually get indebted for many years. The place where you will live for many or some years can determine the time you take to get to work or study, the perceived security, food you can get by foot or in the surroundings and in summary the quality of life.

For this project the focus will be on comparing the neighborhoods of both communes, in the cities mentioned, with a variety of spaces such as markets, gyms, parks that are at least perceived as good and can make a place more worthy to live there.

## Data <a name="data"></a>

Information about neighborhoods about commune 22 in Cali can be found in https://es.wikipedia.org/wiki/Comuna_22_(Cali)
Information about neighborhoods in Ñuñoa, Chile can be found in https://es.wikipedia.org/wiki/%C3%91u%C3%B1oa. Since both links only provide information about the name of the Neighborhoods location data will be retrieved from geocoder library: https://geocoder.readthedocs.io/index.html.

Information for the recommended neighborhoods will be fetched from the Foursquare API using the _explore_ endpoint to get an idea about the nearby venues and later classify it.

>`https://api.foursquare.com/v2/venues/`**explore**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`&limit=`**LIMIT**

and _likes_ endpoint for how users qualify them:
> `https://api.foursquare.com/v2/venues/`**VENUE_ID**`/likes?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&v=`**VERSION**`&limit=`**LIMIT**

In [1]:
import requests # library to handle requests
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#### First get the neighborhoods

In [3]:
from bs4 import BeautifulSoup
cali_communes_text = requests.get('https://es.wikipedia.org/wiki/Comuna_22_(Cali)').text
soup = BeautifulSoup(cali_communes_text, 'html.parser')
c_neighborhoods = [(raw_neighborhood.string) for raw_neighborhood in soup.find_all('ul')[0].find_all('li')]
c_neighborhoods

['Urbanización Ciudad Jardín',
 'Altos de Ciudad de Jardín',
 'La Bocha',
 'Bochalema',
 'Parcelaciones Pance',
 'Urbanización Río Lili',
 'Alférez Real',
 'Ciudad Campestre',
 'Ciudad Pacífica']

In [4]:
nunoa_communes_text = requests.get('https://es.wikipedia.org/wiki/%C3%91u%C3%B1oa').text
soup = BeautifulSoup(nunoa_communes_text, 'html.parser')
n_neighborhoods = list(filter(None,[(raw_neighborhood.string) for raw_neighborhood in soup.find(id='Listado_de_barrios').parent.next_sibling.next_sibling.find_all('li')]))
n_neighborhoods

['Barrio Plaza Ñuñoa',
 'Barrio Diego de Almagro',
 'Barrio Amapolas',
 'Barrio Pucará',
 'Barrio Parque Botánico',
 'Barrio Montenegro',
 'Barrio Los Guindos o Plaza Egaña',
 'Barrio Micalvi',
 'Barrio Parque Juan XXIII',
 'Barrio José Pedro Alessandri',
 'Barrio Regina Pacis',
 'Barrio El Aguilucho',
 'Barrio Villaseca',
 'Barrio Plaza Sucre',
 'Barrio Suboficiales de Caballería',
 'Barrio Guillermo Franke',
 'Barrio Italia',
 'Barrio Pedro de Valvidia',
 'Barrio Plaza Zañartu',
 'Barrio Dr. Luis Bisquert',
 'Barrio Simón Bolívar',
 'Barrio Eusebio Lillo',
 'Barrio Parque del Deporte',
 'Barrio Estadio Nacional',
 "Barrio Bernardo O'Higgins",
 'Barrio Javiera Carrera',
 'Barrio Irarrázaval',
 'Barrio Hernán Cortés',
 'Barrio Suárez Mujica',
 'Barrio Industrial Lo Encalada',
 'Barrio Colo Colo',
 'Barrio Parque San Eugenio',
 'Barrio Empart',
 'Villa Presidente Frei',
 'Villa Los Jardines',
 'Villa Lo Plaza',
 'Villa Los Presidentes',
 'Villa Los Alerces',
 'Villa Olímpica',
 'Villa S

#### Now get locations

In [5]:
import geocoder
c_lats = []
c_lngs = []
for c in c_neighborhoods:
    g = geocoder.arcgis('{}, Cali, Colombia'.format(c))
    c_lats.append(g.latlng[0])
    c_lngs.append(g.latlng[1])
df_c = pd.DataFrame(data={'Commune': '22, Cali, Colombia', 'Neighbourhood': c_neighborhoods, 'Latitude': c_lats, 'Longitude': c_lngs})
df_c.head()
    

Unnamed: 0,Commune,Neighbourhood,Latitude,Longitude
0,"22, Cali, Colombia",Urbanización Ciudad Jardín,3.36348,-76.5355
1,"22, Cali, Colombia",Altos de Ciudad de Jardín,3.42448,-76.51734
2,"22, Cali, Colombia",La Bocha,5.35,-74.71667
3,"22, Cali, Colombia",Bochalema,7.61094,-72.64755
4,"22, Cali, Colombia",Parcelaciones Pance,3.34642,-76.53621


In [6]:
n_lats = []
n_lngs = []
for c in n_neighborhoods:
    g = geocoder.arcgis('{}, Ñuñoa, Santiago de Chile'.format(c))
    n_lats.append(g.latlng[0])
    n_lngs.append(g.latlng[1])
df_n = pd.DataFrame(data={'Commune': 'Ñuñoa, Santiago de Chile', 'Neighbourhood': n_neighborhoods, 'Latitude': n_lats, 'Longitude': n_lngs})
df_n.head()

Unnamed: 0,Commune,Neighbourhood,Latitude,Longitude
0,"Ñuñoa, Santiago de Chile",Barrio Plaza Ñuñoa,-33.461709,-70.585497
1,"Ñuñoa, Santiago de Chile",Barrio Diego de Almagro,-33.437963,-70.579519
2,"Ñuñoa, Santiago de Chile",Barrio Amapolas,-33.440349,-70.573968
3,"Ñuñoa, Santiago de Chile",Barrio Pucará,-33.445045,-70.58255
4,"Ñuñoa, Santiago de Chile",Barrio Parque Botánico,-33.45521,-70.59388


In [25]:
df = pd.concat([df_c, df_n], ignore_index=True, sort=False)
print(df.shape)
df.head(10)

(53, 4)


Unnamed: 0,Commune,Neighbourhood,Latitude,Longitude
0,"22, Cali, Colombia",Urbanización Ciudad Jardín,3.36348,-76.5355
1,"22, Cali, Colombia",Altos de Ciudad de Jardín,3.42448,-76.51734
2,"22, Cali, Colombia",La Bocha,5.35,-74.71667
3,"22, Cali, Colombia",Bochalema,7.61094,-72.64755
4,"22, Cali, Colombia",Parcelaciones Pance,3.34642,-76.53621
5,"22, Cali, Colombia",Urbanización Río Lili,3.36501,-76.52762
6,"22, Cali, Colombia",Alférez Real,3.38997,-76.54797
7,"22, Cali, Colombia",Ciudad Campestre,3.3699,-76.53945
8,"22, Cali, Colombia",Ciudad Pacífica,3.4576,-76.53554
9,"Ñuñoa, Santiago de Chile",Barrio Plaza Ñuñoa,-33.461709,-70.585497


#### Obtaining the venues and their likes

In [17]:
CLIENT_ID = '1CG02IBFH3BMIUN3FX43OFI3KMTM2FD00ESGHSGP23ODY2M2' # your Foursquare ID
CLIENT_SECRET = 'PRD3I02CL3KM5I5YU4TZ5HWFBEHTX4S0F5WDUND2EB5EEOLR' # your Foursquare Secret
VERSION = '20200719'
LIMIT = 10
radius = 200

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        #Explore the venue
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        response = requests.get(url).json()["response"]
        print(response)
        results= response['groups'][0]['items']
    
        urlLikesFormat = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['id'], 
            requests.get(urlLikesFormat.format(v['venue']['id'], CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)).json()["response"]["likes"]["count"],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue',  
                  'Venue Id',   
                  'Venue Likes', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
venues = getNearbyVenues(df['Neighbourhood'].to_list(), df['Latitude'].to_list(), df['Longitude'].to_list())
venues

Urbanización Ciudad Jardín
{}


KeyError: 'groups'

## Methodology <a name="methodology"></a> 

In this project the effort will be on detecting areas of the communes of Ñuñoa and 22 that have a variety of places nearby, specially markets, restaurants and parks, highly recognized, using likes in Foursquare. The analysis will be limited by the area covered by both communes.

In first step the collected data have been collected: location, type (category) and likes of every place within the limits of the neighborhoods.

Second step will be the calculation and exploration of variety of places across different areas, being prioritary markets and parks of both communes and the likes - heatmaps will be used to identify promising areas with a variety of places in general and likes.

In third and final step the focus will be on the most promising areas and within those create clusters of locations that meet basic requirements: at least 1 market, 1 park, 1 restaurant and 1 gym with at least 100 likes within 300 meters. Finally a map will show the resulting locations and the clusters (using k-means clustering) to identify general zones recommended for living and help with the decision making.


## Analysis <a name="analysis"></a> 

#### First, obtain let's cluster the data 

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a> 