# Capstone Project - The Battle of the Neighborhoods

## Applied Data Science Capstone by IBM/Coursera

### Marta Martinez Lopez

# Introduction - Business Problem

## Description of the problem and a discussion of the background

The Lopez family lives in Madrid and for work reasons they have to move to a new city, Cordoba. They will be in Córdoba for the next month and a half, and during their stay they will live in the center of the city due to its proximity to the new work office. They will look for a hotel in the city center that is surrounded by restaurants in order to lead a life similar to the one they had in Madrid.

## Description of the data and how it will be used to solve the problem

Sufficient data is needed to know which is the center of the new city. In addition, we must know which are the existing hotels in the center of the city of Córdoba to choose the appropriate one that is surrounded and supported by many restaurants. Therefore, we need data on the center of Córdoba and the number of hotels and their location and the number of restaurants and their location.

# Data

Based on definition of our problem, factors that will influence our decission are:

- Number of existing hotels in the center of Cordoba (any type of restaurant)
- Number of and distance to restaurants in the center
- Distance of hotels and restaurants from city center

Following data sources will be needed to extract/generate the required information:

Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API


# Methodology

In this project we will direct our efforts on detecting areas of the Cordoba's center that have hotels and restaurants.
We will limit our analysis to area ~2km around city center.

We have collected the required data: location and type (category) of every hotels + restaurants  within 2km from center (Tendillas Square). 

Next, we will focus on most promising areas and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will take into consideration locations with no more than two hotels and restaurants in radius of 250 meters. We will present map of all such locations but also create clusters (using k-means clustering) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

# Analysis

In [3]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Tendillas, Cordoba, Spain'
cordoba_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, cordoba_center))

NameError: name 'google_api_key' is not defined

In [4]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Cordoba center longitude={}, latitude={}'.format(cordoba_center[1], cordoba_center[0]))
x, y = lonlat_to_xy(cordoba_center[1], cordoba_center[0])
print('Cordoba center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Cordoba center longitude={}, latitude={}'.format(lo, la))

ModuleNotFoundError: No module named 'shapely'

In [1]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

NameError: name 'location_restaurants' is not defined

In [5]:
#!pip install folium

import folium

In [6]:
map_cordoba = folium.Map(location=cordoba_center, zoom_start=13)
folium.Marker(cordoba_center, popup='Tendillas').add_to(map_berlin)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_berlin)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_cordoba

NameError: name 'cordoba_center' is not defined

# Results

Our analysis shows that although there is a great number of hotels and restaurants in Cordoba.
Highest concentration of restaurants was detected west from Tendillas Square (center Point), ad in the north from Tendillas there are a lot of hotels.

After directing our attention to this more narrow area of interest (covering approx. 5x5km south-east from Tendillas) we first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than two hotels+ restaurants in radius of 250m.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.


# Conclusion

Purpose of this project was to identify Cordoba areas close to center with the mains hotels and restaurants in order to aid stakeholders in narrowing down the search for optimal location fnt. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis (Kreuzberg and Friedrichshain), and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.