### Introduction
#### In this project we will use the geolospatial information of Los Angeles. The idea is to use the data to make suggestions regarding which districs are more suitable for opening a new restaurant. We will identify which districts have less restaurants than the rest, cluster them and make an appropriate suggestion. The fourquare database will be used to retrieve information for all the neighborhoods in our dataset. This step will be crucial when deciding to commence such an expensive procedure, like starting a new business. The same approach could be used to identify regions that are more suitable for opening new cafes. Of cource this is a simlified scenario. To address the question at each core, we should also take into account other factors, such as the average income in each neighborhood, the criminality levels, the average age of the citizens etc.. Obtaining this information for this project would be very hard to achieve, so we will restrict our analysis on the data we can retrieve from foursquare.

### The dataset
#### The data for this study have been retrieved from: https://usc.data.socrata.com/dataset/Los-Angeles-Neighborhood-Map/r8qd-yxsr. 
#### Let's first explore our data and plot a map of Los Angeles, highlighting with blue dots the neighborhoods of our dataset. 

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
with open('/home/christos/Downloads/LosAngelesNeighborhoodMap.geojson') as json_data:
    la_data = json.load(json_data)

In [3]:
neighborhoods_data = la_data['features']

In [4]:
neighborhoods_data[0]

{'type': 'Feature',
 'properties': {'external_i': 'acton',
  'name': 'Acton',
  'location': 'POINT(34.497355239240846 -118.16981019229348)',
  'latitude': '-118.16981019229348',
  'slug_1': None,
  'sqmi': '39.3391089485',
  'display_na': 'Acton L.A. County Neighborhood (Current)',
  'set': 'L.A. County Neighborhoods (Current)',
  'slug': 'acton',
  'longitude': '34.497355239240846',
  'name_1': None,
  'kind': 'L.A. County Neighborhood (Current)',
  'type': 'unincorporated-area'},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[-118.20261747920541, 34.53898972076929],
     [-118.18946958918568, 34.5385546636616],
     [-118.18950400422953, 34.5349457732411],
     [-118.185124836341, 34.53482956044709],
     [-118.18516440876348, 34.53124651970553],
     [-118.17601577983017, 34.531354702430015],
     [-118.1761893084381, 34.523803185624594],
     [-118.16702561365965, 34.52351227823281],
     [-118.16294026595281, 34.523716853632315],
     [-118.16298888279476, 34.527586918

In [5]:
column_names = ['City', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    neighborhood_name = data['properties']['name']
    neighborhood_lon = data['properties']['latitude']
    neighborhood_lat = data['properties']['longitude']
    
    neighborhoods = neighborhoods.append({'City': 'L.A.',
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [6]:
neighborhoods.sort_values('Neighborhood')

Unnamed: 0,City,Neighborhood,Latitude,Longitude
0,L.A.,Acton,34.49735523924085,-118.16981019229348
1,L.A.,Adams-Normandie,34.03146149912416,-118.30020800000013
2,L.A.,Agoura Hills,34.146736499122795,-118.75988450000015
3,L.A.,Agua Dulce,34.50492699979684,-118.3171036690717
4,L.A.,Alhambra,34.08553899912357,-118.1365120000002
5,L.A.,Alondra Park,33.889617004889644,-118.3351559860816
7,L.A.,Altadena,34.19387050223217,-118.13623898201556
8,L.A.,Angeles Crest,34.31393700589531,-117.9223952817848
9,L.A.,Arcadia,34.13322999912302,-118.03041899311202
10,L.A.,Arleta,34.24309999912158,-118.4307575


In [7]:
address = 'Los Angeles'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of L.A are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of L.A are 34.0536909, -118.2427666.


In [8]:
map_la = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.9,
        parse_html=False).add_to(map_la)  
    
map_la