# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

In this project, which is part of the Capstone project of the [IBM Data Science Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-science) the aim is to define an interesting business problem which would need to leverage the Foursquare location data to solve. 

## Instructions

Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.
In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?
These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

Review criteria:
This capstone project will be graded by your peers. This capstone project is worth 70% of your total grade. The project will be completed over the course of 2 weeks. Week 1 submissions will be worth 30% whereas week 2 submissions will be worth 40% of your total grade.

**For this week, you will required to submit the following:**

1. A description of the problem and a discussion of the background. (15 marks)
2. A description of the data and how it will be used to solve the problem. (15 marks)

**For the second week, the final deliverables of the project will be:**

1. A link to your Notebook on your Github repository, showing your code. (15 marks)
2. A full report consisting of all of the following components (15 marks):
    1. Introduction where you discuss the business problem and who would be interested in this project.
    2. Data where you describe the data that will be used to solve the problem and the source of the data.
    3. Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
    4. Results section where you discuss the results.
    5. Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
    5. Conclusion section where you conclude the report.
3. Your choice of a presentation or blogpost. (10 marks)

<br/><br/>

Examples of other reports:
- https://medium.com/@dougm_9851/the-battle-of-neighborhoods-coursera-ibm-capstone-project-52b4292ef410
- https://medium.com/@kunal_chhabra/coursera-capstone-project-the-battle-of-neighbourhoods-7a4aa3e70086
- http://www.zinkohlaing.com/data-science/using-machine-learning-to-find-locations-to-open-a-burmese-restaurant-in-toronto-ibm-capstone-project/

## Table of contents
* [Business Problem](#businessProblem)
* [Introduction](#introduction)
* [Methodology](#methodology)
* [Dependencies](#dependencies)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Business Problem <a name="businessProblem"></a>

### Project aim: Find the best location in Berlin to open a Vietnamese restaurant 

**Background:** Suppose you are the owner of a successfull Vietnamese restaurant chain in the UK that wants to open their first restaurant in Berlin, Germany. You ask yourself: _"Where should I even open my first restaurant?"_. In an ever increasing data-driven world you realize that you could leverage the hard-won skills of a Data Scientist to help you uncover business value from relevant data which would allow you to reach to a decision. You find a Data Scientist and give him/her this description: _"I want to open a restaurant in Berlin that would serve Vietnamese food. Which is the best location and why? Show me facts that would support your analysis and conclusion."_


What follows is the report handed in by the Data Scientist.


## Introduction <a name="introduction"></a>

Factors that can influence the location choice:
* **Target customer?** Who are my target customers and where do they live, hang out or work?
* **Area specific information:** people's age range, education level, income level, and area crime rate. --> Question to ask: Will this area attract my target customers?
* **Area traffic:** The higher the traffic (pedestrians or cars) the better the chances for attracting people off the street.
* **Ease of access and visibility:** Is the location easily accesible (by car or foot) and does it have good visibility? Is it easy to drive to the location? Any obstacles to getting there such as one way streets, obstructions, on-going construction?
* **Parking availability** at the property itself if possible or nearby. 
* **Competition** (other restaurants, fast food chains, coffee shops offering light meals etc.)

<br/><br/>

Sources:
* [Why the Location of Your Restaurant is So Important](https://www.tigerchef.com/why-the-location-of-your-business-is.html)
* [4 Important Factors When Choosing a Location to Open a Restaurant](https://www.thebalancesmb.com/choosing-a-location-for-your-restaurant-2888635)

## Methodology <a name="methodology"></a>

1. **General questions asked:**
    1. Is it sensible at all to open a Vietnamese restaurant in Berlin?
    2. Where are the most successful Vietnamese restuarants in Berlin? (metric to measure success?) --> since we are only interested in the location maybe it makes sense to check it out by number of customer (per month/week) or money it makes (includes delivery and in-restaurant visits)
    3. Have several other restaurants opened and closed in the same spot?
        - How much of the success is attributed to the location?

2. **Analysis:**
    1. **Exploratory:**
        1. Visualisations (interactive):
            1. Map of Berlin that shows all restaurants
            2. Map that shows all Vietnamese restaurants (identify N best restaurants: criterion/a?)
            3. Map that shows boroughs/neighborhoods by tourist visits
            4. Map that shows boroughs/neighborhoods by population
            5. Map that shows boroughs/neighborhoods by purchasing power (i.e. how much money do the people that live in those areas have?)
            6. What kind of people live in each area? (i.e. families, students, couples without children, singles that have a job, which nationalities? etc.)
        2. Analysis:
            1. Check nationality of people (or by language) living in an area and correlate that to the restaurants found in that area. **--> What can we uncover from this?**
            2. **Profile of people that prefer Viet cuisine.**
                * Build profile from the people that have rated the Vietnamese restaurants?
            3. Check nationality of people (or by language) living in an area and correlate that to the restaurants found in that area.
            4. Where are all the Vietnamese restaurants? (neighbourhoods or boroughs)
                - Nationality/Language (1st/2nd) of people living in each of the locations?  (Census Data)
                - Is there anything common in these locations?
                - What is the distribution of the ratings/likes/people turnover/money spent in each of these locations? (Foursqare API)
    - Rank the Viet restaurants in all of Berlin (metric?). Where are the N best located?
    2. **Regression analysis? Clustering? Unsupervised learning?**. Factors to consider:
        * **Target customer?**
        * **Area specific information**  -> **HOW TO BEST DiVIDE THE AREAS??** Using a custom grid? postcodes?
        * **Area traffic**
        * **Ease of access and visibility**. To this respect: proximity to U, S and bus stops
        * **Parking availability** at the property itself if possible or nearby. 
        * **Competition**

3. **Evaluation:**
    1. Area population and by purchasing power
    2. Tourist visits

4. **Assumptions:**
    1. The locations recommended in this report are focused on potential dine-in and take-away visits (i.e. location suited for delivery was not considered)
    2. Travel distance from the recommended restaurant locations to the manager's house is not taken into account.
    3. Target market: 
        1. Families
        2. Everyone else. Most of the families in Berlin live in the outskirts of the city and would likely go out for food on weekends.

## Dependencies <a name="dependencies"></a>

* numpy
* numpy.matlib
* pyproj
* folium
* geopy

## Data <a name="data"></a>

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

<br/><br/>

* Foursquare api for restaurant location data:
    1. Search for specific type of venues or stores around a given location (regular call)
    2. Learn more about a specific venue/store/shop e.g. full address, working hours, menu etc. (premium call)
    3. Learn more about a specific Foursquare user, their full name and any tips or photos that they have posted about venues and stores (regular call)
    4. Explore a given location by finding what popular spots exist in the vicinity of the location (regular call)
    5. Explore trending venues around a given location. These are venues with the highest foot traffic at the time of the API call (regular call)
* More restaurant data from other sources (TripAdvisor) --> need a web scraper for that (future work)
* Census data for people living in each area/borough/neighborhood
    - https://www.statistikportal.de/en
    - https://www.statistik-berlin-brandenburg.de/home.asp
        - [Demographic data](https://www.businesslocationcenter.de/en/business-location/berlin-at-a-glance/demographic-data/). Used data from the above link.
* Census data for language (1st/2nd) of each person living in each area/borough/neighborhood
    - Same links as above (especially the second one)
* Number of tourists for each area and per week or month
* Traffic data: Check this [link](https://www.berlin.de/senuvk/verkehr/) -> For pedestrian traffic
* Other sources (might be useful):
    - [Berlin economic atlas](https://www.businesslocationcenter.de/wab/maps/main/#/). Has a map of all restaurants --> can I download the data? Homepage [here](https://www.businesslocationcenter.de/en/economic-atlas/) with a download link. Check what kind of data it is before download because they are large files.
    - [European Data Incubator](https://edincubator.eu/data-providers-main/)
    - Senate Department for the Environment, Transport and Climate Protection, Berlin data download
    - https://ergebnisse.zensus2011.de/#StaticContent:11,EINWOHNERZAHLEN,m, (in German)
    - Check the downloaded pdf

--> Check Yelp?

<br/><br/>

Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Berlin center will be obtained using **Google Maps API geocoding** of well known Berlin location (Alexanderplatz)

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent `'foursquare_agent'`, as shown below.

In [1]:
import sys
sys.path.append("../")

In [2]:
#!pip install geopy  # if geopy is not already installed
from geopy.geocoders import Nominatim  # to convert an address into latitude and longitude values


def get_address_coordinates(address):
    """Convert a specific address to latitude & longitude
    using the Nominatim geocoder from OpenStreetMap.
    """
    try:
        # define a custom user_agent and get the location
        location = Nominatim(user_agent="foursquare_agent").geocode(address)
        print(location)
        # decode the location data
        lat = location.latitude
        lon = location.longitude
        return [lat, lon]
    except:
        return [None, None]


ADDRESS = 'Alexanderplatz, Berlin, Germany'  # approximately at the centre of Berlin
BERLIN_CENTRE = get_address_coordinates(ADDRESS)
print(f'Coordinates of {ADDRESS} are [{BERLIN_CENTRE[0]:.2f}, {BERLIN_CENTRE[1]:.2f}]')


Alexanderplatz, Mitte, Berlin, 10178, Deutschland
Coordinates of Alexanderplatz, Berlin, Germany are [52.52, 13.41]


### Define the search area

The search area is defined by creating a grid centred at BERLIN_CENTRE. Convert location data to cartesian coordinates for easier data calculations. Grid consists of circular areas of radius equal to minor_radius. The aim is to get a list of all restaurants per grid area.

In [3]:
from utils import lonlat_to_xy, create_grid
import numpy as np


#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#
# TODO: Remove the code below and create a unittest instead
# sanity check
x, y = lonlat_to_xy(Lon=BERLIN_CENTRE[1], Lat=BERLIN_CENTRE[0])
lo, la = lonlat_to_xy(x, y, inverse=True)
assert(round(BERLIN_CENTRE[0], 4)==round(la, 4))
assert(round(BERLIN_CENTRE[1], 4)==round(lo, 4))
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#


# convert BERLIN_CENTRE to UTM coordinates
berlin_centre_xy = lonlat_to_xy(Lon=BERLIN_CENTRE[1], Lat=BERLIN_CENTRE[0])

# parameters
RADIUS = 6000  # search radius with area centre: berlin_centre_xy
MINOR_RADIUS = 500  # grid shapes' radius
# Note: The values above should take into account the number of API calls
# a developer can make. Check out https://developer.foursquare.com/docs/api/troubleshooting/rate-limits

# create the grid
grid_centres_x, grid_centres_y, area_radius = create_grid(centre=berlin_centre_xy,
                                                          radius=RADIUS,
                                                          minor_radius=MINOR_RADIUS,
                                                          area_shape='circle',
                                                          ov=True)
# convert cartesian grid centres to longitude and latitude
grid_centres_lo, grid_centres_la = lonlat_to_xy(grid_centres_x, grid_centres_y, inverse=True)

# store into a single numpy array for easier manipulation
grid_centres_longla = np.concatenate((grid_centres_lo.T.reshape(np.prod(grid_centres_lo.shape), 1),
                                      grid_centres_la.reshape(np.prod(grid_centres_la.shape), 1)),
                                     axis=1)

print(f'=====> Number of Foursquare API calls to make={grid_centres_longla.shape[0]}.')

# sanity check
assert(np.unique(np.sum(grid_centres_longla, axis=1)).shape[0]==grid_centres_longla.shape[0])

=====> Number of Foursquare API calls to make=400.


The following map of Berlin depicts the grid search area. Markers indicate the centre of each minor grid area.

In [4]:
#!pip install folium
import folium

In [5]:

map_berlin = folium.Map(location=BERLIN_CENTRE, zoom_start=12)
folium.Marker(BERLIN_CENTRE, popup=ADDRESS.split(',')[0]).add_to(map_berlin)
for longlat in grid_centres_longla:
    lon = longlat[0]
    lat = longlat[1]
    folium.Circle([lat, lon], radius=2, color='blue', fill=True, fill_opacity=1).add_to(map_berlin)

map_berlin  # show the map

### Get a list of restaurants per search area

In [6]:
from dotenv import load_dotenv  # to read .env
from pathlib import Path
import os


# load the Foursquare credentials from .env (hidden file)
if load_dotenv(dotenv_path=Path('..') / '.env'):
    CLIENT_ID = os.getenv("CLIENT_ID")
    CLIENT_SECRET = os.getenv("CLIENT_SECRET")
    print('Credentials loaded from file')
else:
    CLIENT_ID = "Your Foursquare Client ID"
    CLIENT_SECRET = "Your Foursquare Client Secret"


Credentials loaded from file


In [7]:
import json
import fsquare  # my module for making api calls and fetching relevant data


## Parameters ##
# Categories:  (source: https://developer.foursquare.com/docs/resources/categories)
SEARCH_CATEGORY = ('4d4b7105d754a06374d81259',)  # make it immutable to avoid changing it
load_results_from_file = True  # Set to False to make api calls

# create the foursquare class instance
fsq = fsquare.fsquare(CLIENT_ID, CLIENT_SECRET)

## Search parameters ##
fixed_search_params, search_params = fsq.fsquare_search_settings(BERLIN_CENTRE[0], BERLIN_CENTRE[1])
search_params['radius'] = MINOR_RADIUS  # grid area radius
api_params = {'search_params': search_params, 'fixed_search_params': fixed_search_params}
# queries = ['restaurant']#['Vietnamesische', 'Vietnamese']
queries = [SEARCH_CATEGORY[0]]

## Get the data ##
if load_results_from_file:
    with open('../data/foursquare_data.json') as json_file:
        fsqdata = json.load(json_file)
else:
    fsqdata = fsq.get_fsquare_data(api_params,
                                   queries,
                                   tp='cat',  # perform a category search
                                   coords=grid_centres_longla,
                                   verbose=1)
    if fsqdata:
        print('Obtained Foursquare data successfully')
    else:
        raise ValueError('Something went wrong!')

    # Check for duplicates
    assert(len(fsqdata['venue_ids'])==len(set(fsqdata['venue_ids'])))
    
    # save the data
    with open('../data/foursquare_data.json', 'w') as json_file:
        json.dump(fsqdata, json_file)
    print('Data saved successfully!')

In [8]:
fsqdata.keys()

dict_keys(['venues', 'venue_ids'])

To obtain the list of restaurants we can either use a search endpoint (see Foursquare documentation) with a relevant search string or seek for specific venue category identities. The former might not be a suitable choice in this case as you probably need more than just 1 or 2 strings to get all the restaurants, Biergartens, imbiss, Bistro Cafes etc. that also serve more than light meals and are likely competitors to your business. Therefore, searching by category identity is more practical and guarantees that all relevant venues will be obtained.

So we first need to obtain the category identities. This can be achieved by making an API call or by manually adding the relevant identities by looking at the documentation [here](https://developer.foursquare.com/docs/resources/categories). In our current case there are a lot of categories so we go with the first option. An additional benefit of this method is that you don't need to continuously check for any changes in the documentation and then manually updating your list of category identities as you can programmatically take care of that.

In [9]:
# this string must exist in the name of a venue category
word_to_include = 'restaurant'  # case insensitive

# Get all the venue categories (from the Foursquare API docs)
# (See: https://developer.foursquare.com/docs/resources/categories)
all_categories, categories_to_include = fsq.get_all_fsquare_categories(api_params["fixed_search_params"],
                                                                       categ_name='Food',  # main category name
                                                                       w=word_to_include)
# sanity check
assert(len(categories_to_include)==len(set(categories_to_include)))
print('Ignode status code 200')

Ignode status code 200


  warn(f"API response status code: {code}")


Now we need to find all the restaurants within each area. Separate Vietnamese and other Asian restaurants from the rest. Will be used for plotting later.

In [10]:
from typing import List, Tuple


def get_restaurants(tp: List[str], categs: dict) -> Tuple:
    # to store the API reponses of the venues of interest
    restaurants = {item:[] for item in venue_types}

    # append with venue names that are not added to the restaurants dictionary
    not_added = []
    
    # populate with the correct category id
    for venue in fsqdata['venues']:
        v_id = venue['categories'][0]['id']
        if v_id in venue_categories:
            restaurants[venue_categories[v_id]].append(venue)
        elif v_id in categories_to_include:
            restaurants[venue_types[0]].append(venue)  # append to the 'Other restaurants'
        else:
            # don't add any venues that are not direct competitors (e.g. bakeries, cafes)
            not_added.append(venue['name'])
    print(f"{len(not_added)} venues not added.")
    return restaurants, not_added


# Venue category names of interest
venue_types = ['Other restaurants', 'Other Asian', 'Vietnamese']

# Categories:  (source: https://developer.foursquare.com/docs/resources/categories)
venue_categories = {'4d4b7105d754a06374d81259': venue_types[0],  # Food
                    '4bf58dd8d48988d142941735': venue_types[1],  # Asian Restaurant
                    '4bf58dd8d48988d14a941735': venue_types[2]}  # Vietnamese Restaurant

restaurants, not_added = get_restaurants(tp=venue_types, categs=venue_categories)

# an example of a venue that is not considered as a competitor
print(f"Example: '{not_added[0]}'\n")
    
# sanity check: check the venues that don't have word_to_include in their category name
# These should only come from the primary category 'Food'
print(f"Check the venue that don't have the word '{word_to_include}' in their name:")
catIDs = list()
for item in restaurants['Other restaurants']:
    name = item['categories'][0]['name']
    if word_to_include.lower() not in name.lower():
        print(f'Category name: {name}. Venue name: {item["name"]}')
        catIDs.append(item['categories'][0]['id'])
assert(len(set(catIDs))==1)  # ensure that it's only 1 unique category
assert(catIDs[0]==SEARCH_CATEGORY[0])  # and it is the SEARCH_CATEGORY
print('No problems found!\n')

    
print('Number of restaurants per category:')
for i,c in enumerate(venue_types):
    print(f'{(" "*4)}{i+1}) {c}: {len(restaurants[c])} venues.')



3306 venues not added.
Example: 'Lula Deli & Grill'

Check the venue that don't have the word 'restaurant' in their name:
Category name: Food. Venue name: Bohemia Restaurant
Category name: Food. Venue name: einhorn catering
Category name: Food. Venue name: City 54 Bistro
Category name: Food. Venue name: Küche im Blogger Apartment
Category name: Food. Venue name: Imbiss International
Category name: Food. Venue name: Metropol Café Bar
Category name: Food. Venue name: Biergarten
Category name: Food. Venue name: Mimo
Category name: Food. Venue name: Treffpunkt Bistro Cafe
No problems found!

Number of restaurants per category:
    1) Other restaurants: 2220 venues.
    2) Other Asian: 175 venues.
    3) Vietnamese: 204 venues.


_"A position on the Earth is given by the UTM zone number and the easting and northing planar coordinate pair in that zone. The point of origin of each UTM zone is the intersection of the equator and the zone's central meridian. To avoid dealing with negative numbers, the central meridian of each zone is defined to coincide with 500000 meters East. In any zone a point that has an easting of 400000 meters is about 100 km west of the central meridian"_

[source](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system)

## Analysis <a name="analysis"></a>

### Visualisations



Plotting the location of each restaurant will help to derive general observations regarding the restaurant concentration in Berlin.

We can therefore plot the restaurants we previously found and colour code them for easier inspection.

In [11]:
# available colours:
list(folium.map.Icon.color_options)

['red',
 'cadetblue',
 'lightred',
 'lightblue',
 'purple',
 'white',
 'green',
 'lightgreen',
 'pink',
 'darkgreen',
 'darkred',
 'beige',
 'gray',
 'darkpurple',
 'black',
 'blue',
 'orange',
 'lightgray',
 'darkblue']

In [12]:
# choose colours that give good visual contrast
folium_colours = ['cadetblue', 'red', 'black', 'yellow']

# create a colour dictionary for the different venues
venue_colors = {name:folium_colours[i] for i,name in enumerate(venue_types)}
venue_colors['Other'] = folium_colours[len(venue_types)]
venue_colors

{'Other restaurants': 'cadetblue',
 'Other Asian': 'red',
 'Vietnamese': 'black',
 'Other': 'yellow'}

In [13]:
V = venue_categories.values()

# create map
map_berlin = folium.Map(location=BERLIN_CENTRE, zoom_start=12)
folium.Marker(BERLIN_CENTRE, popup=ADDRESS.split(',')[0]).add_to(map_berlin)

# plot the restaurant locations with different colours
for venue_categ,venues in restaurants.items():
    for venue in venues:
        # Get the latitude and longitude of the venue
        lat = venue['location']['lat']
        lon = venue['location']['lng']
        # TODO: 
            # 1. convert to x,y coords
            # 2. correct the location using the distance
                 #distance = venue['location']['distance'])
            # 3. convert back to long, lat

        # now plot using the correct colour
        if venue_categ in V:
            folium.Circle([lat, lon], radius=1,
                          color=venue_colors[venue_categ],
                          fill=True, fill_opacity=1).add_to(map_berlin)
        else:
            folium.Circle([lat, lon], radius=1,
                          color=venue_colors['Other'],
                          fill=True, fill_opacity=1).add_to(map_berlin)

# DIFFERENT PLOT METHOD:
#folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
# ADD MARKER:
# folium.Marker([lat, lon]).add_to(map_berlin)

map_berlin  # show the map

Observations:
- Most of the venues seem to be concentrated in the centre as expected, but a high number of restaurants are also found in the west and south west.
- Some streets seem to be crowded in the north west as well.
- A big proportion in the north east seems to be quite empty of venues.
- Few venues also appear in the south and around the Tempelhofer field.

# Decision methodology

## What to consider as important factors that will affect our decision:

1. Distance to:
    1. other Vietnamese restaurants
    2. other Asian restaurants.
    3. any other restaurants.
    4. ?Distance to centre? (easier to do than other more complex but better solutions)
2. Within a touristy area (defined by distance to the centre of the area.)
3. Number of tourists in the touristy area
4. Buying power of people living in the area. (higher then better area)
5. ?Type of companies and number of workers in the area (banks, hedge funds, consultancies --> get people for business lunches)


Decision:
1. Compute a ranking score? How to set the weights? -> then how to get the street name (make the search radius small)?
2. Something with Machine Learning?


## Taken from the other notebook
In this project we will direct our efforts on detecting areas of Berlin that have low restaurant density, particularly those with low number of Italian restaurants. We will limit our analysis to area ~6km around city center.

In first step we have collected the required **data: location and type (category) of every restaurant within 6km from Berlin center** (Alexanderplatz). We have also **identified Italian restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Berlin - we will use **heatmaps** to identify a few promising areas close to center with low number of restaurants in general (*and* no Italian restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 250 meters**, and we want locations **without Italian restaurants in radius of 400 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>

Maybe have a separate section "Recommendation":

For best visibility it is advisable to open the restaurant on a major road within the recommended areas.