# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

In this project, which is part of the Capstone project of the [IBM Data Science Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-science) the aim is to define an interesting business problem which would need to leverage the Foursquare location data to solve. 

## Instructions

Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.
In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?
These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

Review criteria:
This capstone project will be graded by your peers. This capstone project is worth 70% of your total grade. The project will be completed over the course of 2 weeks. Week 1 submissions will be worth 30% whereas week 2 submissions will be worth 40% of your total grade.

**For this week, you will required to submit the following:**

1. A description of the problem and a discussion of the background. (15 marks)
2. A description of the data and how it will be used to solve the problem. (15 marks)

**For the second week, the final deliverables of the project will be:**

1. A link to your Notebook on your Github repository, showing your code. (15 marks)
2. A full report consisting of all of the following components (15 marks):
    1. Introduction where you discuss the business problem and who would be interested in this project.
    2. Data where you describe the data that will be used to solve the problem and the source of the data.
    3. Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
    4. Results section where you discuss the results.
    5. Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
    5. Conclusion section where you conclude the report.
3. Your choice of a presentation or blogpost. (10 marks)

<br/><br/>

Examples of other reports:
- https://medium.com/@dougm_9851/the-battle-of-neighborhoods-coursera-ibm-capstone-project-52b4292ef410
- https://medium.com/@kunal_chhabra/coursera-capstone-project-the-battle-of-neighbourhoods-7a4aa3e70086
- http://www.zinkohlaing.com/data-science/using-machine-learning-to-find-locations-to-open-a-burmese-restaurant-in-toronto-ibm-capstone-project/

## Table of contents
* [Business Problem](#businessProblem)
* [Introduction](#introduction)
* [Methodology](#methodology)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Business Problem <a name="businessProblem"></a>

### Project aim: Find the best location in Berlin to open a Vietnamese restaurant 

**Background:** Suppose you are the owner of a successfull Vietnamese restaurant chain in the UK that wants to open their first restaurant in Berlin, Germany. You ask yourself: _"Where should I even open my first restaurant?"_. In an ever increasing data-driven world you realize that you could leverage the hard-won skills of a Data Scientist to help you uncover business value from relevant data which would allow you to reach to a decision. You find a Data Scientist and give him/her this description: _"I want to open a restaurant in Berlin that would serve Vietnamese food. Which is the best location and why? Use your skills and show me facts that would support your analysis and conclusion."_


What follows is the report handed in by the Data Scientist.


## Introduction <a name="introduction"></a>

Factors that can influence the location choice:
* **Target customer?** Who are my target customers and where do they live, hang out or work?
* **Area specific information:** people's age range, education level, income level, and area crime rate. --> Question to ask: Will this area attract my target customers?
* **Area traffic:** The higher the traffic (pedestrians or cars) the better the chances for attracting people off the street.
* **Ease of access and visibility:** Is the location easily accesible (by car or foot) and does it have good visibility? Is it easy to drive to the location? Any obstacles to getting there such as one way streets, obstructions, on-going construction?
* **Parking availability** at the property itself if possible or nearby. 
* **Competition** (other restaurants, fast food chains, coffee shops offering light meals etc.)

<br/><br/>

Sources:
* [Why the Location of Your Restaurant is So Important](https://www.tigerchef.com/why-the-location-of-your-business-is.html)
* [4 Important Factors When Choosing a Location to Open a Restaurant](https://www.thebalancesmb.com/choosing-a-location-for-your-restaurant-2888635)

## Methodology <a name="methodology"></a>

1. **General questions asked:**
    1. Is it sensible at all to open a Vietnamese restaurant in Berlin?
    2. Where are the most successful Vietnamese restuarants in Berlin? (metric to measure success?) --> since we are only interested in the location maybe it makes sense to check it out by number of customer (per month/week) or money it makes (includes delivery and in-restaurant visits)
    3. Have several other restaurants opened and closed in the same spot?
        - How much of the success is attributed to the location?

2. **Analysis:**
    1. **Exploratory:**
        1. Visualisations (interactive):
            1. Map of Berlin that shows all restaurants
            2. Map that shows all Vietnamese restaurants (identify N best restaurants: criterion/a?)
            3. Map that shows boroughs/neighborhoods by tourist visits
            4. Map that shows boroughs/neighborhoods by population
            5. Map that shows boroughs/neighborhoods by purchasing power (i.e. how much money do the people that live in those areas have?)
            6. What kind of people live in each area? (i.e. families, students, couples without children, singles that have a job, which nationalities? etc.)
        2. Analysis:
            1. Check nationality of people (or by language) living in an area and correlate that to the restaurants found in that area. **--> What can we uncover from this?**
            2. **Profile of people that prefer Viet cuisine.**
                * Build profile from the people that have rated the Vietnamese restaurants?
            3. Check nationality of people (or by language) living in an area and correlate that to the restaurants found in that area.
            4. Where are all the Vietnamese restaurants? (neighbourhoods or boroughs)
                - Nationality/Language (1st/2nd) of people living in each of the locations?  (Census Data)
                - Is there anything common in these locations?
                - What is the distribution of the ratings/likes/people turnover/money spent in each of these locations? (Foursqare API)
    - Rank the Viet restaurants in all of Berlin (metric?). Where are the N best located?
    2. **Regression analysis? Clustering? Unsupervised learning?**

3. **Evaluation:**
    1. Area population and by purchasing power
    2. Tourist visits

4. **Assumptions:**
    1. The locations recommended in this report are focused on potential dine-in and take-away visits (i.e. location suited for delivery was not considered)
    2. Target market: 
        1. Families
        2. Everyone else. Most of the families in Berlin live in the outskirts of the city and would likely go out for food on weekends.

## Data <a name="data"></a>

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

<br/><br/>

* Foursquare api for restaurant location data:
    1. Search for specific type of venues or stores around a given location (regular call)
    2. Learn more about a specific venue/store/shop e.g. full address, working hours, menu etc. (premium call)
    3. Learn more about a specific Foursquare user, their full name and any tips or photos that they have posted about venues and stores (regular call)
    4. Explore a given location by finding what popular spots exist in the vicinity of the location (regular call)
    5. Explore trending venues around a given location. These are venues with the highest foot traffic at the time of the API call (regular call)
* More restaurant data from other sources (TripAdvisor)
* Census data for people living in each area/borough/neighborhood
* Census data for language (1st/2nd) of each person living in each area/borough/neighborhood
* Number of tourists for each area and per week or month

--> Check Yelp?

<br/><br/>

Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Berlin center will be obtained using **Google Maps API geocoding** of well known Berlin location (Alexanderplatz)

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent `'foursquare_agent'`, as shown below.

In [1]:
#!pip install geopy
from geopy.geocoders import Nominatim  # to convert an address into latitude and longitude values


def get_address_coordinates(address):
    """Convert a specific address to latitude & longitude
    using the Nominatim geocoder from OpenStreetMap.
    """
    try:
        # define a custom user_agent and get the location
        location = Nominatim(user_agent="foursquare_agent").geocode(address)
        print(location)
        # decode the location data
        lat = location.latitude
        lon = location.longitude
        return [lat, lon]
    except:
        return [None, None]


address = 'Alexanderplatz, Berlin, Germany'  # approximately at the centre of Berlin
lat, lon = get_address_coordinates(address)
print(f'Coordinates of {address} are [{lat:.2f}, {lon:.2f}]')


Alexanderplatz, Mitte, Berlin, 10178, Deutschland
Coordinates of Alexanderplatz, Berlin, Germany are [52.52, 13.41]


In [2]:
from dotenv import load_dotenv  # to read .env
from pathlib import Path
import os


# load the Foursquare credentials from .env (hidden file)
if load_dotenv(dotenv_path=Path('.') / '.env'):
    CLIENT_ID = os.getenv("CLIENT_ID")
    CLIENT_SECRET = os.getenv("CLIENT_SECRET")
    print('Credentials loaded from file')
else:
    CLIENT_ID = "Your Foursquare Client ID"
    CLIENT_SECRET = "Your Foursquare Client Secret"


Credentials loaded from file


In [44]:
import requests  # library to handle requests
from typing import List, Tuple


def get_foursquare_data(search_params: dict, fixed_params: dict):
    """Calls the Foursquare API for a search endpoint.
    
    Arguments:
    ----------
        search_params: dict, Holds the specific search parameters i.e. the
                       search query, coordinates and the radious of search
                       
        fixed_params: dict, Holds the API credentials and version
    
    
    Returns:
    --------
        response: API call response as a json file
    """
    url = ('https://api.foursquare.com/v2/venues/search'
           f'?client_id={fixed_params["client_id"]}'
           f'&client_secret={fixed_params["client_secret"]}'
           f'&ll={search_params["latitude"]},{search_params["longitude"]}'
           f'&v={fixed_params["version"]}'
           f'&query={search_params["search_query"]}'
           f'&radius={search_params["radius"]}'
           f'&limit={fixed_params["limit"]}')
    response = requests.get(url).json()
    return response


def foursquare_search_settings() -> Tuple:
    # construct the fixed search parameters
    fixed_search_params = {'client_id': CLIENT_ID,
                           'client_secret': CLIENT_SECRET,
                           'version': '20200121',  # YYYYMMDD
                           'limit': 50  # max is 50
                          }

    # change the below as per your request
    search_params = {'search_query': 'replace_this',
                     'latitude': lat,
                     'longitude': lon,
                     'radius': 14000  # meters, distance from the specified latitude and longitude
                    }
    return search_params, fixed_search_params


Now we need to find all the Vietnamese restaurants in Berlin:

In [45]:
def prepare_foursquare_data(api_params: dict, queries: List[str]) -> Tuple:
    """Calls the Foursquare API for the search strings in queries
       and outputs the final (merged) responses.
       
       Arguments:
       ----------
           api_params: dict, The parameters for the FourSquare API call (see get_foursquare_data)
           
           queries: List[str], The search strings for the FourSquare API call
       
       Returns:
       --------
           fsq_data: dict, The merged responses from the API calls
           
           venue_ids: List[str], The unique id strings of each venue
    """
    fsq_data = {'venues': []}  # to be populated by the API responses
    venue_ids = list()  # to be populated by the unique venue ids
    for item in queries:
        api_params['search_params']['search_query'] = item
        response = get_foursquare_data(api_params['search_params'], api_params['fixed_search_params'])
        venues = response['response']['venues']
        for venue in venues:
            # avoid adding duplicates to fsq_data
            if venue['id'] not in venue_ids:
                fsq_data['venues'].append(venue)
                venue_ids.append(venue['id'])
    return fsq_data, venue_ids


# get the data
search_params, fixed_search_params = foursquare_search_settings()
api_params = {'search_params': search_params, 'fixed_search_params': fixed_search_params}
queries = ['Vietnamesische', 'Vietnamese']
fsq_data, venue_ids = prepare_foursquare_data(api_params, queries)
    


In [46]:
# example of a venue data
fsq_data['venues'][0]

{'id': '4dad6bce8154b108febd9e15',
 'name': 'Vietnamesische Küche - Com Pho Viet',
 'location': {'address': 'Luxemburger Str. 32',
  'lat': 52.54590945569875,
  'lng': 13.358392132334648,
  'labeledLatLngs': [{'label': 'display',
    'lat': 52.54590945569875,
    'lng': 13.358392132334648}],
  'distance': 4592,
  'postalCode': '13353',
  'cc': 'DE',
  'city': 'Berlin',
  'state': 'Berlin',
  'country': 'Deutschland',
  'formattedAddress': ['Luxemburger Str. 32', '13353 Berlin', 'Deutschland']},
 'categories': [{'id': '4bf58dd8d48988d14a941735',
   'name': 'Vietnamese Restaurant',
   'pluralName': 'Vietnamese Restaurants',
   'shortName': 'Vietnamese',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vietnamese_',
    'suffix': '.png'},
   'primary': True}],
 'referralId': 'v-1579800272',
 'hasPerk': False}

In [69]:
# response
# response['meta']
print(len(response['response']['venues']))
# response['response'].keys()

for i in range(0, len(response['response']['venues'])):
    try:
        print(response['response']['venues'][i]['name'])
        print('\t' + response['response']['venues'][i]['categories'][0]['name'])
    except:
        pass


17
Vietnamesische Küche - Com Pho Viet
	Vietnamese Restaurant
Tom Café - Foodstore • Vietnamesische Spezialitäten
	Vietnamese Restaurant
Vietnamesische Nachrichtenagentur
	Community Center
Tônis Vietnamesische Küche
	Vietnamese Restaurant
Phò12 - Traditionelle Vietnamesische Küche
	Vietnamese Restaurant
Vietnamesisches Erlebnis Restaurant
Van Hoa
	Vietnamese Restaurant
Otito
	Vietnamese Restaurant
Mamay
	Vietnamese Restaurant
Pho - Sushi + vietnamesische Küche
	Vietnamese Restaurant
A.nam Sushi Bar
	Sushi Restaurant
H&H Imbiss Vietnamesische Küche
	Snack Place
Linh Linh Vietnamese Food & Sushi
	Sushi Restaurant
Eiscafé & Vietnamesische Küche
	Vietnamese Restaurant
Botschaft der Sozialistischen Republik Vietnam
	Embassy / Consulate
Unami
	Vietnamese Restaurant
Soy (Sushi & Vietnamesisches Restaurant) @Berlin-Spandau
	Vietnamese Restaurant


## Analysis <a name="analysis"></a>

### Visualisations



## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>

Maybe have a separate section "Recommendation":

For best visibility it is advisable to open the restaurant on a major road within the recommended areas.