# An analysis of neighborhoods in Hamburg for couples with children


## Table of Contents
1. Introduction
2. Data
3. Methodology
4. Analysis
5. Results and discussion
6. Conclusion and outlook


## 1. Introduction
<div style="text-align: justify"> 
Hamburg is a beautiful and modern city. According to Numbeo, Hamburg is the top 20 out of 87 cities in Europa ranked by high Quality of Life Index (https://www.numbeo.com/quality-of-life/in/Hamburg). The city of Hamburg is divided into seven boroughs with 104 neighborhoods. Each of the neighborhoods has its own specific characters. Whether you are a worker moving from outside of the city, or a student who wants to rent a flat close to his college at an economical price or if you are a new parent and want to move to a quiet place that offers plenty of activities for your kids, finding a neighborhood that meets the personal requirements the most will be challenging. This project aims to give you an overview of the neighborhoods in Hamburg that could fit your personal requirements by clustering the neighborhoods into several groups based on the feathers (rental price, the most common venue categories, etc.) of each neighborhood. 
</div>
<br/><br/>
<div style="text-align: justify"> 
More specifically, in this project, I'm trying to find a neighborhood in Hamburg for my friend Betty, who is going to give birth to twin babies at the end of 2021. Betty is currently living in the neighborhood of Eimsbüttel with her partner and a 4-year-old daughter. Because of the coming of the twins, she needs to find a bigger apartment for 2 adults with 3 children. She actually quite enjoys her current living area. Every morning, after she sends her daughter to the kindergarten, she grabs a coffee and then sits on the beach in the park along the Alster River near her living area. There is also a big playground nearby. In the afternoon, after she picks up the daughter, she goes to the playground. Usually, there are some other couples with children in the playground too, so she can easily have chats with them. She also enjoys the convenience of shopping because the current living area has several supermarkets and grocery stores. When she is busy and don’t have time to cook, she can easily get food from the restaurants downstairs of her apartment. However, there is also something she does not like about her current living area. For example, there are usually many people walking, running, cycling in the park nearby, which makes it a little bit too crowed in the park. It is also crowed in the restaurants too. She usually needs to wait quite long time to get the food she ordered. She would like to have an apartment in a quieter neighborhood. Besides, she doesn’t like driving. So, it will be great if the new living area has convenient public transpirations.
</div>

After talking with her, I summarized her key requirements for a neighborhood with the following characters:
* Less people
* Playground
* Park
* Supermarket
* Bus stops or U ban stations
* Restaurants
* Coffee shops

I further discussed with her about how important of the above characters and we reached to the agreements about the importance of each character:
* Playground, **important ratio=0.25**;
* Nature, including parks, rivers, etc., **important ratio: 0.25**;
* Supermarkets, stores to buy groceries, etc., **important ratio: 0.15**;
* Public transportation, including bus stops, U ban, S ban, train stations, etc., **important ratio: 0.15**;
* Food, including restaurants, fast food shops, etc., **important ratio: 0.1**;
* Coffee shops, **important ratio: 0.1**;

Other considerations:
* **Less population density** than the current neighborhood;
* Other couples with children, so she can have chats with the other parents;
* **Lower rental price** is better;

<div style="text-align: justify"> 
Overall, Betty would like to keep the things she likes about the current neighborhood (playground, parks/rivers nearby, restaurants and coffee shops, easy public transportation, parents with children that she can talk with) and avoid the things she dislikes (too many people). I will help her to identify the most promising neighborhoods based on her above wishes.
</div>

## 2. Data 

### 2.1 Information needed
After identifying Betty’s wishes for the new neighborhood, I decide to collect the following information:
* Information of the neighborhoods in Hamburg, including names etc.;
* Population density of each neighborhood;
* Percentage of families with children living in each neighborhood;
* Rental price of each neighborhood;
* Venues in each neighborhood;

### 2.2 Data source
To get the needed information, the following datasets are collected through internet:
* Information about each neighborhood, including names and coordinates, population density.  
https://de.zxc.wiki/wiki/Liste_der_Bezirke_und_Stadtteile_Hamburgs
<br/><br/>
* Percentage of families with children in each neighborhood **(31.12.2020)**. 
This information is downloaded from:
https://www.govdata.de/web/guest/suchen/-/details/statistisches-jahrbuch-hamburg-2020-2021
<br/><br/>
* Rental price of each neighborhood **(06/2021)**. 
This information is downloaded from:
https://www.wohnungsboerse.net/mietspiegel-Hamburg/3195 
<br/><br/>
* Information about the venues in each neighborhood **(06/2021)**. 
This information is gathered through **Foursquare API**, using which to get the most common venue categories in each neighborhood.

## 3. Methodology
* **Collect and clean the data**. 
I will first start to collect all the needed information, clean the dataset, dealing with missing values, etc..  <br/><br/>
* **Explore the data**.
After I get the cleaned data, I will do an exploratory data analysis. From describe analysis, I can get valuable insights for the distributions of the important features, like ‘population density’, ’rental price’ and ‘percentage of families with children’ living in each neighborhood.
<br/><br/>
* **Set criterion to filter out the unwanted neighborhoods**. 
Betty really does not like the crowed neighborhoods and she would like to live in a neighborhood with many couples with children. Based on these two wished, I first filter out the neighborhoods with high density of population and less percentage of families with children. 
<br/><br/>
* **Visualize the neighborhoods that she might be interested**. 
Folium map is used to visualize the distribution of the neighborhoods that she might be interested (green) and not interested (gray). This will give her a first impression about the locations that might fit her wishes.
<br/><br/>
* **Get the venue categories in each neighborhood**. 
Foursquare API is used to get the most common venue categories in each neighborhood. One hot encoding is used to put all the venue categories of each neighborhood into one data frame. Then, I will summarize the venues into the categories Betty wishes to have (see the list in the introduction part).
<br/><br/>
* **Get interested index**.
Based on her wishes, the interested index for each neighborhood is calculated based on the importance of the features.
**Interested index = Playground * 0.25 + Nature * 0.25 + Stores * 0.15 + transportation * 0.15 + Food * 0.1 + Coffee * 0.1**
<br/><br/>
* **Cluster the neighborhoods**.
k means is used to cluster the neighborhoods into 6 groups. The features I will use are ‘Interested index’, ‘rental price’, ‘percentage of families with children’ of each neighborhood. 
Folium map is then used to visualize the locations of the 6 clustered groups. This will allow Betty to have an overview about the 6 clusters.
<br/><br/>
* **Check each cluster and pick up the one fits Betty's wishes most**. 
Based on the principle of lower ‘rental price’, higher ‘percentage of families with children’ and higher ‘interested index’, I will pick up the cluster that fits the wishes of Betty most. Folium map is then used to visualize the neighborhoods that fits her wished most to give her an impression about the locations of these neighborhoods. 

## 4. Analysis

### Thanks for IBM data science certificate from coursera!
This notebook is part of a course on **Coursera** called _Applied Data Science Capstone_.

## Change Log

| Date (YYYY-MM-DD) | Version | Changed By    | Change Description         |
| ----------------- | ------- | ------------- | -------------------------- |
| 2021-06-22        | 1.0     | Xiaxia        |                            |
|                   |         |               |                            |