# Finding the optimum spot for the Halewijn Award Ceremony in Haarlem, the Netherlands. 
#### Applied Data Science Capstone by IBM/Coursera
   
## *CRW Korver*, the Netherlands

December 15, 2019 \
email author: <crwkorver@hotmail.com>
***

## Table of contents
* [Introduction: Business Problem](#1)
* [Data](#2)
* [Methodology](#3)
* [Analysis](#4)
* [Results and Discussion](#5)
* [Conclusion](#6)

***



## Introduction: Business Problem <a name="1"></a>

The [**Halewijn Prijs**](https://nl.wikipedia.org/wiki/Halewijnprijs,_literatuurprijs_van_de_stad_Roermond) is a Dutch national award, intended for literary talent that, based on quality and irresistibility of his or her published work, deserves extra interest. The prize consists of a cash prize and a bronze small statue made by Dick van Wijk. 

Recently, it was decided that the yearly Award Ceremony will take place in the Haarlem region of the Netherlands. To attain maximal public attendance, the ideal neighborhood would be in the vicinity of the railway station in order to facilitate traveling by public means in a car-crowded city like Haarlem. Furthermore, to maximize potential book sales, a location should be found that is in the near vicinity of book shops around. Finally, ample restaurants should be present here as well, to accommodate the audience (and the winner and jury members, who will dine together afterwards.

The aim of this project is to find an optimal location for the Award Ceremony. The report will be targeted to the organizing committee of the Foundation Circle Halewijn, but could be useful to other stakeholders such as book retailers and literary organizations.

First, we will localize **existing book shops in the Haarlem region**. Second, we are interested in **areas with ample restaurants in vicinity**. Results will be combined, and finally confined to an area **within a walking distance of 2.5 kilometers from the Central Railway Station**. 

Some eligible neighborhoods, based on the above mentioned criteria will be presented. Additional advantages of each area will be given, in order to make a reasonable choice for the forthcoming location of the Halewijn Award ceremony in Haarlem. 


## Data <a name="2"></a>

In order to meet the above mentioned criteria, we should gather insight in 
* the number of existing bookshops in the various neighborhoods of Haarlem;
* the distance of these neighborhoods from the Central Railway Station;
* the number of and distance to restaurants in the neighborhood, if any;
* the number of popular spots in each neighborhood.

Definition of neighborhoods is based on the Dutch Postal Code dictionary.

The following data sources were used to extract the information required:
* **Nederlandse Postcodetabel** > Dutch Postal Code Dictionary including coordinates, in Excel xlsx format ([documentation](http://www.sqlblog.nl/postcodetabel-nederland-sql-script/)). 
    NB: A Dutch Postal Code API is [available](https://www.postcodeapi.nu/?gclid=EAIaIQobChMIpPeUz86P5gIVC553Ch2RcwO5EAAYASAAEgLMEvD_BwE), but only as a paid service.
* **Foursquare API** > number of book shops and restaurants as well as location in every neighborhood. ([documentation](https://developer.foursquare.com/docs))
* **GeoPy** > a Python 3 client for third-party geocoding web services: coordinates of Haarlem Railway Station, book shops, restaurants as well as popular spots. ([documentation](https://geopy.readthedocs.io/en/stable//))

In [14]:
# Only data: coordinates of all postcodes in Haarlem, book shops from Foursqaure API, map containing station, bookshops and popular spots

### Subhead

### Subhead


## Methodology <a name="3"></a>

In this project we will identify areas of Haarlem with high book shop density, particularly those with restaurants present. The analysis will be limited to an area ~ 2.5km around the Central Station.

First, the city of Haarlem was split up in Postal Code neighborhoods. From each neighborhood data were collected (number of book shops and their locations). Then, from the resulting top 5 neighborhoods, the number of restaurants present were extracted. Finally, results were further confined to contain only those neighborhoods within walking distance of the Central Station. 

Second step will be calculation and exploration of '**book shop density**' across different areas of Haarlem. We will use Folium to visually identify some promising areas close to the Railway Station and with a high number of restaurants. Results will be confined to those areas.

Finally, we will focus on 1-3 promising areas and extract potential popular spots, if any. 
We will present a map with these locations.
This will enable the organizing committee of the Halewijn Foundation to do 'on spot' research to make their decision where the Halewijn Award Ceremony will be presented. 


## Analysis <a name="4"></a>

First explanatory data analysis. Additional info from the raw data. E.g. **number of book shops in every area candidate**:

In [None]:
# 1) nr bookshops in each postcode neighborhood
# 2) distance from each center to station
# 3) visualization
# 4) number of restaurants in vicinity (radius of 500 meters)
# 5) map with optimum locations
# 6) K means clustering of locations


Now **clustering** locations to create **centers with high book shop concentrations**. Those zones, their centers and addresses will be the final result of the analysis. 


## Results and Discussion <a name="5"></a>

Our analysis shows bla bla


## Conclusion <a name="6"></a>

From the results so far, we might conclude that optimal neighborhoods for the Halewijn Award Ceremony comprise ***
These neighborhoods are located within walking distance of the Central Railway Station, encompass many bookshops, while >5 restaurants are in the immediate vicinity. 
The final decision on optimal location might depend on specific characteristics of these neighborhoods, e.g. proximity to popular spots. 