# Capstone Project - The Battle of Neighborhoods (Week 1)

Adam Rubins

## Introduction

This is my final project for the [IBM Data Science Professional Certificate program](https://www.coursera.org/professional-certificates/ibm-data-science), submitted in the course [“Applied Data Science Capstone”](https://www.coursera.org/learn/applied-data-science-capstone) on [coursera](https://www.coursera.org/). The project is peer-reviewed. The objective of the project is to come up with a creative idea to leverage the Foursquare location data to explore or compare neighbourhoods or cities. Alternatively, to come up with a problem that can be solved by using the Foursquare location data.

### Business Problem

#### Project Goal

My project is devoted to exploring and clustering elementary schools in Tel Aviv (a major city located in the center of Israel), based on their proximity to other venues that are of particular interest to the client. The project aims to provide clients, who wish to live in close proximity to a school, with data concerning other venues in the area, in order to assist them in understanding and narrowing down their choices and eventually deciding where to rent an apartment. 

#### Background

I was inspired to create this project by a friend of mine, who is interested in moving with his family to the city of Tel Aviv. My friend, who currently lives in a suburban area, works in Tel Aviv and spends hours each day in traffic. He and his wife have decided to move to the city, sell their car and travel mainly by foot or, if necessary, public transportation. While he works in Tel Aviv, my friend is unfamiliar with most of its neighbourhoods and needs my advice on the best area to search for an apartment.

Despite Tel Aviv being my hometown, I soon realized how easily one could become overwhelmed when approaching such a decision. Tel Aviv is Israel’s second most populous city, with approximately 450,000 residents. It is the financial capital of Israel, with the largest economy per capita in the Middle East. It hosts countless venues, including numerous schools and education venues. Its many neighborhoods differ greatly and provide for a wide variety of living environments.

My friend and his wife have no special financial limitations, as their employers subsidize their rent. However, they have a rather **long list of preferences**, as they do not wish to rely on private transportation and are looking for the most suitable environment to raise their two children. They have a six-year-old boy about to attend elementary school and a baby girl. They also have a dog. They wish to live in an area in which many venues are at a walking distance, which they consider to be approximately 500 meters. They are looking for a child-friendly environment, preferably a neighborhood with a relatively young demographic and many families with children.

The most fundamental requirement, as far as my friend and his wife are concerned, is that they want to rent a place adjacent to a elementary school. However, they would also like to live nearby a day care for toddlers; a playground for toddlers; a kindergarten; green areas; a dog garden; a pizzeria and ice cream parlor for their weekly “family tradition” and, preferably, a neighborhood bar (or pub), so that they could go on dates and easily return home quickly in case of an emergency. Additionally, my friend’s background is somewhat traditional. He thus strongly prefers not to expose his children to shady places such as strip clubs and would rather live far away from such places. 

As indicated above, one of my friend’s main concerns is education. In Israel, there are different types of schools (public/private; Jewish/non-Jewish; secular/religious) and the education a student receives may differ greatly based on the school in which he or she are enrolled. My friend, for instance, wants to send his son to a public, Jewish, secular school, and does not know which schools in Tel Aviv fit these criteria. The Israeli Board of Education publishes this data. It is important to note that the Board of Education also [provides data](http://ic.education.gov.il/QvAJAXZfc/opendoc_pc.htm?document=ShkifutReports.qvw&host=qvsprodlb&sheet=SH02&lang=en-US) on the quality of each school (test scores, etc.), however, when I attempted to integrate this data I realized that it was partial and outdated (last updated in 2017). I consulted the “client” (i.e., my friend) and he asked that I discard this data, as he is currently less interested in the quality of the school and more focused on finding a neighborhood that would meet the holistic needs of his family. 
My project was originally intended to help my friend find a suitable place for him to live, by addressing his specific needs. However, the parameters he has set for me are not so unique; many of my friends consider similar parameters when deciding where to live and it will be easy to adapt this analysis to fit different parameters. I believe that this project could assist them in making informed decisions on their living situations. Thus, while I initially planned to choose a project in the field of finance, designed for a business-oriented target audience, I decided that it would be preferable to focus on a concrete problem designed to help an actual client, rather than on an abstract problem designed for theoretical clients. In so doing, I aspire to create a project that has real practical value, without compromising the main objective of the assignment, which is demonstrating my skillset using location data.      
Therefore, the objective of my project is to explore and segment schools in Tel Aviv, based on the parameters that the client has set for me. The project is designed so that the client receives the relevant data on each cluster, thus allowing him to understand his possibilities, narrow down his choices and reach an informed decision independently.    


### Target  Audience - the clients

My project was designed for a specific client - my friend, who is a 32 year old male, married with 2 children and a pet dog. This is a middle class family that intends to move to the city of Tel Aviv. However, the target audience for this project is far wider, and includes young families that wish to live in Tel Aviv, in close proximaty to an elementary school and in a child and familiy-friendly environment. 

**The Client's Particular Interest**

 The client's main particular interest is the proximaty to elementary schools. These schools must abide by several specific parameters:
  * State (public); ownership: city.
  * Stream of education: Jewish 
  * Type of education: regular (secular)
  
 There are additional parameters that are of particular interest to the client, as listed bellow. He wishes to live in a walking distance from these venues, which he considers to be 500 meters. The client strongly prefers there to be a large variety of these venues of interest in close proximaty (for example, there is great advantage in a wide variety of kindergartens and daycares, as many may have long waiting lists):
* kindergartens (see specific parameters listed above for elementary schools)
* Playgrounds with at least one facility for toddlers (mostly interested in the number of playground facilities available nearby)
* Green public areas (mostly interested in the total green area in meters)
* Dog gardens
* Pizza places
* Ice cream parlors
* Pubs/bars (specific types of venues with no interest to the client were filtered out, as detailed in the full report in foursquare data collection). 
* Client does not wish to live in close proximity to strip clubs.
* Population distribution by age – the client prefers to live in a relatively young neighborhood and considers 34 to be a young age. 

### Data introduction

I have used two main data sources. The first is TLV opendata, a free publicly available website provided by the Tel Aviv-Yafo Municipality **[TLV OpenData](https://opendata.tel-aviv.gov.il/en/Pages/home.aspx)** This website is dedicated to exposing the public datasets of the Tel Aviv-Yafo Municipality and is a great resource for getting data about Tel Aviv. However, many of the API’s are returning results in Hebrew and need to be translated. 
I use this website to retrieve the following data: 

1.	[Population distribution by city neighborhoods](https://opendata.tel-aviv.gov.il/en/pages/item.aspx?ids=1) – the data format is google sheet JSON. I use this data to calculate the proportion of young people (up to 34) in each neighborhood and eventually will match each school to the age of population in each neighborhood. Unfortunately, the most recent data available is from 2017. After consulting with the client, I will not use this data in the clustering process. However, I will use it to give the client an indication of the age distribution of the population, as well as to filter out schools located in neighborhoods where there are no people aged under 34.

1.	[API - developer portal](https://apiportal.tel-aviv.gov.il/) – among other things, this enables [GisLayers](https://apiportal.tel-aviv.gov.il/docs/services/59493e269f9e531074c17205/operations/59493ec69f9e531074c17209): a REST API for GIS (Geographic Information System) Layers. (Everything about municipal geographic) data. This data is provided in Hebrew and will be translated using google cloud translate (see below). From this API I will request the following JSON data:

   1.   Green areas (layer code 503, geometry: polygons) – For each school (from the elementary schools 2021 data), I will calculate the distance to each green area and will sum the total area (in meters) of all green areas whose territory begins within 500 meters from the school. 
   1. Neighborhoods (layer code 511, geometry: polygons) – I will match each school to the neighborhood in which it resides and will incorporate the population age distribution data to allow filtering out those schools that reside in neighborhoods with no young residents.
   1. Dog gardens (layer code 586, geometry: points) – for each school, I will count the number of dog gardens that are within 500 meters.   
   1. Kindergartens 2021 (layer code 598, geometry: points) – for each school, I will count the number of kindergartens that are within 500 meters.     
   1.	Elementary schools 2021 (layer code 599, geometry: points) - I will filter the data in accordance with the preferences set by the client and will use them as a proxy for their preferred place of residence.  
   1.	Recognized daycares for toddlers (layer code 624, geometry: points) – for each school, I will count the number of daycares that are within 500 meters.     
   1.	 Playgrounds (layer code 696, geometry: points) – I will filter only playgrounds with at least one facility for toddlers and will sum up the total number of playground facilities in the area.  


The second main data source I use is **Foursquare [Places API](https://developer.foursquare.com/docs/api)**, which offers real-time access to Foursquare’s global database of venue data and user content. As this is part of the project requirements, this is the first source I explored, however, it contained hardly any data on schools and other educational venues in Tel Aviv. Therefore, I searched for the missing data in the abovementioned open TLV, while using Foursquare Places API for data on the other categories. I utilized the Foursquare Places API [explore endpoint](https://developer.foursquare.com/docs/api/venues/explore) to find a list of the relevant venues near each school location. I only requested the following [venue categories](https://developer.foursquare.com/docs/resources/categories): 

* For the client’s family weekly tradition:
  *	Ice cream shop (category id.: 4bf58dd8d48988d1c9941735) – I group all venues that are returned under “ice cream”.
  *	Pizza place (category id.: 4bf58dd8d48988d1ca941735) – I group all venues that are returned under “pizza”.
* For fun for parents – – I group all venues that are returned under both categories below under “parents’ fun”, apart from venues that the client has instructed me to ignore (e.g., restaurants, hotels – the full list will be detailed in the full report in filtering process):
  * Bar (category id.: 4bf58dd8d48988d116941735)
  *	Pub (category id.: 4bf58dd8d48988d11b941735)
* Client’s request not to live near strip clubs (category id.: 4bf58dd8d48988d1d6941735)

Upon gathering all of the relevant information for each school, I utilize this information to segment the schools using KMeans clustering algorithm. This will allow me to characterize each cluster and hopefully to find the most relevant clusters for the client. Either way, it will help the client understand all the available options. I will try to highlight at least one school in each cluster that is most compatible with the parameters set by the client. Additionally, I will create a function that allows the client to get a full report for any school ID. 

In addition to these two main sources, I use a couple of google API’s for the following reasons:

* Google Maps – used to retrieve Tel Aviv-Yafo coordinates (latitude and longitude) and center most of my visualizations according to these coordinates. 
* Google Cloud Translate Client – most of the data that is returned by the Tel-Aviv Municipality API is in Hebrew. Even the fields’ names that are written in English letters are mostly in phonetic Hebrew. Even though my client and I are fluent in Hebrew, in order to facilitate peer review for this analysis, I will translate (to English) the fields’ names and any fields’ values that will be used for filtering or understanding the data analysis. I will not translate the names of the venues (such as schools, day cares, kindergartens, etc.) and will be using system Id's in the report. Because it will be a pain to manually translate all of the relevant data I will be using **google cloud translate client** to do the translation for me. When it does not preform properly, I manually update my dictionary. 
