# Coursera - Applied Data Science Capstone
## Week 4
### The Battle of Neighborhoods

This notebook is part of the assignment of Week 4 of the Applied Data Science Capstone course on Coursera. 

### Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li>Introduction</li>
    <li>Data</li>
    <li>Methodology</li>
    <li>XXX</li>
    <li>XXX</li>
</ol>
    
</div>
 
<hr>

### <b>1. Introduction</b>

Universities in the UK are among the best worldwide - attracting thousands of students from around the globe per year. Major focus is usually given to bigger and more well-known cities like London, Cambridge or Oxford (Figure 1). 

<img src="UniRanking.PNG" width=1000/>


<center>Figure 1. University League Table 2020 <a href=https://www.thecompleteuniversityguide.co.uk/league-tables/rankings>[1]</a></center> </br>


But what about other parts of the UK? One region for studying among some of the top universities, is the East Midlands, which is one of the nine official regions of England in the eastern part of central England. It comprises the counties of Derbyshire, Leicestershire, Rutland, Nottinghamshire, Lincolnshire and Northamptonshire and is home to the beautiful Peak District <a href=https://en.wikivoyage.org/wiki/East_Midlands>[2]</a>.


Having experienced and currently still experiencing university life abroad and here in the UK, the focus of this project will be on providing students in the East Midlands some additional information for their life at university. While there are many things to consider when it comes to beginning your studies, one important aspect is the housing situation. For many, it is the first time living away from home, having to take responsibility for paying bills, etc. As such, we will be looking at places to rent, concentrating on the city of Nottingham. In addition, we will investigate other aspects that might be important for students when thinking about the location of their accommodation, such as entertainment, proximity to university, and public transport. 

### <b>2. Data</b>

For this project, we will require data about the available properties for rent in Nottingham, geo location data, and information about nearby things to do. Thus, we will use datasets from the following sources:

1. Data from <a href=https://www.zoopla.co.uk/ rel="nofollow" >Zoopla</a>. Zoopla is one of the UK's largest online real estate portals, where one can find properties for rent and sale. Easy access to these data for further processing are provided through their very own <a href=https://developer.zoopla.co.uk/ rel="nofollow" >Zoopla API</a>. We use the python wrapper <a href=https://pypi.org/project/zoopla/ >Zoopla API</a> as it facilitates working with the API. One limitation that is to note here, is that the Zoopla API limits the size for each page of results to a 100. To have a better working dataset, the API call was repeated a few times for different result pages.

2. To plot the properties in their appropriate districts, the <a href='https://www.opendatanottingham.org.uk/dataset.aspx?id=160' >data</a> for the electoral ward boundaries from 2019 for Nottingham City have downloaded as a json file and used in conjunction with a folium map. 

3. Using the <a href='https://developer.foursquare.com/' >Foursquare City Guide Developer API</a> provides us with things to do in Nottingham, such as a list of places to eat, shop and visit.

#### <b>2.1 Load Dependencies</b>

In [5]:
%%capture

import pandas as pd
import numpy as np
!pip install lxml html5lib beautifulsoup4
!pip install seaborn
import seaborn as sns
!pip install zoopla
from zoopla import Zoopla
from json import JSONDecoder
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline 

print('Libraries imported.')

#### <b>2.2 Scrape required Data from Websites</b>

<b>_a. Zoopla_</b>

In [2]:
# read API key from file
with open("zooplaAuthentication.txt","r") as output:
    lines=output.readlines()

In [3]:
# make call to Zoopla API
zoopla = Zoopla(api_key=lines[0])

search_res = zoopla.property_listings({
    'page_size': 100,
    'page_number': 1,
    'listing_status': 'rent',
    'area': 'Nottingham',
    'summarised': 'yes',
    'radius': 0.1,
    'new_homes': 'no'
})

<b>_b. Wikipedia_</b>

In [4]:
# scrape the following Wikipedia page to obtain the data that is in the table of postal codes 
urlWNG = 'https://en.wikipedia.org/wiki/NG_postcode_area'
dfs = pd.read_html(urlWNG)

##### Now that we have collected the data we need, let's take a closer look at them in the next section. 

### <b>3. Methodology</b>

<b>Analyse Data</b>

##### Let's have a look at the data we collected. For this, we will inspect the data and run some statistical analysis to better understand what we need to further do.

_to follow..._