This notebook follows an analysis of Barcelona’s second-hand properties economy. Numerous are analyses of Barcelona's real estate market, a market that is very much influenced by foreign investors, so there tends to be more activity and somewhat different prices than surrounding areas that are not much in demand by foreigners.

Urban residential house prices depend on two broad factors:
1. tangible factors: characteristics of the house
2. intangible factors: neighborhood characteristics, services, and environment. 
This means that real house prices depend on a number of factors not only on the characteristics of the house but on the characteristics of the residential environment and the location of the house too. The relationship between housing preferences and the characteristics of the neighborhood is complex.
It is hard to assess the contribution of the different neighborhood aspects to the property price, because of different attributes of the residential environment show high correlations with each other. It's necessary to consider also that different housing submarkets exist, in which dwellings in one submarket are not realistic substitutes for dwellings in the other. 

Since the 70's numerous neighborhood externalities have been evaluated for their impact on residential property values, including rail transit stations, greenbelts and open spaces, brownfields, churches and landfills.
However, fewer studies (especially in Europe) consider the impact of commercial property development on residential property values.

A couple of prime examples of commercial property that impacts property values of a neighborhood positively would be lively nightlife venues, art galleries, or coffee shops. Reports have shown that a smaller, neighborhood movie theater, just as an example, can increase a home’s property value by as much as <a href="http://homeguides.sfgate.com/effects-commercial-property-residential-value-7923.html">14 to 30 percent</a>. Smaller movie theaters are very much welcomed in residential areas, especially if they are in an inner-city, trending neighborhood. Much the same impact can be seen with grocery stores, and homes <a href="https://www.zillow.com/research/starbucks-home-value-appreciation-8912/">near Starbucks locations appreciate</a> faster than the typical U.S. home.

This analysis, that can be used by Barcelona house buyers and sellers, real estate agents and invenstors, serves to discover the effects of different venues(obtained using the Foursquare API) on the 2nd-hand housing market.These effects are accounted for at the neighborhood level.

## 1. Loading Data and Packages

In [105]:
# Basic libraries
import pandas as pd
import numpy as np
import re


# Web-scraping libraries
import requests
import lxml.html as lh
from bs4 import BeautifulSoup

# Plot libraries
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import seaborn as sns
%matplotlib inline
plt.style.use('bmh')

# Geo-rendering library
import folium

# Iphyton libraries
from IPython.display import Image
from IPython.core.display import HTML 

# import kmean for later clustering
from sklearn.cluster import KMeans

In [2]:
!conda install -c conda-forge shapely --y

Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs:
    - shapely


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    basemap-1.2.0              |py36h673bf1a_1001        15.1 MB  conda-forge
    conda-4.6.8                |           py36_0         876 KB  conda-forge
    geos-3.7.1                 |    hf484d3e_1000         1.6 MB  conda-forge
    openssl-1.1.1b             |       h14c3975_1         4.0 MB  conda-forge
    shapely-1.6.4              |py36h092830f_1002         330 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        21.9 MB

The following NEW packages will be INSTALLED:

  shapely            conda-forge/linux-64::shapely-1.6.4-py36h092830f_1002

The following packages will be UPDATED:

 

In [244]:
# planar features library
from shapely.geometry import Polygon

### Loading price data
In the Barcelona's website is possible to find the 2018 average price for 2nd hand houses: http://www.bcn.cat/estadistica/catala/dades/timm/ipreus/hab2mave/evo/t2mab.htm.

In [10]:
url = 'http://www.bcn.cat/estadistica/catala/dades/timm/ipreus/hab2mave/evo/t2mab.htm'
# Create a handle, page, to handle the contents of the website
page = requests.get(url)

# Store the contents of the website under doc
doc = lh.fromstring(page.content)
# Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')[9:82]
# Create empty list, I will use it to create a df
barrios_list=[]

# Loop throug the tr elements, use regex to extract relevant informations
for t in tr_elements:
    string_html= t.text_content()
    barrio_id= re.search("(\d+\.) ([a-zA-Z\w'\- ]*)", string_html).groups()
    barrio_num= re.findall('(\d.\d{3})', string_html)
    if barrio_num != []:
        barrios_list.append({'Barrio Id': barrio_id[0], 'Barrio': barrio_id[1], 'Avg Price m2': barrio_num[-1]})
    else:
        barrios_list.append({'Barrio Id': barrio_id[0], 'Barrio': barrio_id[1], 'Avg Price m2': None})

Let's create a dataframe, not all the neigborhoods have a price: i will exclude them

In [11]:
# creating the df for prices
df_house_price= pd.DataFrame(barrios_list, columns= ['Barrio', 'Barrio Id', 'Avg Price m2'])

# converting price string(x.xxx) to int
df_house_price['Avg Price m2']= df_house_price['Avg Price m2'].str.replace('.', '', regex=False)

df_house_price.dropna(inplace= True) # drop neighboord without a price
df_house_price['Avg Price m2']= df_house_price['Avg Price m2'].astype(int)

# converting barrio_id, inserting liding 0 for less than 2 figures, for Geojson file compatibility
df_house_price['Barrio Id']= df_house_price['Barrio Id'].str.replace('.', '').str.zfill(2)

df_house_price[:5]

Unnamed: 0,Barrio,Barrio Id,Avg Price m2
0,el Raval,1,4034
1,el Barri Gòtic,2,4660
2,la Barceloneta,3,4815
3,Sant Pere,4,4689
4,el Fort Pienc,5,4500


### Obtaining neighborhoods coordinates

I am going to get the geojson file of the Barcelona's neighborhoods:

In [12]:
url= 'https://raw.githubusercontent.com/martgnz/bcn-geodata/master/barris/barris_geo.json'
geojson = requests.get(url).json()

# Loop through json file, calculate the neighborhoods' centroind
poly_list= []
for p in geojson['features']:
    barrio_poly= {}
    barrio_poly['Barrio Id']= p['properties']['C_Barri']
    try:
        barrio_poly['Latitude']= Polygon(p['geometry']['coordinates'][0]).centroid.y
        barrio_poly['Longitude']= Polygon(p['geometry']['coordinates'][0]).centroid.x
    except ValueError: # if more than one polygon
        barrio_poly['Latitude']= Polygon(p['geometry']['coordinates'][0][0]).centroid.y
        barrio_poly['Longitude']= Polygon(p['geometry']['coordinates'][0][0]).centroid.x
    poly_list.append(barrio_poly)

In [13]:
df_price_coord= df_house_price.merge(pd.DataFrame(poly_list), how= 'left', on= 'Barrio Id')
df_price_coord[:3]


Unnamed: 0,Barrio,Barrio Id,Avg Price m2,Latitude,Longitude
0,el Raval,1,4034,41.378963,2.170491
1,el Barri Gòtic,2,4660,41.381099,2.177446
2,la Barceloneta,3,4815,41.377203,2.190159
