This notebook reads the dataset, which the 'merge_and_clean_the dataset.ipynb' has created and then creates a new dataset with the mean values per room type option per neighbourhood and overall mean per neighbourhood. 

The geojson file has all listed neighbourhoods, however the dataset has only data for some of them. Thus, it creates a new geojson file with a limited number neighbourhoods and the geometries.

#### Import the libraries

In [1]:
import numpy as np
import pandas as pd
import geopandas
import folium
from folium.features import DivIcon


import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas


#### Read the data

Read the csv file which have been created by the notebook 'merge_and_clean_the dataset.ipynb'.

In [2]:
data=pd.read_csv('airbnb_prices_in_6_european_cities.csv')
data

Unnamed: 0.1,Unnamed: 0,Price,Room type,Person capacity,Longitude,Latitude,City,Neighbourhood
0,0,194.03,Private room,2,4.90569,52.41772,Amsterdam,Noord-West
1,1,344.25,Private room,4,4.90005,52.37432,Amsterdam,Centrum-West
2,2,264.10,Private room,2,4.97512,52.36103,Amsterdam,IJburg - Zeeburgereiland
3,3,433.53,Private room,4,4.89417,52.37663,Amsterdam,Centrum-West
4,4,485.55,Private room,2,4.90051,52.37508,Amsterdam,Centrum-West
...,...,...,...,...,...,...,...,...
17784,17784,310.45,Private room,2,-0.21207,51.48667,London,Hammersmith and Fulham
17785,17785,265.06,Entire home,4,-0.05459,51.52018,London,Tower Hamlets
17786,17786,142.29,Private room,2,-0.12056,51.42875,London,Lambeth
17787,17787,372.30,Private room,2,-0.12810,51.44023,London,Lambeth


#### Delete the column 'Unnamed: 0'

In [3]:
data = data.drop('Unnamed: 0', axis=1)
data.head()

Unnamed: 0,Price,Room type,Person capacity,Longitude,Latitude,City,Neighbourhood
0,194.03,Private room,2,4.90569,52.41772,Amsterdam,Noord-West
1,344.25,Private room,4,4.90005,52.37432,Amsterdam,Centrum-West
2,264.1,Private room,2,4.97512,52.36103,Amsterdam,IJburg - Zeeburgereiland
3,433.53,Private room,4,4.89417,52.37663,Amsterdam,Centrum-West
4,485.55,Private room,2,4.90051,52.37508,Amsterdam,Centrum-West


# Read the geojson files

I read the geojson files and I change name of the column neighbourhood -> Neighbourhood

In [4]:
nbh_geo_amsterdam = geopandas.read_file('Amsterdam_neighbourhoods/neighbourhoods.geojson', driver='GeoJSON')
nbh_geo_amsterdam = nbh_geo_amsterdam.rename({'neighbourhood': 'Neighbourhood'}, axis=1)

nbh_geo_lisbon = geopandas.read_file('Lisbon_neighbourhoods/neighbourhoods.geojson', driver='GeoJSON')
nbh_geo_lisbon = nbh_geo_lisbon.rename({'neighbourhood': 'Neighbourhood'}, axis=1)

nbh_geo_london = geopandas.read_file('London_neighbourhoods/neighbourhoods.geojson', driver='GeoJSON')
nbh_geo_london = nbh_geo_london.rename({'neighbourhood': 'Neighbourhood'}, axis=1)

nbh_geo_paris = geopandas.read_file('Paris_neighbourhoods/neighbourhoods.geojson', driver='GeoJSON')
nbh_geo_paris = nbh_geo_paris.rename({'neighbourhood': 'Neighbourhood'}, axis=1)

nbh_geo_rome = geopandas.read_file('Rome_neighbourhoods/neighbourhoods.geojson', driver='GeoJSON')
nbh_geo_rome = nbh_geo_rome.rename({'neighbourhood': 'Neighbourhood'}, axis=1)

nbh_geo_vienna = geopandas.read_file('Vienna_neighbourhoods/neighbourhoods.geojson', driver='GeoJSON')
nbh_geo_vienna = nbh_geo_vienna.rename({'neighbourhood': 'Neighbourhood'}, axis=1)
# Make the change that at some listings 
for i in range(nbh_geo_vienna.shape[0]):
    if nbh_geo_vienna.iloc[i,0]=='Rudolfsheim-Fnfhaus':
        nbh_geo_vienna.iloc[i,0]='Rudolfsheim-Fünfhaus'
    elif nbh_geo_vienna.iloc[i,0]=='Landstra§e':
        nbh_geo_vienna.iloc[i,0]='Landstrasse'
    elif nbh_geo_vienna.iloc[i,0]=='Whring':
        nbh_geo_vienna.iloc[i,0]='Währing'
    elif nbh_geo_vienna.iloc[i,0]=='Dbling':
        nbh_geo_vienna.iloc[i,0]='Döbling'


nbh_geo_files = {'Amsterdam': nbh_geo_amsterdam,
                 'Lisbon': nbh_geo_lisbon,
                 'London': nbh_geo_london,
                 'Paris': nbh_geo_paris,
                 'Rome': nbh_geo_rome,
                 'Vienna': nbh_geo_vienna}

In [5]:
nbh_geo_files['Vienna']

Unnamed: 0,Neighbourhood,neighbourhood_group,geometry
0,Leopoldstadt,,"MULTIPOLYGON (((16.38484 48.22616, 16.38495 48..."
1,Landstrasse,,"MULTIPOLYGON (((16.38681 48.21271, 16.38683 48..."
2,Innere Stadt,,"MULTIPOLYGON (((16.36497 48.21590, 16.36498 48..."
3,Brigittenau,,"MULTIPOLYGON (((16.38595 48.24764, 16.38611 48..."
4,Floridsdorf,,"MULTIPOLYGON (((16.37817 48.28858, 16.37819 48..."
5,Donaustadt,,"MULTIPOLYGON (((16.48378 48.17615, 16.48358 48..."
6,Liesing,,"MULTIPOLYGON (((16.33924 48.15405, 16.33948 48..."
7,Alsergrund,,"MULTIPOLYGON (((16.34255 48.21837, 16.34259 48..."
8,Penzing,,"MULTIPOLYGON (((16.27508 48.21508, 16.27512 48..."
9,Mariahilf,,"MULTIPOLYGON (((16.34200 48.19634, 16.34424 48..."


The function below is calculating the mean values per room type option and per neighbourhood and the overall mean per neighbourhood. It takes the data for a specific city and returns a dataset with specific columns ,including the calculated means per neighbourhood, and the new geojson files.

In [6]:
def find_the_means(df,city,geo_file):
    
    # This part calculates the mean values per Neighbourhood for the option 'Private room'
    check_priv_room = pd.DataFrame(data[(data['Room type']=="Private room") & 
                                  (data.City==city)].groupby("Neighbourhood").Price.mean()).reset_index()
    check_priv_room = check_priv_room.merge(geo_file, on="Neighbourhood")
    check_priv_room = check_priv_room.drop(["neighbourhood_group"], axis=1)
    check_priv_room = check_priv_room.rename(columns={'Price':'Mean Priv. room',
                                                      'geometry':'geometry_priv_room'})
    
    
    # This part calculates the mean values per Neighbourhood for the option 'Entire home'
    check_ent_home = pd.DataFrame(data[(data['Room type']=="Entire home") & 
                                       (data.City==city)].groupby("Neighbourhood").Price.mean()).reset_index()
    check_ent_home = check_ent_home.merge(geo_file, on="Neighbourhood")
    check_ent_home = check_ent_home.drop(["neighbourhood_group"], axis=1)
    check_ent_home = check_ent_home.rename(columns={'Price':'Mean Ent. home',
                                                    'geometry':'geometry_ent_home'})

    
    # This part calculates the mean values per Neighbourhood (irregardless the Room type)
    overall_mean_per_neighbourhood = pd.DataFrame(data[data.City==city].groupby("Neighbourhood").Price.mean()).reset_index()
    overall_mean_per_neighbourhood = overall_mean_per_neighbourhood.rename(columns={'Price':'Overall Mean'})


    # I merge (join) the datasets. Nan values will be created
    if len(check_priv_room.geometry_priv_room.values)>=len(check_ent_home.geometry_ent_home.values):
        merged = check_ent_home.merge(check_priv_room, how='right', on='Neighbourhood')
        merged = merged.drop(['geometry_ent_home'], axis=1)
        merged = merged.rename(columns={'geometry_priv_room':'geometry'})
    else:
        merged = check_ent_home.merge(check_priv_room, how='left', on='Neighbourhood')
        merged = merged.drop('geometry_priv_room', axis=1)
        merged = merged.rename(columns={'geometry_ent_home':'geometry'})
    
    
    # The final merged dataset;
    # I add a column for the city;
    # I recreate the dataset with the columns being in a specific row
    final_merged = overall_mean_per_neighbourhood.merge(merged,on="Neighbourhood")
    column_city = [city]*final_merged.shape[0]
    final_merged["City"] = column_city
    final_merged = final_merged[['City', 'Neighbourhood', 'Overall Mean','Mean Ent. home', 
                                 'Mean Priv. room', 'geometry']]
    
    
    # This part calculates the centroid point per Neighbourhood and I add it in the dataset
    geo1 = geo_file.copy()
    geo1['Centroid']=geo1.geometry.centroid.values
    geo1 = geo1.drop(['neighbourhood_group','geometry'], axis=1)
    final_merged = final_merged.merge(geo1,on="Neighbourhood")
    
    
    # This part checks which are the neighbourhoods that have been included in the final dataset
    # and removes from the copied version of geo_file those who does not exist. I return this one
    # as well!
    geo_new = final_merged[["Neighbourhood",'geometry']]
    geo_new = geopandas.GeoDataFrame(geo_new, geometry='geometry')
    
    
    return final_merged, geo_new

#### I use the function above for each city and append the results on a dataset

In [7]:
means_summary, new_nbh_geo_file_amsterdam = find_the_means(data,'Amsterdam',nbh_geo_files['Amsterdam'])

new_nbh_geo_file = {'Amsterdam':new_nbh_geo_file_amsterdam}

for city in data.City.unique()[1:]:
    
    mitsos, new_geo_file = find_the_means(data,city,nbh_geo_files[city])
    
    means_summary = means_summary.append(mitsos, ignore_index=True)
    
    new_nbh_geo_file[city] = new_geo_file


  geo1['Centroid']=geo1.geometry.centroid.values

  geo1['Centroid']=geo1.geometry.centroid.values

  geo1['Centroid']=geo1.geometry.centroid.values

  geo1['Centroid']=geo1.geometry.centroid.values

  geo1['Centroid']=geo1.geometry.centroid.values

  geo1['Centroid']=geo1.geometry.centroid.values


In [8]:
means_summary.City.value_counts()#[means_summary.]

Vienna       23
Lisbon       23
Amsterdam    22
Paris        20
Rome         14
London       13
Name: City, dtype: int64

I define a function for the visualization of the results, just only to check the out the visualization

In [9]:
def visualization_with_folium(df, x, center, geo_city):
    
    map = folium.Map(location=center,
                     tiles="cartodbpositron",zoom_start=12.5) 
    folium.Marker(location=center,
                  icon = folium.Icon(color='black')).add_to(map)

    folium.Choropleth(geo_data=geo_city,
                      name="choropleth",
                      data=df[x],
                      columns=["Neighbourhood", x],
                      key_on="feature.id",
                      fill_color="RdYlGn",
                      fill_opacity=0.6,
                      line_opacity=0.6,
                      legend_name="Mean of Total Prices",).add_to(map)

    folium.LayerControl().add_to(map)
    
    for i,location in enumerate(df.Centroid.values):
        loc=[location.y,location.x] #+0.004
        folium.map.Marker(loc, icon=DivIcon(icon_size=(20,20),
                                            icon_anchor=(0,0),
                                            html='<div style="font-size: 10pt">%s</div>' %df['Neighbourhood'][i])).add_to(map)


    return map

In [10]:
#[Latitude,Longitude]
capitals_lat_lng = {'Amsterdam': [52.377956,4.897070],
                    'Lisbon': [38.736946, -9.142685],
                    'London': [51.509865, -0.118092],
                    'Paris': [48.864716, 2.349014],
                    'Rome': [41.902782, 12.496366],
                    'Vienna': [48.210033, 16.363449]}

### Check Lisbon

In [11]:
visualization_with_folium(means_summary, 'Overall Mean',
                          capitals_lat_lng['Lisbon'],
                          new_nbh_geo_file['Lisbon'])

In [12]:
visualization_with_folium(means_summary, 'Mean Priv. room',
                          capitals_lat_lng['Lisbon'],
                          new_nbh_geo_file['Lisbon'])

In [13]:
visualization_with_folium(means_summary, 'Mean Ent. home',
                          capitals_lat_lng['Lisbon'],
                          new_nbh_geo_file['Lisbon'])

# Store the mean values into a new file

I will store the mean values into a new csv and it will be used directly in streamlit

In [14]:
means_summary.to_csv("mean_values_per_neighbourhoods.csv")

# Store the new neighbourhoods geojson files

I will store the neighbourhoods with mean values into geojson files for each city.

In [15]:
new_nbh_geo_file['Amsterdam'].to_file("Amsterdam_neighbourhoods/new_neighbourhoods.geojson", driver="GeoJSON")
new_nbh_geo_file['Vienna'].to_file("Vienna_neighbourhoods/new_neighbourhoods.geojson", driver="GeoJSON")
new_nbh_geo_file['Rome'].to_file("Rome_neighbourhoods/new_neighbourhoods.geojson", driver="GeoJSON")
new_nbh_geo_file['Lisbon'].to_file("Lisbon_neighbourhoods/new_neighbourhoods.geojson", driver="GeoJSON")
new_nbh_geo_file['Paris'].to_file("Paris_neighbourhoods/new_neighbourhoods.geojson", driver="GeoJSON")
new_nbh_geo_file['London'].to_file("London_neighbourhoods/new_neighbourhoods.geojson", driver="GeoJSON")