# Amsterdam Neighborhoods, properties and conditions

Let's import useful packages:

In [1]:
import gzip
import json
import csv
import pandas as pd

In [2]:
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
import numpy as np

Spatial visualizatios using **folium** to create interactive maps with Python and Leaflet.js:

In [4]:
import sys
# !{sys.executable} -m pip install -U folium

In [5]:
import folium

In [6]:
listing = pd.read_csv('../Data/raw/listings.csv.gz', 
                      compression='gzip',
                      error_bad_lines=False, 
                      low_memory=False)

`info()` and `head()` provide us the big picture of listing data

In [7]:
listing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20025 entries, 0 to 20024
Columns: 106 entries, id to reviews_per_month
dtypes: float64(23), int64(21), object(62)
memory usage: 16.2+ MB


In [8]:
listing.drop_duplicates(inplace=True)

A brief inspection of columns enable us to split them according to the following **different categories of information**:

1. **General information of booking**: id, name, summary, description, space, experiences_offered, neighborhood overview, notes, transit, access, interaction, house_rules. To access that, select columns from `id` to `house_rules`.

2. **Host information**: id, name, location, about, response time, response rate, acceptance rate, listings count, verifications. In this case, select columns from `host_id` to `host_identity_verified`.

3. All about the **neighborhood**: city, state, market, cleansed. Extract this information selecting columns from `street` to `is_location_exact`.

4. Details of **property and conditions** are between the columns `property_type` and `maximum_nights_avg_ntm`.

5. **Calendar updates and availability** the next 30, 60 and 90 days, for instances. You can find this information between the columns `calendar_updated` and `calendar_last_scraped`.

6. **Reviews**: first and last review, number of reviews, scores rating, scores per accuracy, cleanliness, checkin, communication, location and value. Columns from `number of reviews` to `review_scores_value`.

7. **Verifications and licenses**: require guest profile picture, phone verification, cancellation policy, instant bookeable. Columns from `requires_license` to `require_guest_phone_verification`.

8. **Host listings counts** and reviews per month. Select from `calculated_host_listings_count` to `reviews_per_month`.

## 3. Neighborhoods

In this section, location, neighborhood's features will be analyze.

In [9]:
neighborhoods = listing.loc[:, 'street':'is_location_exact']

In [10]:
neighborhoods.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20025 entries, 0 to 20024
Data columns (total 14 columns):
street                          20025 non-null object
neighbourhood                   19221 non-null object
neighbourhood_cleansed          20025 non-null object
neighbourhood_group_cleansed    0 non-null float64
city                            20020 non-null object
state                           19820 non-null object
zipcode                         19198 non-null object
market                          19991 non-null object
smart_location                  20025 non-null object
country_code                    20025 non-null object
country                         20025 non-null object
latitude                        20025 non-null float64
longitude                       20025 non-null float64
is_location_exact               20025 non-null object
dtypes: float64(3), object(11)
memory usage: 2.3+ MB


Defining a generic location using the average neighborhood for latitude and longitude:

In [11]:
AMSTERDAM_COORDINATES = (neighborhoods.latitude.mean(), neighborhoods.longitude.mean())

The function **`baseMap()`** defines a generic map object with default values for `default_location` of the city (average of latitude and longitude of all the bookings in the database as the center location), `default_tiles` (style of map available in **folium**), a `default_zoom` (default magnification level of the map) and a `default_control_scale` that allow us enables/disables the map scale for a given zoom level.

In [12]:
def baseMap(default_location=AMSTERDAM_COORDINATES, 
            default_tiles='OpenStreetMap', 
            default_zoom_start=12, 
            default_control_scale=False):
    base_map = folium.Map(location=default_location, 
                          tiles = default_tiles, 
                          control_scale=default_control_scale, 
                          zoom_start=default_zoom_start)
    return base_map

In [13]:
base_map = baseMap()
display(base_map)

In [14]:
base_map.save('../Data/maps/base_map.html')

The **`Heatmap()`** class function is used to overlay a heat map over the map object created previously. We define an extra column `count` for this overlapping of maps.

In [15]:
neighborhoods['count'] = 1

In [16]:
from folium.plugins import HeatMap

HeatMap(data=neighborhoods[['latitude', 'longitude', 'count']].groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)

<folium.plugins.heat_map.HeatMap at 0x1069e3ac8>

In [17]:
base_map

In [18]:
base_map.save('../Data/maps/head_map.html')

From the Heatmap above, it can be seen that there is more bookings near to the center of the city. In the following map every listing is drawn as a circle. To do that, `tiles` changes to `Stamen Terrain` for avoid extra colorful information on the map. Only five hundred of listings are displayed.

In [19]:
base_map_circles = baseMap(default_tiles='Stamen Terrain')

# for speed purposes
MAX_RECORDS = 500
 
# add a marker for every record in the filtered data, use a clustered view
for each in neighborhoods[0:MAX_RECORDS].iterrows():
    folium.CircleMarker(
        location = [each[1]['latitude'], each[1]['longitude']], 
        clustered_marker = True, tiles='Stamen Toner').add_to(base_map_circles)

In [20]:
display(base_map_circles)

In [21]:
base_map_circles.save('../Data/maps/base_map_circles.html')

## Geo-Data Amsterdam Neighborhoods

Using the geodata from the neighborhoods, we are able to draw the different zones and investigate how is the distribuition of some features, as the number of listings or number of superhosts in every area:

In [22]:
with open("../Data/raw/neighbourhoods.geojson") as json_file:
    geo_data = json.load(json_file)

In [23]:
geo_neighborhoods = baseMap(default_tiles='Stamen Terrain')

folium.Choropleth(
    geo_data=geo_data,
    name='choropleth',
    fill_opacity=0.5,
    line_opacity=0.8
).add_to(geo_neighborhoods)

folium.LayerControl().add_to(geo_neighborhoods)

<folium.map.LayerControl at 0x1069e3518>

In [24]:
display(geo_neighborhoods)

In [25]:
geo_neighborhoods.save('../Data/maps/geo_neighborhoods.html')

### Spatial representation of neighborhoods: number of listings

In [26]:
# neighbourhoods = pd.read_csv('../Data/interim/neighbourhoods.csv')
# neighbourhoods

The `neighbourhood` file in `raw` folder is read to extract the name of the neighborhoods availables in geo_data. This list of places is used later to join the spatial locations of every neighbourhood with the neighbourhoods dataFrame.

In [27]:
neighbourhoods_geo = pd.read_csv('../Data/raw/neighbourhoods.csv', usecols=['neighbourhood'])

In [28]:
list(neighbourhoods_geo.neighbourhood)

['Bijlmer-Centrum',
 'Bijlmer-Oost',
 'Bos en Lommer',
 'Buitenveldert - Zuidas',
 'Centrum-Oost',
 'Centrum-West',
 'De Aker - Nieuw Sloten',
 'De Baarsjes - Oud-West',
 'De Pijp - Rivierenbuurt',
 'Gaasperdam - Driemond',
 'Geuzenveld - Slotermeer',
 'IJburg - Zeeburgereiland',
 'Noord-Oost',
 'Noord-West',
 'Oostelijk Havengebied - Indische Buurt',
 'Osdorp',
 'Oud-Noord',
 'Oud-Oost',
 'Slotervaart',
 'Watergraafsmeer',
 'Westerpark',
 'Zuid']

We groupby the neighborhoods dataFrame by `neighbourhood_cleansed` instead `neighbourhood` because the first column matches perfectly with the list of neighbourhoods_geo. The second column has more than forty unique neighborhoods that are more appropiately clustered zones.

In [29]:
neighborhoods.neighbourhood_cleansed.unique()

array(['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost',
       'Centrum-West', 'Zuid', 'De Pijp - Rivierenbuurt',
       'De Baarsjes - Oud-West', 'Bos en Lommer', 'Westerpark',
       'Oud-Oost', 'Noord-West', 'Slotervaart', 'Oud-Noord',
       'Geuzenveld - Slotermeer', 'Watergraafsmeer',
       'IJburg - Zeeburgereiland', 'Noord-Oost', 'Gaasperdam - Driemond',
       'Buitenveldert - Zuidas', 'Bijlmer-Oost', 'De Aker - Nieuw Sloten',
       'Osdorp', 'Bijlmer-Centrum'], dtype=object)

In [30]:
neighborhoods_cleansed = neighborhoods.groupby('neighbourhood_cleansed').count().reset_index().loc[:, ['neighbourhood_cleansed', 'count']]

In [31]:
neighborhoods_cleansed = neighborhoods_cleansed[neighborhoods_cleansed['neighbourhood_cleansed'].isin(list(neighbourhoods_geo.neighbourhood))]

In [32]:
listings_neighborhoods = baseMap(default_tiles='Stamen Terrain')

folium.Choropleth(
    geo_data=geo_data,
    name='choropleth',
    data=neighborhoods_cleansed,
    columns=['neighbourhood_cleansed', 'count'],
    fill_color='Reds',
    key_on='properties.neighbourhood',
    fill_opacity=0.7,
    line_opacity=0.5,
    legend_name='Number of listings (%)'
).add_to(listings_neighborhoods)

folium.LayerControl().add_to(listings_neighborhoods)

<folium.map.LayerControl at 0x1a1ac3e080>

In [33]:
display(listings_neighborhoods)

In [34]:
listings_neighborhoods.save('../Data/maps/listings_neighborhoods.html')

In [35]:
neighborhoods_cleansed.sort_values(by=['count'], ascending=False)

Unnamed: 0,neighbourhood_cleansed,count
7,De Baarsjes - Oud-West,3391
8,De Pijp - Rivierenbuurt,2477
5,Centrum-West,2186
4,Centrum-Oost,1744
20,Westerpark,1471
21,Zuid,1407
17,Oud-Oost,1323
2,Bos en Lommer,1152
14,Oostelijk Havengebied - Indische Buurt,972
16,Oud-Noord,609


### Spatial representation of neighborhoods: number of Superhosts

In [36]:
def str2boolean(row):
    if row == 't':
        return True
    elif row == 'f':
        return False
    else:
        return np.nan

In [37]:
neighborhoods['host_is_superhost_boolean'] = listing.host_is_superhost.apply(str2boolean)

In [38]:
neighborhoods_superhosts = neighborhoods[neighborhoods['host_is_superhost_boolean'] == True]

In [39]:
superhosts = neighborhoods_superhosts.groupby(['neighbourhood_cleansed']).count().reset_index().loc[:, ['neighbourhood_cleansed', 'count']]

In [40]:
superhosts_neighborhoods = baseMap(default_tiles='Stamen Terrain')

folium.Choropleth(
    geo_data=geo_data,
    name='choropleth',
    data=superhosts,
    columns=['neighbourhood_cleansed', 'count'],
    fill_color='Blues',
    key_on='properties.neighbourhood',
    fill_opacity=0.7,
    line_opacity=0.5,
    legend_name='Number of listings (%)'
).add_to(superhosts_neighborhoods)

folium.LayerControl().add_to(superhosts_neighborhoods)

<folium.map.LayerControl at 0x1a1ac2f710>

In [41]:
display(superhosts_neighborhoods)

In [42]:
superhosts_neighborhoods.save('../Data/maps/superhosts_neighborhoods.html')

In [43]:
superhosts.sort_values(by=['count'], ascending=False)

Unnamed: 0,neighbourhood_cleansed,count
5,Centrum-West,480
7,De Baarsjes - Oud-West,440
4,Centrum-Oost,361
8,De Pijp - Rivierenbuurt,317
21,Zuid,188
20,Westerpark,183
17,Oud-Oost,169
2,Bos en Lommer,146
14,Oostelijk Havengebied - Indische Buurt,141
16,Oud-Noord,87
