# Amsterdam Neighborhoods, properties and conditions

Let's import useful packages:

In [1]:
import gzip
import json
import csv
import pandas as pd

In [2]:
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
import numpy as np

Visualization open sources package:

In [4]:
import sys
!{sys.executable} -m pip install folium

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


Pandas is a powerful and flexible library. Specifying the compression type we are able to read gzip files, a compression for huge csv files. In this case, we have set low memory as False because some columns have mixed types. Another recommendation is to establish the data types, but in csv there is not a complement file or extra information to know that.

In [5]:
listing = pd.read_csv('dataset/listings.csv.gz', compression='gzip',
                   error_bad_lines=False, low_memory=False)

`info()` and `head()` provide us the big picture of listing data

In [6]:
listing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20025 entries, 0 to 20024
Columns: 106 entries, id to reviews_per_month
dtypes: float64(23), int64(21), object(62)
memory usage: 16.2+ MB


In [7]:
listing.drop_duplicates(inplace=True)

A brief inspection of columns enable us to split them according to the following **different categories of information**:

1. **General information of booking**: id, name, summary, description, space, experiences_offered, neighborhood overview, notes, transit, access, interaction, house_rules. To access that, select columns from `id` to `house_rules`.

2. **Host information**: id, name, location, about, response time, response rate, acceptance rate, listings count, verifications. In this case, select columns from `host_id` to `host_identity_verified`.

3. All about the **neighborhood**: city, state, market, cleansed. Extract this information selecting columns from `street` to `is_location_exact`.

4. Details of **property and conditions** are between the columns `property_type` and `maximum_nights_avg_ntm`.

5. **Calendar updates and availability** the next 30, 60 and 90 days, for instances. You can find this information between the columns `calendar_updated` and `calendar_last_scraped`.

6. **Reviews**: first and last review, number of reviews, scores rating, scores per accuracy, cleanliness, checkin, communication, location and value. Columns from `number of reviews` to `review_scores_value`.

7. **Verifications and licenses**: require guest profile picture, phone verification, cancellation policy, instant bookeable. Columns from `requires_license` to `require_guest_phone_verification`.

8. **Host listings counts** and reviews per month. Select from `calculated_host_listings_count` to `reviews_per_month`.

## 3. Neighborhoods

In this section, location, neighborhood's features will be analyze.

In [8]:
neighborhoods = listing.loc[:, 'street':'is_location_exact']

In [9]:
neighborhoods.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20025 entries, 0 to 20024
Data columns (total 14 columns):
street                          20025 non-null object
neighbourhood                   19221 non-null object
neighbourhood_cleansed          20025 non-null object
neighbourhood_group_cleansed    0 non-null float64
city                            20020 non-null object
state                           19820 non-null object
zipcode                         19198 non-null object
market                          19991 non-null object
smart_location                  20025 non-null object
country_code                    20025 non-null object
country                         20025 non-null object
latitude                        20025 non-null float64
longitude                       20025 non-null float64
is_location_exact               20025 non-null object
dtypes: float64(3), object(11)
memory usage: 2.3+ MB


In [10]:
neighborhoods.neighbourhood.unique()

array(['Indische Buurt', 'Grachtengordel', 'Westelijke Eilanden',
       'Amsterdam Centrum', 'Oud-Zuid', 'Jordaan', 'Oud-West',
       'Bos en Lommer', 'Frederik Hendrikbuurt', 'Oost', 'De Pijp',
       'Spaarndammer en Zeeheldenbuurt', 'Nieuwmarkt en Lastage',
       'Banne Buiksloot', 'Museumkwartier', 'Slotervaart',
       'Rivierenbuurt', 'Buiksloterham', 'Stadionbuurt',
       'Hoofddorppleinbuurt', 'Slotermeer-Noordoost', 'De Wallen',
       'Watergraafsmeer', 'Oosterparkbuurt', 'Volewijck', nan,
       'Oostelijke Eilanden en Kadijken', 'Weesperbuurt en Plantage',
       'Zeeburg', 'Slotermeer-Zuidwest', 'Buitenveldert-West',
       'Overtoomse Veld', 'IJplein en Vogelbuurt', 'Buikslotermeer',
       'Oostzanerwerf', 'Nieuwendam-Noord', 'Landelijk Noord', 'Osdorp',
       'Tuindorp Oostzaan', 'Kadoelen', 'Tuindorp Nieuwendam',
       'Buitenveldert-Oost', 'Nieuwendammerham', 'Tuindorp Buiksloot',
       'Nieuwendammerdijk en Buiksloterdijk'], dtype=object)

In [11]:
neighborhoods.neighbourhood_cleansed.unique()

array(['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost',
       'Centrum-West', 'Zuid', 'De Pijp - Rivierenbuurt',
       'De Baarsjes - Oud-West', 'Bos en Lommer', 'Westerpark',
       'Oud-Oost', 'Noord-West', 'Slotervaart', 'Oud-Noord',
       'Geuzenveld - Slotermeer', 'Watergraafsmeer',
       'IJburg - Zeeburgereiland', 'Noord-Oost', 'Gaasperdam - Driemond',
       'Buitenveldert - Zuidas', 'Bijlmer-Oost', 'De Aker - Nieuw Sloten',
       'Osdorp', 'Bijlmer-Centrum'], dtype=object)

In [12]:
neighborhoods.latitude.mean()

52.36532782070294

In [13]:
import folium

In [14]:
def baseMap(default_location=[neighborhoods.latitude.mean(), neighborhoods.longitude.mean()], default_zoom_start=12):
    base_map = folium.Map(location=default_location, control_scale=True, zoom_start=default_zoom_start)
    return base_map

In [15]:
base_map = baseMap()
base_map

In [16]:
neighborhoods['count'] = 1

In [17]:
from folium.plugins import HeatMap

HeatMap(data=neighborhoods[['latitude', 'longitude', 'count']].groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)


<folium.plugins.heat_map.HeatMap at 0x1a2096d7b8>

In [18]:
base_map