---
title: "Airbnb Dataset Description"
author: Daniel Redel
date: today
format:
  html:
    toc: true
    code-fold: true
    html-math-method: katex
jupyter: python3
---

# Amsterdam Inside-AirBnB Dataset

**From the project website**: http://insideairbnb.com/about/

Inside Airbnb is a mission driven project that provides data and advocacy about Airbnb's impact on residential communities.

The dataset contains 7000 Amsterdam listings (each with 75 features) and 300k textual reviews from airbnb.com, scraped in March 2023. Oldest listings have reviews from as far as 2009. Additionally, for each listing, the dataset includes the price of stay for each day (night) since March 2023 till March 2024, as well as whether this day is available for booking, also scraped in March 2023.

The features descriptions can be found in this [sheet from Inside Airbnb](https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit#gid=1322284596).

**File descriptions**:
- `listings.csv`: Detailed Listings data
- `calendar.csv`: Detailed Calendar Data
- `reviews.csv`: Detailed Review Data
- `listings.csv`: Summary information and metrics for listings in Amsterdam (good for visualisations).
- `reviews.csv`: Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).
- `neighbourhoods.csv` Neighbourhood list for geo filter. Sourced from city or open source GIS files.
- `neighbourhoods.geojson` GeoJSON file of neighbourhoods of the city.

## Motivation

1. Exploratory Data Analysis
    - Geomaps
2. Price and Rating Prediction

## Data Import

In [55]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

file = "D:/Data Science/Inside AirBnB - Netherlands/Amsterdam/"

### Listings dataset

In [56]:
listings = pd.read_csv(file + "listings.csv")
listings.head(2)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,2818,Quiet Garden View Room & Super Fast Wi-Fi,3159,Daniel,,Oostelijk Havengebied - Indische Buurt,52.36435,4.94358,Private room,69,3,322,2023-02-28,1.9,1,44,37,0363 5F3A 5684 6750 D14D
1,20168,Studio with private bathroom in the centre 1,59484,Alexander,,Centrum-Oost,52.36407,4.89393,Private room,106,1,339,2020-04-09,2.14,2,0,0,0363 CBB3 2C10 0C2A 1E29


In [46]:
listings_detailed = pd.read_csv(file + "listings_detailed.csv")
listings_detailed.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2818,https://www.airbnb.com/rooms/2818,20230309202119,2023-03-09,city scrape,Quiet Garden View Room & Super Fast Wi-Fi,Quiet Garden View Room & Super Fast Wi-Fi<br /...,"Indische Buurt (""Indies Neighborhood"") is a ne...",https://a0.muscache.com/pictures/10272854/8dcc...,3159,...,4.98,4.69,4.81,0363 5F3A 5684 6750 D14D,f,1,0,1,0,1.9
1,311124,https://www.airbnb.com/rooms/311124,20230309202119,2023-03-10,city scrape,*historic centre* *bright* *canal view* *jordaan*,> Please be so kind to book ONLY AFTER conta...,Perfect location in the lively centre. All his...,https://a0.muscache.com/pictures/5208672/5bb60...,1600010,...,4.92,4.93,4.6,0363 59D8 7D30 6CFA DC81,f,1,1,0,0,0.66


### Calendar dataset

In [47]:
calendar = pd.read_csv(file + 'calendar.csv')
calendar.head(4)

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,2818,2023-03-09,f,$69.00,$69.00,3.0,1125.0
1,2818,2023-03-10,f,$69.00,$69.00,3.0,1125.0
2,2818,2023-03-11,f,$69.00,$69.00,3.0,1125.0
3,2818,2023-03-12,f,$69.00,$69.00,3.0,1125.0


### Reviews dataset

In [49]:
reviews = pd.read_csv(file + 'reviews.csv')
reviews.head(4)

Unnamed: 0,listing_id,date
0,2818,2009-03-30
1,2818,2009-04-24
2,2818,2009-05-03
3,2818,2009-05-18


In [50]:
reviews_detailed = pd.read_csv(file + 'reviews_detailed.csv')
reviews_detailed.head(4)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,2818,1191,2009-03-30,10952,Lam,Daniel is really cool. The place was nice and ...
1,2818,1771,2009-04-24,12798,Alice,Daniel is the most amazing host! His place is ...
2,2818,1989,2009-05-03,11869,Natalja,We had such a great time in Amsterdam. Daniel ...
3,2818,2797,2009-05-18,14064,Enrique,Very professional operation. Room is very clea...


### Neighborhood dataset

In [51]:
neighbourhoods = pd.read_csv(file + 'neighbourhoods.csv')
neighbourhoods.head(4)

Unnamed: 0,neighbourhood_group,neighbourhood
0,,Bijlmer-Centrum
1,,Bijlmer-Oost
2,,Bos en Lommer
3,,Buitenveldert - Zuidas


In [54]:
from json.decoder import JSONDecoder

# neighbourhoods_geojson = JSONDecoder().decode((file + 'neighbourhoods.geojson').read_text())