# European Airbnb Listing Price Analysis - a Report

## Preliminary Data Loading

In [1]:
options(tidyverse.quiet = TRUE)
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidymodels))
# Cities are, amsterdam, athens, barcelona, berlin, budapest, lisbon, london, paris, rome, vienna, days are weekdays, weekends
url_first <- "https://zenodo.org/record/4446043/files/"
cities <- c("amsterdam", "athens", "barcelona", "berlin", "budapest", "lisbon", "london", "paris", "rome", "vienna");
total_dataset <- tibble()
for (i in 1:10) {
    sub_data <- read_csv(paste(url_first, cities[i], "_weekdays.csv", sep ="")) |>
        mutate(city = cities[i]) |>
        mutate(day = "weekday") |>
        suppressMessages()
    Sys.sleep(0.5) # don't overload their server
    sub_data_end <- read_csv(paste(url_first, cities[i], "_weekends.csv", sep ="")) |>
        mutate(city = cities[i]) |>
        mutate(day = "weekend") |>
        suppressMessages()
    total_dataset <- bind_rows(total_dataset, sub_data, sub_data_end)
    Sys.sleep(0.5) # don't overload their server
}

In [3]:
set.seed(420);

# clean data
airbnb_clean <- total_dataset |>
    select(-1) |> #some incrementing number
# according to https://zenodo.org/record/4446043#.Y9Y9ENJBwUE these columns are dummies:
    select(-room_private, -room_shared) |>
# apparently these are already scaled to [0,100] - drop the duplicate
    select(-attr_index, -rest_index) |>
    mutate(room_type = as_factor(room_type)) |>
    mutate(city = as_factor(city), day = as_factor(day), multi = as_factor(multi), biz = as_factor(biz)) |>
    rename(dist_from_city_centre = dist,
           cost = realSum,
           attraction_index = attr_index_norm,
           restaurant_index = rest_index_norm)

airbnb_clean |>
    select(room_type) |>
    pull() |>
    levels()

glimpse(airbnb_clean)

airbnb_split <- initial_split(airbnb_clean, prop = 0.75, strata = room_type)
airbnb_train <- training(airbnb_split)
airbnb_test <- testing(airbnb_split)

Rows: 51,707
Columns: 17
$ cost                       [3m[90m<dbl>[39m[23m 194.0337, 344.2458, 264.1014, 433.5294, 485…
$ room_type                  [3m[90m<fct>[39m[23m Private room, Private room, Private room, P…
$ person_capacity            [3m[90m<dbl>[39m[23m 2, 4, 2, 4, 2, 3, 2, 4, 4, 2, 2, 2, 4, 2, 2…
$ host_is_superhost          [3m[90m<lgl>[39m[23m FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FA…
$ multi                      [3m[90m<fct>[39m[23m 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1…
$ biz                        [3m[90m<fct>[39m[23m 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cleanliness_rating         [3m[90m<dbl>[39m[23m 10, 8, 9, 9, 10, 8, 10, 10, 9, 10, 10, 10, …
$ guest_satisfaction_overall [3m[90m<dbl>[39m[23m 93, 85, 87, 90, 98, 100, 94, 100, 96, 88, 9…
$ bedrooms                   [3m[90m<dbl>[39m[23m 1, 1, 1, 2, 1, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1…
$ dist_from_city_centre      [3m[90m<dbl>[39m[23m 5.0229638, 0.4883893, 5.748311

## Introduction

#! should've trimmed this down!!

Europe is the most traveled region in the world (UNWTO, 2023) and there are many countries to explore. In order to have the most enjoyable trip, proper planning and budgeting is required. Airbnb is one of the most popular ways to find accommodation while you travel. Hence, we would like to determine what factors determine the price of an Airbnb listing in ten most popular European cities, how strongly they affect Airbnb listing prices, and build a model that predicts the price given properties about the listing. We will use the dataset(s) from a study published in Tourism Management titled, “Determinants of Airbnb prices in European cities: A spatial econometrics approach”. These datasets are separated by city and type of day and includes the price and type of listing (shared or entire home), maximum capacity of rental, and cleanliness rating, among other factors.

## Methods and Results
- describe in written English the methods you used to perform your analysis from beginning to end that narrates the code the does the analysis.
- your report should include code which:
  - loads data from the original source on the web 
  - wrangles and cleans the data from it's original (downloaded) format to the format necessary for the planned analysis
  - performs a summary of the data set that is relevant for exploratory data analysis related to the planned analysis 
  - creates a visualization of the dataset that is relevant for exploratory data analysis related to the planned analysis
  - performs the data analysis
  - creates a visualization of the analysis 
  - note: all tables and figure should have a figure/table number and a legend

## Discussion
 - summarize what you found
 - discuss whether this is what you expected to find?
 - discuss what impact could such findings have?
 - discuss what future questions could this lead to?

## References

- At least 2 citations of literature relevant to the project (format is your choice, just be consistent across the references).
- Make sure to cite the source of your data as well.