# STAT 301 Group 40 Final Report

Wasay Hayat, Juhi Grover, Qi Xu, Allen Zheng

## Introduction

(add background information)

**Overarching question:** How can various listing features as well as location be used to predict the full price of accommodation of an Airbnb listing for two people and two nights in Barcelona, Paris, or Vienna?

### Data Description

We are using data on Airbnb prices in European cities, downloaded from [Kaggle](https://www.kaggle.com/datasets/thedevastator/airbnb-prices-in-european-cities).

Original data source: Gyódi, K.and Ł. Nawaro. Determinants of Airbnb Prices in European Cities: A Spatial Econometrics Approach (supplementary Material). Zenodo, 13 Jan. 2021, [doi:10.5281/zenodo.4446043](doi:10.5281/zenodo.4446043).

Combining data from 3 cities (Paris, Barcelona, and Vienna), we have 13,058 observations and 22 columns, including 2 new columns created for this project.

**Variable descriptions:**
- realSum: the full price of accommodation for two people and two nights in EUR
- room_type: the type of the accommodation 
- room_shared: dummy variable for shared rooms
- room_private: dummy variable for private rooms
- person_capacity: the maximum number of guests 
- host_is_superhost: dummy variable for superhost status
- multi: dummy variable if the listing belongs to hosts with 2-4 offers
- biz: dummy variable if the listing belongs to hosts with more than 4 offers
- cleanliness_rating: cleanliness rating
- guest_satisfaction_overall: overall rating of the listing
- bedrooms: number of bedrooms (0 for studios)
- dist: distance from city centre in km
- metro_dist: distance from nearest metro station in km
- attr_index: attraction index of the listing location
- attr_index_norm: normalised attraction index (0-100)
- rest_index: restaurant index of the listing location
- rest_index_norm: normalised restaurant index (0-100)
- lng: longitude of the listing location
- lat: latitude of the listing location
- **(added)** city: city of the listing location
- **(added)** day_type: weekday or weekend

## Methods and Results

### Exploratory Data Analysis

In [1]:
library(tidyverse)
library(repr)
library(tidymodels)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.2.0 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39

In [2]:
# Main developer: Wasay

barcelona_weekdays <- read_csv("https://raw.githubusercontent.com/awhayat/stat-301-project/refs/heads/main/barcelona_weekdays.csv") |>
    mutate(city = "Barcelona", day_type = "weekday")
barcelona_weekends <- read_csv("https://raw.githubusercontent.com/awhayat/stat-301-project/refs/heads/main/barcelona_weekends.csv") |>
    mutate(city = "Barcelona", day_type = "weekend")
paris_weekdays <- read_csv("https://raw.githubusercontent.com/awhayat/stat-301-project/refs/heads/main/paris_weekdays.csv") |>
    mutate(city = "Paris", day_type = "weekday")
paris_weekends <- read_csv("https://raw.githubusercontent.com/awhayat/stat-301-project/refs/heads/main/paris_weekends.csv") |>
    mutate(city = "Paris", day_type = "weekend")
vienna_weekdays <- read_csv("https://raw.githubusercontent.com/awhayat/stat-301-project/refs/heads/main/vienna_weekdays.csv") |>
    mutate(city = "Vienna", day_type = "weekday")
vienna_weekends <- read_csv("https://raw.githubusercontent.com/awhayat/stat-301-project/refs/heads/main/vienna_weekends.csv") |>
    mutate(city = "Vienna", day_type = "weekend")

airbnb_data <- bind_rows(paris_weekdays, paris_weekends, barcelona_weekdays, barcelona_weekends, vienna_weekdays, vienna_weekends)
head(airbnb_data)

[1m[22mNew names:
[36m•[39m `` -> `...1`
[1mRows: [22m[34m1555[39m [1mColumns: [22m[34m20[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (1): room_type
[32mdbl[39m (16): ...1, realSum, person_capacity, multi, biz, cleanliness_rating, gu...
[33mlgl[39m  (3): room_shared, room_private, host_is_superhost

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1m[22mNew names:
[36m•[39m `` -> `...1`
[1mRows: [22m[34m1278[39m [1mColumns: [22m[34m20[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (1): room_type
[32mdbl[39m (16): ...1, realSum, person_capacity, multi, biz, cleanliness_rating, gu...
[33mlgl[39m  (3): room_shared, room_pr

...1,realSum,room_type,room_shared,room_private,person_capacity,host_is_superhost,multi,biz,cleanliness_rating,⋯,dist,metro_dist,attr_index,attr_index_norm,rest_index,rest_index_norm,lng,lat,city,day_type
<dbl>,<dbl>,<chr>,<lgl>,<lgl>,<dbl>,<lgl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
0,296.1599,Private room,False,True,2,True,0,0,10,⋯,0.6998206,0.1937094,518.4789,25.23938,1218.6622,71.60803,2.35385,48.86282,Paris,weekday
1,288.2375,Private room,False,True,2,True,0,0,10,⋯,2.1000054,0.1072207,873.217,42.50791,1000.5433,58.79146,2.32436,48.85902,Paris,weekday
2,211.3431,Private room,False,True,2,False,0,0,10,⋯,3.3023251,0.2347238,444.5561,21.64084,902.8545,53.05131,2.31714,48.87475,Paris,weekday
3,298.9561,Entire home/apt,False,False,2,False,0,1,9,⋯,0.5475667,0.1959965,542.142,26.39129,1199.1842,70.46351,2.356,48.861,Paris,weekday
4,247.9262,Entire home/apt,False,False,4,False,0,0,7,⋯,1.1979209,0.1035729,406.929,19.80916,1070.7755,62.91827,2.35915,48.86648,Paris,weekday
5,527.0761,Entire home/apt,False,False,4,True,0,0,10,⋯,1.5432015,0.5491303,967.4781,47.09651,1095.8704,64.39284,2.33201,48.85891,Paris,weekday


### Plan

### Computational Code and Output

## Discussion

## References

(at least two references expected in the introduction)

- reference 1
- reference 2