# **Analyzing Key Factors Affecting Airbnb Prices Across Berlin, London, and Paris**

## **Summary**

This project aims to develop a predictive model to estimate Airbnb listing prices in Berlin, London, and Paris using key property and location-based attributes. The dataset, sourced from a publicly available Airbnb repository, includes variables such as room type, guest ratings, host status, distance to city center, and neighborhood attractiveness scores. To enhance model performance, we apply exploratory data analysis (EDA), correlation analysis, and multiple linear regression to determine the most significant predictors of price. Our findings align with previous research that highlights the influence of location, property characteristics, and host reputation on pricing strategies in the short-term rental market. The evaluation of our model relies on R-squared to measure explanatory power and Root Mean Squared Error (RMSE) to assess prediction accuracy. By identifying key pricing factors, this study provides insights that can guide Airbnb hosts in optimizing their pricing strategies and travelers in making informed booking decisions.

## **Introduction**
The rapid expansion of peer-to-peer accommodation platforms like Airbnb has significantly transformed the hospitality industry, enabling homeowners to monetize their properties while offering travelers diverse lodging options. Unlike hotels, Airbnb prices are highly variable, influenced by factors such as property type, host reputation, and location-specific characteristics **(Guttentag, 2015)**. Understanding the determinants of Airbnb pricing is crucial for both hosts seeking to maximize revenue and travelers looking for cost-effective stays.

Given the dynamic nature of Airbnb pricing, identifying the key factors that influence listing prices has been a focal point of research. Prior studies have investigated how aspects such as property characteristics, location, and host reputation contribute to price variations. However, price determinants may differ across cities due to unique market conditions and demand fluctuations. Thus, a more comprehensive analysis is needed to quantify these effects across different urban contexts.

This study aims to analyze the factors that influence Airbnb prices across Berlin, London, and Paris. By leveraging publicly available Airbnb listing data, we examine the role of various attributes, including room type, guest satisfaction ratings, host status, and proximity to city landmarks. Previous research has shown that location and property attributes are key drivers of pricing **(Oskam & Boswijk, 2016)**, but this study seeks to further quantify their impact using regression-based modeling.

Prior studies have also emphasized the importance of host-related attributes, such as Superhost status and guest review scores, in determining Airbnb listing prices **(Teubner, Hawlitschek, & Dann, 2017)**. A highly rated host can often command a higher price due to increased trust and perceived quality from potential guests.

Beyond host characteristics, broader market factors also influence short-term rental pricing. Research suggests that urban tourism trends and local demand-supply dynamics play a significant role in price fluctuations, particularly during peak seasons and major events, when demand surges and hosts adjust prices accordingly **(Toader & Negrușa, 2021)**.

Given these insights, it is evident that Airbnb pricing is influenced by both micro-level attributes (e.g., host reputation and property features) and macro-level market factors (e.g., tourism trends and seasonal demand fluctuations). However, most prior studies have focused on isolated factors, such as host-related attributes or location-based pricing, without integrating multiple key determinants into a unified predictive framework.

To address this gap, this study aims to develop a comprehensive pricing prediction model that incorporates both property-specific features (e.g., room type, capacity, and cleanliness rating) and external market influences (e.g., demand fluctuations and urban attractiveness). By analyzing Airbnb listings across Berlin, London, and Paris, we seek to determine the most significant predictors of price and assess their relative impact.

Thus, our central research question is:
***"What are the most significant factors that influence Airbnb listing prices in Berlin, London, and Paris?"***


By identifying these key pricing determinants, this study aims to provide practical insights for Airbnb hosts optimizing their pricing strategies, travelers making cost-effective booking decisions, and market analysts studying rental price trends.


### **Dataset Description:**
The dataset contains 19,165 listings from Berlin, London, and Paris with variables such as room type, location, guest satisfaction, and amenities. The key variables include:

**realSum**: The price of the listing (target variable).
    
**room_type**: Type of accommodation (entire home, private room, shared room).
    
**person_capacity**: Maximum occupancy.
    
**host_is_superhost**: Whether the host is a superhost.
    
**cleanliness_rating, guest_satisfaction_overall**: Guest ratings.
    
**dist, metro_dist**: Distance to city center and metro station.
    
**attr_index, rest_index**: Tourism and restaurant scores.
    
**weekdays**: Whether the listing is available on a weekday (TRUE) or weekend (FALSE).
    
**city**: The city where the listing is located (Berlin, London, or Paris).

In [1]:
library(ggplot2)
library(dplyr)
library(car)
library(corrplot)
library(tidyr)
library(cowplot)


Attaching package: ‘dplyr’




The following objects are masked from ‘package:stats’:

    filter, lag




The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




ERROR: Error in library(car): there is no package called ‘car’


In [None]:
# Loading the dataset from the webpage
berlin_weekdays = read.csv("https://zenodo.org/records/4446043/files/berlin_weekdays.csv")
berlin_weekends = read.csv("https://zenodo.org/records/4446043/files/berlin_weekends.csv")
london_weekdays = read.csv("https://zenodo.org/records/4446043/files/london_weekdays.csv")
london_weekends = read.csv("https://zenodo.org/records/4446043/files/london_weekends.csv")
paris_weekdays = read.csv("https://zenodo.org/records/4446043/files/paris_weekdays.csv")
paris_weekends = read.csv("https://zenodo.org/records/4446043/files/paris_weekends.csv")

In [None]:
# Add 'weekdays' and 'city' columns
berlin_weekdays <- berlin_weekdays %>% mutate(weekdays = TRUE, city = "berlin")
berlin_weekends <- berlin_weekends %>% mutate(weekdays = FALSE, city = "berlin")

london_weekdays <- london_weekdays %>% mutate(weekdays = TRUE, city = "london")
london_weekends <- london_weekends %>% mutate(weekdays = FALSE, city = "london")

paris_weekdays <- paris_weekdays %>% mutate(weekdays = TRUE, city = "paris")
paris_weekends <- paris_weekends %>% mutate(weekdays = FALSE, city = "paris")

# Combine all datasets
airbnb <- bind_rows(berlin_weekdays, berlin_weekends,
                    london_weekdays, london_weekends,
                    paris_weekdays, paris_weekends)

# Save the cleaned dataset
# write.csv(airbnb, "airbnb.csv", row.names = FALSE)

# Display first few rows to verify
head(airbnb)

**Table 1.** First few rows of Airbnb data.

In [None]:
# Convert categorical variables to factors
airbnb$room_type <- as.factor(airbnb$room_type)
airbnb$host_is_superhost <- as.factor(airbnb$host_is_superhost)
airbnb$city <- as.factor(airbnb$city)
airbnb$weekdays <- as.factor(airbnb$weekdays)

# Check structure and missing values
str(airbnb)
cat("The number of missing values is:", sum(is.na(airbnb)), "\n")

**Figure. 1.** Summary of the Airbnb dataset, variable types, sample values, missing data count.

In [None]:
# Summary of the data
summary(airbnb[, c("realSum", "person_capacity", "bedrooms", "dist",
                   "metro_dist", "attr_index", "attr_index_norm", "rest_index", "rest_index_norm")])

**Table 2.** Summary statistics of selected numeric variables.

In [None]:
options(repr.plot.width = 15, repr.plot.height = 20)
# List of selected numeric variables
numeric_vars <- c("realSum", "person_capacity", "bedrooms", "dist",
                  "metro_dist", "attr_index", "attr_index_norm",
                  "rest_index", "rest_index_norm")

# Create an empty list to store plots
plot_list <- list()

# Loop through each variable and create a histogram
for (var in numeric_vars) {
  p <- ggplot(airbnb, aes_string(x = var)) +
    geom_histogram(bins = 50, fill = "blue", color = "black") +
    ggtitle(paste("Distribution of", var)) +
    xlab(var) + ylab("Count") +
    theme_minimal()

  plot_list <- append(plot_list, list(p))}  # Store each plot in the list


# Display plots in a 2-column layout using plot_grid()
plot_grid(plotlist = plot_list, ncol = 2)

**Figure. 2.** Distributions of predictor variables used in modeling Airbnb price.

In [None]:
cor_matrix <- cor(airbnb[, c("realSum", "person_capacity", "bedrooms", "dist", "metro_dist", "attr_index")])
corrplot(cor_matrix, method = "color", tl.col = "black", tl.srt = 45)

**Figure. 3.** Correlation matrix of predictor variables and price, with color indicating the strength and direction of relationships.

In [None]:
# Run multiple linear regression including 'weekdays' and 'city'
model <- lm(realSum ~ person_capacity + bedrooms + dist + metro_dist + attr_index_norm +
              host_is_superhost + room_type + weekdays + city + rest_index_norm, data = airbnb)

# Display summary
summary(model)

**Table 3.**  Linear regression results for Airbnb price as a function of listing characteristics

In [None]:
options(repr.plot.width = 10, repr.plot.height = 10)
plot(model$residuals, main="Residuals Plot", col="blue", pch=20)
abline(h=0, col="red")

# Check multicollinearity
vif(model)

**Figure. 4.** Residuals plot for the regression model, showing the distribution of errors around zero.

## **Discussion**

Our analysis of Airbnb listings in Berlin, London, and Paris identified several key factors that influence pricing. We found that the number of bedrooms and guest capacity are among the strongest predictors of price, with larger accommodations commanding higher rates. Additionally, location plays a significant role, as listings near metro stations tend to have higher prices due to increased accessibility. The impact of being a superhost was positive but relatively minor compared to other factors. Surprisingly, while guest satisfaction ratings contribute to pricing, they do not appear to be as influential as property size and location.

## **Key Findings**

### *Strongest Predictors of Price*:

**Person Capacity (+48.29 per additional person)**: Listings accommodating more guests tend to be more expensive.

**Bedrooms (+143.87 per additional bedroom)**: More bedrooms significantly increase the price.

**Tourism Attraction Index (+8.86 per unit increase)**: Higher attraction index leads to higher prices, confirming that locations with popular tourist attractions are more expensive.

### *Location-Based Effects*:

**Distance from the City Center (+9.73 per km)**: Surprisingly, prices increase slightly as the distance from the city center grows. This suggests that some high-end properties exist further from the city core, possibly in suburban or scenic locations.

**Distance from Metro (-19.50 per km)**: Listings farther from metro stations are cheaper, showing that metro accessibility is a key pricing factor.

### *Effect of Host & Room Type*:

**Superhost Status (+25.78 for superhosts)**: Superhosts charge slightly higher prices, likely due to better reviews and perceived trustworthiness.

### *Room Type*:

**Private Rooms (-145.59)**: Private rooms are significantly cheaper than entire homes.

**Shared Rooms (-249.30)**: The cheapest option, with prices much lower than entire homes.

### *City-Based Price Differences*:

**Paris (+86.64 compared to Berlin)** and **London(+75.57 compared to Berlin)** are significantly more expensive.
The results confirm that Paris has the highest Airbnb prices, followed by London, while Berlin is the most affordable.

**Weekdays vs. Weekends**:
No significant difference (p = 0.654).
Prices remain stable between weekdays and weekends, indicating consistent demand rather than peak pricing strategies.


## **Expected vs. Unexpected Results**

### **Expected:**

>-Person capacity, bedrooms, and attraction index significantly influence price, as anticipated.
>
>-Superhosts charge higher prices, likely due to trust and better service.
>
>-Room type heavily affects pricing, with entire homes being the most expensive.

### **Unexpected:**

>-City center distance shows a positive relationship with price (instead of negative). This could mean luxury properties exist in suburban areas, or city centers have budget accommodations.
>
>-Weekday vs. weekend pricing is not significantly different. We expected weekend prices to be slightly higher, but the data suggests a steady demand throughout the week.

## **Model Performance & Validity**

### **R-Squared**
>-The model has R-squared of 0.24(with Adjusted R-squared of 0.2395), meaning that the model explains 24% of Airbnb price variation, meaning other factors like seasonal trends, demand fluctuations, or special events might also influence pricing.

### **Variance Inflation Factor (VIF) Checks:**

>-rest_index_norm (GVIF = 5.64) is the highest, meaning it might have moderate collinearity with other features.
>
>-dist (GVIF = 3.64) and city (GVIF = 5.38) are also relatively high.
>
>-While no extreme multicollinearity exists, some correlation between location-related variables could affect estimates.

## **Impact & Future Research**

### **Business Impact:**
Understanding these pricing factors can provide good insights for property owners and hosts looking to optimize their listings:

-Property owners should focus on increasing guest capacity and bedrooms to maximize revenue.

-Listings near metro stations can charge higher prices due to increased accessibility.

-While being a superhost slightly boosts prices, it’s not a major pricing factor.

### **Consumer Insights**
These results also provide valuable insights for consumers, helping them make more informed decisions when booking Airbnbs:

-Consumers can use these findings to assess whether a listing's price is reasonable based on key factors like location, room size, and amenities, avoiding overpriced listings.

-They may adjust their booking strategies, such as selecting less central locations with good transit access, to get better deals while still maintaining convenience.

### **Future Research Directions**  
While this analysis provides valuable insights into Airbnb pricing, several areas warrant further investigation to refine and expand upon these findings.

1. **Enhancing Model Performance**  
   - The current model explains only a portion of the variation in listing prices. Future studies could explore more advanced modeling techniques, such as non-linear regression methods, tree-based models, or ensemble approaches, to improve predictive accuracy.  
   - Incorporating additional features, such as review sentiment scores, booking frequency, or cancellation policies, may also enhance the model’s ability to capture key pricing drivers.

2. **Exploring Seasonal and Event-Based Price Variations**  
   - Seasonal trends, such as holiday periods and peak travel seasons, likely influence short-term rental prices. Analyzing how pricing fluctuates throughout the year could offer a deeper understanding of demand-driven adjustments.  
   - Major events, including conferences, festivals, and sporting events, may create short-term price surges. Examining the extent of these effects across different cities could provide valuable insights for both hosts and policymakers.

3. **Assessing Neighborhood-Level Demand Patterns**  
   - Price variations are often shaped by local factors, such as proximity to tourist attractions, business districts, or public transportation hubs. A more granular spatial analysis could help identify high-demand areas and their characteristics.  
   - Studying shifts in neighborhood popularity over time may also reveal emerging trends in the short-term rental market.

4. **Understanding the Impact of Policy and Economic Conditions**  
   - Government regulations on short-term rentals, such as caps on rental days or stricter licensing requirements, may have significant implications for pricing and market dynamics. Investigating policy interventions across different regions could shed light on their long-term effects.  
   - Broader economic factors, including inflation rates, tourism activity, and local cost of living, may also play a role in shaping Airbnb pricing strategies. Future research could examine how these external influences interact with host-driven pricing decisions.

5. **Comparing Consumer Preferences Across Markets**  
   - Pricing strategies are often influenced by guest expectations and booking behaviors. A deeper analysis of guest reviews, satisfaction ratings, and preferred amenities across cities may provide insights into how consumer preferences shape listing prices.  
   - Understanding why Paris consistently commands higher prices than Berlin or London could reveal cultural or market-driven differences that extend beyond property attributes alone.

## **References**
Guttentag, D. (2015). Airbnb: Disruptive Innovation and the Rise of an Informal Tourism Accommodation Sector. Current Issues in Tourism, 18(12), 1192-1217. https://www.researchgate.net/publication/271624904_Airbnb_Disruptive_Innovation_and_the_Rise_of_an_Informal_Tourism_Accommodation_Sector

Oskam, J., & Boswijk, A. (2016). Airbnb: The Future of Networked Hospitality Businesses. Journal of Tourism Futures, 2(1), 22-42. https://www.researchgate.net/publication/298305479_Airbnb_The_Future_of_Networked_Hospitality_Businesses

Teubner, T., Hawlitschek, F., & Dann, D. (2017). Price Determinants on Airbnb: How Reputation Pays Off in the Sharing Economy. Journal of Self-Governance and Management Economics, 5(4), 53-80. https://www.researchgate.net/publication/315838775_Price_Determinants_on_Airbnb_How_Reputation_Pays_Off_in_the_Sharing_Economy

Toader, V., & Negrușa, A. L. (2021). Analysis of Price Determinants in the Case of Airbnb Listings. Economic Research-Ekonomska Istraživanja, 34(1), 123-139. https://www.tandfonline.com/doi/full/10.1080/1331677X.2021.1962380

Alharbi, Z. H. (2023). A Sustainable Price Prediction Model for Airbnb Listings Using Machine Learning and Sentiment Analysis. Sustainability, 15(17), 13159. https://doi.org/10.3390/su151713159

Gyódi, K., & Nawaro, Ł. (2021). Determinants of Airbnb prices in European cities: A spatial econometrics approach (Supplementary Material) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4446043

Pittala, T. S. S. R., Meleti, U. M. R., & Vasireddy, H. (2024). Unveiling patterns in European Airbnb prices: A comprehensive analytical study using machine learning techniques. arXiv. https://arxiv.org/abs/2407.01555

TheDevastator. (n.d.). Airbnb prices in European cities. Kaggle. https://www.kaggle.com/datasets/thedevastator/airbnb-prices-in-european-cities/data

