This cell is for loading the data and installing relevant librarys

Hypothesis: Impact of Review Scores on Price

Null Hypothesis (H0): Review scores (e.g., cleanliness, location, communication) do not significantly affect the listing price.

Alternative Hypothesis (H1): Higher review scores lead to higher listing prices.

EDA Approach: Calculate correlations between review scores and price. Fit a multiple linear regression model (e.g., price ~ cleanliness + location_score + communication_score).


In this experiment, we aim to investigate the relationship between review scores and listing prices in the context of a rental property marketplace. The overarching question we seek to address is whether higher review scores, encompassing factors such as cleanliness, location, and communication, correspond to higher listing prices. Our null hypothesis (H0) posits that review scores do not significantly affect listing prices, while the alternative hypothesis (H1) suggests that higher review scores lead to higher listing prices.

Exploratory Data Analysis (EDA):
To explore the relationship between review scores and listing prices, we employ various techniques aimed at understanding the data distribution and identifying correlations. These techniques include:

Calculating Correlations: We compute correlations between review scores (e.g., cleanliness, location, communication) and listing prices. This helps us understand the strength and direction of the relationship between these variables.

Fitting a Multiple Linear Regression Model: We fit a multiple linear regression model, where the listing price is the dependent variable, and review scores are the independent variables. This model allows us to quantify the impact of each review score on the listing price while controlling for other factors.

Calculating Spearman's Rank Correlation: Spearman's rank correlation coefficient, is a statistical method used to determine the strength and direction of the relationship between two ordinal(Ordinal refers to categorical data with a specific order or ranking. In our analysis, review scores (e.g., cleanliness, location) are examples of ordinal variables because they can be ranked from lowest to highest.) variables. It's particularly useful when dealing with ordinal data, where the actual numerical differences between values may not be meaningful. This method ensures robustness against outliers and non-linear relationships, providing reliable insights into the association between variables in our analysis.

Calculating Summary Statistics: We compute summary statistics, including mean, median, standard deviation, minimum, maximum, and quartiles, for both review scores and listing prices. These statistics provide insights into the central tendency and variability of the data.

Visualizations: We create various visualizations, including box plots, scatter plots, and histograms, to further explore the distribution and relationships within the data. Box plots help us visualize the distribution of review scores and their relationship with listing prices. Scatter plots provide insights into the linear relationship between review scores and listing prices, while histograms offer a visual representation of the distribution of review scores and listing prices individually.

In [44]:
# Import necessary libraries
import pandas as pd              # For data manipulation
import numpy as np               # For numerical operations
import matplotlib.pyplot as plt # For plotting graphs
import seaborn as sns            # For enhanced visualization
from sklearn.model_selection import train_test_split  # For splitting data into training and testing sets
from sklearn.linear_model import LinearRegression    # For linear regression modeling
from sklearn.metrics import mean_squared_error, r2_score  # For model evaluation metrics

# Set backend for matplotlib to TkAgg for interactive plotting
# Note: This line may not be necessary if you're not using an interactive backend
matplotlib.use('TkAgg')

# Specify the file path
file = '/Users/olisa/Downloads/clean_df.csv'  # Update file path as needed

# Read the CSV file into a DataFrame
data = pd.read_csv(file)


URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)>

In [5]:
# Checking whether the interactive backend has been set right
import matplotlib
print(matplotlib.get_backend())

TkAgg


Next step is cleaning the dataframe and removing the colums that are not neccsary for the analysis.

In [6]:
print(data.columns)

Index(['id', 'listing_url', 'name', 'description', 'neighborhood_overview',
       'host_id', 'host_url', 'host_name', 'host_since', 'host_location',
       'host_about', 'host_response_time', 'host_response_rate',
       'host_acceptance_rate', 'host_is_superhost', 'host_picture_url',
       'host_listings_count', 'host_total_listings_count',
       'host_verifications', 'host_identity_verified',
       'neighbourhood_cleansed', 'latitude', 'longitude', 'property_type',
       'room_type', 'accommodates', 'bathrooms_text', 'bedrooms', 'beds',
       'amenities', 'price', 'minimum_nights', 'maximum_nights',
       'minimum_nights_avg_ntm', 'maximum_nights_avg_ntm', 'number_of_reviews',
       'number_of_reviews_ltm', 'first_review', 'last_review',
       'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'instant_bookable',
       'ca

In [7]:
# Creating a list of relevant columns for analysis
relevant_columns = ['price', 'review_scores_rating', 'review_scores_accuracy', 
                    'review_scores_cleanliness', 'review_scores_checkin', 
                    'review_scores_communication', 'review_scores_location',
                    'review_scores_value']

# Creating a subset of the original DataFrame with only the relevant columns
subset_data = data[relevant_columns]

# Display the first 5 rows of the new DataFrame
print(subset_data.head())


     price  review_scores_rating  review_scores_accuracy  \
0   $50.00                  4.90                    4.82   
1   $75.00                  4.79                    4.84   
2   $90.00                  4.32                    4.53   
3   $55.00                  4.84                    4.91   
4  $379.00                  4.74                    4.82   

   review_scores_cleanliness  review_scores_checkin  \
0                       4.89                   4.86   
1                       4.88                   4.87   
2                       4.03                   4.72   
3                       4.71                   4.93   
4                       4.69                   4.69   

   review_scores_communication  review_scores_location  review_scores_value  
0                         4.93                    4.75                 4.82  
1                         4.82                    4.93                 4.73  
2                         4.86                    4.72                 4.3

Now lets check and potentially handle any missing values and convert any datatypes if needed

In [8]:
# Print the data types of columns in the subset DataFrame
print(subset_data.dtypes)

# Print the number of missing values in each column of the subset DataFrame
print(subset_data.isna().sum(axis=0))


price                           object
review_scores_rating           float64
review_scores_accuracy         float64
review_scores_cleanliness      float64
review_scores_checkin          float64
review_scores_communication    float64
review_scores_location         float64
review_scores_value            float64
dtype: object
price                              7
review_scores_rating           16785
review_scores_accuracy         17807
review_scores_cleanliness      17794
review_scores_checkin          17841
review_scores_communication    17808
review_scores_location         17839
review_scores_value            17842
dtype: int64


We have confirmed that there are some missing values so we will need to fix that. In addition price is in the wrong type so lets convert it to a neumeric one. After that we will proceed with some Descriptive statistics

In [12]:
# As there are a significant amount of missing values for the review score we will have to 'drop' these rows from the dataframe entirely.

subset_data = subset_data.dropna()

In [14]:
# Remove commas from the 'price' column
subset_data['price'] = subset_data['price'].str.replace(',', '')

# Remove dollar sign ('$') from the 'price' column
subset_data['price'] = subset_data['price'].str.replace('$', '')

# Convert the 'price' column to float datatype
subset_data['price'] = subset_data['price'].astype(float)

# Print the 'price' column and its datatype after conversion
print(subset_data['price'])
print(subset_data['price'].dtypes)



0         50.0
1         75.0
2         90.0
3         55.0
4        379.0
         ...  
69346     55.0
69347    201.0
69348    246.0
69349    250.0
69350    134.0
Name: price, Length: 51505, dtype: float64
float64


In [26]:
#Checking that we have succesfully cleaned the data

# Check if all data types are float64
if subset_data.dtypes.eq('float64').all():
    # Check if there are no missing values
    if subset_data.isna().sum().sum() == 0:
        print("Data successfully cleaned.")
    else:
        print("Error: Missing values detected after cleaning.")
else:
    print("Error: Data types not successfully converted to float64.")


price                          float64
review_scores_rating           float64
review_scores_accuracy         float64
review_scores_cleanliness      float64
review_scores_checkin          float64
review_scores_communication    float64
review_scores_location         float64
review_scores_value            float64
dtype: object
price                          0
review_scores_rating           0
review_scores_accuracy         0
review_scores_cleanliness      0
review_scores_checkin          0
review_scores_communication    0
review_scores_location         0
review_scores_value            0
dtype: int64


In [25]:
# Calculate summary statistics for the 'price' column

# Mean
mean_price = subset_data['price'].mean()

# Median
median_price = subset_data['price'].median()

# Standard Deviation
std_price = subset_data['price'].std()

# Variance
var_price = subset_data['price'].var()

# Minimum and Maximum
min_price = subset_data['price'].min()
max_price = subset_data['price'].max()

# Quartiles
q25_price = subset_data['price'].quantile(0.25)
q75_price = subset_data['price'].quantile(0.75)

# Store summary statistics in a list
summ_stats = [(f'mean = {mean_price}'), (f'median = {median_price}'), 
              (f'standard deviation = {std_price}'), (f'variance = {var_price}'), 
              (f'min = {min_price}'), (f'max = {max_price}'), 
              (f'q25 = {q25_price}'), (f'q75 = {q75_price}')]

# Print each summary statistic
for stat in summ_stats:
    print(stat)


mean = 157.2412330841666
median = 99.0
standard deviation = 296.7888059691308
variance = 88083.59534858238
min = 0.0
max = 23000.0
q25 = 56.0
q75 = 171.0


Mean: The mean listing price is $157.24, indicating the average price of listings in the dataset.

Median: The median listing price is $99.00. This suggests that half of the listings have prices below $99.00 and half have prices above $99.00.

Standard Deviation: The standard deviation of $296.79 tells us how much each listing price deviates, on average, from the mean price of $157.24., the standard deviation is relatively high, thus it means that listing prices vary widely around the average, indicating a broader range of prices.

Variance: The variance of $88083.60, being the square of the standard deviation, provides a measure of the overall variability or spread of listing prices from the mean. A high variance suggests that listing prices are widely spread out from the mean, indicating a diverse range of prices within the dataset

Minimum: The minimum listing price is $0.00, which suggests the presence of listings with no cost or possibly outliers.

Maximum: The maximum listing price is $23000.00, indicating the highest price in the dataset.

25th Percentile (Q1): The 25th percentile, is $56.00. This means that 25% of listings have prices below $56.00.

75th Percentile (Q3): The 75th percentile,  is $171.00. This means that 75% of listings have prices below $171.00.

These stats on there own are not very useful so lets do some more work

# now we are going to start with data visualisation

# Boxplot - A boxplot provides a concise summary of the distribution of the data, including measures such as the median, quartiles, and potential outliers. Here's how you can create a boxplot for the 'price' variable:

In [35]:
# Create a boxplot to visualize the distribution of prices

# Set the figure size
plt.figure(figsize=(8, 6))

# Plot the boxplot
sns.boxplot(x='price', data=subset_data, palette='pastel')

# Add title and labels
plt.title('Boxplot of Price')
plt.xlabel('Price')

# Display the plot
plt.show()



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.boxplot(x='price', data=subset_data, palette='pastel')


Scatter plots vizualize the relationship between review score and price

In [42]:
# Create scatter plots to visualize the relationship between each review score and the listing price

# Set the figure size
plt.figure(figsize=(10, 6))

# Creating a list of review score columns
review_scores = ['review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness', 
                 'review_scores_checkin', 'review_scores_communication', 'review_scores_location',
                 'review_scores_value']

# Iterate over each review score and create a scatter plot
for score in review_scores:
    sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')
    plt.xlabel(f'Review Score ({score.replace("review_scores_", "")})')
    plt.ylabel('Listing Price')
    plt.title(f'Impact of {score} on Listing Price')
    plt.tight_layout()
    plt.show()


  sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')
  sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')
  sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')
  sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')
  sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')
  sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')
  sns.scatterplot(x=score, y='price', data=subset_data, palette='pastel')


Histograms show us the frequency distribution of the given variable

In [43]:
# Create histograms to visualize the distribution of each review score

# Set the figure size
plt.figure(figsize=(10, 6))

# Creating a list of review score columns
review_scores = ['review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness', 
                 'review_scores_checkin', 'review_scores_communication', 'review_scores_location',
                 'review_scores_value']

# Iterate over each review score and create a histogram
for score in review_scores:
    sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')
    plt.xlabel(f'Review Score ({score.replace("review_scores_", "")})')
    plt.ylabel('Count')
    plt.title(f'Distribution of {score}')
    plt.tight_layout()
    plt.show()


  sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')
  sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')
  sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')
  sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')
  sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')
  sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')
  sns.histplot(x=score, data=subset_data, bins=10, palette='pastel')


The scatter graphs reveal a moderately strong correlation between review scores and listing prices. This suggests that higher review scores are associated with higher listing prices, supporting the hypothesis that better-reviewed properties tend to command higher prices.

One valuable insight from the histograms is the distribution of review scores among the properties. The predominance of ratings at 4 and 5 stars indicates that the majority of properties receive high ratings from guests. This could suggests that the properties listed in your dataset generally provide satisfactory experiences for guests, which could positively influence their pricing and overall desirability.

Analysis of summary statstics

Correlation analysis - using Spearmans rank
Spearman's rank correlation is a way to see if there's a pattern between these two factors (in this case it will be price and a review score)
The stronger the correlation (closer to 1 or -1), the more closely review scores and prices are connected


In [20]:
# Import necessary library for Spearman's rank correlation
from scipy.stats import spearmanr

# Dictionary to store correlation coefficients and p-values for each review score
correlation_results = {}

# Iterating over each review score variable
for score in review_scores:
    # Calculating Spearman's rank correlation coefficient between the review score and prices
    spearman_corr, p_value = spearmanr(subset_data[score], subset_data['price'])
    
    # Storing correlation coefficient and p-value in the dictionary
    correlation_results[score] = {'correlation_coefficient': spearman_corr, 'p_value': p_value}

# Printing the results
for score, result in correlation_results.items():
    print(f"Spearman's Rank Correlation for {score}:")
    print("Correlation Coefficient:", result['correlation_coefficient'])
    print("P-value:", result['p_value'])
    print()


Spearman's Rank Correlation for review_scores_rating:
Correlation Coefficient: -0.015812341544848587
P-value: 0.00033232744870826764

Spearman's Rank Correlation for review_scores_accuracy:
Correlation Coefficient: -0.044544580908478904
P-value: 4.786891612422521e-24

Spearman's Rank Correlation for review_scores_cleanliness:
Correlation Coefficient: 0.018445622110721985
P-value: 2.8336935429981428e-05

Spearman's Rank Correlation for review_scores_checkin:
Correlation Coefficient: -0.0626855392342194
P-value: 5.1754473558540985e-46

Spearman's Rank Correlation for review_scores_communication:
Correlation Coefficient: -0.06263411655087699
P-value: 6.118655276762388e-46

Spearman's Rank Correlation for review_scores_location:
Correlation Coefficient: 0.14657953741885937
P-value: 2.963534779491126e-245

Spearman's Rank Correlation for review_scores_value:
Correlation Coefficient: -0.16359875053693257
P-value: 8.42714947732203e-306



In [21]:
# Visualize Spearman's rank correlation coefficients using a heatmap

# Extract correlation coefficients from the dictionary
correlation_coefficients = [[correlation_results[score]['correlation_coefficient'] for score in review_scores]]

# Set the figure size
plt.figure(figsize=(8, 6))

# Create a heatmap
sns.heatmap(correlation_coefficients, annot=True, cmap='coolwarm', xticklabels=review_scores, yticklabels=['Price'], fmt=".2f")

# Add title and labels
plt.title("Spearman's Rank Correlation Coefficients between Review Scores and Price")
plt.xlabel('Review Scores')
plt.ylabel('Price')

# Display the heatmap
plt.show()


Correlation Coefficient: This value indicates the strength and direction of the relationship between review scores and listing prices. A correlation coefficient closer to 1 or -1 suggests a strong positive or negative correlation, respectively, while a value closer to 0 indicates a weaker correlation. For example, a correlation coefficient of -0.016 for review_scores_rating suggests a very weak negative correlation between rating scores and listing prices.

P-value: This tells us the probability of observing such a correlation coefficient by random chance alone, assuming the null hypothesis is true (i.e., no correlation). A small p-value (typically below 0.05) indicates that the observed correlation is unlikely to be due to random chance, suggesting that there is a significant relationship between the variables. For instance, a p-value of 0.00033 for review_scores_rating suggests a significant correlation between rating scores and listing prices.

Review Scores Rating: There is a very weak negative correlation between review scores rating and listing prices, but it is statistically significant.

Review Scores Accuracy: There is a weak negative correlation between accuracy scores and listing prices, and it is statistically significant.

Review Scores Cleanliness: There is a weak positive correlation between cleanliness scores and listing prices, and it is statistically significant.

Review Scores Check-in: There is a moderate negative correlation between check-in scores and listing prices, and it is statistically significant.

Review Scores Communication: There is a moderate negative correlation between communication scores and listing prices, and it is statistically significant.

Review Scores Location: There is a strong positive correlation between location scores and listing prices, and it is statistically significant.

Review Scores Value: There is a strong negative correlation between value scores and listing prices, and it is statistically significant.

Based on the Spearman's rank correlation results we will omit certain features. We do this to fine tune the linear regression model and for other benefits such as: Improved interpretability, Simplification of the model :

Review Scores Rating: This feature has a very weak negative correlation with listing prices and a relatively high p-value (0.00033). While statistically significant, the correlation coefficient is close to zero, suggesting a negligible effect on listing prices.

Review Scores Accuracy: Similar to the rating scores, accuracy scores exhibit a weak negative correlation with listing prices but with a statistically significant p-value (4.79e-24). However, the correlation coefficient is relatively small (-0.044), indicating a minor impact on prices.

Review Scores Cleanliness: While cleanliness scores show a weak positive correlation with listing prices, the correlation coefficient (0.018) is small, and the p-value (2.83e-05) is statistically significant. Considering the weak correlation, this feature may not significantly contribute to predicting prices.

Review Scores Communication: Communication scores demonstrate a moderate negative correlation with listing prices, but with a statistically significant p-value (6.12e-46). However, the correlation coefficient (-0.063) suggests a relatively modest influence on prices compared to other features.


In [29]:
# Prepare features (X) and target variable (y)
features = ['review_scores_checkin', 'review_scores_location', 'review_scores_value']
X = subset_data[features]
y = subset_data['price']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')

# Interpretation (coefficients)
coefficients = pd.DataFrame({'Feature': X.columns, 'Coefficient': model.coef_})
print(coefficients)

# Set Seaborn color palette to pastel
sns.set_palette("pastel")

# Visualization (actual vs. predicted)
sns.scatterplot(x=y_test, y=y_pred, alpha=0.5)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='--')
plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.title('Regression Model: Actual vs. Predicted Price')
plt.show()


Mean Squared Error: 92082.93
R-squared: 0.01
                  Feature  Coefficient
0   review_scores_checkin   -52.506745
1  review_scores_location    95.160796
2     review_scores_value   -61.526620


The linear regression model aimed to predict listing prices based on review scores, including cleanliness, location, value, check-in experience, communication, and overall rating. However, the model's performance was poor, with a high mean squared error (MSE) of 92082.93 and a low R-squared value of 0.01, indicating limited predictive accuracy. Coefficients revealed minimal impact of review scores on listing prices, suggesting the need for model refinement. Lets try again

By minimizing the RMSE, we are aiming to improve the accuracy of our model's predictions regarding listing prices.
Minimizing RMSE helps achieve this by fine-tuning the model's parameters to better capture the relationships between review scores and listing prices in the dataset. Ultimately, a lower RMSE indicates that the model's predictions are closer to the actual listing prices, making it more reliable.

In [34]:
from scipy.optimize import minimize

# Define objective function to minimize (RMSE)
def rmse_objective(params):
    # Fit linear regression model with given parameters
    model = LinearRegression()
    X_train_subset = X_train.drop(columns=['review_scores_checkin', 'review_scores_value'])
    model.fit(X_train_subset, y_train)
    
    # Make predictions on test set
    X_test_subset = X_test.drop(columns=['review_scores_checkin', 'review_scores_value'])
    y_pred = model.predict(X_test_subset)
    
    # Calculate RMSE
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    return rmse

# Set initial guess for parameters
initial_guess = [1.0] * len(X_train.columns)

# Run optimization
optimized_params = minimize(rmse_objective, initial_guess, method='Nelder-Mead')

# Extract optimized parameters
optimized_features = optimized_params.x

# Fit linear regression model with optimized parameters
model = LinearRegression()
X_train_subset = X_train.drop(columns=['review_scores_checkin', 'review_scores_value'])
model.fit(X_train_subset, y_train)

# Make predictions on test set
X_test_subset = X_test.drop(columns=['review_scores_checkin', 'review_scores_value'])
y_pred = model.predict(X_test_subset)

# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Interpretation (coefficients)
coefficients = pd.DataFrame({'Feature': X_train_subset.columns, 'Coefficient': model.coef_})

# Visualization (actual vs. predicted)
sns.scatterplot(x=y_test, y=y_pred, alpha=0.5)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='--')
plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.title('Regression Model: Actual vs. Predicted Price')
plt.show()


In the context of this project, where we aimed to analyze the impact of review scores on listing prices, the scatter plot showing the disparity between actual and predicted prices highlights a crucial issue. It suggests that our current multiple linear regression model, despite incorporating various review scores as features, fails to accurately predict listing prices. This discrepancy could potentially invalidate our hypothesis regarding the influence of review scores on prices.

In conclusion, our analysis focused on investigating the impact of review scores on listing prices in the context of our project. We began by formulating hypotheses, with the null hypothesis stating that review scores do not significantly affect listing prices, and the alternative hypothesis proposing that higher review scores lead to higher prices. Through exploratory data analysis (EDA), we employed various techniques such as correlation analysis, linear regression modeling, and visualization to examine the relationship between review scores and prices.

Our EDA revealed several key findings. Firstly, Spearman's rank correlation analysis indicated significant correlations between certain review scores (e.g., location, cleanliness) and listing prices, suggesting a potential influence of these factors on pricing decisions. However, the linear regression model yielded less promising results, with a high mean squared error (MSE) and low R-squared value, indicating poor model fit and predictive performance. Despite attempts to optimize the model using techniques like minimizing root mean squared error (RMSE), the model's predictions remained inconsistent with the actual prices.

Furthermore, visualizations such as box plots, scatter plots, and histograms provided additional insights into the distribution and relationship of review scores and prices. While scatter plots exhibited moderately strong correlations between review scores and prices, the box plots and histograms did not offer significant value due to data skewness and lack of variability in certain variables.

In conclusion, while our analysis provided some evidence supporting the influence of review scores on listing prices, the predictive capabilities of our models were limited. Our findings corroborate the null hypothesis, indicating that review scores may not have a significant impact on listing prices. Future research could explore alternative modeling techniques, feature engineering, or additional variables to improve the accuracy and robustness of price prediction models in the vacation rental industry. Additionally, incorporating qualitative factors such as property amenities, location characteristics, and guest reviews could offer a more comprehensive understanding of pricing dynamics in the market.