# IS 4487 Assignment 11: Predicting Airbnb Prices with Regression

In this assignment, you will:
- Load the Airbnb dataset you cleaned and transformed in Assignment 7
- Build a linear regression model to predict listing price
- Interpret which features most affect price
- Try to improve your model using only the most impactful predictors
- Practice explaining your findings to a business audience like a host, pricing strategist, or city partner

## Why This Matters

Pricing is one of the most important levers for hosts and Airbnb‚Äôs business teams. Understanding what drives price ‚Äî and being able to predict it accurately ‚Äî helps improve search results, revenue management, and guest satisfaction.

This assignment gives you hands-on practice turning a cleaned dataset into a predictive model. You‚Äôll focus not just on code, but on what the results mean and how you‚Äôd communicate them to stakeholders.

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Assignments/assignment_11_regression.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>



## Original Source: Dataset Description

The dataset you'll be using is a **detailed Airbnb listing file**, available from [Inside Airbnb](https://insideairbnb.com/get-the-data/).

Each row represents one property listing. The columns include:

- **Host attributes** (e.g., host ID, host name, host response time)
- **Listing details** (e.g., price, room type, minimum nights, availability)
- **Location data** (e.g., neighborhood, latitude/longitude)
- **Property characteristics** (e.g., number of bedrooms, amenities, accommodates)
- **Calendar/booking variables** (e.g., last review date, number of reviews)

The schema is consistent across cities, so you can expect similar columns regardless of the location you choose.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


## 1. Load Your Transformed Airbnb Dataset

**Business framing:**  
Before building any models, we must start with clean, prepared data. In Assignment 7, you exported a cleaned version of your Airbnb dataset. You‚Äôll now import that file for analysis.

### Do the following:
- Import your CSV file called `cleaned_airbnb_data_7.csv`.   (Note: If you had significant errors with assignment 7, you can use the file named "airbnb_listings.csv" in the DataSets folder on GitHub as a backup starting point.)
- Use `pandas` to load and preview the dataset

### In Your Response:
1. What does the dataset include?
2. How many rows and columns are present?


In [2]:
import pandas as pd

# Load the CSV file
df = pd.read_csv('cleaned_airbnb_data.csv')

# Display the first few rows of the dataset
print("First 5 rows of the dataset:")
display(df.head())

# Display general information about the dataset
print("\nDataset Information:")
display(df.info())

# Display the number of rows and columns
print("\nDataset Shape (rows, columns):")
display(df.shape)

First 5 rows of the dataset:


Unnamed: 0,id,listing_url,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,...,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,18674,https://www.airbnb.com/rooms/18674,2025-06-21,city scrape,Huge flat for 8 people close to Sagrada Familia,110m2 apartment to rent in Barcelona. Located ...,Apartment in Barcelona located in the heart of...,https://a0.muscache.com/pictures/13031453/413c...,71615,https://www.airbnb.com/users/show/71615,...,4.62,4.6,4.81,4.28,t,28,28,0,0,0.33
1,23197,https://www.airbnb.com/rooms/23197,2025-06-23,city scrape,"Forum CCIB DeLuxe, Spacious, Large Balcony, relax",Beautiful and Spacious Apartment with Large Te...,"Strategically located in the Parc del F√≤rum, a...",https://a0.muscache.com/pictures/miso/Hosting-...,90417,https://www.airbnb.com/users/show/90417,...,4.94,4.99,4.65,4.68,f,1,1,0,0,0.51
2,32711,https://www.airbnb.com/rooms/32711,2025-06-22,city scrape,Sagrada Familia area - C√≤rsega 1,A lovely two bedroom apartment only 250 m from...,What's nearby <br />This apartment is located...,https://a0.muscache.com/pictures/357b25e4-f414...,135703,https://www.airbnb.com/users/show/135703,...,4.88,4.89,4.89,4.47,f,3,3,0,0,0.87
3,34241,https://www.airbnb.com/rooms/34241,2025-06-22,city scrape,Stylish Top Floor Apartment - Ramblas Plaza Real,Located in close proximity to Plaza Real and L...,,https://a0.muscache.com/pictures/2437facc-2fe7...,73163,https://www.airbnb.com/users/show/73163,...,4.68,4.68,4.73,4.23,f,3,3,0,0,0.14
4,347824,https://www.airbnb.com/rooms/347824,2025-06-22,city scrape,"Ideal Happy Location Barceloneta Beach, Old Town!",Please send us a message to confirm availabili...,,https://a0.muscache.com/pictures/miso/Hosting-...,1447144,https://www.airbnb.com/users/show/1447144,...,2.67,3.67,5.0,4.0,f,355,355,0,0,0.02



Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18927 entries, 0 to 18926
Data columns (total 76 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            18927 non-null  int64  
 1   listing_url                                   18927 non-null  object 
 2   last_scraped                                  18927 non-null  object 
 3   source                                        18927 non-null  object 
 4   name                                          18927 non-null  object 
 5   description                                   18189 non-null  object 
 6   neighborhood_overview                         9154 non-null   object 
 7   picture_url                                   18927 non-null  object 
 8   host_id                                       18927 non-null  int64  
 9   host_url                               

None


Dataset Shape (rows, columns):


(18927, 76)

### ‚úçÔ∏è Your Response: üîß
1. The dataset includes various details about Airbnb listings, such as host information, listing identification, location data, pricing and availability, reviews, and property characteristics

2.There are 5 rows and 76 columns

## 2. Drop Columns Not Useful for Modeling

**Business framing:**  
Some columns ‚Äî like post IDs or text ‚Äî may not help us predict price and could add noise or bias.

### Do the following:
- Drop columns like `post_id`, `title`, `descr`, `details`, and `address` if they‚Äôre still in your dataset

### In Your Response:
1. What columns did you drop, and why?
2. What risks might occur if you included them in your model?


In [3]:
# List of columns to drop that are identifiers, URLs, or free-form text
columns_to_drop = [
    'id', 'listing_url', 'last_scraped', 'name', 'description',
    'neighborhood_overview', 'picture_url', 'host_id', 'host_url',
    'host_name', 'host_since', 'host_location', 'host_about',
    'host_thumbnail_url', 'host_picture_url', 'host_verifications',
    'neighbourhood', 'calendar_last_scraped', 'first_review', 'last_review',
    'bathrooms_text'
]

# Drop the columns if they exist in the DataFrame
df = df.drop(columns=[col for col in columns_to_drop if col in df.columns], errors='ignore')

print("Columns dropped successfully. Displaying the first few rows of the updated DataFrame:")
display(df.head())
print(f"\nNew DataFrame shape: {df.shape}")

Columns dropped successfully. Displaying the first few rows of the updated DataFrame:


Unnamed: 0,source,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_neighbourhood,host_listings_count,host_total_listings_count,host_has_profile_pic,host_identity_verified,...,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,city scrape,within an hour,96%,91%,f,la Sagrada Fam√≠lia,44.0,46.0,t,t,...,4.62,4.6,4.81,4.28,t,28,28,0,0,0.33
1,city scrape,within an hour,100%,96%,t,El Bes√≤s i el Maresme,6.0,9.0,t,t,...,4.94,4.99,4.65,4.68,f,1,1,0,0,0.51
2,city scrape,within an hour,100%,100%,f,Camp d'en Grassot i Gr√†cia Nova,3.0,15.0,t,t,...,4.88,4.89,4.89,4.47,f,3,3,0,0,0.87
3,city scrape,within an hour,80%,94%,f,El G√≤tic,5.0,5.0,t,t,...,4.68,4.68,4.73,4.23,f,3,3,0,0,0.14
4,city scrape,within an hour,87%,28%,f,El Poble-sec,356.0,565.0,t,t,...,2.67,3.67,5.0,4.0,f,355,355,0,0,0.02



New DataFrame shape: (18927, 55)


### ‚úçÔ∏è Your Response: üîß
1. I dropped id, listing_url, host_id, host_url, host_name, host_thumbnail_url, host_picture_url because there variables do not contain predictive information for a regression model.

2. A model with many irrelevant features becomes harder to read, which makes it difficult to understand which factors truly influence price.

## 3. Explore Relationships Between Numeric Features

**Business framing:**  
Understanding how features relate to each other ‚Äî and to the target ‚Äî helps guide feature selection and modeling.

### Do the following:
- Generate a correlation matrix
- Identify which variables are strongly related to `price`

### In Your Response:
1. Which variables had the strongest positive or negative correlation with price?
2. Which variables might be useful predictors?


In [4]:
# Calculate the correlation matrix
correlation_matrix = df.corr(numeric_only=True)

# Get correlations with 'price'
price_correlations = correlation_matrix['price'].sort_values(ascending=False)

print("Correlation with 'price':")
display(price_correlations)

Correlation with 'price':


Unnamed: 0,price
price,1.0
accommodates,0.468353
bedrooms,0.455812
bathrooms,0.412265
beds,0.348858
estimated_revenue_l365d,0.30518
calculated_host_listings_count_entire_homes,0.182985
availability_30,0.156078
host_total_listings_count,0.154843
availability_eoy,0.139778


### ‚úçÔ∏è Your Response: üîß
1. highest is accommodates, lowest correlation is mimimum_mimimum_nights

2. accommodates, bedrooms, bathrooms, beds

## 4. Define Features and Target Variable

**Business framing:**  
To build a regression model, you need to define what you‚Äôre predicting (target) and what you‚Äôre using to make that prediction (features).

### Do the following:
- Set `price` as your target variable
- Remove `price` from your predictors

### In Your Response:
1. What features are you using?
2. Why is this a regression problem and not a classification problem?


In [5]:
# Drop rows where 'price' is NaN, as it's our target variable
df_cleaned = df.dropna(subset=['price']).copy()

# --- Feature Engineering & Cleaning ---

# Convert host_response_rate and host_acceptance_rate to numeric
# Fill NaN values before conversion or handle them after as a separate category
df_cleaned['host_response_rate'] = df_cleaned['host_response_rate'].str.replace('%', '', regex=False).astype(float) / 100
df_cleaned['host_acceptance_rate'] = df_cleaned['host_acceptance_rate'].str.replace('%', '', regex=False).astype(float) / 100

# Identify numerical features (excluding 'price' and other non-numeric/identifier columns)
numerical_features = [
    'host_response_rate', 'host_acceptance_rate',
    'host_total_listings_count', 'accommodates', 'bathrooms', 'bedrooms', 'beds',
    'minimum_nights', 'maximum_nights', 'minimum_minimum_nights', 'maximum_minimum_nights',
    'minimum_maximum_nights', 'maximum_maximum_nights', 'minimum_nights_avg_ntm',
    'maximum_nights_avg_ntm', 'availability_30', 'availability_60', 'availability_90',
    'availability_365', 'number_of_reviews', 'number_of_reviews_ltm', 'number_of_reviews_l30d',
    'availability_eoy', 'number_of_reviews_ly', 'estimated_occupancy_l365d',
    'estimated_revenue_l365d', 'review_scores_rating', 'review_scores_accuracy',
    'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication',
    'review_scores_location', 'review_scores_value', 'calculated_host_listings_count',
    'calculated_host_listings_count_entire_homes', 'calculated_host_listings_count_private_rooms',
    'calculated_host_listings_count_shared_rooms', 'reviews_per_month',
    'latitude', 'longitude' # Including lat/long as numerical features
]

# Identify categorical features to one-hot encode
categorical_features = [
    'source', 'host_response_time', 'host_is_superhost', 'property_type', 'room_type',
    'host_has_profile_pic', 'host_identity_verified', 'neighbourhood_cleansed',
    'neighbourhood_group_cleansed', 'has_availability', 'instant_bookable'
]

# Select numerical columns and impute missing values with the median
# Using df_cleaned to avoid modifying the original df until necessary
X_numerical = df_cleaned[numerical_features].fillna(df_cleaned[numerical_features].median())

# Select categorical columns and apply one-hot encoding
# handle_unknown='ignore' allows the model to work even if new categories appear in test set
X_categorical = pd.get_dummies(df_cleaned[categorical_features], drop_first=True)

# Combine numerical and one-hot encoded categorical features
X = pd.concat([X_numerical, X_categorical], axis=1)

# Define target variable
y = df_cleaned['price']

print(f"Shape of features (X): {X.shape}")
print(f"Shape of target (y): {y.shape}")

print("\nFirst 5 rows of features (X):")
display(X.head())
print("\nFirst 5 rows of target (y):")
display(y.head())

Shape of features (X): (14913, 177)
Shape of target (y): (14913,)

First 5 rows of features (X):


Unnamed: 0,host_response_rate,host_acceptance_rate,host_total_listings_count,accommodates,bathrooms,bedrooms,beds,minimum_nights,maximum_nights,minimum_minimum_nights,...,neighbourhood_group_cleansed_Eixample,neighbourhood_group_cleansed_Gr√†cia,neighbourhood_group_cleansed_Horta-Guinard√≥,neighbourhood_group_cleansed_Les Corts,neighbourhood_group_cleansed_Nou Barris,neighbourhood_group_cleansed_Sant Andreu,neighbourhood_group_cleansed_Sant Mart√≠,neighbourhood_group_cleansed_Sants-Montju√Øc,neighbourhood_group_cleansed_Sarri√†-Sant Gervasi,instant_bookable_t
0,0.96,0.91,46.0,8,2.0,3.0,6.0,1,1125,1.0,...,True,False,False,False,False,False,False,False,False,True
1,1.0,0.96,9.0,5,2.0,3.0,4.0,3,32,2.0,...,False,False,False,False,False,False,True,False,False,False
2,1.0,1.0,15.0,6,1.5,2.0,3.0,1,31,1.0,...,False,True,False,False,False,False,False,False,False,False
3,0.8,0.94,5.0,2,1.0,1.0,1.0,31,180,31.0,...,False,False,False,False,False,False,False,False,False,False
4,0.87,0.28,565.0,3,1.0,1.0,3.0,2,330,2.0,...,False,False,False,False,False,False,False,False,False,False



First 5 rows of target (y):


Unnamed: 0,price
0,232.0
1,382.0
2,186.0
3,131.0
4,285.0


### ‚úçÔ∏è Your Response: üîß
1. I am using host related features, property features such as bathrooms and bedrooms and beds. Using pretty much all numerical features

2. This is a regression problem because I am trying to predict a continuous numerical value and that is regression.

## 5. Split Data into Training and Testing Sets

### Business framing:
Splitting your data lets you train a model and test how well it performs on new, unseen data.

### Do the following:
- Use `train_test_split()` to split into 80% training, 20% testing



In [6]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")

X_train shape: (11930, 177)
X_test shape: (2983, 177)
y_train shape: (11930,)
y_test shape: (2983,)


## 6. Fit a Linear Regression Model

### Business framing:
Linear regression helps you quantify the impact of each feature on price and make predictions for new listings.

### Do the following:
- Fit a linear regression model to your training data
- Use it to predict prices for the test set



In [7]:
from sklearn.linear_model import LinearRegression

# Initialize the Linear Regression model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Predict prices on the test set
y_pred = model.predict(X_test)

print("Model fitted and predictions made.")

Model fitted and predictions made.


## 7. Evaluate Model Performance

### Business framing:  
A good model should make accurate predictions. We‚Äôll use Mean Squared Error (MSE) and R¬≤ to evaluate how close our predictions were to the actual prices.

### Do the following:
- Print MSE and R¬≤ score for your model

### In Your Response:
1. What is your R¬≤ score? How well does your model explain price variation?
2. Is your MSE large or small? What could you do to improve it?


In [8]:
from sklearn.metrics import mean_squared_error, r2_score

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")

# Calculate R-squared (R¬≤) score
r2 = r2_score(y_test, y_pred)
print(f"R-squared (R¬≤) Score: {r2:.2f}")

Mean Squared Error (MSE): 86271.40
R-squared (R¬≤) Score: 0.41


### ‚úçÔ∏è Your Response: üîß
1. The R2 score is 0.41. This means that the model explains 41% of the variance in the airbnb listings prices which is not that good.

2. The MSE is pretty large at 86271.40. Ways to improve it include, creating more sophisticated features. Another way is to handle extreme values in the price column.

## 8. Interpret Model Coefficients

### Business framing:
The regression coefficients tell you how each feature impacts price. This can help Airbnb guide hosts and partners.

### Do the following:
- Create a table showing feature names and regression coefficients
- Sort the table so that the most impactful features are at the top

### In Your Response:
1. Which features increased price the most?
2. Were any surprisingly negative?
3. What business insight could you draw from this?


In [9]:
import pandas as pd

# Get feature names from X
feature_names = X.columns

# Get coefficients from the fitted model
coefficients = model.coef_

# Create a DataFrame to display coefficients
coefficients_df = pd.DataFrame({
    'Feature': feature_names,
    'Coefficient': coefficients
})

# Sort by the absolute value of coefficients to see the most impactful features
coefficients_df['Absolute_Coefficient'] = abs(coefficients_df['Coefficient'])
coefficients_df = coefficients_df.sort_values(by='Absolute_Coefficient', ascending=False)

# Drop the temporary 'Absolute_Coefficient' column for cleaner output
coefficients_df = coefficients_df.drop(columns=['Absolute_Coefficient'])

print("Top 20 most impactful features and their coefficients:")
display(coefficients_df.head(20))

Top 20 most impactful features and their coefficients:


Unnamed: 0,Feature,Coefficient
38,latitude,-1949.166391
144,neighbourhood_cleansed_la Clota,-672.294983
44,property_type_Boat,457.655517
48,property_type_Entire chalet,364.664303
87,property_type_Shared room in home,-331.987724
65,property_type_Private room in chalet,202.703229
99,neighbourhood_cleansed_Canyelles,-185.832805
116,neighbourhood_cleansed_Sants - Badal,180.663185
131,neighbourhood_cleansed_el Coll,176.367783
95,room_type_Shared room,-168.995197


### ‚úçÔ∏è Your Response: üîß
1. property_type_Boat, property_type_Entire chalet, property_type_Entire loft,

2. latitude, neighbourhood_cleaned_la Clota

3. Listings that offer unique experiences command higher prices, indicating that guests are willing to pay a premium for property characteristics.


## 9. Try to Improve the Linear Regression Model

### Business framing:
The first version of your model included all available features ‚Äî but not all features are equally useful. Removing weak or noisy predictors can often improve performance and interpretation.

### Do the following:
1. Choose your top 3‚Äì5 features with the strongest absolute coefficients
2. Rebuild the regression model using just those features
3. Compare MSE and R¬≤ between the baseline and refined model

### In Your Response:
1. What features did you keep in the refined model, and why?
2. Did model performance improve? Why or why not?
3. Which model would you recommend to stakeholders?
4. How does this relate to your customized learning outcome you created in canvas?


In [16]:
# Add code here üîß
top_5_features = ['latitude', 'neighbourhood_cleansed_la Clota', 'property_type_Boat', 'property_type_Entire chalet', 'property_type_Shared room in home']
print("Top 5 features identified:", top_5_features)


Top 5 features identified: ['latitude', 'neighbourhood_cleansed_la Clota', 'property_type_Boat', 'property_type_Entire chalet', 'property_type_Shared room in home']


In [17]:
X_refined = X[top_5_features]

print(f"Shape of refined features (X_refined): {X_refined.shape}")
print("\nFirst 5 rows of refined features (X_refined):")
display(X_refined.head())

Shape of refined features (X_refined): (14913, 5)

First 5 rows of refined features (X_refined):


Unnamed: 0,latitude,neighbourhood_cleansed_la Clota,property_type_Boat,property_type_Entire chalet,property_type_Shared room in home
0,41.40556,False,False,False,False
1,41.412432,False,False,False,False
2,41.40566,False,False,False,False
3,41.38062,False,False,False,False
4,41.3764,False,False,False,False


In [18]:
from sklearn.model_selection import train_test_split

X_train_refined, X_test_refined, y_train_refined, y_test_refined = train_test_split(X_refined, y, test_size=0.2, random_state=42)

print(f"X_train_refined shape: {X_train_refined.shape}")
print(f"X_test_refined shape: {X_test_refined.shape}")
print(f"y_train_refined shape: {y_train_refined.shape}")
print(f"y_test_refined shape: {y_test_refined.shape}")

X_train_refined shape: (11930, 5)
X_test_refined shape: (2983, 5)
y_train_refined shape: (11930,)
y_test_refined shape: (2983,)


In [19]:
from sklearn.linear_model import LinearRegression

# Initialize the Linear Regression model for the refined features
refined_model = LinearRegression()

# Fit the model to the refined training data
refined_model.fit(X_train_refined, y_train_refined)

# Predict prices on the refined test set
y_pred_refined = refined_model.predict(X_test_refined)

print("Refined model fitted and predictions made.")

Refined model fitted and predictions made.


In [20]:
from sklearn.metrics import mean_squared_error, r2_score

# Calculate Mean Squared Error (MSE) for the refined model
mse_refined = mean_squared_error(y_test_refined, y_pred_refined)
print(f"Refined Model - Mean Squared Error (MSE): {mse_refined:.2f}")

# Calculate R-squared (R¬≤) score for the refined model
r2_refined = r2_score(y_test_refined, y_pred_refined)
print(f"Refined Model - R-squared (R¬≤) Score: {r2_refined:.2f}")


Refined Model - Mean Squared Error (MSE): 144850.90
Refined Model - R-squared (R¬≤) Score: 0.00


### ‚úçÔ∏è Your Response: üîß
1. I kept the following top 5 features: 'latitude', 'neighbourhood_cleansed_la Clota', 'property_type_Boat', 'property_type_Entire chalet', and 'property_type_Shared room in home'. These features were selected because they showed the strongest absolute coefficients in the initial model, suggesting they had the most significant impact on price according to that model.
2. No, the model performance did not improve; it significantly worsened. The R-squared score for the refined model is 0.00, compared to 0.41 for the baseline model. This means the refined model explains almost none of the variance in Airbnb listing prices.

3. I would recommend the baseline model to stakeholders. Although its R¬≤ score of 0.41 is not exceptionally high, it is significantly better than the refined model's R¬≤ of 0.00. The baseline model provides a much better explanation of price variation and makes more accurate predictions.

4. Its important to create and understand the model to get the best R2 score which means it can predict the target variable or have the most correlation to it.




## 10. Reflect and Recommend

### Business framing:  
Ultimately, the value of your model comes from how well it can guide business decisions. Use your results to make real-world recommendations.

### In Your Response:
1. What business question did your model help answer?
2. What would you recommend to Airbnb or its hosts?
3. What could you do next to improve this model or make it more useful?
4. How does this relate to your customized learning outcome you created in canvas?


### ‚úçÔ∏è Your Response: üîß
1.  The model helped answer what features most significantly influence Airbnb listing prices.
2. What would you recommend to Airbnb or its hosts?
For Hosts: Focus on highlighting unique property types (if applicable) and ensuring competitive pricing relative to the specific neighborhood. Emphasize amenities and overall guest experience, especially if located in areas with lower base prices. For entire homes/chalets/lofts, leverage the premium guests are willing to pay for exclusivity.
For Airbnb: Utilize the insights on high-impact features for dynamic pricing strategies, personalized recommendations for hosts on how to optimize their listings, and for targeted marketing to guests seeking specific property types or locations. Further geographical analysis could reveal specific high-value and low-value zones.
3. What could you do next to improve this model or make it more useful?
Advanced Feature Engineering: Explore more complex feature interactions or create features from amenity lists using text analysis.
Non-Linear Models: Experiment with machine learning models that can capture non-linear relationships and interactions better than linear regression .
Outlier Treatment: More rigorous identification and handling of outliers in price or features to ensure they don't disproportionately affect the model.

4. The whole goal of this course is to gain the skils needed for the final project and for the final project having a high r2 score is needed to predict the target variable best.


## Submission Instructions
‚úÖ Checklist:
- All code cells run without error
- All markdown responses are complete
- Submit on Canvas as instructed

In [21]:
!jupyter nbconvert --to html "assignment_11_MillerAaron.ipynb"

[NbConvertApp] Converting notebook assignment_11_MillerAaron.ipynb to html
[NbConvertApp] Writing 394469 bytes to assignment_11_MillerAaron.html
