# Exploratory Data Analysis on üè† **Real Estate Price Prediction**

<h2 style="font-family: 'poppins'; font-weight: bold;">üë®‚ÄçüíªAuthor: Muhammad Hassan Saboor</h2>

[![GitHub](https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github)](https://github.com/MuhammadHassanSaboor) 
[![Kaggle](https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle)](https://www.kaggle.com/mhassansaboor) 
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/muhammad-hassan-saboor/)  
[![Facebook](https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook)](https://www.facebook.com/profile.php?id=61555194218257) 
[![Twitter/X](https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter)](https://twitter.com/MUHAMMA84929767) 
[![Instagram](https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram)](https://www.instagram.com/m_hassan_saboor/) 

# üìö Notebook Purpose
## üìä **Exploratory Data Analysis and Machine Learning**

### **Dataset Description**
This task involves predicting the **Price** of real estate properties based on various features that influence the property value. The dataset includes attributes such as property size, number of rooms, amenities, location score, and distance from the city center.

| Feature               | Description                                                                 |
|-----------------------|-----------------------------------------------------------------------------|
| üÜî **ID**             | A unique identifier for each property.                                      |
| üìê **Square_Feet**    | The area of the property in square meters.                                  |
| üõå **Num_Bedrooms**   | The number of bedrooms in the property.                                     |
| üöø **Num_Bathrooms**  | The number of bathrooms in the property.                                    |
| üè¢ **Num_Floors**     | The number of floors in the property.                                       |
| üìÖ **Year_Built**     | The year the property was built.                                            |
| üå≥ **Has_Garden**     | Indicates if the property has a garden (1 = Yes, 0 = No).                   |
| üèä **Has_Pool**       | Indicates if the property has a pool (1 = Yes, 0 = No).                     |
| üöó **Garage_Size**    | The size of the garage in square meters.                                    |
| üìç **Location_Score** | A score from 0 to 10 indicating the quality of the neighborhood.            |
| üõ£Ô∏è **Distance_to_Center** | The distance from the property to the city center in kilometers.      |
| üí∞ **Price**          | The target variable representing the price of the property.                 |

---

### **Objective** üéØ
- Develop a regression model to predict the **Price** of real estate properties based on the features provided.
- Learn the relationship between the features and the property price to make accurate predictions for unseen data.

---

### **Planned Workflow** üõ†Ô∏è
1. **üîç Exploratory Data Analysis (EDA):**
   - General descriptive statistics.
   - Distribution Analysis
   - Correlation Analysis
   - Relationship Analysis
   - Categorical Analysis
   - Time-Based Analysis
   - Spatial Analysis
   - Outlier detection and handling.
   - Hypothesis Testing

2. **üìà Regression Models:**
   - **Linear Regression**: Establish a baseline model.
   - **Lasso Regression**: Introduce regularization to reduce overfitting.
   - **Ridge Regression**: Explore models with different levels of regularization.

3. **‚öôÔ∏è Model Evaluation:**
   - Use **Root Mean Square Error (RMSE)** as the evaluation metric to measure performance.

---

### **Goal** üèÜ
To build an accurate, robust regression model that provides reliable price predictions based on the given dataset.

---



# üìö Importing Libraries for EDA

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import warnings

# üìö Importing Libraries for Machine Learning

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso

# ‚öôÔ∏è Basic Important Settings

In [3]:
warnings.filterwarnings("ignore")

# üì• Loading the Dataset

In [4]:
df = pd.read_csv("/kaggle/input/housing-prices-regression/real_estate_dataset.csv")

# üìä Dataset Overview

In [5]:
df.head()

Unnamed: 0,ID,Square_Feet,Num_Bedrooms,Num_Bathrooms,Num_Floors,Year_Built,Has_Garden,Has_Pool,Garage_Size,Location_Score,Distance_to_Center,Price
0,1,143.63503,1,3,3,1967,1,1,48,8.297631,5.935734,602134.816747
1,2,287.678577,1,2,1,1949,0,1,37,6.061466,10.827392,591425.135386
2,3,232.998485,1,3,2,1923,1,0,14,2.911442,6.904599,464478.69688
3,4,199.664621,5,2,2,1918,0,0,17,2.070949,8.284019,583105.655996
4,5,89.00466,4,3,3,1999,1,0,34,1.523278,14.648277,619879.142523


In [6]:
df.sample(5)

Unnamed: 0,ID,Square_Feet,Num_Bedrooms,Num_Bathrooms,Num_Floors,Year_Built,Has_Garden,Has_Pool,Garage_Size,Location_Score,Distance_to_Center,Price
26,27,99.918446,5,1,2,1921,1,0,48,2.762373,6.551043,563531.476928
67,68,250.549245,3,3,1,1935,1,0,45,6.365386,0.310167,685065.871514
162,163,208.382428,5,3,3,1931,1,1,26,6.360425,9.279803,783157.159525
418,419,218.129614,1,1,3,1974,1,1,40,2.914957,12.198831,585468.451937
374,375,67.797162,4,1,1,1959,1,1,15,7.394082,17.392122,532377.580195


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   ID                  500 non-null    int64  
 1   Square_Feet         500 non-null    float64
 2   Num_Bedrooms        500 non-null    int64  
 3   Num_Bathrooms       500 non-null    int64  
 4   Num_Floors          500 non-null    int64  
 5   Year_Built          500 non-null    int64  
 6   Has_Garden          500 non-null    int64  
 7   Has_Pool            500 non-null    int64  
 8   Garage_Size         500 non-null    int64  
 9   Location_Score      500 non-null    float64
 10  Distance_to_Center  500 non-null    float64
 11  Price               500 non-null    float64
dtypes: float64(4), int64(8)
memory usage: 47.0 KB


In [8]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
ID,500.0,250.5,144.481833,1.0,125.75,250.5,375.25,500.0
Square_Feet,500.0,174.640428,74.672102,51.265396,110.319923,178.290937,239.03122,298.241199
Num_Bedrooms,500.0,2.958,1.440968,1.0,2.0,3.0,4.0,5.0
Num_Bathrooms,500.0,1.976,0.820225,1.0,1.0,2.0,3.0,3.0
Num_Floors,500.0,1.964,0.802491,1.0,1.0,2.0,3.0,3.0
Year_Built,500.0,1957.604,35.491781,1900.0,1926.0,1959.0,1988.0,2022.0
Has_Garden,500.0,0.536,0.499202,0.0,0.0,1.0,1.0,1.0
Has_Pool,500.0,0.492,0.500437,0.0,0.0,0.0,1.0,1.0
Garage_Size,500.0,30.174,11.582575,10.0,20.0,30.0,41.0,49.0
Location_Score,500.0,5.16441,2.853489,0.004428,2.76065,5.206518,7.732933,9.995439


# üîç Exploratory Data Analysis (EDA)

## üìà General Descriptive Statistics

In [9]:
stats = df.describe().T
stats["range"] = stats["max"] - stats["min"]

# Plotly table
stats_table = go.Figure(
    data=[
        go.Table(
            header=dict(
                values=["Feature"] + list(stats.columns),
                font=dict(color="white"),
                fill_color="black",
                align="left",
            ),
            cells=dict(
                values=[stats.index,
                        stats["count"],stats["mean"],stats["std"],stats["min"],stats["25%"],
                        stats["50%"],stats["75%"],stats["max"],stats["range"],
                ],
                font=dict(color="white"),
                fill_color="gray",
                align="left",
            ),
        )
    ]
)
stats_table.show()

## üìä Distribution Analysis

#### üìä Distribution of Price

In [10]:
fig_price = px.histogram(
    df,
    x="Price",
    nbins=50,
    title="Distribution of House Prices",
    template="plotly_dark",
)
fig_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Price",
    yaxis_title="Count",
)
fig_price.show()

#### üìä Distribution of Square_Feet

In [11]:
fig_square_feet = px.histogram(
    df,
    x="Square_Feet",
    nbins=50,
    title="Distribution of Square Feet",
    template="plotly_dark",
)
fig_square_feet.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Square Feet",
    yaxis_title="Count",
)
fig_square_feet.show()

#### üìä Distribution of Location_Score

In [12]:
fig_location_score = px.histogram(
    df,
    x="Location_Score",
    nbins=10,
    title="Distribution of Location Score",
    template="plotly_dark",
)
fig_location_score.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Location Score",
    yaxis_title="Count",
)
fig_location_score.show()

#### üìä Distribution of Distance_to_Center

In [13]:
fig_distance_to_center = px.histogram(
    df,
    x="Distance_to_Center",
    nbins=20,
    title="Distribution of Distance to Center",
    template="plotly_dark",
)
fig_distance_to_center.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Distance to Center (km)",
    yaxis_title="Count",
)
fig_distance_to_center.show()

## üîó Correlation Analysis

#### üü¶‚¨úüü•‚¨úüü© Heatmap for the correlation matrix

In [14]:
correlation_matrix = df.corr()

fig_corr = ff.create_annotated_heatmap(
    z=correlation_matrix.values,
    x=correlation_matrix.columns.tolist(),
    y=correlation_matrix.columns.tolist(),
    annotation_text=np.round(correlation_matrix.values, 2),
    colorscale="Blues",
    showscale=True,
)

# Update layout for a dark background
fig_corr.update_layout(
    title="Correlation Matrix",
    template="plotly_dark",
    paper_bgcolor="black",
    font=dict(color="white"),
)
fig_corr.show()

In [15]:
# Highlighting features with high correlation to Price (threshold = 0.5)
high_corr_to_price = correlation_matrix["Price"][
    correlation_matrix["Price"].abs() > 0.5
].sort_values(ascending=False)

print("Features with high correlation to Price (|correlation| > 0.5):")
print(high_corr_to_price)

Features with high correlation to Price (|correlation| > 0.5):
Price           1.000000
Num_Bedrooms    0.563973
Square_Feet     0.558604
Name: Price, dtype: float64


## üîó Relationship Analysis

#### üîµ‚ö´ Scatter plot for Price vs. Square_Feet

In [16]:
fig_sqft = px.scatter(
    df,
    x="Square_Feet",
    y="Price",
    title="Price vs. Square Feet",
    template="plotly_dark",
    trendline="lowess",  # Add a trendline for non-linear relationships
)
fig_sqft.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Square Feet",
    yaxis_title="Price",
)
fig_sqft.show()

#### üîµ‚ö´ Scatter plot for Price vs. Num_Bedrooms

In [17]:
fig_bedrooms = px.scatter(
    df,
    x="Num_Bedrooms",
    y="Price",
    title="Price vs. Number of Bedrooms",
    template="plotly_dark",
    trendline="lowess",
)
fig_bedrooms.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Number of Bedrooms",
    yaxis_title="Price",
)
fig_bedrooms.show()

#### üîµ‚ö´ Scatter plot for Price vs. Num_Bathrooms

In [18]:
fig_bathrooms = px.scatter(
    df,
    x="Num_Bathrooms",
    y="Price",
    title="Price vs. Number of Bathrooms",
    template="plotly_dark",
    trendline="lowess",
)
fig_bathrooms.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Number of Bathrooms",
    yaxis_title="Price",
)
fig_bathrooms.show()

#### üîµ‚ö´ Scatter plot for Price vs. Garage_Size

In [19]:
fig_garage = px.scatter(
    df,
    x="Garage_Size",
    y="Price",
    title="Price vs. Garage Size",
    template="plotly_dark",
    trendline="lowess",
)
fig_garage.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Garage Size",
    yaxis_title="Price",
)
fig_garage.show()

## üìä Categorical Analysis

#### üìä Frequency and average price by Has_Garden

In [20]:
garden_data = df.groupby("Has_Garden").agg(
        Frequency=("Has_Garden", "count"),
        Avg_Price=("Price", "mean")
    ).reset_index()

fig_garden = px.bar(garden_data
    ,
    x="Has_Garden",
    y=["Frequency", "Avg_Price"],
    barmode="group",
    title="Frequency and Average Price by Garden Presence",
    template="plotly_dark",
    labels={"Has_Garden": "Has Garden", "value": "Value"},
)
fig_garden.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(tickvals=[0, 1], ticktext=["No Garden", "Has Garden"]),
    yaxis_title="Frequency / Average Price",
)
fig_garden.show()

#### üìä Frequency and average price by Has_Pool

In [21]:
fig_pool = px.bar(
    df.groupby("Has_Pool").agg(
        Frequency=("Has_Pool", "count"),
        Avg_Price=("Price", "mean")
    ).reset_index(),
    x="Has_Pool",
    y=["Frequency", "Avg_Price"],
    barmode="group",
    title="Frequency and Average Price by Pool Presence",
    template="plotly_dark",
    labels={"Has_Pool": "Has Pool", "value": "Value"},
)
fig_pool.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(tickvals=[0, 1], ticktext=["No Pool", "Has Pool"]),
    yaxis_title="Frequency / Average Price",
)
fig_pool.show()

#### üìä Frequency and average price by Num_Floors

In [22]:
fig_floors = px.bar(
    df.groupby("Num_Floors").agg(
        Frequency=("Num_Floors", "count"),
        Avg_Price=("Price", "mean")
    ).reset_index(),
    x="Num_Floors",
    y=["Frequency", "Avg_Price"],
    barmode="group",
    title="Frequency and Average Price by Number of Floors",
    template="plotly_dark",
    labels={"Num_Floors": "Number of Floors", "value": "Value"},
)
fig_floors.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    yaxis_title="Frequency / Average Price",
)
fig_floors.show()

## ‚è≥Time-Based Analysis

#### üìà Line plot for average Price with respect to Year_Built

In [23]:
# Grouping data by Year_Built for average Price
avg_price_by_year = df.groupby("Year_Built")["Price"].mean().reset_index()

fig_year_price = px.line(
    avg_price_by_year,
    x="Year_Built",
    y="Price",
    title="Trends in Average Price with Respect to Year Built",
    template="plotly_dark",
    labels={"Year_Built": "Year Built", "Price": "Average Price"},
)

# Enhancing plot with a black background and style
fig_year_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis_title="Year Built",
    yaxis_title="Average Price",
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_year_price.show()

## üîµ‚ö´‚ö™ Spatial Analysis

#### üîµ‚ö´‚ö™ Scatter plot for Distance_to_Center vs. Price

In [24]:
fig_distance_price = px.scatter(
    df,
    x="Distance_to_Center",
    y="Price",
    title="Relationship Between Distance to Center and Price",
    template="plotly_dark",
    labels={"Distance_to_Center": "Distance to City Center", "Price": "Price"},
    color="Price",
    color_continuous_scale="viridis",
)

# Customize plot layout
fig_distance_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_distance_price.show()

#### üîµ‚ö´‚ö™ Scatter plot for Location_Score vs. Price

In [25]:
fig_location_price = px.scatter(
    df,
    x="Location_Score",
    y="Price",
    title="Relationship Between Location Score and Price",
    template="plotly_dark",
    labels={"Location_Score": "Location Score", "Price": "Price"},
    color="Price",
    color_continuous_scale="viridis",
)

# Customize plot layout
fig_location_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_location_price.show()

## üö® Outlier Detection

#### üö® Outlier Detection for Numerical Features

In [26]:
fig_boxplots = px.box(
    df, 
    y=["Square_Feet", "Price", "Distance_to_Center"], 
    points="outliers", 
    template="plotly_dark", 
    title="Outlier Detection for Numerical Features"
)
fig_boxplots.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
)
fig_boxplots.show()

#### üö® Boxplot for Price to detect outliers

In [27]:
fig_price_outliers = px.box(
    df,
    y="Price",
    title="Outlier Detection in Price",
    template="plotly_dark",
    labels={"Price": "Price"},
    color_discrete_sequence=["cyan"],
)

# Customize layout for Price outlier detection
fig_price_outliers.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    yaxis=dict(showgrid=False),
)
fig_price_outliers.show()

#### üö® Boxplot for Square_Feet to detect outliers

In [28]:
fig_sqft_outliers = px.box(
    df,
    y="Square_Feet",
    title="Outlier Detection in Square Feet",
    template="plotly_dark",
    labels={"Square_Feet": "Square Feet"},
    color_discrete_sequence=["magenta"],
)

# Customize layout for Square_Feet outlier detection
fig_sqft_outliers.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    yaxis=dict(showgrid=False),
)
fig_sqft_outliers.show()

## üìä Segmentation and Filtering

#### üìä Bar plot for Price segmented by Num_Bedrooms

In [29]:
fig_bedrooms_price = px.bar(
    df,
    x="Num_Bedrooms",
    y="Price",
    title="Price Differences by Number of Bedrooms",
    template="plotly_dark",
    labels={"Num_Bedrooms": "Number of Bedrooms", "Price": "Price"},
    color="Num_Bedrooms",
    color_continuous_scale="viridis",
)

# Customize plot layout for Num_Bedrooms segmentation
fig_bedrooms_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_bedrooms_price.show()

#### üìä Bar plot for Price segmented by Num_Bathrooms

In [30]:
fig_bathrooms_price = px.bar(
    df,
    x="Num_Bathrooms",
    y="Price",
    title="Price Differences by Number of Bathrooms",
    template="plotly_dark",
    labels={"Num_Bathrooms": "Number of Bathrooms", "Price": "Price"},
    color="Num_Bathrooms",
    color_continuous_scale="plasma",
)

# Customize plot layout for Num_Bathrooms segmentation
fig_bathrooms_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_bathrooms_price.show()

#### üìä Box plot for Price filtered by Garage_Size

In [31]:
fig_garage_price = px.box(
    df,
    x="Garage_Size",
    y="Price",
    title="Effect of Garage Size on Pricing",
    template="plotly_dark",
    labels={"Garage_Size": "Garage Size", "Price": "Price"},
    color="Garage_Size",
)

# Customize plot layout for Garage_Size filtering
fig_garage_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_garage_price.show()

#### üìä Box plot for Price filtered by Has_Pool

In [32]:
fig_pool_price = px.box(
    df,
    x="Has_Pool",
    y="Price",
    title="Effect of Having a Pool on Pricing",
    template="plotly_dark",
    labels={"Has_Pool": "Has Pool (1=Yes, 0=No)", "Price": "Price"},
    color="Has_Pool",
)

# Customize plot layout for Has_Pool filtering
fig_pool_price.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_pool_price.show()

## ‚öñÔ∏è Hypothesis Testing

In [33]:
garden_stats = df.groupby("Has_Garden")["Price"].agg(["mean", "std"]).reset_index()
garden_stats["Error"] = garden_stats["std"] / (len(df) ** 0.5)

#### üìä Bar plot with error bars for Has_Garden

In [34]:
fig_garden = px.bar(
    garden_stats,
    x="Has_Garden",
    y="mean",
    error_y="Error",
    title="Average Price with/without Garden",
    template="plotly_dark",
    labels={"Has_Garden": "Has Garden (1=Yes, 0=No)", "mean": "Average Price"},
    color="Has_Garden",
    color_continuous_scale="viridis",
)

fig_garden.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_garden.show()

#### üìä Box plot for Has_Pool

In [35]:
fig_pool = px.box(
    df,
    x="Has_Pool",
    y="Price",
    title="Price Distribution with/without Pool",
    template="plotly_dark",
    labels={"Has_Pool": "Has Pool (1=Yes, 0=No)", "Price": "Price"},
    color="Has_Pool",  # Use color for categorical coloring
)

# Customize layout
fig_pool.update_layout(
    paper_bgcolor="black",
    font=dict(color="white"),
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
)
fig_pool.show()

# ‚öôÔ∏è Machine Learning

In [36]:
df['Price'].describe().T

count       500.000000
mean     582209.629529
std      122273.390345
min      276892.470136
25%      503080.344140
50%      574724.113347
75%      665942.301274
max      960678.274291
Name: Price, dtype: float64

## üõ†Ô∏è Preprocessing the data before applying the algorithm for Machine Learning

In [37]:
# Features and target
X = df.drop(columns=["Price", "ID"])
y = df["Price"]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardizing the data (for algorithms sensitive to scaling)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

#### üìâ Linear Regressoin

In [38]:
# Train Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

# Predictions
y_pred = lr_model.predict(X_test)

# Evaluate with RMSE
rmse_lr = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Linear Regression RMSE: {rmse_lr}")

Linear Regression RMSE: 20922.006588954922


#### üìâ Ridge Regression

In [39]:
# Ridge Regression
ridge_model = Ridge(alpha=1.0)  # Adjust alpha as needed
ridge_model.fit(X_train_scaled, y_train)
y_pred_ridge = ridge_model.predict(X_test_scaled)
rmse_ridge = np.sqrt(mean_squared_error(y_test, y_pred_ridge))
print(f"Ridge Regression RMSE: {rmse_ridge}")

Ridge Regression RMSE: 21015.145594061134


#### üìâ Lasso Regression

In [40]:
# Lasso Regression
lasso_model = Lasso(alpha=0.1)  # Adjust alpha as needed
lasso_model.fit(X_train_scaled, y_train)
y_pred_lasso = lasso_model.predict(X_test_scaled)
rmse_lasso = np.sqrt(mean_squared_error(y_test, y_pred_lasso))
print(f"Lasso Regression RMSE: {rmse_lasso}")

Lasso Regression RMSE: 20922.09560262965


`Note` Now we can clearly see that the **RMSE** of all these algrorithms is quite same.

# üí¨ Thank You for Exploring!

I hope this notebook provided valuable insights into the **Real State Prices** through advanced visualizations and analysis. Your journey here reflects a shared passion for uncovering stories hidden within data.

If you found this work helpful or have suggestions for improvement, feel free to leave feedback. Together, we can make data exploration even more impactful. üåü

Happy Analyzing! üöÄ

### üë®‚Äçüíª Muhammad Hassan Saboor