<a href="https://colab.research.google.com/github/1070rahul/1070rahul/blob/main/End_to_End_ML_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# House Price Prediction Using Linear Regression
    Dated - 7/4/2024
    By- Rahul Sati

# Aim of the Project
The primary objective is to develop a predictive model that accurately forecasts the house price per unit area based on various features like the property’s age, its proximity to key amenities (MRT stations and convenience stores), and its geographical location. The project involves  training a linear regression model, and deploying the model through a web application using Dash, making it accessible for user interaction.

# Introduction
House price prediction is a classic example of regression problems in machine learning. The ability to accurately predict real estate prices based on property attributes can significantly aid buyers, sellers, and investors in making informed decisions. This project seeks to demonstrate the process of building a predictive model and deploying it for real-world use.

# Dataset Description:
1. Transaction date: Date of the property transaction.
2. House age: Age of the property in years.
3. Distance to the nearest MRT station: Proximity to the nearest Mass Rapid Transit station in meters, is a key factor considering convenience and accessibility.
4. Number of convenience stores: Count of convenience stores in the vicinity, indicating the property’s accessibility to basic amenities.
5. Latitude and Longitude: Geographical coordinates of the property, reflecting its location.
6. House price of unit area: The target variable, represents the house price per unit area.

# Methodology:
1. Loading data: dataset with a linear relationship between house size and price.
2. Model Training: Implement the LinearRegression model from sklearn to learn from the dataset.
3. Model Evaluation: Assess the model’s performance using appropriate metrics.
4. Web Application: Develop a Dash application to deploy the model, allowing users to input house size and receive price predictions.

In [28]:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
# Load the dataset
df = pd.read_csv('/content/Real_Estate.csv')

In [None]:
df.head()

Unnamed: 0,Transaction date,House age,Distance to the nearest MRT station,Number of convenience stores,Latitude,Longitude,House price of unit area
0,2012-09-02 16:42:30.519336,13.3,4082.015,8,25.007059,121.561694,6.488673
1,2012-09-04 22:52:29.919544,35.5,274.0144,2,25.012148,121.54699,24.970725
2,2012-09-05 01:10:52.349449,1.1,1978.671,10,25.00385,121.528336,26.694267
3,2012-09-05 13:26:01.189083,22.2,1055.067,5,24.962887,121.482178,38.091638
4,2012-09-06 08:29:47.910523,8.5,967.4,6,25.011037,121.479946,21.65471


In [None]:
# check the shape of dataset
df.shape

(414, 7)

In [None]:
# check the info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 414 entries, 0 to 413
Data columns (total 7 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Transaction date                     414 non-null    object 
 1   House age                            414 non-null    float64
 2   Distance to the nearest MRT station  414 non-null    float64
 3   Number of convenience stores         414 non-null    int64  
 4   Latitude                             414 non-null    float64
 5   Longitude                            414 non-null    float64
 6   House price of unit area             414 non-null    float64
dtypes: float64(5), int64(1), object(1)
memory usage: 22.8+ KB


In [None]:
df.columns

Index(['Transaction date', 'House age', 'Distance to the nearest MRT station',
       'Number of convenience stores', 'Latitude', 'Longitude',
       'House price of unit area'],
      dtype='object')

The transaction date column should be in datetime format and we can create a new columns year and month from it


In [None]:
df['Transaction date'] = pd.to_datetime(df['Transaction date'])

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 414 entries, 0 to 413
Data columns (total 7 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 0   Transaction date                     414 non-null    datetime64[ns]
 1   House age                            414 non-null    float64       
 2   Distance to the nearest MRT station  414 non-null    float64       
 3   Number of convenience stores         414 non-null    int64         
 4   Latitude                             414 non-null    float64       
 5   Longitude                            414 non-null    float64       
 6   House price of unit area             414 non-null    float64       
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 22.8 KB


In [None]:
# create a new feature year and month from the dataset
df['year'] = df['Transaction date'].apply(lambda x: pd.to_datetime(x).year)
df['month'] = df['Transaction date'].dt.month

In [None]:
df.head()

Unnamed: 0,Transaction date,House age,Distance to the nearest MRT station,Number of convenience stores,Latitude,Longitude,House price of unit area,year,month
0,2012-09-02 16:42:30.519336,13.3,4082.015,8,25.007059,121.561694,6.488673,2012,9
1,2012-09-04 22:52:29.919544,35.5,274.0144,2,25.012148,121.54699,24.970725,2012,9
2,2012-09-05 01:10:52.349449,1.1,1978.671,10,25.00385,121.528336,26.694267,2012,9
3,2012-09-05 13:26:01.189083,22.2,1055.067,5,24.962887,121.482178,38.091638,2012,9
4,2012-09-06 08:29:47.910523,8.5,967.4,6,25.011037,121.479946,21.65471,2012,9


In [None]:
df.columns

Index(['Transaction date', 'House age', 'Distance to the nearest MRT station',
       'Number of convenience stores', 'Latitude', 'Longitude',
       'House price of unit area', 'year', 'month'],
      dtype='object')

In [None]:
# Selecting features and target variables
features = ['Distance to the nearest MRT station','Number of convenience stores', 'Latitude', 'Longitude']
target = ['House price of unit area']

In [None]:
# Store features and target in X and y variables for traing and testing
X = df[features]
y = df[target]

In [None]:
# spliting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Model initialization
model = LinearRegression()

In [None]:
# training the model
model.fit(X_train, y_train)

In [None]:
!pip install dash

Collecting dash
  Downloading dash-2.16.1-py3-none-any.whl (10.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
Collecting dash-html-components==2.0.0 (from dash)
  Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Collecting dash-core-components==2.0.0 (from dash)
  Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Collecting dash-table==5.0.0 (from dash)
  Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)
Collecting retrying (from dash)
  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Installing collected packages: dash-table, dash-html-components, dash-core-components, retrying, dash
Successfully installed dash-2.16.1 dash-core-components-2.0.0 dash-html-components-2.0.0 dash-table-5.0.0 retrying-1.3.4


In [29]:
import dash
from dash import html, dcc, Input, Output,State

In [30]:
# Initialize the dash app
app = dash.Dash(__name__)

In [31]:
app.layout = html.Div([
    html.Div([
        html.H1("Real Estate Price Prediction", style={'text-align': 'center'}),

        html.Div([
            dcc.Input(id='distance_to_mrt', type='number', placeholder='Distance to MRT Station (meters)',
                      style={'margin': '10px', 'padding': '10px'}),
            dcc.Input(id='num_convenience_stores', type='number', placeholder='Number of Convenience Stores',
                      style={'margin': '10px', 'padding': '10px'}),
            dcc.Input(id='latitude', type='number', placeholder='Latitude',
                      style={'margin': '10px', 'padding': '10px'}),
            dcc.Input(id='longitude', type='number', placeholder='Longitude',
                      style={'margin': '10px', 'padding': '10px'}),
            html.Button('Predict Price', id='predict_button', n_clicks=0,
                        style={'margin': '10px', 'padding': '10px', 'background-color': '#007BFF', 'color': 'white'}),
        ], style={'text-align': 'center'}),

        html.Div(id='prediction_output', style={'text-align': 'center', 'font-size': '20px', 'margin-top': '20px'})
    ], style={'width': '50%', 'margin': '0 auto', 'border': '2px solid #007BFF', 'padding': '20px', 'border-radius': '10px'})
])

# Define callback to update output
@app.callback(
    Output('prediction_output', 'children'),
    [Input('predict_button', 'n_clicks')],
    [State('distance_to_mrt', 'value'),
     State('num_convenience_stores', 'value'),
     State('latitude', 'value'),
     State('longitude', 'value')]
)
def update_output(n_clicks, distance_to_mrt, num_convenience_stores, latitude, longitude):
    if n_clicks > 0 and all(v is not None for v in [distance_to_mrt, num_convenience_stores, latitude, longitude]):
        # Prepare the feature vector
        features = pd.DataFrame([[distance_to_mrt, num_convenience_stores, latitude, longitude]],
                                columns=['distance_to_mrt', 'num_convenience_stores', 'latitude', 'longitude'])
        # Predict
        prediction = model.predict(features)[0]
        return f'Predicted House Price of Unit Area: {prediction:.2f}'
    elif n_clicks > 0:
        return 'Please enter all values to get a prediction'
    return ''

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)

<IPython.core.display.Javascript object>