# Scenario

Imagine you are a real estate analyst, and you want to predict the selling price of houses in a particular neighborhood. The price of a house (dependent variable) can depend on several factors (independent variables), such as:

    Size of the house (in square feet)
    Number of bedrooms
    Number of bathrooms
    Age of the house
    Distance to the nearest city center (in miles)

## Model

To build a multiple linear regression model, you would use the equation:

House Price=β0 + β1×House Size+ β2×Number of Bedrooms+ β3×Number of Bathrooms+ β4×Age of the House+ β5×Distance to City Center+ ϵ

Where:

    β0 is the intercept
    β1,β2,β3,β4,β5 are the coefficients for each independent variable
    ϵ is the error term

## Steps to Perform MLR

     Data Collection:
     
    Gather historical data on house prices and the corresponding features (size, bedrooms,     bathrooms, age, distance).

     Data Preprocessing: 
     
    Clean the data by handling missing values, outliers, and ensuring all data is in a         suitable format for analysis.

    Model Training: 
    
    Split the data into training and testing sets. Use the training set to fit the multiple     linear regression model and estimate the coefficients.

    Model Evaluation:
    
    Use the testing set to evaluate the model's performance by checking metrics like R-         squared, Mean Absolute Error (MAE), and Mean Squared Error (MSE).

    Prediction: 
    
    Use the trained model to predict house prices for new data.

## Step 1 : Importing all Relevant Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

## Step 2: Collecting Data

In [2]:
# Sample data
data = {
    'House Size': [2000, 1500, 2500, 1800, 2200],
    'Number of Bedrooms': [3, 2, 4, 3, 4],
    'Number of Bathrooms': [2, 1, 3, 2, 3],
    'Age of the House': [10, 5, 20, 15, 8],
    'Distance to City Center': [5, 3, 10, 7, 6],
    'House Price': [300000, 200000, 400000, 250000, 350000]
}


## Step 3: Feature Engineering

In [3]:
df = pd.DataFrame(data)

# Features and target variable
X = df[['House Size', 'Number of Bedrooms', 'Number of Bathrooms', 'Age of the House', 'Distance to City Center']]
y = df['House Price']

## Step 4: Split data into training and testing sets

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 5: Model Selection and Fitting

In [5]:
model = LinearRegression()
model.fit(X_train, y_train)

## Make predictions


In [6]:
y_pred = model.predict(X_test)

## Step 6: Evaluating the Model

In [7]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 26009914.067432713
R-squared: nan




In [8]:
# Print the coefficients
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)

Intercept: -131228.71399360884
Coefficients: [  219.06383696   877.65285258   877.65285258 -1672.34293504
  1087.24103332]


In [9]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate a larger synthetic dataset
np.random.seed(42)
n_samples = 100

house_size = np.random.randint(1000, 4000, n_samples)
bedrooms = np.random.randint(1, 6, n_samples)
bathrooms = np.random.randint(1, 4, n_samples)
house_age = np.random.randint(0, 50, n_samples)
distance_to_city = np.random.randint(1, 20, n_samples)

# Generate house prices with some noise
house_price = (
    house_size * 200 + 
    bedrooms * 5000 + 
    bathrooms * 7000 - 
    house_age * 1000 - 
    distance_to_city * 2000 + 
    np.random.normal(0, 10000, n_samples)
)

data = {
    'House Size': house_size,
    'Number of Bedrooms': bedrooms,
    'Number of Bathrooms': bathrooms,
    'Age of the House': house_age,
    'Distance to City Center': distance_to_city,
    'House Price': house_price
}

df = pd.DataFrame(data)

# Features and target variable
X = df[['House Size', 'Number of Bedrooms', 'Number of Bathrooms', 'Age of the House', 'Distance to City Center']]
y = df['House Price']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

# Print the coefficients
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)


Mean Squared Error: 91119496.56747133
R-squared: 0.9965744372726427
Intercept: -10593.98945441487
Coefficients: [  202.43186409  4719.83528624  9070.76270508 -1011.35948176
 -1734.67714872]


In [10]:
# Example single house features
single_house = {
    'House Size': 2500,
    'Number of Bedrooms': 3,
    'Number of Bathrooms': 2,
    'Age of the House': 10,
    'Distance to City Center': 5
}

# Convert the dictionary to a DataFrame
single_house_df = pd.DataFrame([single_house])

# Make a prediction for the single house
predicted_price = model.predict(single_house_df)

print("Predicted House Price for the single house:", predicted_price[0])


Predicted House Price for the single house: 508999.72149014875
