# **House Price Prediction**
---
Basic Regression Model
---
This notebook builds a basic house price prediction model using Support Vector Regression (SVR) from scikit-learn as:
1. Imports necessary libraries for data handling, modeling, and evaluation.
2. Loads the Boston housing dataset (note: this will fail in scikit-learn ≥1.2).
3. Shuffles the dataset for randomness.
4. Splits the data into training (80%) and testing (20%) sets.
5. Creates and trains an SVR model with a linear kernel.
6. Evaluates the model’s performance using mean squared error and explained variance score.
7. Predicts the price for a sample test datapoint and prints the result.

# Import Libraries

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, explained_variance_score
from sklearn.utils import shuffle

# Loading Data

In [2]:
# data = datasets.load_boston()
# `load_boston` has been removed from scikit-learn since version 1.2

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 1
----> 1 data = datasets.load_boston()

File c:\Users\aashi\GitHub\9781786464392\venv\Lib\site-packages\sklearn\datasets\__init__.py:161, in __getattr__(name)
    110 if name == "load_boston":
    111     msg = textwrap.dedent(
    112         """
    113         `load_boston` has been removed from scikit-learn since version 1.2.
   (...)    159         """
    160     )
--> 161     raise ImportError(msg)
    162 try:
    163     return globals()[name]

ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
    target = raw_df.values[1::2, 2]

Alternative datasets include the California housing dataset and the
Ames housing dataset. You can load the datasets as follows::

    from sklearn.datasets import fetch_california_housing
    housing = fetch_california_housing()

for the California housing dataset and::

    from sklearn.datasets import fetch_openml
    housing = fetch_openml(name="house_prices", as_frame=True)

for the Ames housing dataset.

[1] M Carlisle.
"Racist data destruction?"
<https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8>

[2] Harrison Jr, David, and Daniel L. Rubinfeld.
"Hedonic housing prices and the demand for clean air."
Journal of environmental economics and management 5.1 (1978): 81-102.
<https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>

In [3]:
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()



This will load a similar regression dataset suitable for house price prediction tasks.

# Shuffle the data

In [4]:
X, y = shuffle(data.data, data.target, random_state=7)

# Split the data into training and testing datasets 

In [5]:
num_training = int(0.8 * len(X))
X_train, y_train = X[:num_training], y[:num_training]
X_test, y_test = X[num_training:], y[num_training:]

# Create Support Vector Regression model

In [6]:
sv_regressor = SVR(kernel='linear', C=1.0, epsilon=0.1)

# Train Support Vector Regressor

In [None]:
sv_regressor.fit(X_train, y_train)

# Evaluate performance of Support Vector Regressor

In [None]:
y_test_pred = sv_regressor.predict(X_test)
mse = mean_squared_error(y_test, y_test_pred)
evs = explained_variance_score(y_test, y_test_pred) 
print("\n#### Performance ####")
print("Mean squared error =", round(mse, 2))
print("Explained variance score =", round(evs, 2))

# Test the regressor on test datapoint

In [None]:
test_data = [3.7, 0, 18.4, 1, 0.87, 5.95, 91, 2.5052, 26, 666, 20.2, 351.34, 15.27]
print("\nPredicted price:", sv_regressor.predict([test_data])[0])