HOUSE PRICE PREDICTION:


1. Load the Boston housing dataset.
2. Preprocess the dataset (e.g., handle missing values, normalize features).
3. Split the dataset into training and testing sets.
4. Train a linear regression model using the training set.
5. Develop a GUI for user input of housing features.
6. Predict house prices based on user input using the trained model.
7. Visualize the prediction results using a ba

INDEPENDENT VARIABLES(X):

1.CRIM: Per capita crime rate by town.
2.ZN: Proportion of residential land zoned for lots over 25,000 sq. ft.
3.INDUS: Proportion of non-retail business acres per town.
4.CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise).
5.NOX: Nitric oxide concentration (parts per 10 million).
6.RM: Average number of rooms per dwelling.
7.AGE: Proportion of owner-occupied units built prior to 1940.
8.DIS: Weighted distances to five Boston employment centers.
9.RAD: Index of accessibility to radial highways.
10.TAX: Full-value property tax rate per $10,000.
11.PTRATIO: Pupil-teacher ratio by town.
12.B: 1000(Bk - 0.63)^2 where Bk is the proportion of Black residents by town.
13.ISTAT: Percentage of lower status of the population.

DEPENDENT VARIABLE(Y):

1.MEDV: Median value of owner-occupied homes in $1000s (target variable).

In [1]:
import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv('BostonHousing.csv')

In [3]:
df.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [4]:
df.shape

(506, 14)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   crim     506 non-null    float64
 1   zn       506 non-null    float64
 2   indus    506 non-null    float64
 3   chas     506 non-null    int64  
 4   nox      506 non-null    float64
 5   rm       506 non-null    float64
 6   age      506 non-null    float64
 7   dis      506 non-null    float64
 8   rad      506 non-null    int64  
 9   tax      506 non-null    int64  
 10  ptratio  506 non-null    float64
 11  b        506 non-null    float64
 12  lstat    506 non-null    float64
 13  medv     506 non-null    float64
dtypes: float64(11), int64(3)
memory usage: 55.5 KB


In [6]:
for column in df.columns:
    print(df[column].value_counts())
    print("*"*20)

crim
0.01501     2
14.33370    2
0.03466     1
0.03113     1
0.03049     1
           ..
1.51902     1
1.83377     1
1.46336     1
1.27346     1
0.04741     1
Name: count, Length: 504, dtype: int64
********************
zn
0.0      372
20.0      21
80.0      15
22.0      10
12.5      10
25.0      10
40.0       7
45.0       6
30.0       6
90.0       5
95.0       4
60.0       4
21.0       4
33.0       4
55.0       3
70.0       3
34.0       3
52.5       3
35.0       3
28.0       3
75.0       3
82.5       2
85.0       2
17.5       1
100.0      1
18.0       1
Name: count, dtype: int64
********************
indus
18.10    132
19.58     30
8.14      22
6.20      18
21.89     15
        ... 
1.21       1
1.38       1
3.75       1
0.46       1
2.31       1
Name: count, Length: 76, dtype: int64
********************
chas
0    471
1     35
Name: count, dtype: int64
********************
nox
0.538    23
0.713    18
0.437    17
0.871    16
0.624    15
         ..
0.394     1
0.518     1
0.385     1
0.3

In [7]:
df.describe()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


In [9]:
import tkinter as tk
from tkinter import messagebox
from tkinter import ttk
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg

In [10]:
df = pd.read_csv('D:\\Quant\\House\\BostonHousing.csv')
X = df[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'b', 'lstat']]
y = df['medv']

In [11]:
model = LinearRegression()
model.fit(X, y)

In [12]:
def predict():
    # Get user inputs
    inputs = []
    for entry in entries:
        inputs.append(float(entry.get()))
    
    # Predict using the model
    global prediction
    prediction = model.predict([inputs])[0]
    
    # Display prediction
    messagebox.showinfo("Prediction", f"The predicted MEDV value is: {prediction}")

In [13]:
def visualize():
    # Get user inputs
    inputs = []
    for entry in entries:
        inputs.append(float(entry.get()))
    
    # Ensure the prediction is made
    try:
        prediction
    except NameError:
        messagebox.showwarning("Warning", "Please make a prediction first.")
        return
    
    # Plotting
    plot_prediction(inputs, prediction)

In [14]:
def plot_prediction(inputs, prediction):
    # Prepare plot data
    features = X.columns
    x_values = range(1, len(features) + 1)
    input_values = inputs
    
    # Create figure and plot
    fig, ax = plt.subplots(figsize=(8, 6))
    ax.bar(x_values, input_values, label='Input Values', color='b', alpha=0.5)
    ax.scatter(len(features) + 1, prediction, color='r', marker='o', label='Predicted MEDV')
    
    ax.set_xlabel('Features')
    ax.set_ylabel('Values')
    ax.set_title('Input Values and Predicted MEDV')
    ax.set_xticks(list(x_values) + [len(features) + 1])
    ax.set_xticklabels(features.tolist() + ['Predicted MEDV'], rotation=45)
    
    ax.legend()
    
    # Create new window for plot
    new_window = tk.Toplevel(root)
    new_window.title("Prediction Graph")
    
    canvas = FigureCanvasTkAgg(fig, master=new_window)
    canvas.get_tk_widget().pack(fill=tk.BOTH, expand=True)
    canvas.draw()

In [15]:
root = tk.Tk()
root.title("Machine Learning Prediction Model")

''

In [16]:
entries = []
for i, col in enumerate(X.columns):
    label = ttk.Label(root, text=col)
    label.grid(row=i, column=0, padx=10, pady=5)
    
    entry = ttk.Entry(root)
    entry.grid(row=i, column=1, padx=10, pady=5)
    
    entries.append(entry)


In [17]:
# Create Predict button
predict_button = ttk.Button(root, text="Predict", command=predict)
predict_button.grid(row=len(X.columns), columnspan=2, padx=10, pady=10)

# Create Visualize button
visualize_button = ttk.Button(root, text="Visualize", command=visualize)
visualize_button.grid(row=len(X.columns)+1, columnspan=2, padx=10, pady=10)

root.mainloop()