### Project Name: *Housing Data Analysis: Prediction, Segmentation, and Trends*			
### Authors
- [Ahmed Abdullah](https://github.com/ahmedembeddedx)
- [Zaeem ul Islam](https://github.com/mightyflavor)

# üè° **Unlocking the Secrets of Real Estate: A Journey into Homeownership and Beyond!**

![Home Image](https://i.ibb.co/QCjZjh6/Apartment-Square-Karachi-Gulshan-e-Iqbal.jpg)


# Machine Learning Modeling and Evaluation

## 1 *Importing Necessary Libraries*

In [29]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
import numpy as np
import tkinter as tk
from tkinter import ttk

## 2 *Reading Preprocessed Data ready to model*

In [30]:
data=pd.read_csv("cleaned_data.csv")

## 3 *Applying Gaussian Boosting Regressor* 

In [31]:
# Our targeted variable
X = data.drop('price', axis=1)
y = data['price']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Gradient Boosting Regressor model
gb_regressor = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Fit the model to the training data
gb_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = gb_regressor.predict(X_test)

## 4 *Results* 

In [32]:

print(f'R^2 Score: {r2_score(y_test, y_pred)}')
print(f'Correlation Coefficient (r): {np.corrcoef(y_test, y_pred)[0, 1]}')
print(f'Root Mean Squared Error (RMSE): {np.sqrt(mean_squared_error(y_test, y_pred))}')
print(f'Mean Squared Error (MSE): {mean_squared_error(y_test, y_pred)}')
print(f'Mean Absolute Error (MAE): {mean_absolute_error(y_test, y_pred)}')

R^2 Score: 0.8553317718765172
Correlation Coefficient (r): 0.9249815520494189
Root Mean Squared Error (RMSE): 0.06733343609963005
Mean Squared Error (MSE): 0.0045337916169829635
Mean Absolute Error (MAE): 0.04924091544847553


## 5 *Model Accuracy Evaluation*

### R¬≤ Score:
The R¬≤ score is a measure of how well the model's predictions match the actual values. In this case, the R¬≤ score is approximately 0.86, indicating that the model explains about 86% of the variance in the target variable. A higher R¬≤ score suggests a better fit of the model to the data.

### Correlation Coefficient (r):
The correlation coefficient measures the strength and direction of a linear relationship between the predicted and actual values. With a correlation coefficient of approximately 0.92, there is a strong positive linear relationship, indicating that as the predicted values increase, the actual values tend to increase as well.

### Root Mean Squared Error (RMSE):
The RMSE provides an estimate of the average deviation between predicted and actual values. In this case, the RMSE is approximately 0.067, suggesting that, on average, the model's predictions deviate by about 0.067 units from the actual values.

### Mean Squared Error (MSE):
The MSE is similar to the RMSE but without the square root. It represents the average of the squared differences between predicted and actual values. The MSE is approximately 0.0045, providing another measure of the model's accuracy.

### Mean Absolute Error (MAE):
The MAE measures the average absolute difference between predicted and actual values. With a value of approximately 0.049, the model's predictions, on average, deviate by about 0.049 units from the actual values.

## 6 *Conclusion*
Overall, the model demonstrates high accuracy, as evidenced by the high R¬≤ score, strong correlation coefficient, and relatively low RMSE, MSE, and MAE values. These metrics collectively indicate that the model performs well in predicting the target variable, with a strong linear relationship and minimal prediction errors.


## 7 *Ready to use 'desktop application'*

In [11]:
def validate_dropdown(value, options):
    return value in options

def calculate_features():
    values = [int(dropdown.get()) for dropdown in dropdowns if dropdown.get().isdigit()]
    
    if all(validate_dropdown(dropdown.get(), dropdown["values"]) for dropdown in dropdowns):
        result_var.set(f"Sum of selected values: {sum(values)}")
    else:
        result_var.set("Error: Please check input bounds.")

In [12]:

# Tkinter setup
root = tk.Tk()
root.title("House Price Predicting Model")
root.geometry("600x700")  # Adjusted size
root.configure(bg="white") 
# Title
title_label1 = ttk.Label(root, text="House Price Prediction Model", font=("Calibri", 30, "bold"))
title_label1.grid(row=0, column=0, columnspan=1, pady=8, padx=10)

# Styling
style = ttk.Style()
style.configure("TLabel", foreground="black", background='white', font=("Calibri", 15))  # Increased font size
style.configure("TButton", foreground="green", background='white', font=("Calibri", 15, "bold"))  # Increased font size


In [13]:
# Labels and Dropdown Menus
labels = ["BEDROOMS", 
          "BATHROOMS", 
          "SQ. LIVING", 
          "SQ. LOT", 
          "FLOORS",
          "CONDITION", 
          "GRADE", 
          "SQ. ABOVE (TOP FLOOR)", 
          "SQ. BELOW (BASEMENT)",
          "YEAR BUILT", 
          "YEAR RENOVATED", 
          "SQ. LIVING'15", 
          "SQ. LOT'15"]

dropdowns = [ttk.Combobox(root, values=[str(i) for i in range(1, 9)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1, 9)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1000, 6001)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1000, 8001)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1, 4)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1, 8)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(4, 11)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1000, 6001)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1000, 6001)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1900, 2021)], state="readonly"),
             ttk.Combobox(root, values=["Nil"] + [str(i) for i in range(1950, 2021)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1000, 6001)], state="readonly"),
             ttk.Combobox(root, values=[str(i) for i in range(1000, 6001)], state="readonly")]


In [14]:
for i, (label, dropdown) in enumerate(zip(labels, dropdowns)):
    ttk.Label(root, text=label).grid(row=i+2, column=0, sticky="w", padx=10, pady=5)
    dropdown.grid(row=i+2, column=0, sticky="e", padx=10, pady=0)

# Button to Validate and Process
calculate_button = ttk.Button(root, text=" PREDICT THE PRICE ", command=calculate_features)
calculate_button.grid(row=len(labels)+2, sticky="w", column=0,padx=10, pady=10)  # Removed columnspan
 
# Result display
result_var = tk.StringVar()
result_label = ttk.Label(root, textvariable=result_var, foreground="red", font=("Calibri", 15))  # Increased font size
result_label.grid(row=len(labels)+3, column=0, columnspan=2,sticky="w", pady=10, padx=10)

In [15]:
# Start the Tkinter event loop
root.mainloop()