## 1. Project Overview
   - **Description:** Predicting house prices in Bangladesh using the beproperty.com dataset.
   - **Objective:** Develop an accurate and interpretable model to predict house prices based on various features.


## 2. Data Collection and Exploration


### 2.1 Data Collection
- **Accessing beproperty.com Dataset:**
  - The dataset has been accessed from beproperty.com.
  - Further steps include confirming data integrity and completeness.

- **Download and Save:**
  - The dataset has been downloaded and saved in the project's `data/raw/` directory.



In [188]:

import pandas as pd  # For data manipulation and analysis
import numpy as np  # For numerical computations
import matplotlib.pyplot as plt  # For data visualization
import seaborn as sns  # For advanced data visualization
import plotly.express as px


from sklearn.model_selection import train_test_split  # For splitting data into training and testing sets
from sklearn.preprocessing import StandardScaler  # For feature scaling
from sklearn.impute import SimpleImputer  # For handling missing values
from sklearn.pipeline import Pipeline  # For creating machine learning pipelines

# Core machine learning algorithms for regression
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

# Evaluation metrics
from sklearn.metrics import mean_squared_error, r2_score

from sklearn.preprocessing import LabelEncoder
from scipy.stats import zscore
import joblib



In [189]:
# Loading data
def load_data(file_path, file_type="csv"):
   """Loads data from the specified file path and returns a pandas DataFrame.

   """

   if file_type == "csv":
       df = pd.read_csv(file_path)
   elif file_type == "excel":
       df = pd.read_excel(file_path)
   else:
       raise ValueError("Unsupported file type.")

   return df

In [190]:
# Calling the load_data function 
data_path = "../data/raw/Bangladesh_property_prices.csv"  # Replace with the actual path to your data file
raw_data = load_data(data_path)

# Check the DataFrame:
print(raw_data.shape)

(4704, 12)


### 2.2 Data Exploration
- **Dataset Structure:**
  - The dataset consists of the following columns: `Unnamed: 0.1`, `Unnamed: 0`, `Location`, `Price`, `Type`, `No. Beds`, `No. Baths`, `Area`, `Latitude`, `Longitude`, `Region`, `Sub-region`.

- **Target Variable:**
  - The target variable for house price prediction is `Price`.

- **Potential Predictor Variables:**
  - Possible predictor variables include `Type`, `No. Beds`, `No. Baths`, `Area`, `Latitude`, `Longitude`, `Region`, `Sub-region`.

- **Data Distributions and Correlations:**
  - Further exploration is required to visualize data distributions, correlations, and potential outliers.

- **Documentation:**
  - Detailed findings and insights will be documented in the `reports/documentation.md` file.

## 3. Data Preprocessing
 

### 3.1 Cleaning

In [210]:
df = raw_data.copy()

In [211]:
df.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Location,Price,Type,No. Beds,No. Baths,Area,Latitude,Longitude,Region,Sub-region
0,0,0.0,"Sector 10, Uttara, Dhaka",7500000,Apartment,3.0,3.0,1300.0,23.86846,90.3928,Uttara,Sector 10
1,1,1.0,"Section 11, Mirpur, Dhaka",7280000,Apartment,4.0,4.0,1456.0,23.81223,90.35967,Mirpur,Section 11
2,2,2.0,"Chowdhuripara, Khilgaon, Dhaka",13000000,Apartment,3.0,3.0,1550.0,23.75349,90.42469,Khilgaon,Chowdhuripara
3,3,3.0,"Road No 4, Banani, Dhaka",37000000,Apartment,3.0,3.0,2669.0,23.78855,90.40081,Banani,Road No 4
4,4,4.0,"South Banasree Project, Banasree, Dhaka",3600000,Apartment,2.0,2.0,835.0,23.76354,90.4318,Banasree,South Banasree Project


In [212]:
# Droping features
df.drop(columns=["Unnamed: 0.1", "Unnamed: 0", "Location"], inplace=True)

In [213]:
# Set the display format to show whole numbers
pd.set_option('display.float_format', lambda x: '%.06f' % x)

In [214]:
# Display basic information about the dataset
print("Original Data Information:")
print(df.info())

Original Data Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4704 entries, 0 to 4703
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Price       4704 non-null   int64  
 1   Type        4704 non-null   object 
 2   No. Beds    4500 non-null   float64
 3   No. Baths   4500 non-null   float64
 4   Area        4704 non-null   float64
 5   Latitude    4704 non-null   float64
 6   Longitude   4704 non-null   float64
 7   Region      4704 non-null   object 
 8   Sub-region  4680 non-null   object 
dtypes: float64(5), int64(1), object(3)
memory usage: 330.9+ KB
None


In [215]:
df.isnull().sum()

Price           0
Type            0
No. Beds      204
No. Baths     204
Area            0
Latitude        0
Longitude       0
Region          0
Sub-region     24
dtype: int64

In [216]:
# Fill missing values with mean or median
df['No. Beds'].fillna(df['No. Beds'].median(), inplace=True)
df['No. Baths'].fillna(df['No. Baths'].median(), inplace=True)

In [217]:
# Handling missing values
df.dropna(inplace=True)  # Drop rows with missing values


In [218]:
# Display information after handling missing values and duplicates
print("\nData Information after Handling Missing Values and Duplicates:")
print(df.info())



Data Information after Handling Missing Values and Duplicates:
<class 'pandas.core.frame.DataFrame'>
Index: 4680 entries, 0 to 4703
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Price       4680 non-null   int64  
 1   Type        4680 non-null   object 
 2   No. Beds    4680 non-null   float64
 3   No. Baths   4680 non-null   float64
 4   Area        4680 non-null   float64
 5   Latitude    4680 non-null   float64
 6   Longitude   4680 non-null   float64
 7   Region      4680 non-null   object 
 8   Sub-region  4680 non-null   object 
dtypes: float64(5), int64(1), object(3)
memory usage: 365.6+ KB
None


In [219]:
# Outliers using Standard Deviation Method
mean = df[['Price', 'No. Beds', 'No. Baths', 'Area']].mean()
std_dev = df[['Price', 'No. Beds', 'No. Baths', 'Area']].std()
threshold = 3  # Adjust based on your preference
outliers = (df[['Price', 'No. Beds', 'No. Baths', 'Area']] - mean).abs() > threshold * std_dev


In [220]:
# Outliers using Box Plot (IQR) Method
Q1 = df[['Price', 'No. Beds', 'No. Baths', 'Area']].quantile(0.25)
Q3 = df[['Price', 'No. Beds', 'No. Baths', 'Area']].quantile(0.75)
IQR = Q3 - Q1
outliers = (df[['Price', 'No. Beds', 'No. Baths', 'Area']] < Q1 - 1.5 * IQR) | (df[['Price', 'No. Beds', 'No. Baths', 'Area']] > Q3 + 1.5 * IQR)


In [221]:
# Outliers using Z-Score Method:
z_scores = (df[['Price', 'No. Beds', 'No. Baths', 'Area']] - df[['Price', 'No. Beds', 'No. Baths', 'Area']].mean()) / df[['Price', 'No. Beds', 'No. Baths', 'Area']].std()
threshold = 3  # Adjust based on your preference
outliers = z_scores.abs() > threshold


### 3.2 Feature Engineering

In [222]:
# Price per Square Foot
df['Price_per_sqft'] = df['Price'] / df['Area']

In [223]:
# Interaction Features
df['Beds_Baths_Ratio'] = df['No. Beds'] / df['No. Baths']

In [224]:
# Binning - Convert Continuous Variable to Categorical
bins = [0, 1000, 2000, 3000, float('inf')]
labels = ['Small', 'Medium', 'Large', 'Very Large']
df['Area_Category'] = pd.cut(df['Area'], bins=bins, labels=labels, right=False)

In [225]:
# Log Transformation for Skewed Variables
df['Log_Price'] = np.log1p(df['Price'])

In [226]:
# Assuming 'Type' has categories like 'Apartment', 'Plot', etc.
#df = pd.get_dummies(df, columns=['Type'], prefix='Type', dtype=int)


In [227]:
# Instantiate LabelEncoder
label_encoder = LabelEncoder()

# Categorical to Numerical transformation
df['Type_n'] = label_encoder.fit_transform(df['Type'])
df['Region_n'] = label_encoder.fit_transform(df['Region'])
df['Sub-region_n'] = label_encoder.fit_transform(df['Sub-region'])
df['Area_Category'] = label_encoder.fit_transform(df['Area_Category'])


In [228]:
df.sample(5)

Unnamed: 0,Price,Type,No. Beds,No. Baths,Area,Latitude,Longitude,Region,Sub-region,Price_per_sqft,Beds_Baths_Ratio,Area_Category,Log_Price,Type_n,Region_n,Sub-region_n
2846,9240000,Apartment,3.0,3.0,1155.0,23.74545,90.38776,Kathalbagan,Crescent Road,8000.0,1.0,1,16.039053,0,29,79
1850,8500000,Apartment,3.0,3.0,1275.0,23.75596,90.37065,AftabNagar,Block C,6666.666667,1.0,1,15.955577,0,1,50
802,10000000,Apartment,3.0,4.0,1560.0,23.76427,90.36547,Mohammadpur,PC Culture Housing,6410.25641,0.75,1,16.118096,0,43,276
1369,8200000,Apartment,3.0,3.0,1169.0,23.75349,90.42469,Khilgaon,Chowdhuripara,7014.542344,1.0,1,15.919645,0,31,75
439,6250000,Apartment,3.0,3.0,1250.0,23.81467,90.37306,Mirpur,Kallyanpur,5000.0,1.0,1,15.648092,0,40,163


In [229]:
new_df = df[['Price', 'No. Beds', 'No. Baths', 'Area', 'Type_n', 'Region_n', 'Sub-region_n']]

In [230]:
new_df

Unnamed: 0,Price,No. Beds,No. Baths,Area,Type_n,Region_n,Sub-region_n
0,7500000,3.000000,3.000000,1300.000000,0,66,350
1,7280000,4.000000,4.000000,1456.000000,0,40,342
2,13000000,3.000000,3.000000,1550.000000,0,31,75
3,37000000,3.000000,3.000000,2669.000000,0,4,320
4,3600000,2.000000,2.000000,835.000000,0,6,406
...,...,...,...,...,...,...,...
4699,4950000,3.000000,2.000000,1100.000000,0,40,223
4700,4950000,3.000000,2.000000,1100.000000,0,40,223
4701,4950000,3.000000,2.000000,1100.000000,0,40,223
4702,4950000,3.000000,2.000000,1100.000000,0,40,223


### 3.3 Train-Test Split

In [231]:
df=new_df.copy()

In [232]:
# Define features (X) and target variable (y)
X = df.drop("Price", axis=1)  # Features
y = df["Price"]  # Target variable

# Perform train-test split
test_size = 0.2  # You can adjust the test size based on your preference
random_state = 42  # Set a random state for reproducibility
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)

# Display the shape of the resulting sets
print("Training set shape:", X_train.shape, y_train.shape)
print("Testing set shape:", X_test.shape, y_test.shape)

Training set shape: (3744, 6) (3744,)
Testing set shape: (936, 6) (936,)


### 3.4 Save Processed Data

In [117]:
import os

# Create the 'processed' directory if it doesn't exist
processed_dir = '../data/processed/'
os.makedirs(processed_dir, exist_ok=True)

# Save the processed data to 'data/processed/'
processed_data_path = os.path.join(processed_dir, 'processed_data.csv')
df.to_csv(processed_data_path, index=False)


## 4. Model Development


### 4.1 Model Selection

In [233]:
# Import necessary libraries
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor


In [234]:
# 1. Linear Regression Model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
linear_predictions = linear_model.predict(X_test)

In [235]:

# Evaluate Linear Regression model
linear_mae = mean_absolute_error(y_test, linear_predictions)
linear_mse = mean_squared_error(y_test, linear_predictions)
linear_r2 = r2_score(y_test, linear_predictions)

In [236]:

print("Linear Regression Model:")
print(f"Mean Absolute Error: {linear_mae}")
print(f"Mean Squared Error: {linear_mse}")
print(f"R-squared: {linear_r2}\n")

Linear Regression Model:
Mean Absolute Error: 3442509.7832360277
Mean Squared Error: 38339350324442.04
R-squared: 0.5817791564940148



In [237]:
# 2. Decision Tree Model
decision_tree_model = DecisionTreeRegressor(random_state=random_state)
decision_tree_model.fit(X_train, y_train)
dt_predictions = decision_tree_model.predict(X_test)

In [238]:
# Evaluate Decision Tree model
dt_mae = mean_absolute_error(y_test, dt_predictions)
dt_mse = mean_squared_error(y_test, dt_predictions)
dt_r2 = r2_score(y_test, dt_predictions)

In [239]:
print("Decision Tree Model:")
print(f"Mean Absolute Error: {dt_mae}")
print(f"Mean Squared Error: {dt_mse}")
print(f"R-squared: {dt_r2}\n")

Decision Tree Model:
Mean Absolute Error: 1910511.684115801
Mean Squared Error: 28082623193487.41
R-squared: 0.6936636051353859



In [240]:
# 3. Random Forest Model
random_forest_model = RandomForestRegressor(random_state=random_state)
random_forest_model.fit(X_train, y_train)
rf_predictions = random_forest_model.predict(X_test)

In [241]:
# Evaluate Random Forest model
rf_mae = mean_absolute_error(y_test, rf_predictions)
rf_mse = mean_squared_error(y_test, rf_predictions)
rf_r2 = r2_score(y_test, rf_predictions)

In [242]:

print("Random Forest Model:")
print(f"Mean Absolute Error: {rf_mae}")
print(f"Mean Squared Error: {rf_mse}")
print(f"R-squared: {rf_r2}")


Random Forest Model:
Mean Absolute Error: 1550345.7269322567
Mean Squared Error: 15613324938029.072
R-squared: 0.8296836573844426


In [243]:
# 4. K-Nearest Neighbors (KNN) Model
knn_model = KNeighborsRegressor()
knn_model.fit(X_train, y_train)
knn_predictions = knn_model.predict(X_test)

In [244]:
# Evaluate KNN model
knn_mae = mean_absolute_error(y_test, knn_predictions)
knn_mse = mean_squared_error(y_test, knn_predictions)
knn_r2 = r2_score(y_test, knn_predictions)


In [245]:
print("K-Nearest Neighbors (KNN) Model:")
print(f"Mean Absolute Error: {knn_mae}")
print(f"Mean Squared Error: {knn_mse}")
print(f"R-squared: {knn_r2}\n")

K-Nearest Neighbors (KNN) Model:
Mean Absolute Error: 1941915.666239316
Mean Squared Error: 20735168527090.316
R-squared: 0.7738125555531392



In [246]:
# 5. Support Vector Machine (SVM) Model
svm_model = SVR()
svm_model.fit(X_train, y_train)
svm_predictions = svm_model.predict(X_test)

In [247]:
# Evaluate SVM model
svm_mae = mean_absolute_error(y_test, svm_predictions)
svm_mse = mean_squared_error(y_test, svm_predictions)
svm_r2 = r2_score(y_test, svm_predictions)

In [248]:
print("Support Vector Machine (SVM) Model:")
print(f"Mean Absolute Error: {svm_mae}")
print(f"Mean Squared Error: {svm_mse}")
print(f"R-squared: {svm_r2}\n")

Support Vector Machine (SVM) Model:
Mean Absolute Error: 5049360.640682706
Mean Squared Error: 100335839800140.11
R-squared: -0.09450314624512823



In [249]:

# 6. Gradient Boosting Model
gradient_boosting_model = GradientBoostingRegressor(random_state=random_state)
gradient_boosting_model.fit(X_train, y_train)
gb_predictions = gradient_boosting_model.predict(X_test)

In [250]:
# Evaluate Gradient Boosting model
gb_mae = mean_absolute_error(y_test, gb_predictions)
gb_mse = mean_squared_error(y_test, gb_predictions)
gb_r2 = r2_score(y_test, gb_predictions)

In [251]:
print("Gradient Boosting Model:")
print(f"Mean Absolute Error: {gb_mae}")
print(f"Mean Squared Error: {gb_mse}")
print(f"R-squared: {gb_r2}\n")

Gradient Boosting Model:
Mean Absolute Error: 1993652.8230149613
Mean Squared Error: 16496248067779.414
R-squared: 0.8200523816077187



In [252]:
# 7. XGBoost Model
xgb_model = XGBRegressor(random_state=random_state)
xgb_model.fit(X_train, y_train)
xgb_predictions = xgb_model.predict(X_test)

In [253]:
# Evaluate XGBoost model
xgb_mae = mean_absolute_error(y_test, xgb_predictions)
xgb_mse = mean_squared_error(y_test, xgb_predictions)
xgb_r2 = r2_score(y_test, xgb_predictions)


In [254]:
print("XGBoost Model:")
print(f"Mean Absolute Error: {xgb_mae}")
print(f"Mean Squared Error: {xgb_mse}")
print(f"R-squared: {xgb_r2}\n")

XGBoost Model:
Mean Absolute Error: 1624963.4977964743
Mean Squared Error: 15999254449299.773
R-squared: 0.8254737851677351



In [255]:
# 8. LightGBM Model
lgbm_model = LGBMRegressor(random_state=random_state)
lgbm_model.fit(X_train, y_train)
lgbm_predictions = lgbm_model.predict(X_test)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.263137 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 576
[LightGBM] [Info] Number of data points in the train set: 3744, number of used features: 6
[LightGBM] [Info] Start training from score 10001098.946314


In [256]:
# Evaluate LightGBM model
lgbm_mae = mean_absolute_error(y_test, lgbm_predictions)
lgbm_mse = mean_squared_error(y_test, lgbm_predictions)
lgbm_r2 = r2_score(y_test, lgbm_predictions)


In [257]:
print("LightGBM Model:")
print(f"Mean Absolute Error: {lgbm_mae}")
print(f"Mean Squared Error: {lgbm_mse}")
print(f"R-squared: {lgbm_r2}")

LightGBM Model:
Mean Absolute Error: 1703656.169709199
Mean Squared Error: 15726989639441.65
R-squared: 0.8284437577278405


In [258]:
# Create a DataFrame to store evaluation metrics
metrics_df = pd.DataFrame(columns=['Model', 'Mean Absolute Error', 'Mean Squared Error', 'R-squared'])

# Function to add metrics for a model to the DataFrame

def add_metrics(model_name, mae, mse, r2):
    global metrics_df
    new_row = {'Model': model_name, 'Mean Absolute Error': mae, 'Mean Squared Error': mse, 'R-squared': r2}
    metrics_df = pd.concat([metrics_df, pd.DataFrame([new_row])], ignore_index=True)  # Use concat instead of append

# ... (Rest of your code)


# Add metrics for each model
add_metrics('Linear Regression', linear_mae, linear_mse, linear_r2)
add_metrics('Decision Tree', dt_mae, dt_mse, dt_r2)
add_metrics('Random Forest', rf_mae, rf_mse, rf_r2)
add_metrics('K-Nearest Neighbors (KNN)', knn_mae, knn_mse, knn_r2)
add_metrics('Support Vector Machine (SVM)', svm_mae, svm_mse, svm_r2)
add_metrics('Gradient Boosting', gb_mae, gb_mse, gb_r2)
add_metrics('XGBoost', xgb_mae, xgb_mse, xgb_r2)
add_metrics('LightGBM', lgbm_mae, lgbm_mse, lgbm_r2)

# Display the metrics DataFrame
metrics_df


Unnamed: 0,Model,Mean Absolute Error,Mean Squared Error,R-squared
0,Linear Regression,3442509.783236,38339350324442.04,0.581779
1,Decision Tree,1910511.684116,28082623193487.406,0.693664
2,Random Forest,1550345.726932,15613324938029.072,0.829684
3,K-Nearest Neighbors (KNN),1941915.666239,20735168527090.32,0.773813
4,Support Vector Machine (SVM),5049360.640683,100335839800140.1,-0.094503
5,Gradient Boosting,1993652.823015,16496248067779.414,0.820052
6,XGBoost,1624963.497796,15999254449299.771,0.825474
7,LightGBM,1703656.169709,15726989639441.65,0.828444


### 5.3 Saving models

In [260]:

# Saving Models on models folder
# List of models
models = [
    ('Linear Regression', linear_model),
    ('Decision Tree', decision_tree_model),
    ('Decision Tree', decision_tree_model),
    ('Random Forest', random_forest_model),
    ('K-Nearest Neighbors (KNN)', knn_model),
    ('Support Vector Machine (SVM)', svm_model),
    ('Gradient Boosting', gradient_boosting_model),
    ('XGBoost', xgb_model),
    ('LightGBM', lgbm_model),
    # Add other models if needed
]

# Save each model
for model_name, model in models:
    model_path = f'../models/{model_name.lower().replace(" ", "_")}_model.pkl'
    joblib.dump(model, model_path)
    print(f'{model_name} model saved to: {model_path}')


Linear Regression model saved to: ../models/linear_regression_model.pkl
Decision Tree model saved to: ../models/decision_tree_model.pkl
Decision Tree model saved to: ../models/decision_tree_model.pkl
Random Forest model saved to: ../models/random_forest_model.pkl
K-Nearest Neighbors (KNN) model saved to: ../models/k-nearest_neighbors_(knn)_model.pkl
Support Vector Machine (SVM) model saved to: ../models/support_vector_machine_(svm)_model.pkl
Gradient Boosting model saved to: ../models/gradient_boosting_model.pkl
XGBoost model saved to: ../models/xgboost_model.pkl
LightGBM model saved to: ../models/lightgbm_model.pkl


### 5.5 Loading Models

In [261]:
# Import necessary libraries
import joblib

# List of models and their corresponding paths
models_info = [
    ('Linear Regression', '../models/linear_regression_model.pkl'),
    ('Decision Tree', '../models/decision_tree_model.pkl'),
    ('Random Forest', '../models/random_forest_model.pkl'),
    ('K-Nearest Neighbors (KNN)', '../models/k-nearest_neighbors_(knn)_model.pkl'),
    ('Support Vector Machine (SVM)', '../models/support_vector_machine_(svm)_model.pkl'),
    ('Gradient Boosting', '../models/gradient_boosting_model.pkl'),
    ('XGBoost', '../models/xgboost_model.pkl'),
    ('LightGBM', '../models/lightgbm_model.pkl'),
    # Add other models if needed
]

# Load each model
loaded_models = {}
for model_name, model_path in models_info:
    loaded_model = joblib.load(model_path)
    loaded_models[model_name] = loaded_model
    print(f'{model_name} model loaded from: {model_path}')

# Now, you can use the loaded_models dictionary to access each model as needed
# For example, loaded_models['Linear Regression'].predict(X_test)


Linear Regression model loaded from: ../models/linear_regression_model.pkl
Decision Tree model loaded from: ../models/decision_tree_model.pkl
Random Forest model loaded from: ../models/random_forest_model.pkl
K-Nearest Neighbors (KNN) model loaded from: ../models/k-nearest_neighbors_(knn)_model.pkl
Support Vector Machine (SVM) model loaded from: ../models/support_vector_machine_(svm)_model.pkl
Gradient Boosting model loaded from: ../models/gradient_boosting_model.pkl
XGBoost model loaded from: ../models/xgboost_model.pkl
LightGBM model loaded from: ../models/lightgbm_model.pkl


In [262]:
# Load the model
loaded_model = joblib.load("../models/lightgbm_model.pkl")
loaded_model

In [263]:
# Provide new data for prediction
new_data = pd.DataFrame({
    'No. Beds': [2.0],
    'No. Baths': [2.0],
    'Area': [1000.0],
})


In [271]:
# Make predictions on the new data
predictions = loaded_model.predict(test_data)
print(f'Predicted Price for the new data: {predictions[1]}')


Predicted Price for the new data: 26074699.18101142


In [266]:
import pandas as pd

test_data = pd.DataFrame({
    'No. Beds': [2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 3.0, 2.0],
    'No. Baths': [2.0, 3.0, 2.0, 3.0, 3.0, 2.0, 4.0, 2.0, 3.0, 2.0],
    'Area': [735.0, 3640.0, 1110.0, 1450.0, 1066.0, 1185.0, 1406.0, 650.0, 1250.0, 620.0],
    'Type_n': [0, 3, 0, 0, 0, 0, 0, 0, 0, 0],
    'Region_n': [3, 1, 31, 20, 12, 40, 15, 40, 13, 35],
    'Sub-region_n': [432, 60, 125, 297, 209, 284, 310, 342, 113, 134]
})

print(test_data)


   No. Beds  No. Baths        Area  Type_n  Region_n  Sub-region_n
0  2.000000   2.000000  735.000000       0         3           432
1  3.000000   3.000000 3640.000000       3         1            60
2  3.000000   2.000000 1110.000000       0        31           125
3  3.000000   3.000000 1450.000000       0        20           297
4  3.000000   3.000000 1066.000000       0        12           209
5  3.000000   2.000000 1185.000000       0        40           284
6  3.000000   4.000000 1406.000000       0        15           310
7  2.000000   2.000000  650.000000       0        40           342
8  3.000000   3.000000 1250.000000       0        13           113
9  2.000000   2.000000  620.000000       0        35           134


In [None]:
i did write this code for deploying my house price prediction project
{
# app.py
from flask import Flask, render_template, request
import pandas as pd
import joblib

app = Flask(__name__)

# Load the model
loaded_model = joblib.load("../models/lightgbm_model.pkl")

# Render the main page
@app.route('/')
def index():
    return render_template('index.html')

# Handle the form submission
@app.route('/predict', methods=['POST'])
def predict():
    if request.method == 'POST':
        # Get the form data
        no_of_beds = float(request.form['no_of_beds'])
        no_of_baths = float(request.form['no_of_baths'])
        area = float(request.form['area'])

        # Create a DataFrame with the new data
        new_data = pd.DataFrame({
            'No. Beds': [no_of_beds],
            'No. Baths': [no_of_baths],
            'Area': [area],
            'Type_n': [0],  # Assuming Type_n is a categorical feature
            'Region_n': [3],  # Assuming Region_n is a categorical feature
            'Sub-region_n': [432]  # Assuming Sub-region_n is a categorical feature
        })

        # Make predictions using the loaded model
        predictions = loaded_model.predict(new_data)

        # Display the result on the prediction page
        return render_template('prediction.html', prediction=predictions[0])

if __name__ == '__main__':
    app.run(debug=True)


# templates/index.html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>House Price Prediction</title>
    <style>
        body {
            font-family: 'Arial', sans-serif;
            background-color: #f4f4f4;
            margin: 0;
            padding: 0;
            display: flex;
            align-items: center;
            justify-content: center;
            height: 100vh;
        }

        .container {
            background-color: #fff;
            padding: 20px;
            border-radius: 8px;
            box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
            text-align: center;
        }

        .form-group {
            margin-bottom: 15px;
        }

        label {
            display: block;
            font-size: 16px;
            margin-bottom: 5px;
        }

        input, select {
            width: 100%;
            padding: 10px;
            font-size: 16px;
            border: 1px solid #ccc;
            border-radius: 4px;
            box-sizing: border-box;
        }

        button {
            background-color: #4CAF50;
            color: white;
            padding: 15px 20px;
            font-size: 16px;
            border: none;
            border-radius: 4px;
            cursor: pointer;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>House Price Prediction</h1>
        <form action="/predict" method="post">
            <div class="form-group">
                <label for="no_of_beds">No. of Beds:</label>
                <input type="number" name="no_of_beds" required>
            </div>

            <div class="form-group">
                <label for="no_of_baths">No. of Baths:</label>
                <input type="number" name="no_of_baths" required>
            </div>

            <div class="form-group">
                <label for="area">Area:</label>
                <input type="number" name="area" required>
            </div>

            <div class="form-group">
                <label for="type_n">Type:</label>
                <select name="type_n" required>
                    <option value="0">Type 0</option>
                    <option value="1">Type 1</option>
                    <!-- Add more options as needed -->
                </select>
            </div>

            <div class="form-group">
                <label for="region_n">Region:</label>
                <select name="region_n" required>
                    <option value="0">Region 0</option>
                    <option value="1">Region 1</option>
                    <!-- Add more options as needed -->
                </select>
            </div>

            <div class="form-group">
                <label for="sub_region_n">Sub-region:</label>
                <select name="sub_region_n" required>
                    <option value="0">Sub-region 0</option>
                    <option value="1">Sub-region 1</option>
                    <!-- Add more options as needed -->
                </select>
            </div>

            <button type="submit">Predict</button>
        </form>
    </div>
</body>
</html>


# templates/prediction.html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Prediction Result</title>
    <style>
        body {
            font-family: 'Arial', sans-serif;
            background-color: #f4f4f4;
            margin: 0;
            padding: 0;
            display: flex;
            align-items: center;
            justify-content: center;
            height: 100vh;
        }

        .container {
            background-color: #fff;
            padding: 20px;
            border-radius: 8px;
            box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
            text-align: center;
        }

        h1 {
            color: #4CAF50;
        }

        .result {
            font-size: 24px;
            margin-top: 20px;
        }

        a {
            display: inline-block;
            margin-top: 20px;
            text-decoration: none;
            color: #4CAF50;
            font-weight: bold;
        }

        a:hover {
            text-decoration: underline;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>Prediction Result</h1>
        <div class="result">
            <p>The predicted house price is: {{ prediction }}</p>
        </div>
        <a href="/">Go back to the main page</a>
    </div>
</body>
</html>
}, here the problem is in Type, Region and Sub-region there showing number not exact type or name. how to solve this issue?