Predicting Gurgaon City House Prices
Why Are We Building This?
Gurgaon, a rapidly growing city in India, has seen a sharp rise in real estate development over the past decade. With its proximity to Delhi, booming IT hubs, and modern infrastructure, Gurgaon has become a major attraction for both homebuyers and investors. However, the real estate market here is highly dynamic and often difficult to assess without proper data-driven tools.

We’re building a Gurgaon house price prediction model to help:

Understand how factors like location, size, number of rooms, and amenities affect property prices in Gurgaon.
Assist buyers in identifying fair prices based on historical trends.
Help sellers estimate an appropriate asking price.
Empower real estate agents and platforms to improve recommendations and negotiations.
How Will We Build It?
While we don’t have access to a large, clean dataset of house prices in Gurgaon right now, we will use a well-known and cleaned dataset—the California housing dataset—as a proxy. This will allow us to build, test, and evaluate a working model with real-world variables like:

Median income of the area
Proximity to the city center
Number of rooms
Latitude and longitude
Population density
We’ll treat this as a simulation: suppose the California data is Gurgaon data, and suppose we are building this model for a neighborhood where both you and I live or work nearby.

Once the model is developed and understood, we can later adapt the same approach to real Gurgaon data when available, using the same techniques and logic.

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

data = pd.read_csv("housing")


In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split

# Load the dataset
data = pd.read_csv("housing.csv")  # Make sure the file name has .csv extension

# Split data into features (X) and target (y)
X = data.iloc[:, :-1]  # all columns except the last one
y = data.iloc[:, -1]   # only the last column as target

# Split into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Random Forest Classifier model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Print results
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)


Accuracy: 0.9740794573643411

Confusion Matrix:
 [[1771    8    0    4   12]
 [  24 1300    0    0    0]
 [   0    0    0    0    1]
 [   2    0    0  430    4]
 [  42    0    0   10  520]]

Classification Report:
               precision    recall  f1-score   support

   <1H OCEAN       0.96      0.99      0.97      1795
      INLAND       0.99      0.98      0.99      1324
      ISLAND       0.00      0.00      0.00         1
    NEAR BAY       0.97      0.99      0.98       436
  NEAR OCEAN       0.97      0.91      0.94       572

    accuracy                           0.97      4128
   macro avg       0.78      0.77      0.78      4128
weighted avg       0.97      0.97      0.97      4128



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [3]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Load the dataset
data = pd.read_csv("housing.csv")  # Ensure the filename and path are correct

# Split the dataset into features and target
X = data.iloc[:, :-1]  # All columns except the last one (features)
y = data.iloc[:, -1]   # Last column (target: price or similar)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Random Forest Regressor
model = RandomForestRegressor()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print evaluation metrics
print("Mean Squared Error (MSE):", mse)
print("R-squared (R² Score):", r2)


ValueError: could not convert string to float: 'NEAR OCEAN'