## House Price Prediction Notebook

Problem Statement:

In this project, we aim to predict house prices based on the size of the lot. Accurate predictions of house prices can assist real estate agents, buyers, and sellers in making informed decisions.



Objective

The goal is to build a machine learning model that accurately predicts house prices based on the `LotArea` feature.


 ## Data Loading and Exploration


Loading the Dataset:


In [1]:
import pandas as pd

# Load the dataset
data = pd.read_csv('house_price_by_area.csv')

# Preview the first few rows
data.head()


FileNotFoundError: [Errno 2] No such file or directory: 'house_price_by_area.csv'

Data Overview:

In [None]:
# Summary statistics
data.describe()

# Check for missing values
data.isnull().sum()


## Exploratory Data Analysis (EDA)
Visualizing the Relationship between LotArea and SalePrice:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Scatter plot
plt.figure(figsize=(8, 6))
sns.scatterplot(x='LotArea', y='SalePrice', data=data)
plt.title('Scatter Plot of LotArea vs SalePrice')
plt.show()


Checking for Outliers:

In [None]:
# Boxplot to check for outliers in LotArea and SalePrice
plt.figure(figsize=(12, 6))
sns.boxplot(data=data[['LotArea', 'SalePrice']])
plt.show()


## Data Preprocessing
Normalization/Scaling:

In [None]:
from sklearn.preprocessing import StandardScaler

# Initialize the scaler
scaler = StandardScaler()

# Select the features for scaling
X = data[['LotArea']]
y = data['SalePrice']

# Scale the features
X_scaled = scaler.fit_transform(X)


Train-Test Split:

In [None]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


## Model Building
Linear Regression Model

In [None]:
from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Train the model on the training data
model.fit(X_train, y_train)


## Model Evaluation
Making Predictions:

In [None]:
# Predicting on the test set
y_pred = model.predict(X_test)


Evaluating the Model:

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

# Calculate MSE and R-squared
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error (MSE): {mse}')
print(f'R-squared: {r2}')


Visualizing Predictions vs Actual Values:

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
plt.xlabel('Actual SalePrice')
plt.ylabel('Predicted SalePrice')
plt.title('Actual vs Predicted SalePrice')
plt.show()
