# Exploratory Data Analysis for House Price Prediction

This notebook performs exploratory data analysis (EDA) on the housing dataset to understand its features, distributions, and relationships, guiding preprocessing and linear regression modeling.

##  Import Libraries
```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Shape, data types, missing values, and summary statistics
print("Dataset Shape:", df.shape)
print("\nData Types:\n", df.dtypes)
print("\nMissing Values:\n", df.isnull().sum())
print("\nSummary Statistics:\n", df.describe())


# Distribution of house prices
plt.figure(figsize=(8, 6))
sns.histplot(df['price'], kde=True, bins=30)
plt.title('Distribution of House Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

# Scatter plot: Price vs. Number of Rooms (RM in Boston Housing, adjust if needed)
plt.figure(figsize=(8, 6))
plt.scatter(df['RM'], df['price'], alpha=0.5)
plt.title('Price vs. Number of Rooms')
plt.xlabel('Number of Rooms (RM)')
plt.ylabel('Price')
plt.show()

# Scatter plot: Price vs. Lower Status Population (LSTAT, adjust if needed)
plt.figure(figsize=(8, 6))
plt.scatter(df['LSTAT'], df['price'], alpha=0.5)
plt.title('Price vs. Lower Status Population (LSTAT)')
plt.xlabel('Lower Status Population (LSTAT)')
plt.ylabel('Price')
plt.show()

# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

# Skewness of numerical features
print("\nSkewness of Features:\n", df.skew())
