# Housing Data Analysis – Sushant Dumbre
## Introduction
**Purpose:** Analyze a housing dataset to extract insights and answer key questions.

**Dataset:** Contains information about houses, prices, and other attributes.

**Objective:** Understand patterns in the data, explore relationships, and summarize insights.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
# Replace 'housing.csv' with your dataset file
df = pd.read_csv('housing.csv')

# Display first 5 rows
df.head()

# Check dataset shape and info
print('Shape of dataset:', df.shape)
df.info()

In [None]:
# Check for missing values
print('Missing values:\n', df.isnull().sum())

# Fill missing numeric values with median
numeric_cols = df.select_dtypes(include=np.number).columns
for col in numeric_cols:
    df[col].fillna(df[col].median(), inplace=True)

# Fill missing categorical values with mode
categorical_cols = df.select_dtypes(include='object').columns
for col in categorical_cols:
    df[col].fillna(df[col].mode()[0], inplace=True)

# Check cleaned data
df.info()

In [None]:
# Summary statistics for numeric columns
df.describe()

# Mode for categorical columns
for col in categorical_cols:
    print(f'Mode of {col}: {df[col].mode()[0]}')

# Example: Check min, max, median
print('Median house price:', df['Price'].median())
print('Min house price:', df['Price'].min())
print('Max house price:', df['Price'].max())

In [None]:
# Histogram of house prices
plt.figure(figsize=(8,5))
sns.histplot(df['Price'], bins=30, kde=True)
plt.title('Distribution of House Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

# Boxplot to check outliers
plt.figure(figsize=(8,5))
sns.boxplot(x=df['Price'])
plt.title('Boxplot of House Prices')
plt.show()

# Scatterplot: Price vs Square Footage
plt.figure(figsize=(8,5))
sns.scatterplot(x='SqFt', y='Price', data=df)
plt.title('Price vs Square Footage')
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.show()

# Countplot for a categorical feature (example: Bedrooms)
plt.figure(figsize=(8,5))
sns.countplot(x='Bedrooms', data=df)
plt.title('Count of Houses by Bedrooms')
plt.show()

# Heatmap for correlation
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Analysis / Insights
- Observed trends between house price and square footage.
- Houses with more bedrooms tend to have higher prices.
- Correlation heatmap shows which features are most related to price.
- Any other trends you notice should be described here.

## Conclusion
- Summarize the main insights from your analysis.
- Highlight key patterns or trends.
- Optional recommendations based on analysis (e.g., which type of houses are in high demand).

In [None]:
# df.to_csv('cleaned_housing.csv', index=False)