# Boston Housing Analysis Visualizations

This notebook contains visualizations for exploring relationships between various factors and housing prices in the Boston Housing dataset. These visualizations provide insights that can be valuable for real estate professionals, policymakers, and data analysts in understanding the determinants of housing prices in Boston.

In [None]:

# Import necessary libraries for data analysis and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style="whitegrid")

# Load the Boston Housing dataset
data = pd.read_csv('Data/HousingData.csv')
data.head()


## Correlation Heatmap

This heatmap displays the relationships between all variables, with `MEDV` (median home price) highlighted. Strong correlations can indicate potential predictors of housing prices, such as the number of rooms (`RM`) and percentage of lower-status population (`LSTAT`).

In [None]:

# Plot the correlation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap of Boston Housing Features')
plt.show()


## Distribution of Home Prices (Histogram)

This histogram provides an overview of home price distribution (`MEDV`), showing the spread of property values across Boston. The histogram helps identify if prices are skewed or normally distributed, important for predicting price ranges.

In [None]:

# Plot the distribution of home prices
plt.figure(figsize=(10, 6))
sns.histplot(data['MEDV'], bins=30, kde=True)
plt.title('Distribution of Home Prices (MEDV)')
plt.xlabel('Median Value of Homes ($1000s)')
plt.ylabel('Frequency')
plt.show()


## Scatter Plot: Number of Rooms (RM) vs. Price (MEDV)

This scatter plot visualizes the relationship between `RM` (average number of rooms per dwelling) and `MEDV` (price). A positive correlation is expected, where homes with more rooms tend to have higher prices, highlighting size as a key factor in property valuation.

In [None]:

# Scatter plot to show relationship between number of rooms and price
plt.figure(figsize=(10, 6))
sns.scatterplot(x='RM', y='MEDV', data=data)
plt.title('Relationship Between Number of Rooms and Price')
plt.xlabel('Average Number of Rooms')
plt.ylabel('Median Price ($1000s)')
plt.show()


## Box Plot: Prices by River Proximity (CHAS)

This box plot compares prices for homes located near (`CHAS=1`) and away from (`CHAS=0`) the Charles River. Proximity to the river can be a premium feature, adding value to the property.

In [None]:

# Box plot to show effect of river proximity on home prices
plt.figure(figsize=(8, 6))
sns.boxplot(x='CHAS', y='MEDV', data=data)
plt.title('Housing Prices by River Proximity')
plt.xlabel('Charles River Proximity (1 = Near, 0 = Not Near)')
plt.ylabel('Median Price ($1000s)')
plt.show()


## Scatter Plot: Crime Rate (CRIM) vs. Price (MEDV)

This plot examines the relationship between `CRIM` (crime rate) and `MEDV` (price). Generally, higher crime rates are associated with lower home values, reflecting how safety concerns affect property demand and pricing.

In [None]:

# Scatter plot for crime rate versus home price
plt.figure(figsize=(10, 6))
sns.scatterplot(x='CRIM', y='MEDV', data=data)
plt.title('Relationship Between Crime Rate and Home Price')
plt.xlabel('Crime Rate per Capita')
plt.ylabel('Median Price ($1000s)')
plt.show()


## Bar Plot: Average Price by Pupil-Teacher Ratio Groupings

Grouped by `PTRATIO` (pupil-teacher ratio), this bar plot shows average home price by educational quality. Areas with better pupil-teacher ratios may command higher prices, reflecting the importance of education quality to families and buyers.

In [None]:

# Define ranges for PTRATIO and plot average price by group
data['PTRATIO_Group'] = pd.cut(data['PTRATIO'], bins=[0, 15, 20, 30], labels=['Low', 'Medium', 'High'])

plt.figure(figsize=(8, 6))
sns.barplot(x='PTRATIO_Group', y='MEDV', data=data, estimator=np.mean, ci=None)
plt.title('Average Home Price by Pupil-Teacher Ratio Group')
plt.xlabel('Pupil-Teacher Ratio Group')
plt.ylabel('Average Median Price ($1000s)')
plt.show()


## Box Plot: Socioeconomic Status (LSTAT) Impact on Price

Dividing `LSTAT` (percentage of lower-status population) into groups shows how different socioeconomic brackets affect property values. Higher `LSTAT` values typically correlate with lower property values.

In [None]:

# Define LSTAT groups and plot their impact on price
data['LSTAT_Group'] = pd.cut(data['LSTAT'], bins=[0, 10, 20, 30, 40], labels=['Very Low', 'Low', 'Medium', 'High'])

plt.figure(figsize=(10, 6))
sns.boxplot(x='LSTAT_Group', y='MEDV', data=data)
plt.title('Home Price by Socioeconomic Status Group (LSTAT)')
plt.xlabel('Socioeconomic Status Group')
plt.ylabel('Median Price ($1000s)')
plt.show()
