# Boston Housing Analysis

This project analyzes the Boston Housing dataset to determine factors affecting housing prices. The analysis includes data exploration, hypothesis testing, and regression modeling to derive meaningful insights. This is relevant for understanding real estate markets and identifying key indicators of property value.


## Import Libraries and Load Dataset

We start by importing essential libraries and loading the dataset.


In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm

# Set visualization style
sns.set(style="whitegrid")

# Load the Boston Housing dataset
data = pd.read_csv('Data/HousingData.csv')

# Display the first few rows
data.head()


## Data Overview

Exploring the structure and summary statistics of the dataset to understand its characteristics.


In [None]:
# Summary statistics and data types
data.describe()


## Correlation Analysis

A correlation heatmap shows the relationships between variables, helping us identify potential predictors for housing prices.


In [None]:
plt.figure(figsize=(12, 8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap of Boston Housing Features')
plt.show()


## Hypothesis Testing

We will test if there is a significant correlation between `CRIM` (crime rate) and `MEDV` (median house price). This relationship could indicate how much impact neighborhood safety has on property values.


In [None]:
# Hypothesis test: Correlation between CRIM (crime rate) and MEDV (price)
crim_price_corr, p_value = stats.pearsonr(data['CRIM'], data['MEDV'])
crim_price_corr, p_value


## Simple Linear Regression

Using `RM` (average number of rooms) to predict `MEDV` (median house price). The number of rooms is often a strong predictor of home value.


In [None]:
# Simple linear regression
X = data['RM']
y = data['MEDV']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
model.summary()


## Multiple Linear Regression

Using multiple predictors to predict `MEDV`, including `RM`, `LSTAT` (percent of lower status population), and `PTRATIO` (pupil-teacher ratio).


In [None]:
# Multiple regression
X = data[['RM', 'LSTAT', 'PTRATIO']]
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
model.summary()


## Visualizations

### 1. Rooms vs. Price


In [None]:
# Scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x='RM', y='MEDV', data=data)
plt.title('Relationship Between Number of Rooms and Price')
plt.xlabel('Average Number of Rooms')
plt.ylabel('Median Price ($1000s)')
plt.show()


### 2. Prices by River Proximity


In [None]:
# Boxplot
plt.figure(figsize=(8, 6))
sns.boxplot(x='CHAS', y='MEDV', data=data)
plt.title('Housing Prices by River Proximity')
plt.xlabel('Charles River Proximity (1 = Yes, 0 = No)')
plt.ylabel('Median Price ($1000s)')
plt.show()


## Conclusion

This analysis of the Boston Housing dataset reveals several important factors that influence housing prices. Here are the main findings:

1. **Number of Rooms (RM)**: 
   - There’s a strong positive correlation between the number of rooms in a house and its price. Our simple linear regression confirmed that `RM` significantly impacts `MEDV`, making it a key predictor of home value.
   - This suggests that buyers are willing to pay more for larger homes with more rooms, a trend that could guide real estate developers and investors when assessing potential property improvements.

2. **Crime Rate (CRIM)**:
   - Higher crime rates are associated with lower housing prices, as shown by the correlation and hypothesis test between `CRIM` and `MEDV`. Neighborhood safety appears to be a crucial factor in determining property value.
   - For urban planners and policymakers, this insight underscores the importance of reducing crime in order to stabilize or boost property values.

3. **Socioeconomic Status (LSTAT)**:
   - The percentage of lower status population (`LSTAT`) has a negative impact on housing prices, as seen in our multiple regression model. This is likely due to buyers’ perceptions of neighborhood quality, which includes socioeconomic status.
   - Real estate professionals can use this metric to identify undervalued areas that may offer investment opportunities, particularly if the area shows signs of socioeconomic improvement.

4. **Education Quality (PTRATIO)**:
   - The pupil-teacher ratio (`PTRATIO`) was another significant predictor of housing prices. Lower ratios, which are associated with better-quality schools, corresponded with higher housing prices.
   - This finding is relevant for families prioritizing educational quality and provides insights for real estate marketers who could emphasize proximity to quality schools in listings.

5. **Proximity to Charles River (CHAS)**:
   - Proximity to the Charles River has a minor positive effect on housing prices. However, since this feature only applies to certain properties, it’s less impactful than other factors like room count and crime rate.

### Implications

The findings from this analysis have several practical implications for different stakeholders:
- **Real Estate Investors**: Can use predictors like `RM`, `LSTAT`, and `CRIM` to assess and compare property values.
- **Urban Planners**: Addressing high crime rates and improving local education can contribute to higher property values.
- **Homebuyers**: Families may prioritize homes with lower `PTRATIO` for quality schooling, and those near the Charles River for scenic advantages.

Overall, this analysis provides a comprehensive look into factors that shape the real estate landscape in Boston, with broader applications to other urban areas with similar characteristics.
