This project analyzes the Boston Housing dataset using Python. It includes visualizations, descriptive statistics, and statistical tests to explore factors that affect housing prices in the Boston area.
The dataset contains information on crime rates, property taxes, number of rooms, proximity to the Charles River, and other features for different housing tracts.
- Explore the distribution of housing values
- Visualize relationships between housing features
- Test whether certain factors significantly affect house prices
- Apply T-tests, ANOVA, correlation, and linear regression
- Practice working with real-world data in a data science context
The dataset contains the following variables:
| Variable | Description |
|---|---|
| CRIM | Per capita crime rate by town |
| ZN | Proportion of residential land zoned for lots over 25,000 sq.ft. |
| INDUS | Proportion of non-retail business acres per town |
| CHAS | Charles River dummy variable (1 if tract bounds river; 0 otherwise) |
| NOX | Nitric oxides concentration (parts per 10 million) |
| RM | Average number of rooms per dwelling |
| AGE | Proportion of owner-occupied units built prior to 1940 |
| DIS | Weighted distances to five Boston employment centers |
| RAD | Index of accessibility to radial highways |
| TAX | Full-value property tax rate per $10,000 |
| PTRATIO | Pupil-teacher ratio by town |
| LSTAT | % lower status of the population |
| MEDV | Median value of owner-occupied homes in $1000's |
The notebook includes the following:
- Boxplot of median home values
- Bar chart showing how many houses border the Charles River
- Grouped boxplots comparing home values across age brackets
- Scatter plot to visualize the relationship between industrial land use and nitric oxide pollution
- Histogram of pupil-teacher ratio across towns
- T-Test: Compared house values for homes next to the Charles River vs not
- ANOVA: Compared house values across different age group categories
- Pearson Correlation: Assessed relationship between NOX and INDUS
- Linear Regression: Evaluated the effect of distance to employment centers on house prices
- Python
- Pandas
- Seaborn
- Matplotlib
- Scipy
- Statsmodels