# Housing Prices Analysis in Boston, MA

## Introduction
This notebook analyzes the Boston housing dataset, focusing on various factors that influence the median value of owner-occupied homes (MEDV). We will explore visualizations to gain insights into the dataset variables.

## Dataset Description
The dataset includes the following variables:
- **CRIM**: Per capita crime rate by town
- **ZN**: Proportion of residential land zoned for lots over 25,000 sq.ft.
- **INDUS**: Proportion of non-retail business acres per town.
- **CHAS**: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- **NOX**: Nitric oxides concentration (parts per 10 million)
- **RM**: Average number of rooms per dwelling
- **AGE**: Proportion of owner-occupied units built prior to 1940
- **DIS**: Weighted distances to five Boston employment centres
- **RAD**: Index of accessibility to radial highways
- **TAX**: Full-value property-tax rate per $10,000
- **PTRATIO**: Pupil-teacher ratio by town
- **LSTAT**: % lower status of the population
- **MEDV**: Median value of owner-occupied homes in $1000's

## Visualizations

### 1. Boxplot of Median Value of Owner-Occupied Homes (MEDV)
```python
import matplotlib.pyplot as plt
import seaborn as sns

# Create a boxplot for MEDV
plt.figure(figsize=(10, 6))
sns.boxplot(y=df['medv'])
plt.title('Boxplot of Median Value of Owner-Occupied Homes (MEDV)')
plt.ylabel('Median Value of Homes ($1000s)')
plt.grid(axis='y')
plt.show()
```

### Findings:
The boxplot shows the distribution of median home values, highlighting the median, quartiles, and any potential outliers. If there are many outliers, it could suggest significant variability in home values within the dataset.

### 2. Bar Plot for the Charles River Variable (CHAS)
```python
# Create a bar plot for the Charles River variable
plt.figure(figsize=(10, 6))
sns.countplot(x='chas', data=df)
plt.title('Count of Houses Bounded by the Charles River')
plt.xlabel('Charles River (1 = Yes, 0 = No)')
plt.ylabel('Count of Houses')
plt.xticks([0, 1], ['Not Bounded', 'Bounded'])
plt.grid(axis='y')
plt.show()
```

### Findings:
The bar plot displays the number of houses either bounded by the Charles River or not, helping to understand the proportion of houses near the river, which may influence housing prices.

### 3. Boxplot for MEDV vs. AGE Groups
```python
# Discretize the AGE variable into three groups
bins = [0, 35, 70, df['age'].max()]
labels = ['35 years and younger', '35 to 70 years', '70 years and older']
df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)

# Create a boxplot for MEDV vs. age groups
plt.figure(figsize=(12, 6))
sns.boxplot(x='age_group', y='medv', data=df)
plt.title('Boxplot of Median Value of Homes by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Median Value of Homes ($1000s)')
plt.grid(axis='y')
plt.show()
```

### Findings:
This boxplot illustrates how median home values vary across different age groups. You can observe whether older homes tend to have lower or higher values compared to newer homes, indicating how age affects property value.

### 4. Scatter Plot for Nitric Oxide Concentrations vs. Non-Retail Business Acres
```python
# Create a scatter plot for NOx vs. non-retail business acres
plt.figure(figsize=(10, 6))
sns.scatterplot(x='nox', y='indus', data=df)
plt.title('Scatter Plot of Nitric Oxide Concentrations vs. Non-Retail Business Acres')
plt.xlabel('Nitric Oxide Concentration (parts per 10 million)')
plt.ylabel('Proportion of Non-Retail Business Acres')
plt.grid()
plt.show()
```

### Findings:
The scatter plot reveals the relationship between nitric oxide concentrations and the proportion of non-retail business acres. A positive correlation might suggest that areas with higher business density correlate with higher pollution levels.

### 5. Histogram for the Pupil-Teacher Ratio Variable (PTRATIO)
```python
# Create a histogram for the pupil-teacher ratio
plt.figure(figsize=(10, 6))
sns.histplot(df['ptratio'], bins=30, kde=True)
plt.title('Histogram of Pupil-Teacher Ratio')
plt.xlabel('Pupil-Teacher Ratio')
plt.ylabel('Frequency')
plt.grid(axis='y')
plt.show()
```

### Findings:
The histogram shows the distribution of pupil-teacher ratios. A right-skewed distribution could indicate that many towns have a low pupil-teacher ratio, which is generally favorable for education, potentially impacting housing prices.

## Conclusion
The visualizations provide insights into how various factors influence the median value of owner-occupied homes in Boston. Understanding these relationships can assist in making informed decisions regarding housing policies and urban planning.
