# Predicting House Prices
Welcome to the first project in the ZeroToHeroML series! In this project, we will dive into a classic machine learning problem - predicting house prices. We'll be using the Boston Housing dataset, which is available through the `sklearn.datasets` library, making it easy to access and use for our purposes.

## Objective
Our goal is to build a simple linear regression model to predict the prices of houses based on various features like the number of rooms, age, distance to employment centers, and more.

## The Boston Housing Dataset
The Boston Housing dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It has 506 entries with 13 features that might help us predict the median value of owner-occupied homes. Let's get started!

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

## Data Exploration
Before we dive into building our model, let's take a moment to explore our dataset. We'll look at the first few rows, the distribution of our target variable, and a correlation matrix to understand how different features relate to each other.

In [None]:
# Display the first few rows of the dataset
df.head()

In [None]:
# Descriptive statistics for the dataset
df.describe()

In [None]:
# Plot the distribution of the target variable - MEDV
plt.hist(df['MEDV'], bins=30)
plt.xlabel('House Prices ($1000s)')
plt.ylabel('Count')
plt.show()

In [None]:
# Correlation matrix
import seaborn as sns

plt.figure(figsize=(12, 10))
correlation_matrix = df.corr().round(2)
sns.heatmap(data=correlation_matrix, annot=True)

## Preprocessing the Data
Now that we have a basic understanding of our dataset, let's prepare our data for modeling. This involves splitting our dataset into features (X) and the target variable (y), and then into training and testing sets.

In [None]:
# Splitting the dataset into features and target variable
X = df.drop('MEDV', axis=1)
y = df['MEDV']

# Splitting into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Building and Training the Model
With our data prepared, we can now build and train our linear regression model.

In [None]:
# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

## Model Evaluation
After training our model, it's important to evaluate its performance using the test set. We'll use the mean squared error (MSE) and the coefficient of determination (R^2) as our metrics.

In [None]:
# Predicting the test set results
y_pred = model.predict(X_test)

# Calculating the model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse:.2f}')
print(f'Coefficient of Determination (R^2): {r2:.2f}')

## Conclusion
Congratulations on completing the first project! You've taken your first steps into machine learning by building and evaluating a simple linear regression model to predict house prices. As you progress through ZeroToHeroML, you'll encounter more complex models and techniques. Keep experimenting and learning!