## Case 1: Housing Price Prediction

To predict house prices based on features like square footage, number of bedrooms, and location using linear regression, we'll follow a similar approach to the previous example. This time, the target variable is continuous (house price), making linear regression a suitable choice.

##### 1. Import Libraries
We need libraries for data manipulation, visualization, and modeling.

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

##### 2. Load and Preprocess the Data 
We'll use a dataset of houses with their respective features and sale prices.

In [None]:
# Load the dataset
df = pd.read_csv('housing_data.csv')

# Display the first few rows of the dataset
print(df.head())

# Check for missing values and handle them
df = df.dropna()  # Alternatively, you can use df.fillna() for filling missing values

##### 3. Feature Selection and Target Variable 
We need to define the features (independent variables) and the target variable (dependent variable, which in this case is the house price).

In [None]:
# Select the features for prediction
X = df[['square_footage', 'num_bedrooms', 'num_bathrooms', 'location_score']]

# Target variable: house price
y = df['price']

##### 4. Split the Data into Training and Test Sets 
To evaluate the performance of the model, split the data into training and test sets.

In [None]:
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

##### 5. Train the Linear Regression Model 
Now, we'll fit the linear regression model to the training data.

In [None]:
# Initialize the Linear Regression model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

##### 6. Make Predictions on the Test Set 
After training, we can use the model to predict house prices on the test set.

In [None]:
# Predict the house prices on the test data
y_pred = model.predict(X_test)

##### 7. Evaluate the Model 
We can evaluate the performance of the model using metrics like Mean Squared Error (MSE) and R-squared (R²) score to see how well the model predicts house prices.

In [None]:
# Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

# R-squared score
r2 = r2_score(y_test, y_pred)
print(f'R-squared: {r2}')

The $R^2$ score represents the proportion of the variance in the dependent variable that is predictable from the independent variables, where a score closer to 1 indicates a better fit.

##### 8. Visualize the Results 
We can plot the predicted house prices against the actual house prices for visual inspection.

In [None]:
# Plotting the predicted vs actual house prices
plt.scatter(y_test, y_pred)
plt.xlabel('Actual House Prices')
plt.ylabel('Predicted House Prices')
plt.title('Actual vs Predicted House Prices')
plt.show()

##### 9. Interpret the Model Coefficients 
We can also inspect the coefficients of the linear regression model to understand how each feature contributes to the predicted price.

In [None]:
# Display the coefficients of the model
coefficients = pd.DataFrame(model.coef_, X.columns, columns=['Coefficient'])
print(coefficients)

These coefficients tell us how much the predicted price will change with a one-unit increase in each feature, assuming all other features remain constant.

##### 10. Conclusion:
This linear regression model will predict house prices based on square footage, number of bedrooms, number of bathrooms, and location. It assumes that the relationship between these features and the house price is linear, which may not always be the case in real-world data. Depending on the complexity of the data, we may want to explore more advanced models such as polynomial regression or decision trees to improve predictions.