Create a Linear Regression Model using Python/R to predict home prices

using Boston Housing Dataset. Find the performance of your model.

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston housing dataset
df = pd.read_csv("Boston.csv")

# Check if there are any missing values
missing_values = df.isnull().sum()
if missing_values.any():
    print("Missing values exist in the dataset.")
    # Handle missing values
    df.dropna(inplace=True)
else:
    print("No missing values in the dataset.")

# Split the data into features (X) and the target variable (y)
X = df.drop(columns=['medv'])
y = df['medv']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model
model = LinearRegression()

# Train the model on the training data
model.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = model.predict(X_test)

# Calculate the model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared Score:", r2)

Missing values exist in the dataset.
Mean Squared Error: 20.687720473048493
R-squared Score: 0.7200277678580315


In [None]:
#The Mean Squared Error (MSE) of approximately 20.69 suggests that, on average, the model's predictions are off by around 20.69 units squared. This value gives you an idea of how well the model fits the data, with lower values indicating better performance.

In [None]:
#The R-squared Score of approximately 0.72 suggests that the model explains about 72.00% of the variance in the target variable. This value ranges between 0 and 1, with higher values indicating better fit. In this case, 0.72 indicates a moderately good fit of the model to the data.