# Model Training and Evaluation using House Price Prediction Dataset

In [6]:
# Step 1: Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

In this step, the required libraries are imported. Pandas is imported for data manipulation, train_test_split for splitting the dataset, LinearRegression for the linear regression model, and mean_squared_error for evaluating the model's performance.

In [2]:

# Step 2: Load the dataset
data = pd.read_csv('/content/kc_house_data.csv')

The dataset is loaded using the pandas read_csv function. The dataset file 'kc_house_data.csv' should be in the same directory as the Python script.

In [3]:
# Step 3: Data preprocessing
# Drop unnecessary columns
data = data.drop(['id', 'date'], axis=1)
# Handle missing values if any
data = data.dropna()

In this step, unnecessary columns ('id' and 'date') are dropped from the dataset using the drop function. Missing values, if any, are also handled by dropping the rows containing missing values using dropna().

In [4]:
# Step 4: Split the dataset into training and testing sets
X = data.drop('price', axis=1)
y = data['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

The dataset is split into features (X --> Independant variable) and the target variable (y --> Dependant variable). The train_test_split function is then used to split X and y into training and testing sets. The test size is set to 0.2, indicating that 20% of the data will be used for testing. The random_state parameter is set to 42 to ensure reproducibility of the results.

In [7]:
# Step 5: Data Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [8]:
# Step 6: Train the model
model = Ridge(alpha=0.1)  # Ridge regression with regularization
model.fit(X_train_scaled, y_train)

In [9]:
# Step 6: Evaluate the model
y_pred_train = model.predict(X_train_scaled)  # Predict on the training set
y_pred_test = model.predict(X_test_scaled)  # Predict on the testing set

mse_train = mean_squared_error(y_train, y_pred_train)
mse_test = mean_squared_error(y_test, y_pred_test)

print("Mean Squared Error (Train):", mse_train)
print("Mean Squared Error (Test):", mse_test)

Mean Squared Error (Train): 39311882354.39527
Mean Squared Error (Test): 45173072024.31945


The trained model is used to make predictions on the testing data using the predict method. The mean squared error (MSE) is calculated by comparing the predicted values (y_pred) with the actual target values (y_test) using the mean_squared_error function. Finally, the MSE is printed to the console.