## Option 1: Linear Regression using the Normal Equation

This option implement linear regression using the Normal Equation to find the best fit line for the synthetic data.

In [2]:
import numpy as np

# Set the random seed for reproducibility
np.random.seed(0)

# Generate synthetic square feet data (feature)
square_feet = np.random.uniform(500, 4000, 100)  # random values between 500 and 4000

# Define the true relationship (e.g., price per square foot is 300, with a base price of 50,000)
base_price = 50000
price_per_sqft = 300

# Generate the target values with some noise
noise = np.random.normal(0, 20000, 100)  # adding some noise
house_price = base_price + price_per_sqft * square_feet + noise

In [5]:
# Add bias term to the feature matrix
X = np.c_[np.ones(square_feet.shape[0]), square_feet]  # Adding a column of ones for the intercept term
y = house_price.reshape(-1, 1)  # Target variable in column vector form

# Calculate the parameters using the Normal Equation
theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

# Print the parameters
print(f"Intercept (theta_0): {theta[0][0]}")
print(f"Slope (theta_1): {theta[1][0]}")


Intercept (theta_0): 54623.20720208263
Slope (theta_1): 299.6396286937254


In [6]:
def predict(square_feet):
    return theta[0][0] + theta[1][0] * square_feet

# Example prediction
sample_square_feet = 2000
predicted_price = predict(sample_square_feet)
print(f"Predicted house price for {sample_square_feet} square feet: ${predicted_price:.2f}")


Predicted house price for 2000 square feet: $653902.46


## Option 2: Linear Regression using Gradient Descent

This version implements linear regression using gradient descent, where the parameters `theta_0` (intercept) and `theta_1` (slope) are iteratively updated. If the learning rate is too small, convergence may take longer; if it’s too high, the model may not converge at all. Adjust learning_rate and iterations to find a good balance.

In [7]:
import numpy as np

# Set the random seed for reproducibility
np.random.seed(0)

# Generate synthetic square feet data (feature)
square_feet = np.random.uniform(500, 4000, 100)  # random values between 500 and 4000

# Define the true relationship (e.g., price per square foot is 300, with a base price of 50,000)
base_price = 50000
price_per_sqft = 300

# Generate the target values with some noise
noise = np.random.normal(0, 20000, 100)  # adding some noise
house_price = base_price + price_per_sqft * square_feet + noise


For gradient descent, we’ll initialize the parameters and iteratively update them based on the gradient of the loss function.

In [8]:
# Hyperparameters for gradient descent
learning_rate = 0.0000001
iterations = 1000

# Initialize parameters
theta_0 = 0  # intercept
theta_1 = 0  # slope

# Convert the data to match the gradient descent approach
X = square_feet
y = house_price
m = len(y)  # number of data points

# Gradient descent loop
for _ in range(iterations):
    # Compute predictions
    y_pred = theta_0 + theta_1 * X

    # Calculate the gradients
    d_theta_0 = (1 / m) * np.sum(y_pred - y)
    d_theta_1 = (1 / m) * np.sum((y_pred - y) * X)

    # Update parameters
    theta_0 -= learning_rate * d_theta_0
    theta_1 -= learning_rate * d_theta_1

# Print the final parameters
print(f"Intercept (theta_0): {theta_0}")
print(f"Slope (theta_1): {theta_1}")


Intercept (theta_0): 1.104363188714821
Slope (theta_1): 320.4297820144939


With the parameters learned from gradient descent, we can predict the house price for a new `square_feet` value.

In [9]:
def predict(square_feet):
    return theta_0 + theta_1 * square_feet

# Example prediction
sample_square_feet = 2000
predicted_price = predict(sample_square_feet)
print(f"Predicted house price for {sample_square_feet} square feet: ${predicted_price:.2f}")


Predicted house price for 2000 square feet: $640860.67
