Sure! I'd be happy to explain the cost function in linear regression for beginners in machine learning.

In linear regression, the cost function is a measure of how well the model fits the training data. It quantifies the difference between the predicted values of the model and the actual values in the training dataset. The goal of linear regression is to minimize this cost function, which means finding the best-fitting line or hyperplane that represents the relationship between the input features and the target variable.

In the case of simple linear regression, where there is only one input feature, the cost function is often represented using the mean squared error (MSE) formula. The MSE is calculated by taking the average of the squared differences between the predicted values (obtained from the linear regression equation) and the actual values in the training data. Mathematically, it can be expressed as:

MSE = (1/n) * Σ(yᵢ - ŷᵢ)²

where n is the number of training examples, yᵢ represents the actual value of the target variable for the i-th example, and ŷᵢ represents the predicted value given by the linear regression equation.

The cost function essentially penalizes the model for making large errors in its predictions. By minimizing the MSE, the model learns to adjust its parameters (slope and intercept in the case of simple linear regression) in order to obtain the best-fitting line that minimizes the overall squared difference between the predicted and actual values.

To find the optimal parameters that minimize the cost function, an algorithm called gradient descent is commonly used in linear regression. Gradient descent iteratively updates the parameters by taking steps proportional to the negative gradient of the cost function, gradually converging towards the minimum.

By minimizing the cost function using gradient descent, the linear regression model learns the optimal parameters that best fit the training data, allowing it to make predictions on new, unseen data.

It's worth noting that the cost function and optimization techniques may vary depending on the type of regression problem and the specific requirements of the model. However, the concept of minimizing the cost function to find the best fit remains fundamental in many regression algorithms, including linear regression.

In [1]:
import numpy as np

def compute_cost(X, y, theta):
    """
    Compute the cost function for linear regression.

    Arguments:
    X -- input features, represented as a matrix of shape (m, n+1)
    y -- target variable, represented as a vector of shape (m, 1)
    theta -- parameters of the linear regression model, represented as a vector of shape (n+1, 1)

    Returns:
    cost -- the computed cost value
    """

    m = len(y)  # number of training examples

    # Compute predictions
    predictions = np.dot(X, theta)

    # Compute squared differences
    squared_diff = np.square(predictions - y)

    # Compute cost
    cost = (1 / (2 * m)) * np.sum(squared_diff)

    return cost


In [3]:
# Example usage
X = np.array([[1, 2], [1, 3], [1, 4]])  # input features
y = np.array([[3], [4], [5]])  # target variable
theta = np.array([[1], [2]])  # parameters

cost = compute_cost(X, y, theta)
print("Cost:", cost)


Cost: 4.833333333333333


In [4]:
X = np.array([[1, 2], [1, 3], [1, 4]])  # input features
y = np.array([[3], [4], [5]])  # target variable
theta = np.array([[1], [2]])  # parameters

cost = compute_cost(X, y, theta)
print("Cost:", cost)

Cost: 4.833333333333333


In [5]:
X = np.array([[1, 1], [1, 2], [1, 3], [1, 4]])  # input features
y = np.array([[2], [3], [4], [5]])  # target variable
theta = np.array([[1], [1]])  # parameters

cost = compute_cost(X, y, theta)
print("Cost:", cost)

Cost: 0.0
