# Cost Function
- A cost function measures how wrong a machine learning model's predictions are. It's used to optimize the model by minimizing the "error" between predictions and actual values. Think of it as a penalty for bad predictions.

- Mean Squared Error (MSE): This is a commonly used cost function for regression problems. It calculates the average squared difference between the predicted and actual values.



![image.png](attachment:image.png)

In [14]:
import numpy as np

y_pred = np.array([1.2, 2.1, 3.5, 4.8])
y_actual = np.array([1, 2, 3, 5])
n = len(y_pred)

mse = 1/n * np.sum((y_pred - y_actual)**2)
print('MSE:', mse)



MSE: 0.08500000000000002


- Binary Cross-Entropy/Logloss: This is a commonly used cost function for binary classification problems. It measures the difference between the predicted and actual probabilities of the binary output variable.

![image.png](attachment:image.png)



In [16]:
import numpy as np

y_pred = np.array([0.2, 0.8, 0.6, 0.3])
y_actual = np.array([0, 1, 1, 0])
n = len(y_pred)

bce = -1/n * np.sum(y_actual * np.log(y_pred) + (1 - y_actual) * np.log(1 - y_pred))
print('Binary Cross-Entropy:', bce)



Binary Cross-Entropy: 0.32844691758328565


- Categorical Cross-Entropy: This is a commonly used cost function for multi-class classification problems. It measures the difference between the predicted and actual probabilities of the multiple output classes.



In [17]:
import numpy as np

y_pred = np.array([[0.1, 0.3, 0.6], [0.2, 0.7, 0.1], [0.9, 0.05, 0.05]])
y_actual = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]])
n = len(y_pred)

cce = -1/n * np.sum(np.sum(y_actual * np.log(y_pred)))
print('Categorical Cross-Entropy:', cce)

Categorical Cross-Entropy: 0.3242870277875165


- Hinge Loss: This is a cost function commonly used for training models for binary classification tasks using Support Vector Machines (SVMs). It penalizes misclassified samples and aims to maximize the margin between the decision boundary and the training samples.
- HL = max(0, 1 — y_actual * y_pred)



In [18]:
import numpy as np

y_pred = np.array([0.2, 0.8, 0.6, 0.3])
y_actual = np.array([-1, 1, 1, -1])
n = len(y_pred)

hl = np.mean(np.maximum(0, 1 - y_actual * y_pred))
print('Hinge Loss:', hl)

Hinge Loss: 0.7749999999999999


-  Kullback-Leibler Divergence: This cost function measures the difference between the predicted and actual probability distributions and is commonly used in tasks such as generative modeling.
- KL = sum(y_actual * log(y_actual / y_pred))





In [19]:
import numpy as np

y_pred = np.array([0.1, 0.5, 0.4])
y_actual = np.array([0.2, 0.3, 0.5])

kl = np.sum(y_actual * np.log(y_actual / y_pred))
print('Kullback-Leibler Divergence:', kl)


Kullback-Leibler Divergence: 0.09695352463929671


- Follow this link for more cost functions  https://medium.com/@anishnama20/understanding-cost-functions-in-machine-learning-types-and-applications-cd7d8cc4b47d#:~:text=5%20min%20read,predicted%20values%20and%20actual%20values.

- The chain rule is a formula for finding the derivative of a composite function
- It is used in Gradient descent algorithm

# Gradient Descent Algorithm in Machine Learning

- Gradient descent is the backbone of the learning process for various algorithms, including linear regression, logistic regression, support vector machines, and neural networks which serves as a fundamental optimization technique to minimize the cost function of a model by iteratively adjusting the model parameters to reduce the difference between predicted and actual values, improving the model’s performance
- It is a trail and error method to reduce the loss
- It helps to draw the best fit line




In [None]:
#This uses gradient descent internally 
import pandas as pd
from sklearn.linear_model import LinearRegression


df = pd.read_csv("C:\\Users\\shanmukh.adari\\Documents\\GitHub\\DSML_PRACTISE_FILES\\DATASETS\\home_single_prices.csv")
df.head()


model = LinearRegression()
model.fit(df[['area_sqr_ft']], df['price_lakhs'])

model.predict([[1500]])




array([95.56913434])

In [26]:
#Lets do it without using the function

x=np.array([1,2,3,4,5])
y=np.array([5,7,9,11,13])
m=0
c=0
lr=0.1
b,m=0,0

epochs=3000
for i in range(epochs):
    y_pred=m*x+c
    error=y-y_pred
    cost_func=np.mean(error**2)
            # Calculate gradients
    db = -2 * np.mean(error)  # Derivative w.r.t. intercept b
    dm = -2 * np.mean(error * x)  # Derivative w.r.t. slope m
    # Update parameters
    b -= lr * db
    m -= lr * dm

print(b)
print(m)




-2.6880399632526236e+237
-9.85614653192629e+237


  ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
  cost_func=np.mean(error**2)
