# Cost Function

- A cost function is an important parameter that determines how well a machine learning model porforms for a given dataset
- cost function is a measure of how wrong the model is in estimating the relationship between X(input) & Y(output) Parameter.
- This function helps us in getting the best fit line, plane, etc.

### What is a Cost Function?
- A Cost Function (also known as Loss Function or Error Function) is a fundamental concept in machine learning and deep learning. 
- It quantifies the error between the predicted values from a model and the actual target values. 
- The goal of training a model is to minimize this cost function — i.e., to make predictions as close as possible to the true outputs.
- A cost function is a function that maps an event or values of one or more variables onto a real number representing some "cost" associated with the event.
- The cost function is a mathematical function used to quantify the error (difference) between the predicted values 𝑦^ and the actual target values y.

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

### Why Do We Need a Cost Function?
- To measure how well (or poorly) our model is performing.
- To guide the optimization process (e.g., Gradient Descent).
- To evaluate and compare different models.
- To train models by minimizing this cost.

- we use cost function to reduce our error like if the value of error is more then the model will not perform well.

### Why Minimize Cost?
- Minimizing the cost function during training ensures the model parameters θ(Theta) lead to better predictions.

![image.png](attachment:image.png)

---

## Different Cost Functions 

### A. For Regression
- regression models are used to make a prediction for the continuous variables.
Types: 
- MSE (Mean Square Error)
- RMSE (Root Mean Square Error)
- MAE(Mean Absolute Error)
- R^2 Accuracy

##### 1. MSE (Mean Square Error)
- MSE is the average of the squared differences between actual and predicted values.
- Mean Square Error is the mean square difference between the actual and predicted values. MSE penalizes high errors caused by outliers by squaring the errors.
- It is also known as L2 Loss.
- Here we take the square of difference between the real values and predicted values and then we devide it by number of data points.
- It lacks with outliers on data.
  
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

Key Properties:
- Always non-negative (squared error)
- Sensitive to outliers because large errors are squared.
- Units are squared of the target variable (e.g., if predicting price in ₹, MSE is in ₹²).

``` Python Code
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)
```




##### 2. MAE (Mean Absolute Error)
- MAE is the average of the absolute differences between actual and predicted values.
- Mean Absolute Error (MAE) is the mean absolute difference between the actual values and the predicted values.
- MEA is more robust to outliers. The insensitivity to outliers is because it does not penalize high errors caused by outliers.

![image-3.png](attachment:image-3.png)
![image-4.png](attachment:image-4.png)

Key Properties:
- Linear error metric.
- Robust to outliers — doesn’t square large errors.
- Units are same as the target variable.

``` Python Code
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)
```




##### 3. Root Mean Square Error 
- RMSE is the square root of the average of squared differences between predicted and actual values. 
- It measures how much the predictions deviate from the actual values on average.
- Root Mean Square Error (RMSE) is the root squared mean of the difference between actual and predicated values.
- RMSE can be used in situations where we want to penalize high errors but not as much as MSE does.
- Lower the value of RMSE, better the models performance and it is sensitive to outliers(like MSE)

![image-5.png](attachment:image-5.png)

``` Python Code
from sklearn.metrics import mean_squared_error
import numpy as np
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
```



##### 4. R² Score (Coefficient of Determination)
- R² tells us how much of the variance in the dependent variable is explained by the model.

![image-6.png](attachment:image-6.png)


``` Python Code
from sklearn.metrics import r2_score
r2 = r2_score(y_true, y_pred)
```

---

### B. For Classification

##### 1. Binary Classification Cost Function
- Classification models are used to make predictions of categorical variables, such as predictions for 0 or 1, Cat or dog, etc.
  
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

``` Python Code:
from sklearn.metrics import log_loss
y_true = [1, 0, 1, 0]
y_pred_proba = [0.9, 0.1, 0.1, 0.9]
loss = log_loss(y_true, y_pred_proba)
print("Binary Cross-Entropy Loss:", loss)
```




##### 2. Multi-Class Classification Cost Function 
- A multi-class classification cost function is used in classification problems for which instances are allocated to one of more than two classes.
- Multi-class classification involves more than two classes, where each input is assigned to one class only.


![image-4.png](attachment:image-4.png)
![image-5.png](attachment:image-5.png)

``` Python Code:
# Using Keras 
from tensorflow.keras.losses import CategoricalCrossentropy, SparseCategoricalCrossentropy
# One-hot labels
loss_fn = CategoricalCrossentropy()
loss = loss_fn(y_true, y_pred).numpy()
# Sparse labels
sparse_loss_fn = SparseCategoricalCrossentropy()
sparse_loss = sparse_loss_fn(y_true_sparse, y_pred).numpy()

# Using Scikit-learn
from sklearn.metrics import log_loss
# One-hot or label-encoded
log_loss_value = log_loss(y_true, y_pred_proba)
```

## Notes: 
- We do not need to manually implement the cost function when using standard machine learning libraries or deep learning frameworks like:
- scikit-learn
- TensorFlow/Keras
- PyTorch

👉 These frameworks perform cost (loss) function computation automatically when you:
- Define a model
- Specify the loss function
- Call fit() or train your model

---

# How to Find the Best Fit Line or Plane

- We use gradient decent to get the minimun cost value for the data set
- It’s the line (in 2D) or plane (in higher dimensions) that minimizes the prediction error between the actual values and predicted values.
- That error is usually measured using the Mean Squared Error (MSE):\

![image.png](attachment:image.png)


###### Use Gradient Descent
- If dataset is large or non-linear, gradient descent is preferred:

![image.png](attachment:image.png)