# **OBJECTIVE FUNCTIONS:**

#### Objective functions, also known as loss functions or cost functions, are crucial in machine learning and statistics because they provide a measure of how well a model's predictions align with the actual data. 

### Some of the widely used fuctions are: 
#### - RMSE [ Root Mean Squared Error]
#### - MAE [ Mean Absolute Error ]
#### - Logistic Loss
#### - Cross Entropy

### **Will Discuss these one by one**



### 1. RMSE (Root Mean Squared Error)

**Use case:** Regression

**Formula:**
$$
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} 
$$

**Explanation:**

- **$y_i$**: The true value for the $i$-th observation.
- **$\hat{y}_i$**: The predicted value for the $i-th$ observation.
- **$n$**: The number of observations.

**Details:**

- RMSE measures the square root of the average squared differences between the predicted values and the actual values.
- It gives higher weight to larger errors, meaning it is sensitive to outliers.
- A lower RMSE value indicates a better fit of the model to the data.

**Interpretation:**

- RMSE is in the same units as the response variable $y$.
- It is useful for comparing the predictive accuracy of different models on the same dataset.


In [1]:
import pandas as pd
import numpy as np
NY_data = pd.read_csv('NewYorkFareData.csv')

ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

In [7]:
NY_data.head()

Unnamed: 0,key,fare_amount,pickup_datetime,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count
0,2009-06-15 17:26:21.0000001,4.5,2009-06-15 17:26:21 UTC,-73.844311,40.721319,-73.84161,40.712278,1
1,2010-01-05 16:52:16.0000002,16.9,2010-01-05 16:52:16 UTC,-74.016048,40.711303,-73.979268,40.782004,1
2,2011-08-18 00:35:00.00000049,5.7,2011-08-18 00:35:00 UTC,-73.982738,40.76127,-73.991242,40.750562,2
3,2012-04-21 04:30:42.0000001,7.7,2012-04-21 04:30:42 UTC,-73.98713,40.733143,-73.991567,40.758092,1
4,2010-03-09 07:51:00.000000135,5.3,2010-03-09 07:51:00 UTC,-73.968095,40.768008,-73.956655,40.783762,1


In [9]:
NY_data.describe()

Unnamed: 0,fare_amount,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count
count,55423860.0,55423860.0,55423860.0,55423480.0,55423480.0,55423860.0
mean,11.34505,-72.50968,39.91979,-72.51121,39.92068,1.68538
std,20.71083,12.84888,9.642353,12.7822,9.633346,1.327664
min,-300.0,-3442.06,-3492.264,-3442.025,-3547.887,0.0
25%,6.0,-73.99207,40.73493,-73.9914,40.73403,1.0
50%,8.5,-73.9818,40.75265,-73.98015,40.75316,1.0
75%,12.5,-73.96708,40.76713,-73.96367,40.7681,2.0
max,93963.36,3457.626,3408.79,3457.622,3537.133,208.0


In [10]:
NY_data

Unnamed: 0,key,fare_amount,pickup_datetime,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count
0,2009-06-15 17:26:21.0000001,4.5,2009-06-15 17:26:21 UTC,-73.844311,40.721319,-73.841610,40.712278,1
1,2010-01-05 16:52:16.0000002,16.9,2010-01-05 16:52:16 UTC,-74.016048,40.711303,-73.979268,40.782004,1
2,2011-08-18 00:35:00.00000049,5.7,2011-08-18 00:35:00 UTC,-73.982738,40.761270,-73.991242,40.750562,2
3,2012-04-21 04:30:42.0000001,7.7,2012-04-21 04:30:42 UTC,-73.987130,40.733143,-73.991567,40.758092,1
4,2010-03-09 07:51:00.000000135,5.3,2010-03-09 07:51:00 UTC,-73.968095,40.768008,-73.956655,40.783762,1
...,...,...,...,...,...,...,...,...
55423851,2014-03-15 03:28:00.00000070,14.0,2014-03-15 03:28:00 UTC,-74.005272,40.740027,-73.963280,40.762555,1
55423852,2009-03-24 20:46:20.0000002,4.2,2009-03-24 20:46:20 UTC,-73.957784,40.765530,-73.951640,40.773959,1
55423853,2011-04-02 22:04:24.0000004,14.1,2011-04-02 22:04:24 UTC,-73.970505,40.752325,-73.960537,40.797342,1
55423854,2011-10-26 05:57:51.0000002,28.9,2011-10-26 05:57:51 UTC,-73.980901,40.764629,-73.870605,40.773963,1


In [12]:
NY_data=NY_data.dropna()

In [14]:
X = NY_data.drop('fare_amount', axis=1)
y = NY_data['fare_amount']

In [15]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE: {rmse}')


### 2. MAE (Mean Absolute Error)

**Use case:** Regression

**Formula:**

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

**Explanation:**

- **$y_i$**: The true value for the $i-th$ observation.
- **$\hat{y}_i$**: The predicted value for the $i-th$ observation.
- **$n$**: The number of observations.

**Details:**

- MAE measures the average magnitude of the errors in a set of predictions, without considering their direction.
- It is the average of the absolute differences between the predicted values and the actual values.
- MAE treats all errors equally.

**Interpretation:**

- MAE is in the same units as the response variable $y$.
- It is useful for understanding the typical size of the errors in predictions.



In [None]:
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE: {rmse}')


### 3. Logistic Loss (Binary Cross-Entropy Loss)

**Use case:** Binary classification

**Formula:**

$$text{Logistic Loss} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$

**Explanation:**

- **$y_i$**: The true binary label (0 or 1) for the $i-th$ observation.
- **$\hat{y}_i$**: The predicted probability that the $i-th$ observation belongs to class 1.
- **$n$**: The number of observations.

**Details:**

- Logistic loss measures the performance of a classification model where the output is a probability value between 0 and 1.
- It penalizes false classifications. A higher probability assigned to the correct class results in a lower loss.
- The loss increases as the predicted probability diverges from the actual label.

**Interpretation:**

- A lower logistic loss indicates better performance of the classifier.
- It is particularly useful for models that output probabilities, such as logistic regression.


### 4. Cross Entropy Loss (Categorical Cross-Entropy Loss)

**Use case:** Multi-class classification

**Formula:**

$$\text{Cross Entropy} = -\sum_{i=1}^{n} \sum_{c=1}^{k} y_{i,c} \log(\hat{y}_{i,c})$$

**Explanation:**

- **$n$**: The number of observations.
- **$k$**: The number of classes.
- **$y_{i,c}$**: A binary indicator (0 or 1) if class label $c$ is the correct classification for observation $i$.
- **$\hat{y}_{i,c}$**: The predicted probability that observation $i$ belongs to class $c$.

**Details:**

- Cross entropy loss measures the performance of a classification model where the output is a probability distribution over multiple classes.
- It is the sum of the negative log probabilities of the true class labels.
- It penalizes the probability of the correct class being low.

**Interpretation:**

- A lower cross entropy loss indicates a better performance of the classifier.
- It is especially useful for multi-class classification problems where the output is a probability distribution.

### Summary of Interpretations:

- **RMSE and MAE** are used for regression tasks, with RMSE being more sensitive to outliers.
- **Logistic loss** is used for binary classification and measures the probability predictions against the true binary labels.
- **Cross entropy loss** is used for multi-class classification and measures the probability distributions against the true class labels.

Each of these objective functions helps quantify how well a model's predictions match the actual data, guiding the optimization process during model training.