# **OBJECTIVE FUNCTIONS:**

#### Objective functions, also known as loss functions or cost functions, are crucial in machine learning and statistics because they provide a measure of how well a model's predictions align with the actual data.

### Some of the widely used fuctions are:
#### - RMSE [ Root Mean Squared Error]
#### - MAE [ Mean Absolute Error ]
#### - Logistic Loss
#### - Cross Entropy

### **Will Discuss these one by one**



### 1. RMSE (Root Mean Squared Error)

**Use case:** Regression

**Formula:**
$$
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
$$

**Explanation:**

- **$y_i$**: The true value for the $i$-th observation.
- **$\hat{y}_i$**: The predicted value for the $i-th$ observation.
- **$n$**: The number of observations.

**Details:**

- RMSE measures the square root of the average squared differences between the predicted values and the actual values.
- It gives higher weight to larger errors, meaning it is sensitive to outliers.
- A lower RMSE value indicates a better fit of the model to the data.

**Interpretation:**

- RMSE is in the same units as the response variable $y$.
- It is useful for comparing the predictive accuracy of different models on the same dataset.


In [None]:
import pandas as pd
import numpy as np
NY_data = pd.read_csv('NYCTaxiFareData.csv')

In [None]:
NY_data.head()

Unnamed: 0,key,fare_amount,pickup_datetime,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count
0,2009-06-15 17:26:21.0000001,4.5,2009-06-15 17:26:21 UTC,-73.844311,40.721319,-73.84161,40.712278,1
1,2010-01-05 16:52:16.0000002,16.9,2010-01-05 16:52:16 UTC,-74.016048,40.711303,-73.979268,40.782004,1
2,2011-08-18 00:35:00.00000049,5.7,2011-08-18 00:35:00 UTC,-73.982738,40.76127,-73.991242,40.750562,2
3,2012-04-21 04:30:42.0000001,7.7,2012-04-21 04:30:42 UTC,-73.98713,40.733143,-73.991567,40.758092,1
4,2010-03-09 07:51:00.000000135,5.3,2010-03-09 07:51:00 UTC,-73.968095,40.768008,-73.956655,40.783762,1


In [None]:
NY_data.describe()

Unnamed: 0,fare_amount,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,11.235464,-72.46666,39.920448,-72.474094,39.893281,1.6447
std,9.584258,10.609729,7.318932,10.579732,6.339919,1.271229
min,-2.9,-74.438233,-74.006893,-74.429332,-73.994392,0.0
25%,6.0,-73.992058,40.734547,-73.991112,40.73523,1.0
50%,8.5,-73.981758,40.752693,-73.980083,40.753738,1.0
75%,12.5,-73.966925,40.767694,-73.963504,40.768186,2.0
max,180.0,40.766125,401.083332,40.802437,41.366138,6.0


In [None]:
NY_data['pickup_datetime'] = pd.to_datetime(NY_data['pickup_datetime'])

# Extract useful datetime features
NY_data['year'] = NY_data['pickup_datetime'].dt.year
NY_data['month'] = NY_data['pickup_datetime'].dt.month
NY_data['day'] = NY_data['pickup_datetime'].dt.day
NY_data['hour'] = NY_data['pickup_datetime'].dt.hour

NY_data = NY_data.drop(columns=['pickup_datetime'])

In [None]:
NY_data

Unnamed: 0,key,fare_amount,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count,year,month,day,hour
0,2009-06-15 17:26:21.0000001,4.5,-73.844311,40.721319,-73.841610,40.712278,1,2009,6,15,17
1,2010-01-05 16:52:16.0000002,16.9,-74.016048,40.711303,-73.979268,40.782004,1,2010,1,5,16
2,2011-08-18 00:35:00.00000049,5.7,-73.982738,40.761270,-73.991242,40.750562,2,2011,8,18,0
3,2012-04-21 04:30:42.0000001,7.7,-73.987130,40.733143,-73.991567,40.758092,1,2012,4,21,4
4,2010-03-09 07:51:00.000000135,5.3,-73.968095,40.768008,-73.956655,40.783762,1,2010,3,9,7
...,...,...,...,...,...,...,...,...,...,...,...
9995,2011-10-26 10:44:00.00000086,11.7,-73.988277,40.748970,-73.963712,40.773958,2,2011,10,26,10
9996,2011-12-16 15:37:00.000000179,5.7,-74.002112,40.748727,-73.992467,40.756252,1,2011,12,16,15
9997,2013-11-16 22:47:17.0000001,12.0,-73.992093,40.729071,-73.974470,40.763050,2,2013,11,16,22
9998,2010-01-28 11:38:00.00000022,6.5,-73.992548,40.735652,-73.998802,40.723085,1,2010,1,28,11


In [None]:
X = NY_data.drop(columns=['fare_amount', 'key'])
y = NY_data['fare_amount']

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE: {rmse}')

RMSE: 10.055300400668683


### Understanding RMSE

* An RMSE of 10.055 means that, on average, the model's predictions are about 10.055 units away from the actual values.

* The significance of an RMSE value depends on the context and scale of the data. For example, in a fare prediction model, if the fare amounts range from 0 to 100, an RMSE of 10.055 might be considered high. However, if fare amounts range from 0 to 1000, an RMSE of 10.055 might be considered low.

To improve RMSE, consider more advanced models, feature engineering, or better data preprocessing techniques. Will Discuss in further sections


### 2. MAE (Mean Absolute Error)

**Use case:** Regression

**Formula:**

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

**Explanation:**

- **$y_i$**: The true value for the $i-th$ observation.
- **$\hat{y}_i$**: The predicted value for the $i-th$ observation.
- **$n$**: The number of observations.

**Details:**

- MAE measures the average magnitude of the errors in a set of predictions, without considering their direction.
- It is the average of the absolute differences between the predicted values and the actual values.
- MAE treats all errors equally.

**Interpretation:**

- MAE is in the same units as the response variable $y$.
- It is useful for understanding the typical size of the errors in predictions.



In [None]:
mae = mean_absolute_error(y_test, y_pred)
print(f'MAE: {mae}')

MAE: 6.054968368599895



### 3. Logistic Loss (Binary Cross-Entropy Loss)

**Use case:** Binary classification

**Formula:**

$$text{Logistic Loss} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$

**Explanation:**

- **$y_i$**: The true binary label (0 or 1) for the $i-th$ observation.
- **$\hat{y}_i$**: The predicted probability that the $i-th$ observation belongs to class 1.
- **$n$**: The number of observations.

**Details:**

- Logistic loss measures the performance of a classification model where the output is a probability value between 0 and 1.
- It penalizes false classifications. A higher probability assigned to the correct class results in a lower loss.
- The loss increases as the predicted probability diverges from the actual label.

**Interpretation:**

- A lower logistic loss indicates better performance of the classifier.
- It is particularly useful for models that output probabilities, such as logistic regression.


In [None]:
TitanicData= pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")

In [None]:
TitanicData.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
TitanicData = TitanicData.drop(columns=['Name', 'Ticket', 'Cabin'])

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

In [None]:
TitanicData['Age'].fillna(TitanicData['Age'].median(), inplace=True)
TitanicData['Embarked'].fillna(TitanicData['Embarked'].mode()[0], inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  TitanicData['Age'].fillna(TitanicData['Age'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  TitanicData['Embarked'].fillna(TitanicData['Embarked'].mode()[0], inplace=True)


In [None]:
X = TitanicData.drop(columns=['Survived'])
y = TitanicData['Survived']

In [None]:
numeric_features = ['Age', 'Fare']
numeric_transformer = StandardScaler()

categorical_features = ['Sex', 'Embarked', 'Pclass']
categorical_transformer = OneHotEncoder(drop='first')

In [None]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

In [None]:
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression())
])

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
pipeline.fit(X_train, y_train)

In [None]:
y_pred_proba = pipeline.predict_proba(X_test)
loss = log_loss(y_test, y_pred_proba)

print(f'Logistic Loss: {loss}')

Logistic Loss: 0.43295800972366816


### 4. Cross Entropy Loss (Categorical Cross-Entropy Loss)

**Use case:** Multi-class classification

**Formula:**

$$\text{Cross Entropy} = -\sum_{i=1}^{n} \sum_{c=1}^{k} y_{i,c} \log(\hat{y}_{i,c})$$

**Explanation:**

- **$n$**: The number of observations.
- **$k$**: The number of classes.
- **$y_{i,c}$**: A binary indicator (0 or 1) if class label $c$ is the correct classification for observation $i$.
- **$\hat{y}_{i,c}$**: The predicted probability that observation $i$ belongs to class $c$.

**Details:**

- Cross entropy loss measures the performance of a classification model where the output is a probability distribution over multiple classes.
- It is the sum of the negative log probabilities of the true class labels.
- It penalizes the probability of the correct class being low.

**Interpretation:**

- A lower cross entropy loss indicates a better performance of the classifier.
- It is especially useful for multi-class classification problems where the output is a probability distribution.

### Summary of Interpretations:

- **RMSE and MAE** are used for regression tasks, with RMSE being more sensitive to outliers.
- **Logistic loss** is used for binary classification and measures the probability predictions against the true binary labels.
- **Cross entropy loss** is used for multi-class classification and measures the probability distributions against the true class labels.

Each of these objective functions helps quantify how well a model's predictions match the actual data, guiding the optimization process during model training.

In [None]:
pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.16.1-cp310-cp310-macosx_12_0_arm64.whl.metadata (4.1 kB)
Collecting absl-py>=1.0.0 (from tensorflow)
  Downloading absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=23.5.26 (from tensorflow)
  Downloading flatbuffers-24.3.25-py2.py3-none-any.whl.metadata (850 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Downloading gast-0.5.4-py3-none-any.whl.metadata (1.3 kB)
Collecting google-pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting h5py>=3.10.0 (from tensorflow)
  Downloading h5py-3.11.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (2.5 kB)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-py2.py3-none-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting ml-dtypes~=0.3.1 (from tensorflow)
  Do

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.losses import CategoricalCrossentropy

In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [3]:
X_train = X_train / 255.0
X_test = X_test / 255.0

In [4]:
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [5]:
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

In [6]:
model.compile(optimizer='adam',
              loss=CategoricalCrossentropy(),
              metrics=['accuracy'])

In [7]:
model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x79d0bc98b160>

In [8]:
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Categorical Cross-Entropy Loss: {loss}')
print(f'Accuracy: {accuracy}')

Categorical Cross-Entropy Loss: 0.0780465304851532
Accuracy: 0.9758999943733215


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

With this, we will know if out model is doing or do we have to do some feature engineering steps; or if we have to use other models; So these **Objective Functions** are very important

There are other functions than these like:

### For Regression:
1. **Mean Squared Error (MSE)**:
  $$
   \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
   $$
   - Similar to RMSE but without taking the square root. It also emphasizes larger errors.

2. **Huber Loss**:
   $$
   L_\delta(a) = \begin{cases} 
   \frac{1}{2}a^2 & \text{for } |a| \leq \delta, \\
   \delta(|a| - \frac{1}{2}\delta) & \text{for } |a| > \delta,
   \end{cases}
   $$
   where $a = y_i - \hat{y}_i$.
   - Combines the best properties of MAE and MSE, being less sensitive to outliers than MSE but more sensitive than MAE.

3. **Quantile Loss**:
   $$
   \text{Quantile Loss} = \sum_{i=1}^{n} \left( \tau (y_i - \hat{y}_i)^+ + (1-\tau) (\hat{y}_i - y_i)^+ \right)
   $$
   - Used in quantile regression to predict the quantiles (e.g., median) of the target variable.

### For Classification:
1. **Hinge Loss (for Support Vector Machines)**:
   $$
   \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)
   $$
   - Used for training SVMs, focusing on correctly classifying samples with a margin.

2. **Kullback-Leibler Divergence (KL Divergence)**:
   $$
   \text{KL}(P || Q) = \sum_{i} P(i) \log \left( \frac{P(i)}{Q(i)} \right)
   $$
   - Measures how one probability distribution diverges from a second, expected probability distribution.

3. **Focal Loss**:
   $$
   \text{Focal Loss} = -\sum_{i=1}^{n} (1 - \hat{p}_{i})^\gamma \log(\hat{p}_{i})
   $$
   - Used to address class imbalance by focusing on hard-to-classify examples.

4. **Cosine Similarity Loss**:
   $$
   \text{Cosine Similarity Loss} = 1 - \cos(\theta) = 1 - \frac{A \cdot B}{\|A\| \|B\|}
   $$
   - Measures the cosine of the angle between two vectors, used in tasks like text similarity.

### For Sequence Models and Others:
1. **Connectionist Temporal Classification (CTC) Loss**:
   $$
   \text{CTC Loss} = -\log P(y|x)
   $$
   - Used in sequence modeling tasks where the alignment between input and output sequences is unknown (e.g., speech recognition).

2. **Earth Mover's Distance (Wasserstein Loss)**:
   $$
   W(p, q) = \inf_{\gamma \in \Pi(p, q)} \mathbb{E}_{(x,y) \sim \gamma} [ \|x - y\| ]
   $$
   - Measures the minimum cost of transforming one distribution into another, useful in generative models.

### For Ranking Problems:
1. **Pairwise Ranking Loss**:
  $$
   L = \sum_{i, j} \max(0, 1 - (s_i - s_j))
   $$
   - Used in ranking tasks where the goal is to order items correctly.

2. **Listwise Loss (e.g., ListNet)**:
   $$
   L = - \sum_{i} P(y_i) \log(Q(y_i))
  $$
   - Used in learning to rank tasks, considering the entire list of items.

### Regularization Terms:
While not objective functions by themselves, regularization terms are often added to objective functions to prevent overfitting:

1. **L1 Regularization (Lasso)**:
   $$
   \text{L1} = \lambda \sum_{j=1}^{p} |w_j|
   $$
   - Encourages sparsity in the model parameters.

2. **L2 Regularization (Ridge)**:
   $$
   \text{L2} = \lambda \sum_{j=1}^{p} w_j^2
   $$
   - Encourages smaller model parameters, leading to simpler models.

### Conclusion:
The choice of objective function depends on the specific characteristics of the problem being solved, the type of data, and the desired properties of the solution. The objective functions mentioned above are just some of the many options available, each suited to different types of tasks and model requirements.