In [1]:
import numpy as np

# Evaluation Metrics

#### Summary:

* Precision: Focuses on the proportion of relevant instances (true positives) among the retrieved instances.
* Recall: Measures the proportion of actual positives correctly identified.
* F1 Score: A balance between precision and recall.
* Confusion Matrix: Provides a comprehensive view of the model’s performance.
* ROC Curve and AUC: Useful for evaluating the trade-off between recall and false positives at different thresholds, with AUC providing a summary measure.


#### Precision

Precision measures the proportion of **true positives predictions** out of all positive prediction made

$$ \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}$$

It is important to look at precision when the cost of false positives is high (eg
spam detection).


In [2]:
def precision_score(y_true, y_pred):

    true_positive = np.sum((y_true==1) & (y_pred==1))
    false_positive = np.sum((y_true==0) & (y_pred==1) )

    return true_positive/false_positive

#### Recall

Recall measures the proportion of **true positive predictions** out of all 
actual positive cases. 

$$ \text{Recal} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}$$

It is import to look at recall when it is critical to capture as many positives as possible, such as in disease detection (missing an actual positive case is 
dangerous)



In [3]:
def recall_score(y_true, y_pred):

    true_positive = np.sum( (y_true == 1) & (y_pred == 1) )
    false_negative = np.sum( (y_true == 1) & (y_pred == 0) )

    return true_positive/(true_positive + false_negative)

#### F1 Score
The F1 Score is the **harmonic mean** of precision and recall. 

$$ \text{F1 Score} = 2\frac{\text{Precision} * \text{Recall}}{\text{Precision} + \text{Recall}}  $$

The metric is usefull when we need to balance the trade-off between precision and recall
#### ROC Curve and AUC

The ROC curve is a plot that show the trade-off between **true positive rate (recall)** and **false positive rate (FPR)**

THE AUC (Area Under the Curve) represents the area under the ROC curve and is a single number that summarizes the model's ability to distinguish classess: the higher the AUC, the better the model.

# K-Fold Coss-Validation

In [4]:
import numpy as np

def k_fold(model, X, y, k = 5):
    """
    Perform a K-fold cross-validation using a given model. 

    -----------
    Parameters
    -----------
    model: object
        Machine learning model.
    X: numpy array
        Feature matrix.
    y: numpy array
        Target vector
    k: positive non-zero integer
        Number of folds
    
    -----------
    Returns
    -----------
    scores: list
    List of evaluation scores for each fold
    """

    # Shuffle data indices for random splliting
    indices = np.arange(len(X))
    np.random.shuffle(indices)

    # Define size of each fold
    fold_size = len(X) // k

    scores = []

    for i in range(k):
        # Define train and test indices
        test_indices  = indices[ i*fold_size: (i+1)*fold_size]
        train_indices = np.delete(indices, test_indices)
        
        # Split the data
        X_train, y_train =  X[train_indices], y[train_indices]
        X_test, y_test   =  X[test_indices], y[test_indices]
       

        # Train the model and evaluate
        ## I'm assuming model has bot a fit and a score defined fucntion in the class
        model.fit(X_train, y_train)
        score = model.score(X_test, y_test)
        scores.append(score)

    return scores

# Time-Series Rolling Windows