# Chapter 0.2: Assessing Results

TO COME: 
- Evaluation metrics (accuracy, precision, recall, F1-score)
- Cross-validation and model selection
- Bias and fairness in evaluation

## Section 1: Environment Setup

Before we can evaluate the performance of a machine learning model, we need to set up our environment with the necessary libraries and tools. Here are some commonly used Python libraries for machine learning:

- Scikit-learn: A popular library for machine learning in Python, with support for a variety of models and evaluation metrics.
- TensorFlow: A powerful library for deep learning, with support for building and training complex neural networks.
- PyTorch: Another popular library for deep learning, with a user-friendly interface and support for dynamic computation graphs.
Here's an example code for installing Scikit-learn and importing it in Python:

## Section 2: Evaluation Metrics

In machine learning, evaluation metrics are used to assess the performance of a model on a given dataset. Evaluation metrics can be broadly classified into two categories: classification metrics and regression metrics.

### 2.1 Classification Metrics

Classification is a supervised learning task in which the goal is to assign input data points to one of several discrete categories. Examples include spam detection, image classification, and sentiment analysis.

The most commonly used classification metrics are accuracy, precision, recall, and F1-score. Let's define these metrics mathematically:

**Accuracy** measures the proportion of correct predictions among all predictions. It is defined as:
$$Accuracy = \frac{TP+TN}{TP+TN+FP+FN}$$ 
where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.

**Precision** measures the proportion of true positives among all positive predictions. It is defined as:
$$Precision = \frac{TP}{TP+FP}$$

**Recall** measures the proportion of true positives among all actual positives. It is defined as:
$$Recall = \frac{TP}{TP+FN}$$

**F1-score** is a weighted average of precision and recall, which balances the trade-off between them. It is defined as:
$$F1\text{-}score = 2\times\frac{Precision\times Recall}{Precision+Recall}$$

We can use Python libraries such as Scikit-learn to compute these evaluation metrics for our classification models. Here's an example code for computing accuracy, precision, recall, and F1-score for a binary classification model:

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
y_true = [0, 1, 1, 0, 1, 0]
y_pred = [0, 1, 0, 0, 1, 1]
print("Accuracy:", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall:", recall_score(y_true, y_pred))
print("F1-score:", f1_score(y_true, y_pred))

In addition to these metrics, we may also need to compute other classification metrics such as area under the curve, confusion matrix, or ROC curve. We can use Scikit-learn to compute these metrics as well.

### 2.2 Regression Metrics

Regression is a supervised learning task in which the goal is to predict a continuous output variable given one or more input variables. Examples include house price prediction, stock price prediction, and weather forecasting.

The most commonly used regression metrics are mean squared error, mean absolute error, and R-squared. Let's define these metrics mathematically:

- Mean squared error (MSE) measures the average squared difference between the predicted and actual values. It is defined as:
$$MSE = \frac{1}{n}\sum_{i=1}^n(y_i - \hat{y}_i)^2$$
where $y_i$ is the actual value, $\hat{y}_i$ is the predicted value, and $n$ is the number of data points.

- Mean absolute error (MAE) measures the average absolute difference between the predicted and actual values. It is defined as:
$$MAE= \frac{1}{n}\sum_{i=1}^n|y_i - \hat{y}_i|$$

- R-squared (R²) measures the proportion of variance in the output variable that can be explained by the input variables. It is defined as:
$$R^2 = 1 - \frac{\sum_{i=1}^n(y_i - \hat{y}_i)^2}{\sum_{i=1}^n(y_i - \bar{y})^2}$$
where $\bar{y}$ is the mean of the output variable.

We can use Python libraries such as Scikit-learn to compute these evaluation metrics for our regression models. Here's an example code for computing mean squared error for a regression model:

In [None]:
from sklearn.metrics import mean_squared_error
y_true = [3, 5, 2, 7, 8]
y_pred = [2, 4, 3, 8, 6]
print("Mean squared error:", mean_squared_error(y_true, y_pred))

In addition to these metrics, we may also need to compute other regression metrics such as mean absolute percentage error or root mean squared logarithmic error. We can use Scikit-learn to compute these metrics as well.