# Machine Learning

This tutorial explains how to train a machine learning model in Safe-DS and use it to make predictions.

## Create a `TaggedTable`

First, we need to create a `TaggedTable` from the training data. `TaggedTable`s are used to train supervised machine learning models, because they keep track of the target
column. A `TaggedTable` can be created from a `Table` by calling the `tag_columns` method:

In [10]:
from safeds.data.tabular.containers import Table

training_set = Table({
    "a":      [3, 4,  8,  6, 5],
    "b":      [2, 2,  1,  6, 3],
    "c":      [1, 1,  1,  1, 1],
    "result": [6, 7, 10, 13, 9]
})

tagged_table = training_set.tag_columns(
    target_name="result"
)

## Create and train model

In this example, we want to predict the column `result`, which is the sum of `a`, `b`, and `c`. We will train a linear regression model with this training data. In Safe-DS, machine learning models are modeled as classes. First, their constructor must be called to configure hyperparameters, which returns a model object. Then, training is started by calling the `fit` method on the model object and passing the training data:

In [11]:
from safeds.ml.classical.regression import LinearRegression

model = LinearRegression()
fitted_model = model.fit(tagged_table)

## Predicting new values

The `fit` method returns the fitted model, the original model is **not** changed. Predictions are made by calling the `predict` method on the fitted model. The `predict` method takes a `Table` as input and returns a `Table` with the predictions:

In [12]:
test_set = Table({
    "a": [1, 1, 0, 2, 4],
    "b": [2, 0, 5, 2, 7],
    "c": [1, 4, 3, 2, 1]})

fitted_model.predict(dataset=test_set)


Unnamed: 0,a,b,c,result
0,1,2,1,4.0
1,1,0,4,2.0
2,0,5,3,6.0
3,2,2,2,5.0
4,4,7,1,12.0


## Metrics

A machine learning metric, also known as an evaluation metric, is a measure used to assess the performance of a machine learning model on a test set or is used during cross-validation to gain insights about performance and compare different models or parameter settings.
In `Safe-DS`, the available metrics are: `Accuracy`, `Confusion Matrix`, `F1-Score`, `Precision`, and Recall. Before we go through each of these in detail, we need an understanding of the different `components of evaluation metrics`.


## Components of evaluation metrics

These are distinct elements or parts that contribute to the overall assessment of an evaluation measure.
* `True positives` TP: the positive tuples that the classifier correctly labeled.
* `False positives` FP : the negative tuples that were falsely labeled as positive.
* `True negatives` TN: the negative tuples that the classifier correctly labeled.
* `False negatives` FN: the positive tuples that were falsely labeled as negative.


## Accuracy
Accuracy, also known as `classification rate`, can be defined as the proportion of correctly classified instances out of the total number of instances. Formula: `Accuracy = (TP+TN)/(TP+FP+TN+FN)`.
* Accuracy is suitable when the classes are balanced and there is no significant class imbalance.
## Confusion Matrix
A confusion matrix is a table that is used to define the performance of a classification algorithm.It classifies the predictions to be either true positive, true negative, false positive or false negative.It has no Formula.
* It is useful for evaluating the performance of a classification model and understanding the types of errors it makes.
## F1-Score
The F1-Score is the harmonic mean of precision and recall. Formula: `F1-Score = 2PR/(P+R)`.
* The `F1-score` is suitable when there is an imbalance between the classes, especially when the values of the false positives and false negatives differs.
## Precision
The ability of a classification model to identify only the relevant data points.Formula: `P = TP / (TP+FP)`.
* Precision is useful when the focus is on minimizing the negative tuples that were falsely labeled as positive.
## Recall
Also known as `sensitivity` or `true positive rate`, is the ability of a classification model to identify all the relevant data points.Formula: `R = TP / (TP + FN)`.
* Recall is useful when the focus is on minimizing the positive tuples that were falsely labeled as negative..
