# Training

## Training diagram

````{div} full-width
```{mermaid}
sequenceDiagram
    autonumber
    participant Agent
    participant RL Method
        Note left of RL Method: SVR, Actor-Critic...
    participant Environment

    loop Episode
        Agent-->>+RL Method: Start training (Data, Initial State)
        loop Step
            RL Method-->>+Environment: Select an action following its exploration strategy
            Environment-->>-RL Method: Return next state, action, reward and done flag
            RL Method->>RL Method: Store transition to memory
        end
        RL Method->>RL Method: Update model
        RL Method-->>-Agent: Returns episode reward
    end
```
````

## Example

Training an Agent powered by SVR model on 600 datasets split between binary classification, linear and poisson regression problems.

In [1]:
from docs.workflows.utils.generate_training_datasets import generate_training_datasets
datasets = generate_training_datasets(600)

Agent training:

In [2]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import warnings
warnings.filterwarnings('ignore')

from ostatslib.agents import Agent
from ostatslib.reinforcement_learning_methods import SupportVectorRegression

agent = Agent(rl_method=SupportVectorRegression())
for index, (dataset_type, dataset) in enumerate(datasets):
    agent.train(dataset)

Checking Agent analysis.

Getting first dataset for each trained dataset type:

In [3]:
from datacooker.recipes import LogitRecipe, PoissonRecipe, Recipe

logistic_regression_dataset = [dataset for dataset_type, dataset in datasets if dataset_type == LogitRecipe][0]
linear_regression_dataset = [dataset for dataset_type, dataset in datasets if dataset_type == Recipe][0]
poisson_regression_dataset = [dataset for (dataset_type, dataset) in datasets if dataset_type == PoissonRecipe][0]

- Binary classification:

In [4]:
analysis = agent.analyze(logistic_regression_dataset)

for step in analysis:
    print(f'Action: {step.result}, reward: {step.reward}, next state features: {step.state.features_vector}')

Action: is_response_quantitative, reward: 0.75, next state features: [1 0 0 0 1 0 0]
Action: get_log_rows_count, reward: 0.75, next state features: [1.         0.         0.15323742 0.         1.         0.
 0.        ]
Action: is_response_positive_values_only_check, reward: 0.75, next state features: [1.         0.         0.15323742 0.         1.         0.
 1.        ]
Action: is_response_discrete_check, reward: 0.75, next state features: [1.         0.         0.15323742 0.         1.         1.
 1.        ]
Action: is_response_dichotomous_check, reward: 0.75, next state features: [1.         0.         0.15323742 1.         1.         1.
 1.        ]
Action: LogisticRegressionCV(cv=5), reward: 0.7231884057971014, next state features: [1.         0.62318841 0.15323742 1.         1.         1.
 1.        ]


- Regression:

In [5]:
analysis = agent.analyze(linear_regression_dataset)

for step in analysis:
    print(f'Action: {step.result}, reward: {step.reward}, next state features: {step.state.features_vector}')

Action: is_response_quantitative, reward: 0.75, next state features: [1 0 0 0 1 0 0]
Action: get_log_rows_count, reward: 0.75, next state features: [1.         0.         0.20431554 0.         1.         0.
 0.        ]
Action: is_response_positive_values_only_check, reward: 0.75, next state features: [ 1.          0.          0.20431554  0.          1.          0.
 -1.        ]
Action: SVR(), reward: 0.6955879307065417, next state features: [ 1.          0.79558793  0.20431554  0.          1.          0.
 -1.        ]


- Poisson Regression

In [6]:
analysis = agent.analyze(poisson_regression_dataset)

for step in analysis:
    print(f'Action: {step.result}, reward: {step.reward}, next state features: {step.state.features_vector}')

Action: is_response_quantitative, reward: 0.75, next state features: [1 0 0 0 1 0 0]
Action: get_log_rows_count, reward: 0.75, next state features: [1.         0.         0.23885728 0.         1.         0.
 0.        ]
Action: is_response_positive_values_only_check, reward: 0.75, next state features: [1.         0.         0.23885728 0.         1.         0.
 1.        ]
Action: is_response_discrete_check, reward: 0.75, next state features: [1.         0.         0.23885728 0.         1.         1.
 1.        ]
Action: is_response_dichotomous_check, reward: 0.75, next state features: [ 1.          0.          0.23885728 -1.          1.          1.
  1.        ]
Action: <statsmodels.genmod.generalized_linear_model.GLMResultsWrapper object at 0x7fa6103a1240>, reward: 0.8979244159479015, next state features: [ 1.          0.79792442  0.23885728 -1.          1.          1.
  1.        ]
