# Training

## Training diagram

````{div} full-width
```{mermaid}
sequenceDiagram
    autonumber
    participant Agent
    participant RL Method
        Note left of RL Method: SVR, Actor-Critic...
    participant Environment

    loop Episode
        Agent-->>+RL Method: Start training (Data, Initial State)
        loop Step
            RL Method-->>+Environment: Select an action following its exploration strategy
            Environment-->>-RL Method: Return next state, action, reward and done flag
            RL Method->>RL Method: Store transition to memory
        end
        RL Method->>RL Method: Update model
        RL Method-->>-Agent: Returns episode reward
    end
```
````

## Example

Training an Agent powered by SVR model on 400 datasets split between regression (odd indexes) and binary classification problems (even indexes).

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import warnings
warnings.filterwarnings('ignore')

from docs.workflows.utils.generate_training_datasets import generate_training_datasets
from ostatslib.agents import Agent
from ostatslib.reinforcement_learning_methods import SupportVectorRegression

datasets = generate_training_datasets(400)
agent = Agent(rl_method=SupportVectorRegression())
for index, dataset in enumerate(datasets):
    agent.train(dataset)

Checking Agent analysis.

- Binary classification:

In [2]:
analysis = agent.analyze(datasets[0])

for step in analysis:
    print(f'Action: {step.result}, reward: {step.reward}, next state features: {step.state.features_vector}')

Action: get_response_variable_type, reward: 1, next state features: [0 0 1 1]
Action: LogisticRegressionCV(cv=5), reward: 1, next state features: [0.95290424 0.         1.         1.        ]


- Regression:

In [3]:
analysis = agent.analyze(datasets[1])

for step in analysis:
    print(f'Action: {step.result}, reward: {step.reward}, next state features: {step.state.features_vector}')

Action: get_response_variable_type, reward: 1, next state features: [ 0  0 -1  1]
Action: <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7efe24750100>, reward: 1, next state features: [ 0.98989314  0.         -1.          1.        ]
