<a href="https://colab.research.google.com/github/AI4Finance-LLC/FinRL-Library/blob/master/FinRL_ensemble_stock_trading_ICAIF_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading Using Ensemble Strategy

Tutorials to use OpenAI DRL to trade multiple stocks using ensemble strategy in one Jupyter Notebook | Presented at ICAIF 2020

* This notebook is the reimplementation of our paper: Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy, using FinRL.
* Check out medium blog for detailed explanations: https://medium.com/@ai4finance/deep-reinforcement-learning-for-automated-stock-trading-f1dad0126a02
* Please report any issues to our Github: https://github.com/AI4Finance-LLC/FinRL-Library/issues
* **Pytorch Version** 



# Content

* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. Check Additional Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)  
    * [5.1. Training & Trade Data Split](#4.1)
    * [5.2. User-defined Environment](#4.2)   
    * [5.3. Initialize Environment](#4.3)    
* [6.Implement DRL Algorithms](#5)  
* [7.Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
    * [7.3. Baseline Stats](#6.3)   
    * [7.3. Compare to Stock Market Index](#6.4)             

<a id='0'></a>
# Part 1. Problem Definition

This problem is to design an automated trading solution for single stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: The action space describes the allowed actions that the agent interacts with the
environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent
selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use
an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio
values at state s′ and s, respectively

* State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* Environment: Dow 30 consituents


The data of the single stock that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages through FinRL library


In [1]:
# ## install finrl library
# !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git


<a id='1.2'></a>
## 2.2. Check if the additional packages needed are present, if not install them. 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
# matplotlib.use('Agg')
import datetime

%matplotlib inline
from finrl.config import config
from finrl.marketdata.yahoodownloader import YahooDownloader
from finrl.preprocessing.preprocessors import FeatureEngineer
from finrl.preprocessing.data import data_split
from finrl.env.env_stocktrading import StockTradingEnv
from finrl.model.models import DRLAgent,DRLEnsembleAgent
from finrl.trade.backtest import get_baseline, backtest_stats, backtest_plot

from pprint import pprint

import sys
sys.path.append("../FinRL-Library")

import itertools

<a id='1.4'></a>
## 2.4. Create Folders

In [3]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

<a id='2'></a>
# Part 3. Download Data
Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).




-----
class YahooDownloader:
    Provides methods for retrieving daily stock data from
    Yahoo Finance API

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
    fetch_data()
        Fetches data from yahoo API


In [3]:
# from config.py start_date is a string
config.START_DATE

'2004-08-11'

In [4]:
print(config.DOW_30_TICKER)

['AAPL', 'MSFT', 'JPM', 'RTX', 'PG', 'GS', 'NKE', 'DIS', 'AXP', 'HD', 'INTC', 'WMT', 'IBM', 'MRK', 'UNH', 'KO', 'CAT', 'TRV', 'JNJ', 'CVX', 'MCD', 'VZ', 'CSCO', 'XOM', 'BA', 'MMM', 'PFE', 'WBA', 'DD']


In [5]:
%load_ext autoreload
%autoreload 2

In [6]:
# from finrl.config.config import MISSING3
# df = YahooDownloader(start_date = '2006-01-01',
#                      end_date = '2021-06-11',
#                      ticker_list = config.DOW_30_TICKER).fetch_data()

In [7]:
# df.append(df2).append(df3).append(df4).append(df5)

In [17]:
import pickle
# df.drop('Unnamed: 0')with open('dji_df_2004-2021.pkl', 'rb') as f:
#     df = pickle.load(f)

df = pd.read_csv('/home/roman/Work/trading-bot/notebooks/dji_prices_2020_04_09.csv')
df

Unnamed: 0.1,Unnamed: 0,date,open,high,low,close,volume,tic,day
0,0,2006-01-03,2.585000,2.669643,2.580357,2.295634,807234400,AAPL,1
1,1,2006-01-03,51.700001,52.580002,51.049999,41.155041,7825700,AXP,1
2,2,2006-01-03,70.400002,70.599998,69.330002,50.119705,4943000,BA,1
3,3,2006-01-03,57.869999,58.110001,57.049999,38.086823,3697500,CAT,1
4,4,2006-01-03,17.209999,17.490000,17.180000,12.956775,55426000,CSCO,1
...,...,...,...,...,...,...,...,...,...
116020,116020,2021-06-10,233.100006,234.259995,232.130005,233.949997,4452500,V,3
116021,116021,2021-06-10,57.330002,57.610001,57.220001,57.340000,12013600,VZ,3
116022,116022,2021-06-10,53.799999,55.580002,53.570000,55.310001,6638000,WBA,3
116023,116023,2021-06-10,139.080002,140.190002,139.080002,139.880005,5459500,WMT,3


In [22]:
df = df.drop(['Unnamed: 0'], axis=1)

In [23]:
df = df.sort_values(['date', 'tic'])

In [24]:
df.head()

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2006-01-03,2.585,2.669643,2.580357,2.295634,807234400,AAPL,1
1,2006-01-03,51.700001,52.580002,51.049999,41.155041,7825700,AXP,1
2,2006-01-03,70.400002,70.599998,69.330002,50.119705,4943000,BA,1
3,2006-01-03,57.869999,58.110001,57.049999,38.086823,3697500,CAT,1
4,2006-01-03,17.209999,17.49,17.18,12.956775,55426000,CSCO,1


In [27]:
df.tail()

Unnamed: 0,date,open,high,low,close,volume,tic,day
74625,2021-06-10,155.630005,155.889999,153.929993,154.020004,900500,TRV,3
74626,2021-06-10,57.330002,57.610001,57.220001,57.34,12013600,VZ,3
74627,2021-06-10,53.799999,55.580002,53.57,55.310001,6638000,WBA,3
74628,2021-06-10,139.080002,140.190002,139.080002,139.880005,5459500,WMT,3
74629,2021-06-10,63.610001,63.98,62.25,62.75,27488700,XOM,3


In [28]:
df.shape

(74630, 8)

In [25]:
df.tic.unique()

array(['AAPL', 'AXP', 'BA', 'CAT', 'CSCO', 'CVX', 'DD', 'DIS', 'GS', 'HD',
       'IBM', 'INTC', 'JNJ', 'JPM', 'KO', 'MCD', 'MMM', 'MRK', 'MSFT',
       'NKE', 'PFE', 'PG', 'RTX', 'TRV', 'UNH', 'VZ', 'WBA', 'WMT', 'XOM',
       'V'], dtype=object)

In [28]:
# import pickle
# with open('dji_df_2004-2021.pkl', 'wb') as f:
#     pickle.dump(df, f)

# Part 4: Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

In [26]:
import pickle
with open('/home/roman/Work/trading-bot/notebooks/2004-2021_imputed_sentiment_final_df.pkl', 'rb') as f:
    daily_sentiment_df = pickle.load(f)

In [None]:
with open('doc2vec_2004_2021_expanded_world_df.pkl', 'rb') as f:
    doc2vec_2004_2021_expanded_world_df = pickle.load(f)

In [27]:
daily_sentiment_df

Unnamed: 0,date,business_score,environment_score,politics_score,science_score,technology_score,world_score,business_magnitude,environment_magnitude,politics_magnitude,science_magnitude,technology_magnitude,world_magnitude
0,2004-08-11,-0.033333,-0.366667,-0.133333,0.066667,-0.033333,-0.333333,4.533333,9.400000,17.900000,3.500,4.900000,10.100000
1,2004-08-12,-0.166667,-0.100000,-0.366667,-0.100000,-0.066667,-0.233333,5.166667,15.433333,19.966667,5.300,10.066666,13.033334
2,2004-08-13,-0.300000,-0.100000,-0.066667,-0.100000,-0.133333,-0.100000,3.833333,13.000000,13.233334,5.300,2.266667,19.966667
3,2004-08-14,-0.233333,-0.233333,-0.300000,-0.100000,0.033333,-0.200000,9.400000,10.600000,16.100000,5.300,7.533334,13.933333
4,2004-08-15,-0.100000,-0.333333,-0.300000,-0.100000,-0.050000,-0.333333,8.600000,13.733334,19.200000,5.300,7.200000,5.166667
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6141,2021-06-07,-0.152000,-0.111765,-0.313043,-0.050000,-0.208333,-0.226000,15.572000,10.935294,18.982609,6.175,14.133333,21.184000
6142,2021-06-08,-0.213793,-0.085714,-0.310526,-0.133333,-0.285714,-0.238000,14.465517,12.633334,25.405263,6.100,12.400000,19.104000
6143,2021-06-09,-0.200000,-0.084211,-0.256000,-0.150000,-0.200000,-0.234000,15.242857,11.794737,27.012000,7.000,8.875000,18.166000
6144,2021-06-10,-0.118519,-0.107692,-0.213636,-0.080000,-0.150000,-0.178000,14.151852,11.176923,23.786363,6.260,19.525000,17.912000


In [28]:
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    tech_indicator_list = config.TECHNICAL_INDICATORS_LIST,
                    use_turbulence=True)

processed = fe.preprocess_data(df)

Successfully added technical indicators
Successfully added turbulence index


In [29]:
list(processed.columns)

['date',
 'open',
 'high',
 'low',
 'close',
 'volume',
 'tic',
 'day',
 'macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma',
 'turbulence']

In [30]:
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

In [31]:
processed_full.sample(5)

Unnamed: 0,date,tic,open,high,low,close,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,turbulence
145404,2019-04-11,UNH,246.050003,246.919998,232.679993,226.98526,8223400.0,3.0,-1.950808,247.703096,229.131758,41.193396,-120.91729,34.839594,236.935162,245.672912,30.206898
56381,2011-02-25,INTC,21.52,22.0,21.459999,16.079168,53475700.0,4.0,0.095167,16.270748,15.49247,56.164736,77.443325,2.075169,15.754646,15.619415,26.364106
124973,2017-05-30,TRV,123.360001,124.230003,123.160004,111.790138,1139200.0,1.0,0.542851,111.698529,107.366933,59.091724,206.565871,35.8198,109.403681,109.583163,20.428736
26478,2008-06-03,MSFT,27.91,28.309999,27.27,20.466591,86616700.0,1.0,-0.296617,22.960626,20.394584,42.687388,-146.669316,20.872006,21.848735,21.691082,19.402154
13448,2007-03-27,GS,211.009995,211.729996,209.740005,172.573776,4806100.0,1.0,0.606949,176.135296,156.423773,53.246927,33.933005,17.47191,169.790019,171.120632,16.54163


In [32]:
processed_full['date'][0]

'2006-01-03'

<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.

The action space describes the allowed actions that the agent interacts with the environment. Normally, action a includes three actions: {-1, 0, 1}, where -1, 0, 1 represent selling, holding, and buying one share. Also, an action can be carried upon multiple shares. We use an action space {-k,…,-1, 0, 1, …, k}, where k denotes the number of shares to buy and -k denotes the number of shares to sell. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or -10, respectively. The continuous action space needs to be normalized to [-1, 1], since the policy is defined on a Gaussian distribution, which needs to be normalized and symmetric.

In [33]:
config.TECHNICAL_INDICATORS_LIST

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [34]:
stock_dimension = len(processed_full.tic.unique())
state_space = 1 + 2*stock_dimension + len(config.TECHNICAL_INDICATORS_LIST)*stock_dimension + config.NUMBER_OF_DAILY_FEATURES
print(f"Stock Dimension: {stock_dimension}, User Features: {config.NUMBER_OF_USER_FEATURES}, State Space: {state_space}")


Stock Dimension: 30, User Features: 0, State Space: 313


In [35]:
env_kwargs = {
    "hmax": 100, 
    "initial_amount": 50_000_000/100, #Since in Indonesia the minimum number of shares per trx is 100, then we scaled the initial amount by dividing it with 100 
    "buy_cost_pct": 0.0019, #IPOT has 0.19% buy cost
    "sell_cost_pct": 0.0029, #IPOT has 0.29% sell cost
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": 1e-4,
    "print_verbosity":5
    
}

<a id='5'></a>
# Part 6: Implement DRL Algorithms
* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
* FinRL library includes fine-tuned standard DRL algorithms, such as DQN, DDPG,
Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
design their own DRL algorithms by adapting these DRL algorithms.

* In this notebook, we are training and validating 3 agents (A2C, PPO, DDPG) using Rolling-window Ensemble Method ([reference code](https://github.com/AI4Finance-LLC/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020/blob/80415db8fa7b2179df6bd7e81ce4fe8dbf913806/model/models.py#L92))

In [36]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [40]:
rebalance_window = 63 # rebalance_window is the number of days to retrain the model
validation_window = 63 # validation_window is the number of days to do validation and trading (e.g. if validation_window=63, then both validation and trading period will be 63 days)
train_start = '2006-01-03'
train_end = '2016-01-01'
val_test_start = '2016-01-01'
val_test_end = '2021-06-11'

ensemble_agent = DRLEnsembleAgent(df=processed_full,
                 train_period=(train_start,train_end),
                 val_test_period=(val_test_start,val_test_end),
                 rebalance_window=rebalance_window, 
                 validation_window=validation_window,
                 daily_features=daily_sentiment_df,
                 **env_kwargs)

In [41]:
A2C_model_kwargs = {
                    'n_steps': 5,
                    'ent_coef': 0.01,
                    'learning_rate': 0.0005
                    }

PPO_model_kwargs = {
                    "ent_coef":0.01,
                    "n_steps": 2048,
                    "learning_rate": 0.00025,
                    "batch_size": 128
                    }

DDPG_model_kwargs = {
                      "action_noise":"ornstein_uhlenbeck",
                      "buffer_size": 50_000,
                      "learning_rate": 0.000005,
                      "batch_size": 128
                    }

A2C2_model_kwargs = {
                    'n_steps': 5,
                    'ent_coef': 0.01,
                    'learning_rate': 0.001
                    }

PPO2_model_kwargs = {
                    "ent_coef":0.01,
                    "n_steps": 2048,
                    "learning_rate": 0.0005,
                    "batch_size": 128
                    }

DDPG2_model_kwargs = {
                      "action_noise":"ornstein_uhlenbeck",
                      "buffer_size": 50_000,
                      "learning_rate": 0.00001,
                      "batch_size": 128
                    }

timesteps_dict = {'a2c' : 50_000, 
                 'ppo' : 50_000, 
                 'ddpg' : 25_000,
                  'a2c2' : 25_000, 
                 'ppo2' : 25_000, 
                 'ddpg2' : 12_000
                 }

In [None]:
import time
start = time.time()# print([self.state[0]])
        # print(self.data.close.values.tolist())
        # print(list(self.state[(self.stock_dim+1):(self.stock_dim*2+1)]))
        # print(sum([self.data[tech].values.tolist() for tech in self.tech_indicator_list ], []) )
        # user_features_columns = self.data.columns[-config.NUMBER_OF_USER_FEATURES:]
        # print(self.data[user_features_columns].values[0])
df_summary = ensemble_agent.run_ensemble_strategy(A2C_model_kwargs, PPO_model_kwargs, DDPG_model_kwargs, A2C2_model_kwargs, PPO2_model_kwargs, DDPG2_model_kwargs, timesteps_dict)
time_elapsed = time.time()-start
print(time_elapsed)

39.25064963834649
turbulence_threshold:  458.4056541260132
{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0005}
Using cuda device
Logging to tensorboard_log/a2c/a2c_126_16
------------------------------------
| time/                 |          |
|    fps                | 156      |
|    iterations         | 100      |
|    time_elapsed       | 3        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -42.8    |
|    explained_variance | 0.104    |
|    learning_rate      | 0.0005   |
|    n_updates          | 99       |
|    policy_loss        | -30.6    |
|    std                | 1.01     |
|    value_loss         | 1.01     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 157      |
|    iterations         | 200      |
|    time_elapsed       | 6        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entr

------------------------------------
| time/                 |          |
|    fps                | 157      |
|    iterations         | 1500     |
|    time_elapsed       | 47       |
|    total_timesteps    | 7500     |
| train/                |          |
|    entropy_loss       | -43.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1499     |
|    policy_loss        | -8.06    |
|    std                | 1.02     |
|    value_loss         | 0.321    |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 7.94e+05 |
|    total_cost         | 1.64e+05 |
|    total_reward       | 2.94e+05 |
|    total_reward_pct   | 58.8     |
|    total_trades       | 53109    |
| time/                 |          |
|    fps                | 157      |
|    iterations         | 1600     |
|    time_elapsed       | 50       |
|    total_timesteps    | 8000     |
|

------------------------------------
| time/                 |          |
|    fps                | 158      |
|    iterations         | 2900     |
|    time_elapsed       | 91       |
|    total_timesteps    | 14500    |
| train/                |          |
|    entropy_loss       | -43.5    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0005   |
|    n_updates          | 2899     |
|    policy_loss        | -21.9    |
|    std                | 1.03     |
|    value_loss         | 0.389    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 158      |
|    iterations         | 3000     |
|    time_elapsed       | 94       |
|    total_timesteps    | 15000    |
| train/                |          |
|    entropy_loss       | -43.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 2999     |
|    policy_loss        | 66.1     |
|

------------------------------------
| time/                 |          |
|    fps                | 152      |
|    iterations         | 4300     |
|    time_elapsed       | 140      |
|    total_timesteps    | 21500    |
| train/                |          |
|    entropy_loss       | -44      |
|    explained_variance | 0.567    |
|    learning_rate      | 0.0005   |
|    n_updates          | 4299     |
|    policy_loss        | 16.4     |
|    std                | 1.05     |
|    value_loss         | 0.36     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 152      |
|    iterations         | 4400     |
|    time_elapsed       | 144      |
|    total_timesteps    | 22000    |
| train/                |          |
|    entropy_loss       | -44.1    |
|    explained_variance | 0.0339   |
|    learning_rate      | 0.0005   |
|    n_updates          | 4399     |
|    policy_loss        | -31.6    |
|

------------------------------------
| time/                 |          |
|    fps                | 150      |
|    iterations         | 5700     |
|    time_elapsed       | 189      |
|    total_timesteps    | 28500    |
| train/                |          |
|    entropy_loss       | -44.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5699     |
|    policy_loss        | -26      |
|    std                | 1.07     |
|    value_loss         | 0.841    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 150      |
|    iterations         | 5800     |
|    time_elapsed       | 192      |
|    total_timesteps    | 29000    |
| train/                |          |
|    entropy_loss       | -44.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5799     |
|    policy_loss        | -38.2    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 7.31e+05 |
|    total_cost         | 6.37e+03 |
|    total_reward       | 2.31e+05 |
|    total_reward_pct   | 46.3     |
|    total_trades       | 45026    |
| time/                 |          |
|    fps                | 149      |
|    iterations         | 7100     |
|    time_elapsed       | 237      |
|    total_timesteps    | 35500    |
| train/                |          |
|    entropy_loss       | -45.4    |
|    explained_variance | 2.99e-05 |
|    learning_rate      | 0.0005   |
|    n_updates          | 7099     |
|    policy_loss        | 43.8     |
|    std                | 1.1      |
|    value_loss         | 1.1      |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 149      |
|    iterations         | 7200     |
|    time_elapsed       | 241      |
|    total_timesteps    | 36000    |
|

------------------------------------
| time/                 |          |
|    fps                | 149      |
|    iterations         | 8500     |
|    time_elapsed       | 284      |
|    total_timesteps    | 42500    |
| train/                |          |
|    entropy_loss       | -45.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 8499     |
|    policy_loss        | 46.3     |
|    std                | 1.12     |
|    value_loss         | 1.18     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 9.07e+05 |
|    total_cost         | 6.51e+03 |
|    total_reward       | 4.07e+05 |
|    total_reward_pct   | 81.3     |
|    total_trades       | 43022    |
| time/                 |          |
|    fps                | 149      |
|    iterations         | 8600     |
|    time_elapsed       | 287      |
|    total_timesteps    | 43000    |
|

------------------------------------
| time/                 |          |
|    fps                | 149      |
|    iterations         | 9900     |
|    time_elapsed       | 331      |
|    total_timesteps    | 49500    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 9899     |
|    policy_loss        | -17      |
|    std                | 1.14     |
|    value_loss         | 0.279    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 149      |
|    iterations         | 10000    |
|    time_elapsed       | 334      |
|    total_timesteps    | 50000    |
| train/                |          |
|    entropy_loss       | -46.6    |
|    explained_variance | 0.274    |
|    learning_rate      | 0.0005   |
|    n_updates          | 9999     |
|    policy_loss        | 19.3     |
|

------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 11300    |
|    time_elapsed       | 379      |
|    total_timesteps    | 56500    |
| train/                |          |
|    entropy_loss       | -47      |
|    explained_variance | 0.0309   |
|    learning_rate      | 0.0005   |
|    n_updates          | 11299    |
|    policy_loss        | 18.4     |
|    std                | 1.16     |
|    value_loss         | 0.252    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 11400    |
|    time_elapsed       | 382      |
|    total_timesteps    | 57000    |
| train/                |          |
|    entropy_loss       | -47      |
|    explained_variance | -0.723   |
|    learning_rate      | 0.0005   |
|    n_updates          | 11399    |
|    policy_loss        | 52.5     |
|

------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 12700    |
|    time_elapsed       | 427      |
|    total_timesteps    | 63500    |
| train/                |          |
|    entropy_loss       | -47.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 12699    |
|    policy_loss        | -35.7    |
|    std                | 1.18     |
|    value_loss         | 0.536    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 12800    |
|    time_elapsed       | 431      |
|    total_timesteps    | 64000    |
| train/                |          |
|    entropy_loss       | -47.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 12799    |
|    policy_loss        | -4.91    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.05e+06 |
|    total_cost         | 7.92e+03 |
|    total_reward       | 5.5e+05  |
|    total_reward_pct   | 110      |
|    total_trades       | 42847    |
| time/                 |          |
|    fps                | 148      |
|    iterations         | 14100    |
|    time_elapsed       | 476      |
|    total_timesteps    | 70500    |
| train/                |          |
|    entropy_loss       | -48      |
|    explained_variance | 0.142    |
|    learning_rate      | 0.0005   |
|    n_updates          | 14099    |
|    policy_loss        | 15.9     |
|    std                | 1.2      |
|    value_loss         | 0.207    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 14200    |
|    time_elapsed       | 479      |
|    total_timesteps    | 71000    |
|

------------------------------------
| time/                 |          |
|    fps                | 147      |
|    iterations         | 15500    |
|    time_elapsed       | 524      |
|    total_timesteps    | 77500    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 15499    |
|    policy_loss        | 4.48     |
|    std                | 1.22     |
|    value_loss         | 0.21     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 147      |
|    iterations         | 15600    |
|    time_elapsed       | 527      |
|    total_timesteps    | 78000    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 15599    |
|    policy_loss        | 0.65     |
|

------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 16900    |
|    time_elapsed       | 569      |
|    total_timesteps    | 84500    |
| train/                |          |
|    entropy_loss       | -49.1    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 16899    |
|    policy_loss        | -32.9    |
|    std                | 1.25     |
|    value_loss         | 1.59     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 17000    |
|    time_elapsed       | 572      |
|    total_timesteps    | 85000    |
| train/                |          |
|    entropy_loss       | -49.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 16999    |
|    policy_loss        | 81.1     |
|

------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 18300    |
|    time_elapsed       | 617      |
|    total_timesteps    | 91500    |
| train/                |          |
|    entropy_loss       | -49.6    |
|    explained_variance | 0.143    |
|    learning_rate      | 0.0005   |
|    n_updates          | 18299    |
|    policy_loss        | -21.9    |
|    std                | 1.27     |
|    value_loss         | 0.454    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 18400    |
|    time_elapsed       | 620      |
|    total_timesteps    | 92000    |
| train/                |          |
|    entropy_loss       | -49.6    |
|    explained_variance | 0.256    |
|    learning_rate      | 0.0005   |
|    n_updates          | 18399    |
|    policy_loss        | 46.9     |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.14e+06 |
|    total_cost         | 4.09e+03 |
|    total_reward       | 6.36e+05 |
|    total_reward_pct   | 127      |
|    total_trades       | 42981    |
| time/                 |          |
|    fps                | 147      |
|    iterations         | 19700    |
|    time_elapsed       | 666      |
|    total_timesteps    | 98500    |
| train/                |          |
|    entropy_loss       | -50.2    |
|    explained_variance | -0.0383  |
|    learning_rate      | 0.0005   |
|    n_updates          | 19699    |
|    policy_loss        | 1.12     |
|    std                | 1.29     |
|    value_loss         | 0.028    |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 147       |
|    iterations         | 19800     |
|    time_elapsed       | 670       |
|    total_timesteps    | 99000  

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 3.2e+05     |
|    total_cost           | 3.74e+05    |
|    total_reward         | -1.8e+05    |
|    total_reward_pct     | -36.1       |
|    total_trades         | 67477       |
| time/                   |             |
|    fps                  | 158         |
|    iterations           | 8           |
|    time_elapsed         | 103         |
|    total_timesteps      | 16384       |
| train/                  |             |
|    approx_kl            | 0.024424216 |
|    clip_fraction        | 0.281       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.9       |
|    explained_variance   | 0.00393     |
|    learning_rate        | 0.00025     |
|    loss                 | -0.142      |
|    n_updates            | 70          |
|    policy_gradient_loss | -0.0345     |
|    std                  | 1.01        |
|    value_loss           | 0.946 

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 7.07e+05    |
|    total_cost           | 3.42e+05    |
|    total_reward         | 2.07e+05    |
|    total_reward_pct     | 41.3        |
|    total_trades         | 65348       |
| time/                   |             |
|    fps                  | 161         |
|    iterations           | 16          |
|    time_elapsed         | 202         |
|    total_timesteps      | 32768       |
| train/                  |             |
|    approx_kl            | 0.026867568 |
|    clip_fraction        | 0.233       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.5       |
|    explained_variance   | 0.199       |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0795      |
|    n_updates            | 150         |
|    policy_gradient_loss | -0.0211     |
|    std                  | 1.03        |
|    value_loss           | 1.07  

day: 2516, episode: 60
begin_total_asset: 500000.00
end_total_asset: 490199.71
total_reward: -9800.29
total_cost: 342726.61
total_trades: 64501
Sharpe: 0.074
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 4.9e+05     |
|    total_cost           | 3.43e+05    |
|    total_reward         | -9.8e+03    |
|    total_reward_pct     | -1.96       |
|    total_trades         | 64501       |
| time/                   |             |
|    fps                  | 163         |
|    iterations           | 25          |
|    time_elapsed         | 312         |
|    total_timesteps      | 51200       |
| train/                  |             |
|    approx_kl            | 0.035490323 |
|    clip_fraction        | 0.291       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44         |
|    explained_variance   | 0.0868      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.133       |
| 

-----------------------------------------
| time/                   |             |
|    fps                  | 164         |
|    iterations           | 33          |
|    time_elapsed         | 409         |
|    total_timesteps      | 67584       |
| train/                  |             |
|    approx_kl            | 0.027280852 |
|    clip_fraction        | 0.253       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.6       |
|    explained_variance   | 0.432       |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0977      |
|    n_updates            | 320         |
|    policy_gradient_loss | -0.0115     |
|    std                  | 1.07        |
|    value_loss           | 1.27        |
-----------------------------------------
---------------------------------------
| environment/            |           |
|    portfolio_value      | 7.89e+05  |
|    total_cost           | 2.66e+05  |
|    total_reward         | 2.89e+05  |
| 

----------------------------------------
| environment/            |            |
|    portfolio_value      | 9.16e+05   |
|    total_cost           | 3e+05      |
|    total_reward         | 4.16e+05   |
|    total_reward_pct     | 83.1       |
|    total_trades         | 62611      |
| time/                   |            |
|    fps                  | 164        |
|    iterations           | 42         |
|    time_elapsed         | 522        |
|    total_timesteps      | 86016      |
| train/                  |            |
|    approx_kl            | 0.04240474 |
|    clip_fraction        | 0.354      |
|    clip_range           | 0.2        |
|    entropy_loss         | -45.1      |
|    explained_variance   | -0.0342    |
|    learning_rate        | 0.00025    |
|    loss                 | -0.131     |
|    n_updates            | 410        |
|    policy_gradient_loss | -0.027     |
|    std                  | 1.09       |
|    value_loss           | 1.24       |
----------------

----------------------------------
| environment/        |          |
|    portfolio_value  | 1.02e+06 |
|    total_cost       | 1.37e+03 |
|    total_reward     | 5.2e+05  |
|    total_reward_pct | 104      |
|    total_trades     | 29740    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 118      |
|    time_elapsed     | 85       |
|    total timesteps  | 10068    |
| train/              |          |
|    actor_loss       | -7.42    |
|    critic_loss      | 2.48     |
|    learning_rate    | 5e-06    |
|    n_updates        | 7551     |
----------------------------------
day: 2516, episode: 85
begin_total_asset: 500000.00
end_total_asset: 1103667.99
total_reward: 603667.99
total_cost: 1480.64
total_trades: 24279
Sharpe: 0.460
----------------------------------
| environment/        |          |
|    portfolio_value  | 1.04e+06 |
|    total_cost       | 948      |
|    total_reward     | 5.43e+05 |
|    total_reward_pct | 109      |
| 

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.62e+05    |
|    total_cost           | 4.26e+05    |
|    total_reward         | 6.24e+04    |
|    total_reward_pct     | 12.5        |
|    total_trades         | 70894       |
| time/                   |             |
|    fps                  | 166         |
|    iterations           | 6           |
|    time_elapsed         | 73          |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.014018025 |
|    clip_fraction        | 0.238       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.8       |
|    explained_variance   | -0.0564     |
|    learning_rate        | 0.00025     |
|    loss                 | 0.312       |
|    n_updates            | 50          |
|    policy_gradient_loss | -0.0418     |
|    std                  | 1.01        |
|    value_loss           | 1.99  

----------------------------------------
| environment/            |            |
|    portfolio_value      | 5.43e+05   |
|    total_cost           | 4.03e+05   |
|    total_reward         | 4.31e+04   |
|    total_reward_pct     | 8.61       |
|    total_trades         | 69820      |
| time/                   |            |
|    fps                  | 162        |
|    iterations           | 14         |
|    time_elapsed         | 176        |
|    total_timesteps      | 28672      |
| train/                  |            |
|    approx_kl            | 0.03183069 |
|    clip_fraction        | 0.372      |
|    clip_range           | 0.2        |
|    entropy_loss         | -43.4      |
|    explained_variance   | 0.0589     |
|    learning_rate        | 0.00025    |
|    loss                 | 0.0795     |
|    n_updates            | 130        |
|    policy_gradient_loss | -0.0403    |
|    std                  | 1.03       |
|    value_loss           | 1.17       |
----------------

----------------------------------------
| environment/            |            |
|    portfolio_value      | 4.9e+05    |
|    total_cost           | 3.58e+05   |
|    total_reward         | -1.02e+04  |
|    total_reward_pct     | -2.04      |
|    total_trades         | 67147      |
| time/                   |            |
|    fps                  | 161        |
|    iterations           | 23         |
|    time_elapsed         | 291        |
|    total_timesteps      | 47104      |
| train/                  |            |
|    approx_kl            | 0.05428914 |
|    clip_fraction        | 0.352      |
|    clip_range           | 0.2        |
|    entropy_loss         | -44.1      |
|    explained_variance   | -0.0896    |
|    learning_rate        | 0.00025    |
|    loss                 | 0.509      |
|    n_updates            | 220        |
|    policy_gradient_loss | -0.0335    |
|    std                  | 1.05       |
|    value_loss           | 2.02       |
----------------

day: 2579, episode: 25
begin_total_asset: 500000.00
end_total_asset: 532207.91
total_reward: 32207.91
total_cost: 318777.41
total_trades: 64550
Sharpe: 0.146
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.32e+05    |
|    total_cost           | 3.19e+05    |
|    total_reward         | 3.22e+04    |
|    total_reward_pct     | 6.44        |
|    total_trades         | 64550       |
| time/                   |             |
|    fps                  | 162         |
|    iterations           | 32          |
|    time_elapsed         | 403         |
|    total_timesteps      | 65536       |
| train/                  |             |
|    approx_kl            | 0.034775224 |
|    clip_fraction        | 0.272       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.9       |
|    explained_variance   | 0.143       |
|    learning_rate        | 0.00025     |
|    loss                 | 0.616       |
| 

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 4.73e+05    |
|    total_cost           | 2.77e+05    |
|    total_reward         | -2.68e+04   |
|    total_reward_pct     | -5.36       |
|    total_trades         | 62182       |
| time/                   |             |
|    fps                  | 162         |
|    iterations           | 40          |
|    time_elapsed         | 502         |
|    total_timesteps      | 81920       |
| train/                  |             |
|    approx_kl            | 0.028300587 |
|    clip_fraction        | 0.226       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.4       |
|    explained_variance   | 0.0313      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.827       |
|    n_updates            | 390         |
|    policy_gradient_loss | -0.0213     |
|    std                  | 1.1         |
|    value_loss           | 3.03  

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 6.65e+05    |
|    total_cost           | 2.76e+05    |
|    total_reward         | 1.65e+05    |
|    total_reward_pct     | 33.1        |
|    total_trades         | 61666       |
| time/                   |             |
|    fps                  | 162         |
|    iterations           | 48          |
|    time_elapsed         | 603         |
|    total_timesteps      | 98304       |
| train/                  |             |
|    approx_kl            | 0.041971773 |
|    clip_fraction        | 0.437       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.8       |
|    explained_variance   | 0.0729      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.225       |
|    n_updates            | 470         |
|    policy_gradient_loss | -0.0171     |
|    std                  | 1.11        |
|    value_loss           | 1.76  

------------------------------------
| time/                 |          |
|    fps                | 150      |
|    iterations         | 1200     |
|    time_elapsed       | 39       |
|    total_timesteps    | 6000     |
| train/                |          |
|    entropy_loss       | -42.8    |
|    explained_variance | 0.0582   |
|    learning_rate      | 0.0005   |
|    n_updates          | 1199     |
|    policy_loss        | -3.1     |
|    std                | 1.01     |
|    value_loss         | 0.102    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 150      |
|    iterations         | 1300     |
|    time_elapsed       | 43       |
|    total_timesteps    | 6500     |
| train/                |          |
|    entropy_loss       | -42.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1299     |
|    policy_loss        | 1.78     |
|

day: 2579, episode: 5
begin_total_asset: 500000.00
end_total_asset: 1189949.37
total_reward: 689949.37
total_cost: 8000.54
total_trades: 42296
Sharpe: 0.520
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.19e+06 |
|    total_cost         | 8e+03    |
|    total_reward       | 6.9e+05  |
|    total_reward_pct   | 138      |
|    total_trades       | 42296    |
| time/                 |          |
|    fps                | 152      |
|    iterations         | 2600     |
|    time_elapsed       | 85       |
|    total_timesteps    | 13000    |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0.0637   |
|    learning_rate      | 0.0005   |
|    n_updates          | 2599     |
|    policy_loss        | -20.3    |
|    std                | 1.03     |
|    value_loss         | 0.451    |
------------------------------------
------------------------------------
| time/                 |    

------------------------------------
| time/                 |          |
|    fps                | 152      |
|    iterations         | 4000     |
|    time_elapsed       | 131      |
|    total_timesteps    | 20000    |
| train/                |          |
|    entropy_loss       | -43.9    |
|    explained_variance | -0.497   |
|    learning_rate      | 0.0005   |
|    n_updates          | 3999     |
|    policy_loss        | 37.1     |
|    std                | 1.05     |
|    value_loss         | 0.8      |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 152      |
|    iterations         | 4100     |
|    time_elapsed       | 134      |
|    total_timesteps    | 20500    |
| train/                |          |
|    entropy_loss       | -44      |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 4099     |
|    policy_loss        | -69.2    |
|

------------------------------------
| time/                 |          |
|    fps                | 152      |
|    iterations         | 5400     |
|    time_elapsed       | 177      |
|    total_timesteps    | 27000    |
| train/                |          |
|    entropy_loss       | -44.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5399     |
|    policy_loss        | 11.1     |
|    std                | 1.07     |
|    value_loss         | 0.105    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 152      |
|    iterations         | 5500     |
|    time_elapsed       | 180      |
|    total_timesteps    | 27500    |
| train/                |          |
|    entropy_loss       | -44.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5499     |
|    policy_loss        | -71.3    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.1e+06  |
|    total_cost         | 1.05e+04 |
|    total_reward       | 6.01e+05 |
|    total_reward_pct   | 120      |
|    total_trades       | 36906    |
| time/                 |          |
|    fps                | 151      |
|    iterations         | 6800     |
|    time_elapsed       | 223      |
|    total_timesteps    | 34000    |
| train/                |          |
|    entropy_loss       | -45      |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 6799     |
|    policy_loss        | -9.93    |
|    std                | 1.09     |
|    value_loss         | 0.863    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 151      |
|    iterations         | 6900     |
|    time_elapsed       | 227      |
|    total_timesteps    | 34500    |
|

------------------------------------
| time/                 |          |
|    fps                | 150      |
|    iterations         | 8200     |
|    time_elapsed       | 272      |
|    total_timesteps    | 41000    |
| train/                |          |
|    entropy_loss       | -45.4    |
|    explained_variance | 1.43e-06 |
|    learning_rate      | 0.0005   |
|    n_updates          | 8199     |
|    policy_loss        | 45.2     |
|    std                | 1.1      |
|    value_loss         | 1.44     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 9.13e+05 |
|    total_cost         | 2.16e+04 |
|    total_reward       | 4.13e+05 |
|    total_reward_pct   | 82.7     |
|    total_trades       | 41681    |
| time/                 |          |
|    fps                | 150      |
|    iterations         | 8300     |
|    time_elapsed       | 276      |
|    total_timesteps    | 41500    |
|

------------------------------------
| time/                 |          |
|    fps                | 150      |
|    iterations         | 9600     |
|    time_elapsed       | 319      |
|    total_timesteps    | 48000    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | 8.34e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 9599     |
|    policy_loss        | -31.6    |
|    std                | 1.12     |
|    value_loss         | 0.633    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 150      |
|    iterations         | 9700     |
|    time_elapsed       | 323      |
|    total_timesteps    | 48500    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 9699     |
|    policy_loss        | -49      |
|

------------------------------------
| time/                 |          |
|    fps                | 149      |
|    iterations         | 11000    |
|    time_elapsed       | 367      |
|    total_timesteps    | 55000    |
| train/                |          |
|    entropy_loss       | -46.1    |
|    explained_variance | 0.0888   |
|    learning_rate      | 0.0005   |
|    n_updates          | 10999    |
|    policy_loss        | -58.1    |
|    std                | 1.13     |
|    value_loss         | 2.11     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 149      |
|    iterations         | 11100    |
|    time_elapsed       | 371      |
|    total_timesteps    | 55500    |
| train/                |          |
|    entropy_loss       | -46.2    |
|    explained_variance | -0.0763  |
|    learning_rate      | 0.0005   |
|    n_updates          | 11099    |
|    policy_loss        | 6.55     |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.03e+06 |
|    total_cost         | 6.88e+03 |
|    total_reward       | 5.26e+05 |
|    total_reward_pct   | 105      |
|    total_trades       | 35818    |
| time/                 |          |
|    fps                | 148      |
|    iterations         | 12400    |
|    time_elapsed       | 417      |
|    total_timesteps    | 62000    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | -0.0997  |
|    learning_rate      | 0.0005   |
|    n_updates          | 12399    |
|    policy_loss        | 38.2     |
|    std                | 1.14     |
|    value_loss         | 0.856    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 12500    |
|    time_elapsed       | 421      |
|    total_timesteps    | 62500    |
|

------------------------------------
| time/                 |          |
|    fps                | 147      |
|    iterations         | 13800    |
|    time_elapsed       | 468      |
|    total_timesteps    | 69000    |
| train/                |          |
|    entropy_loss       | -47.1    |
|    explained_variance | 0.533    |
|    learning_rate      | 0.0005   |
|    n_updates          | 13799    |
|    policy_loss        | -30.1    |
|    std                | 1.17     |
|    value_loss         | 0.412    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 147      |
|    iterations         | 13900    |
|    time_elapsed       | 471      |
|    total_timesteps    | 69500    |
| train/                |          |
|    entropy_loss       | -47.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 13899    |
|    policy_loss        | -59.2    |
|

------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 15200    |
|    time_elapsed       | 518      |
|    total_timesteps    | 76000    |
| train/                |          |
|    entropy_loss       | -47.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 15199    |
|    policy_loss        | 72.3     |
|    std                | 1.19     |
|    value_loss         | 2.72     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 15300    |
|    time_elapsed       | 521      |
|    total_timesteps    | 76500    |
| train/                |          |
|    entropy_loss       | -47.7    |
|    explained_variance | -0.3     |
|    learning_rate      | 0.0005   |
|    n_updates          | 15299    |
|    policy_loss        | -7.43    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.44e+06 |
|    total_cost         | 1.17e+04 |
|    total_reward       | 9.4e+05  |
|    total_reward_pct   | 188      |
|    total_trades       | 35661    |
| time/                 |          |
|    fps                | 145      |
|    iterations         | 16600    |
|    time_elapsed       | 568      |
|    total_timesteps    | 83000    |
| train/                |          |
|    entropy_loss       | -48      |
|    explained_variance | -0.034   |
|    learning_rate      | 0.0005   |
|    n_updates          | 16599    |
|    policy_loss        | 3.61     |
|    std                | 1.2      |
|    value_loss         | 0.0295   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 145      |
|    iterations         | 16700    |
|    time_elapsed       | 572      |
|    total_timesteps    | 83500    |
|

------------------------------------
| time/                 |          |
|    fps                | 145      |
|    iterations         | 18000    |
|    time_elapsed       | 618      |
|    total_timesteps    | 90000    |
| train/                |          |
|    entropy_loss       | -48.5    |
|    explained_variance | 3.45e-05 |
|    learning_rate      | 0.0005   |
|    n_updates          | 17999    |
|    policy_loss        | 6.69     |
|    std                | 1.22     |
|    value_loss         | 0.822    |
------------------------------------
day: 2579, episode: 35
begin_total_asset: 500000.00
end_total_asset: 1090388.01
total_reward: 590388.01
total_cost: 41893.39
total_trades: 37870
Sharpe: 0.507
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.09e+06 |
|    total_cost         | 4.19e+04 |
|    total_reward       | 5.9e+05  |
|    total_reward_pct   | 118      |
|    total_trades       | 37870    |
| time/                 |  

------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 19400    |
|    time_elapsed       | 669      |
|    total_timesteps    | 97000    |
| train/                |          |
|    entropy_loss       | -49      |
|    explained_variance | 0.0544   |
|    learning_rate      | 0.0005   |
|    n_updates          | 19399    |
|    policy_loss        | -9.78    |
|    std                | 1.24     |
|    value_loss         | 0.0992   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 19500    |
|    time_elapsed       | 672      |
|    total_timesteps    | 97500    |
| train/                |          |
|    entropy_loss       | -49      |
|    explained_variance | 0.108    |
|    learning_rate      | 0.0005   |
|    n_updates          | 19499    |
|    policy_loss        | -126     |
|

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 4e+05       |
|    total_cost           | 4.15e+05    |
|    total_reward         | -1e+05      |
|    total_reward_pct     | -20.1       |
|    total_trades         | 70172       |
| time/                   |             |
|    fps                  | 154         |
|    iterations           | 6           |
|    time_elapsed         | 79          |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.016210493 |
|    clip_fraction        | 0.264       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.8       |
|    explained_variance   | 0.0621      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0281      |
|    n_updates            | 50          |
|    policy_gradient_loss | -0.0396     |
|    std                  | 1.01        |
|    value_loss           | 1.04  

day: 2579, episode: 50
begin_total_asset: 500000.00
end_total_asset: 421846.45
total_reward: -78153.55
total_cost: 389771.32
total_trades: 67886
Sharpe: 0.023
----------------------------------------
| environment/            |            |
|    portfolio_value      | 4.22e+05   |
|    total_cost           | 3.9e+05    |
|    total_reward         | -7.82e+04  |
|    total_reward_pct     | -15.6      |
|    total_trades         | 67886      |
| time/                   |            |
|    fps                  | 152        |
|    iterations           | 14         |
|    time_elapsed         | 188        |
|    total_timesteps      | 28672      |
| train/                  |            |
|    approx_kl            | 0.04037021 |
|    clip_fraction        | 0.305      |
|    clip_range           | 0.2        |
|    entropy_loss         | -43.4      |
|    explained_variance   | -0.093     |
|    learning_rate        | 0.00025    |
|    loss                 | 0.102      |
|    n_updates       

----------------------------------------
| environment/            |            |
|    portfolio_value      | 6.45e+05   |
|    total_cost           | 4.01e+05   |
|    total_reward         | 1.45e+05   |
|    total_reward_pct     | 29         |
|    total_trades         | 68585      |
| time/                   |            |
|    fps                  | 151        |
|    iterations           | 23         |
|    time_elapsed         | 310        |
|    total_timesteps      | 47104      |
| train/                  |            |
|    approx_kl            | 0.03718037 |
|    clip_fraction        | 0.294      |
|    clip_range           | 0.2        |
|    entropy_loss         | -43.9      |
|    explained_variance   | 0.0571     |
|    learning_rate        | 0.00025    |
|    loss                 | 0.302      |
|    n_updates            | 220        |
|    policy_gradient_loss | -0.029     |
|    std                  | 1.05       |
|    value_loss           | 1.92       |
----------------

----------------------------------------
| environment/            |            |
|    portfolio_value      | 8.41e+05   |
|    total_cost           | 3.45e+05   |
|    total_reward         | 3.41e+05   |
|    total_reward_pct     | 68.2       |
|    total_trades         | 66461      |
| time/                   |            |
|    fps                  | 152        |
|    iterations           | 32         |
|    time_elapsed         | 430        |
|    total_timesteps      | 65536      |
| train/                  |            |
|    approx_kl            | 0.03745857 |
|    clip_fraction        | 0.309      |
|    clip_range           | 0.2        |
|    entropy_loss         | -44.5      |
|    explained_variance   | 0.192      |
|    learning_rate        | 0.00025    |
|    loss                 | 0.503      |
|    n_updates            | 310        |
|    policy_gradient_loss | -0.0224    |
|    std                  | 1.07       |
|    value_loss           | 2.3        |
----------------

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 1.11e+06    |
|    total_cost           | 3.05e+05    |
|    total_reward         | 6.11e+05    |
|    total_reward_pct     | 122         |
|    total_trades         | 63888       |
| time/                   |             |
|    fps                  | 153         |
|    iterations           | 41          |
|    time_elapsed         | 546         |
|    total_timesteps      | 83968       |
| train/                  |             |
|    approx_kl            | 0.035693597 |
|    clip_fraction        | 0.354       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.9       |
|    explained_variance   | -0.0639     |
|    learning_rate        | 0.00025     |
|    loss                 | 0.616       |
|    n_updates            | 400         |
|    policy_gradient_loss | -0.0127     |
|    std                  | 1.08        |
|    value_loss           | 2.77  

-----------------------------------------
| time/                   |             |
|    fps                  | 154         |
|    iterations           | 49          |
|    time_elapsed         | 647         |
|    total_timesteps      | 100352      |
| train/                  |             |
|    approx_kl            | 0.040811047 |
|    clip_fraction        | 0.354       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.4       |
|    explained_variance   | 0.0454      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.756       |
|    n_updates            | 480         |
|    policy_gradient_loss | 0.000119    |
|    std                  | 1.1         |
|    value_loss           | 2.87        |
-----------------------------------------
PPO Sharpe Ratio:  -0.10807044224389023
{'action_noise': OrnsteinUhlenbeckActionNoise(mu=[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.], sigma=[0.1 0.1 0.1 

----------------------------------
| environment/        |          |
|    portfolio_value  | 1.1e+06  |
|    total_cost       | 2.41e+03 |
|    total_reward     | 6.02e+05 |
|    total_reward_pct | 120      |
|    total_trades     | 20607    |
| time/               |          |
|    episodes         | 16       |
|    fps              | 101      |
|    time_elapsed     | 416      |
|    total timesteps  | 42288    |
| train/              |          |
|    actor_loss       | -16.9    |
|    critic_loss      | 3.27     |
|    learning_rate    | 5e-06    |
|    n_updates        | 39645    |
----------------------------------
12.081121283288907
turbulence_threshold:  458.4056541260132
{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0005}
Using cuda device
Logging to tensorboard_log/a2c/a2c_252_5
------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 100      |
|    time_elapsed       | 3        |
|    tot

-------------------------------------
| time/                 |           |
|    fps                | 143       |
|    iterations         | 1400      |
|    time_elapsed       | 48        |
|    total_timesteps    | 7000      |
| train/                |           |
|    entropy_loss       | -43.2     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 1399      |
|    policy_loss        | -3.54     |
|    std                | 1.02      |
|    value_loss         | 0.0708    |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 1500     |
|    time_elapsed       | 52       |
|    total_timesteps    | 7500     |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1499     |
|    policy_loss       

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 2800     |
|    time_elapsed       | 99       |
|    total_timesteps    | 14000    |
| train/                |          |
|    entropy_loss       | -43.8    |
|    explained_variance | -0.0172  |
|    learning_rate      | 0.0005   |
|    n_updates          | 2799     |
|    policy_loss        | 32.4     |
|    std                | 1.04     |
|    value_loss         | 0.763    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 2900     |
|    time_elapsed       | 103      |
|    total_timesteps    | 14500    |
| train/                |          |
|    entropy_loss       | -43.8    |
|    explained_variance | 0.395    |
|    learning_rate      | 0.0005   |
|    n_updates          | 2899     |
|    policy_loss        | 0.0279   |
|

-------------------------------------
| time/                 |           |
|    fps                | 140       |
|    iterations         | 4200      |
|    time_elapsed       | 149       |
|    total_timesteps    | 21000     |
| train/                |           |
|    entropy_loss       | -44.2     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 4199      |
|    policy_loss        | 77        |
|    std                | 1.06      |
|    value_loss         | 4.42      |
-------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.28e+06 |
|    total_cost         | 4.78e+04 |
|    total_reward       | 7.84e+05 |
|    total_reward_pct   | 157      |
|    total_trades       | 47596    |
| time/                 |          |
|    fps                | 140      |
|    iterations         | 4300     |
|    time_elapsed       | 153      |
|    total_timesteps   

------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 5600     |
|    time_elapsed       | 198      |
|    total_timesteps    | 28000    |
| train/                |          |
|    entropy_loss       | -44.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5599     |
|    policy_loss        | -19.9    |
|    std                | 1.07     |
|    value_loss         | 0.348    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 5700     |
|    time_elapsed       | 201      |
|    total_timesteps    | 28500    |
| train/                |          |
|    entropy_loss       | -44.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5699     |
|    policy_loss        | -21.7    |
|

-------------------------------------
| time/                 |           |
|    fps                | 142       |
|    iterations         | 7000      |
|    time_elapsed       | 246       |
|    total_timesteps    | 35000     |
| train/                |           |
|    entropy_loss       | -44.9     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 6999      |
|    policy_loss        | -100      |
|    std                | 1.08      |
|    value_loss         | 6.21      |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 142       |
|    iterations         | 7100      |
|    time_elapsed       | 249       |
|    total_timesteps    | 35500     |
| train/                |           |
|    entropy_loss       | -45       |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 7099      |
|    policy_

------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 8400     |
|    time_elapsed       | 294      |
|    total_timesteps    | 42000    |
| train/                |          |
|    entropy_loss       | -45.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 8399     |
|    policy_loss        | -55.6    |
|    std                | 1.1      |
|    value_loss         | 2.77     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.34e+06 |
|    total_cost         | 6.34e+04 |
|    total_reward       | 8.4e+05  |
|    total_reward_pct   | 168      |
|    total_trades       | 46213    |
| time/                 |          |
|    fps                | 142      |
|    iterations         | 8500     |
|    time_elapsed       | 298      |
|    total_timesteps    | 42500    |
|

------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 9800     |
|    time_elapsed       | 342      |
|    total_timesteps    | 49000    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 9799     |
|    policy_loss        | 83.8     |
|    std                | 1.11     |
|    value_loss         | 5.08     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 9900     |
|    time_elapsed       | 346      |
|    total_timesteps    | 49500    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | 0.123    |
|    learning_rate      | 0.0005   |
|    n_updates          | 9899     |
|    policy_loss        | -56.5    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.46e+06 |
|    total_cost         | 1.04e+04 |
|    total_reward       | 9.61e+05 |
|    total_reward_pct   | 192      |
|    total_trades       | 42849    |
| time/                 |          |
|    fps                | 141      |
|    iterations         | 11200    |
|    time_elapsed       | 394      |
|    total_timesteps    | 56000    |
| train/                |          |
|    entropy_loss       | -46.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 11199    |
|    policy_loss        | -66.2    |
|    std                | 1.13     |
|    value_loss         | 2.34     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 11300    |
|    time_elapsed       | 398      |
|    total_timesteps    | 56500    |
|

------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 12600    |
|    time_elapsed       | 446      |
|    total_timesteps    | 63000    |
| train/                |          |
|    entropy_loss       | -46.6    |
|    explained_variance | 0.00105  |
|    learning_rate      | 0.0005   |
|    n_updates          | 12599    |
|    policy_loss        | 37.1     |
|    std                | 1.15     |
|    value_loss         | 0.907    |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.07e+06 |
|    total_cost         | 3.82e+04 |
|    total_reward       | 5.71e+05 |
|    total_reward_pct   | 114      |
|    total_trades       | 43387    |
| time/                 |          |
|    fps                | 141      |
|    iterations         | 12700    |
|    time_elapsed       | 449      |
|    total_timesteps    | 63500    |
|

------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 14000    |
|    time_elapsed       | 494      |
|    total_timesteps    | 70000    |
| train/                |          |
|    entropy_loss       | -47      |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 13999    |
|    policy_loss        | -26.5    |
|    std                | 1.16     |
|    value_loss         | 0.377    |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 141       |
|    iterations         | 14100     |
|    time_elapsed       | 497       |
|    total_timesteps    | 70500     |
| train/                |           |
|    entropy_loss       | -47       |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 14099     |
|    policy_loss        | -

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.28e+06 |
|    total_cost         | 2.05e+04 |
|    total_reward       | 7.83e+05 |
|    total_reward_pct   | 157      |
|    total_trades       | 40846    |
| time/                 |          |
|    fps                | 142      |
|    iterations         | 15400    |
|    time_elapsed       | 541      |
|    total_timesteps    | 77000    |
| train/                |          |
|    entropy_loss       | -47.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 15399    |
|    policy_loss        | 4.84     |
|    std                | 1.18     |
|    value_loss         | 0.168    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 15500    |
|    time_elapsed       | 545      |
|    total_timesteps    | 77500    |
|

------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 16800    |
|    time_elapsed       | 589      |
|    total_timesteps    | 84000    |
| train/                |          |
|    entropy_loss       | -48      |
|    explained_variance | 0.211    |
|    learning_rate      | 0.0005   |
|    n_updates          | 16799    |
|    policy_loss        | -63.1    |
|    std                | 1.2      |
|    value_loss         | 1.68     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 16900    |
|    time_elapsed       | 592      |
|    total_timesteps    | 84500    |
| train/                |          |
|    entropy_loss       | -48      |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 16899    |
|    policy_loss        | -24.1    |
|

-------------------------------------
| time/                 |           |
|    fps                | 142       |
|    iterations         | 18200     |
|    time_elapsed       | 637       |
|    total_timesteps    | 91000     |
| train/                |           |
|    entropy_loss       | -48.3     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 18199     |
|    policy_loss        | -0.429    |
|    std                | 1.21      |
|    value_loss         | 0.396     |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 18300    |
|    time_elapsed       | 640      |
|    total_timesteps    | 91500    |
| train/                |          |
|    entropy_loss       | -48.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 18299    |
|    policy_loss       

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.32e+06 |
|    total_cost         | 2.98e+04 |
|    total_reward       | 8.24e+05 |
|    total_reward_pct   | 165      |
|    total_trades       | 43872    |
| time/                 |          |
|    fps                | 143      |
|    iterations         | 19600    |
|    time_elapsed       | 682      |
|    total_timesteps    | 98000    |
| train/                |          |
|    entropy_loss       | -48.7    |
|    explained_variance | 0.0434   |
|    learning_rate      | 0.0005   |
|    n_updates          | 19599    |
|    policy_loss        | 32       |
|    std                | 1.23     |
|    value_loss         | 1.04     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 19700    |
|    time_elapsed       | 685      |
|    total_timesteps    | 98500    |
|

----------------------------------------
| environment/            |            |
|    portfolio_value      | 5.42e+05   |
|    total_cost           | 3.98e+05   |
|    total_reward         | 4.24e+04   |
|    total_reward_pct     | 8.47       |
|    total_trades         | 71088      |
| time/                   |            |
|    fps                  | 154        |
|    iterations           | 7          |
|    time_elapsed         | 92         |
|    total_timesteps      | 14336      |
| train/                  |            |
|    approx_kl            | 0.02972842 |
|    clip_fraction        | 0.279      |
|    clip_range           | 0.2        |
|    entropy_loss         | -42.9      |
|    explained_variance   | 0.0224     |
|    learning_rate        | 0.00025    |
|    loss                 | 0.042      |
|    n_updates            | 60         |
|    policy_gradient_loss | -0.0431    |
|    std                  | 1.01       |
|    value_loss           | 1.34       |
----------------

day: 2642, episode: 50
begin_total_asset: 500000.00
end_total_asset: 623907.35
total_reward: 123907.35
total_cost: 378500.26
total_trades: 69631
Sharpe: 0.205
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 6.24e+05    |
|    total_cost           | 3.79e+05    |
|    total_reward         | 1.24e+05    |
|    total_reward_pct     | 24.8        |
|    total_trades         | 69631       |
| time/                   |             |
|    fps                  | 153         |
|    iterations           | 16          |
|    time_elapsed         | 213         |
|    total_timesteps      | 32768       |
| train/                  |             |
|    approx_kl            | 0.034854576 |
|    clip_fraction        | 0.29        |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.4       |
|    explained_variance   | -0.119      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.576       |
|

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 1.02e+06    |
|    total_cost           | 3.49e+05    |
|    total_reward         | 5.24e+05    |
|    total_reward_pct     | 105         |
|    total_trades         | 67957       |
| time/                   |             |
|    fps                  | 153         |
|    iterations           | 24          |
|    time_elapsed         | 320         |
|    total_timesteps      | 49152       |
| train/                  |             |
|    approx_kl            | 0.024662403 |
|    clip_fraction        | 0.246       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44         |
|    explained_variance   | -0.0195     |
|    learning_rate        | 0.00025     |
|    loss                 | 0.579       |
|    n_updates            | 230         |
|    policy_gradient_loss | -0.0218     |
|    std                  | 1.05        |
|    value_loss           | 2.22  

-----------------------------------------
| time/                   |             |
|    fps                  | 153         |
|    iterations           | 32          |
|    time_elapsed         | 426         |
|    total_timesteps      | 65536       |
| train/                  |             |
|    approx_kl            | 0.019045807 |
|    clip_fraction        | 0.25        |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.5       |
|    explained_variance   | 0.055       |
|    learning_rate        | 0.00025     |
|    loss                 | 0.775       |
|    n_updates            | 310         |
|    policy_gradient_loss | -0.0181     |
|    std                  | 1.07        |
|    value_loss           | 3.1         |
-----------------------------------------
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 7.48e+05    |
|    total_cost           | 3.84e+05    |
|    total_reward         | 2.48e+

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 1.07e+06    |
|    total_cost           | 3.16e+05    |
|    total_reward         | 5.67e+05    |
|    total_reward_pct     | 113         |
|    total_trades         | 66948       |
| time/                   |             |
|    fps                  | 152         |
|    iterations           | 41          |
|    time_elapsed         | 549         |
|    total_timesteps      | 83968       |
| train/                  |             |
|    approx_kl            | 0.029193316 |
|    clip_fraction        | 0.268       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45         |
|    explained_variance   | 0.129       |
|    learning_rate        | 0.00025     |
|    loss                 | 0.956       |
|    n_updates            | 400         |
|    policy_gradient_loss | -0.0221     |
|    std                  | 1.09        |
|    value_loss           | 3.37  

----------------------------------------
| time/                   |            |
|    fps                  | 153        |
|    iterations           | 49         |
|    time_elapsed         | 655        |
|    total_timesteps      | 100352     |
| train/                  |            |
|    approx_kl            | 0.05424186 |
|    clip_fraction        | 0.371      |
|    clip_range           | 0.2        |
|    entropy_loss         | -45.6      |
|    explained_variance   | 0.00248    |
|    learning_rate        | 0.00025    |
|    loss                 | 0.482      |
|    n_updates            | 480        |
|    policy_gradient_loss | -0.019     |
|    std                  | 1.11       |
|    value_loss           | 2.08       |
----------------------------------------
PPO Sharpe Ratio:  -0.2645732183357934
{'action_noise': OrnsteinUhlenbeckActionNoise(mu=[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.], sigma=[0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 

------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 700      |
|    time_elapsed       | 23       |
|    total_timesteps    | 3500     |
| train/                |          |
|    entropy_loss       | -42.8    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 699      |
|    policy_loss        | -145     |
|    std                | 1.01     |
|    value_loss         | 12.9     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 800      |
|    time_elapsed       | 27       |
|    total_timesteps    | 4000     |
| train/                |          |
|    entropy_loss       | -42.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 799      |
|    policy_loss        | -18      |
|

------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 2100     |
|    time_elapsed       | 70       |
|    total_timesteps    | 10500    |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 2099     |
|    policy_loss        | -62.1    |
|    std                | 1.02     |
|    value_loss         | 2.17     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.58e+06 |
|    total_cost         | 1.02e+05 |
|    total_reward       | 1.08e+06 |
|    total_reward_pct   | 215      |
|    total_trades       | 50411    |
| time/                 |          |
|    fps                | 148      |
|    iterations         | 2200     |
|    time_elapsed       | 73       |
|    total_timesteps    | 11000    |
|

-------------------------------------
| time/                 |           |
|    fps                | 149       |
|    iterations         | 3500      |
|    time_elapsed       | 117       |
|    total_timesteps    | 17500     |
| train/                |           |
|    entropy_loss       | -43.6     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 3499      |
|    policy_loss        | -7.39     |
|    std                | 1.04      |
|    value_loss         | 0.0428    |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 3600     |
|    time_elapsed       | 120      |
|    total_timesteps    | 18000    |
| train/                |          |
|    entropy_loss       | -43.6    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 3599     |
|    policy_loss       

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.47e+06 |
|    total_cost         | 1.49e+04 |
|    total_reward       | 9.68e+05 |
|    total_reward_pct   | 194      |
|    total_trades       | 30399    |
| time/                 |          |
|    fps                | 146      |
|    iterations         | 4900     |
|    time_elapsed       | 166      |
|    total_timesteps    | 24500    |
| train/                |          |
|    entropy_loss       | -44.2    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 4899     |
|    policy_loss        | 5.52     |
|    std                | 1.06     |
|    value_loss         | 0.0488   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 5000     |
|    time_elapsed       | 170      |
|    total_timesteps    | 25000    |
|

------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 6300     |
|    time_elapsed       | 215      |
|    total_timesteps    | 31500    |
| train/                |          |
|    entropy_loss       | -44.5    |
|    explained_variance | 0.00494  |
|    learning_rate      | 0.0005   |
|    n_updates          | 6299     |
|    policy_loss        | -60.9    |
|    std                | 1.07     |
|    value_loss         | 2.22     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 6400     |
|    time_elapsed       | 218      |
|    total_timesteps    | 32000    |
| train/                |          |
|    entropy_loss       | -44.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 6399     |
|    policy_loss        | -16.4    |
|

------------------------------------
| time/                 |          |
|    fps                | 145      |
|    iterations         | 7700     |
|    time_elapsed       | 263      |
|    total_timesteps    | 38500    |
| train/                |          |
|    entropy_loss       | -44.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 7699     |
|    policy_loss        | -60.7    |
|    std                | 1.08     |
|    value_loss         | 2.21     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 145      |
|    iterations         | 7800     |
|    time_elapsed       | 267      |
|    total_timesteps    | 39000    |
| train/                |          |
|    entropy_loss       | -45      |
|    explained_variance | -0.107   |
|    learning_rate      | 0.0005   |
|    n_updates          | 7799     |
|    policy_loss        | 57.3     |
|

------------------------------------
| time/                 |          |
|    fps                | 145      |
|    iterations         | 9100     |
|    time_elapsed       | 313      |
|    total_timesteps    | 45500    |
| train/                |          |
|    entropy_loss       | -45.1    |
|    explained_variance | -0.204   |
|    learning_rate      | 0.0005   |
|    n_updates          | 9099     |
|    policy_loss        | -39.3    |
|    std                | 1.09     |
|    value_loss         | 1.44     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 145      |
|    iterations         | 9200     |
|    time_elapsed       | 316      |
|    total_timesteps    | 46000    |
| train/                |          |
|    entropy_loss       | -45.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 9199     |
|    policy_loss        | 28.7     |
|

------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 10500    |
|    time_elapsed       | 362      |
|    total_timesteps    | 52500    |
| train/                |          |
|    entropy_loss       | -45.7    |
|    explained_variance | 0.000979 |
|    learning_rate      | 0.0005   |
|    n_updates          | 10499    |
|    policy_loss        | -17.8    |
|    std                | 1.11     |
|    value_loss         | 0.164    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 10600    |
|    time_elapsed       | 365      |
|    total_timesteps    | 53000    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | -0.113   |
|    learning_rate      | 0.0005   |
|    n_updates          | 10599    |
|    policy_loss        | -5.69    |
|

------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 11900    |
|    time_elapsed       | 410      |
|    total_timesteps    | 59500    |
| train/                |          |
|    entropy_loss       | -46.1    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 11899    |
|    policy_loss        | -30.6    |
|    std                | 1.12     |
|    value_loss         | 1.25     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.36e+06 |
|    total_cost         | 6.55e+04 |
|    total_reward       | 8.6e+05  |
|    total_reward_pct   | 172      |
|    total_trades       | 47204    |
| time/                 |          |
|    fps                | 144      |
|    iterations         | 12000    |
|    time_elapsed       | 414      |
|    total_timesteps    | 60000    |
|

------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 13300    |
|    time_elapsed       | 460      |
|    total_timesteps    | 66500    |
| train/                |          |
|    entropy_loss       | -46.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 13299    |
|    policy_loss        | 3.2      |
|    std                | 1.15     |
|    value_loss         | 0.0255   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 13400    |
|    time_elapsed       | 464      |
|    total_timesteps    | 67000    |
| train/                |          |
|    entropy_loss       | -46.9    |
|    explained_variance | -0.14    |
|    learning_rate      | 0.0005   |
|    n_updates          | 13399    |
|    policy_loss        | 17.8     |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 9.85e+05 |
|    total_cost         | 7.19e+03 |
|    total_reward       | 4.85e+05 |
|    total_reward_pct   | 97       |
|    total_trades       | 38699    |
| time/                 |          |
|    fps                | 144      |
|    iterations         | 14700    |
|    time_elapsed       | 509      |
|    total_timesteps    | 73500    |
| train/                |          |
|    entropy_loss       | -47.4    |
|    explained_variance | 0.125    |
|    learning_rate      | 0.0005   |
|    n_updates          | 14699    |
|    policy_loss        | 6.21     |
|    std                | 1.18     |
|    value_loss         | 0.454    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 144      |
|    iterations         | 14800    |
|    time_elapsed       | 512      |
|    total_timesteps    | 74000    |
|

------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 16100    |
|    time_elapsed       | 559      |
|    total_timesteps    | 80500    |
| train/                |          |
|    entropy_loss       | -48.1    |
|    explained_variance | -0.00377 |
|    learning_rate      | 0.0005   |
|    n_updates          | 16099    |
|    policy_loss        | 74.5     |
|    std                | 1.2      |
|    value_loss         | 3.21     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 16200    |
|    time_elapsed       | 563      |
|    total_timesteps    | 81000    |
| train/                |          |
|    entropy_loss       | -48.1    |
|    explained_variance | -0.144   |
|    learning_rate      | 0.0005   |
|    n_updates          | 16199    |
|    policy_loss        | -87.4    |
|

------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 17500    |
|    time_elapsed       | 608      |
|    total_timesteps    | 87500    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 17499    |
|    policy_loss        | -31.8    |
|    std                | 1.22     |
|    value_loss         | 0.483    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 17600    |
|    time_elapsed       | 612      |
|    total_timesteps    | 88000    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 17599    |
|    policy_loss        | -115     |
|

------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 18900    |
|    time_elapsed       | 657      |
|    total_timesteps    | 94500    |
| train/                |          |
|    entropy_loss       | -49      |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 18899    |
|    policy_loss        | 42.8     |
|    std                | 1.24     |
|    value_loss         | 1.24     |
------------------------------------
day: 2705, episode: 35
begin_total_asset: 500000.00
end_total_asset: 1242745.58
total_reward: 742745.58
total_cost: 7946.11
total_trades: 41978
Sharpe: 0.552
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.24e+06 |
|    total_cost         | 7.95e+03 |
|    total_reward       | 7.43e+05 |
|    total_reward_pct   | 149      |
|    total_trades       | 41978    |
| time/                 |   

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 200      |
|    time_elapsed       | 7        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -42.5    |
|    explained_variance | 0.0186   |
|    learning_rate      | 0.0005   |
|    n_updates          | 199      |
|    policy_loss        | -3.98    |
|    std                | 0.998    |
|    value_loss         | 0.1      |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 300      |
|    time_elapsed       | 10       |
|    total_timesteps    | 1500     |
| train/                |          |
|    entropy_loss       | -42.6    |
|    explained_variance | 0.695    |
|    learning_rate      | 0.0005   |
|    n_updates          | 299      |
|    policy_loss        | -40.7    |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 1600     |
|    time_elapsed       | 57       |
|    total_timesteps    | 8000     |
| train/                |          |
|    entropy_loss       | -43.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1599     |
|    policy_loss        | -17.9    |
|    std                | 1.02     |
|    value_loss         | 0.464    |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 7.17e+05 |
|    total_cost         | 2.05e+05 |
|    total_reward       | 2.17e+05 |
|    total_reward_pct   | 43.4     |
|    total_trades       | 59812    |
| time/                 |          |
|    fps                | 140      |
|    iterations         | 1700     |
|    time_elapsed       | 60       |
|    total_timesteps    | 8500     |
|

------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 3000     |
|    time_elapsed       | 108      |
|    total_timesteps    | 15000    |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 2999     |
|    policy_loss        | 3.57     |
|    std                | 1.03     |
|    value_loss         | 0.325    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 3100     |
|    time_elapsed       | 111      |
|    total_timesteps    | 15500    |
| train/                |          |
|    entropy_loss       | -43.4    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 3099     |
|    policy_loss        | 30.9     |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.47e+06 |
|    total_cost         | 1.77e+04 |
|    total_reward       | 9.73e+05 |
|    total_reward_pct   | 195      |
|    total_trades       | 47675    |
| time/                 |          |
|    fps                | 140      |
|    iterations         | 4400     |
|    time_elapsed       | 156      |
|    total_timesteps    | 22000    |
| train/                |          |
|    entropy_loss       | -43.7    |
|    explained_variance | -0.376   |
|    learning_rate      | 0.0005   |
|    n_updates          | 4399     |
|    policy_loss        | -7.23    |
|    std                | 1.04     |
|    value_loss         | 0.0656   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 4500     |
|    time_elapsed       | 160      |
|    total_timesteps    | 22500    |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 5800     |
|    time_elapsed       | 206      |
|    total_timesteps    | 29000    |
| train/                |          |
|    entropy_loss       | -44.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5799     |
|    policy_loss        | 22.4     |
|    std                | 1.06     |
|    value_loss         | 0.351    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 5900     |
|    time_elapsed       | 210      |
|    total_timesteps    | 29500    |
| train/                |          |
|    entropy_loss       | -44.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5899     |
|    policy_loss        | -73.4    |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 7200     |
|    time_elapsed       | 258      |
|    total_timesteps    | 36000    |
| train/                |          |
|    entropy_loss       | -44.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 7199     |
|    policy_loss        | 38.1     |
|    std                | 1.08     |
|    value_loss         | 1.25     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 7300     |
|    time_elapsed       | 262      |
|    total_timesteps    | 36500    |
| train/                |          |
|    entropy_loss       | -44.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 7299     |
|    policy_loss        | 22.1     |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 8600     |
|    time_elapsed       | 309      |
|    total_timesteps    | 43000    |
| train/                |          |
|    entropy_loss       | -45.3    |
|    explained_variance | -0.175   |
|    learning_rate      | 0.0005   |
|    n_updates          | 8599     |
|    policy_loss        | 158      |
|    std                | 1.1      |
|    value_loss         | 11       |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.32e+06 |
|    total_cost         | 8.59e+04 |
|    total_reward       | 8.18e+05 |
|    total_reward_pct   | 164      |
|    total_trades       | 48079    |
| time/                 |          |
|    fps                | 138      |
|    iterations         | 8700     |
|    time_elapsed       | 312      |
|    total_timesteps    | 43500    |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 10000    |
|    time_elapsed       | 359      |
|    total_timesteps    | 50000    |
| train/                |          |
|    entropy_loss       | -45.7    |
|    explained_variance | -0.0567  |
|    learning_rate      | 0.0005   |
|    n_updates          | 9999     |
|    policy_loss        | -16.5    |
|    std                | 1.11     |
|    value_loss         | 0.161    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 10100    |
|    time_elapsed       | 363      |
|    total_timesteps    | 50500    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | -0.203   |
|    learning_rate      | 0.0005   |
|    n_updates          | 10099    |
|    policy_loss        | -9.31    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 9.16e+05 |
|    total_cost         | 5.72e+04 |
|    total_reward       | 4.16e+05 |
|    total_reward_pct   | 83.2     |
|    total_trades       | 50225    |
| time/                 |          |
|    fps                | 138      |
|    iterations         | 11400    |
|    time_elapsed       | 411      |
|    total_timesteps    | 57000    |
| train/                |          |
|    entropy_loss       | -46.3    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 11399    |
|    policy_loss        | 0.364    |
|    std                | 1.13     |
|    value_loss         | 0.0666   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 11500    |
|    time_elapsed       | 414      |
|    total_timesteps    | 57500    |
|

------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 12800    |
|    time_elapsed       | 461      |
|    total_timesteps    | 64000    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 12799    |
|    policy_loss        | -34.3    |
|    std                | 1.14     |
|    value_loss         | 0.898    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 12900    |
|    time_elapsed       | 464      |
|    total_timesteps    | 64500    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 12899    |
|    policy_loss        | 167      |
|

------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 14200    |
|    time_elapsed       | 512      |
|    total_timesteps    | 71000    |
| train/                |          |
|    entropy_loss       | -46.9    |
|    explained_variance | -0.00506 |
|    learning_rate      | 0.0005   |
|    n_updates          | 14199    |
|    policy_loss        | -34.6    |
|    std                | 1.16     |
|    value_loss         | 2.53     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 14300    |
|    time_elapsed       | 516      |
|    total_timesteps    | 71500    |
| train/                |          |
|    entropy_loss       | -47      |
|    explained_variance | 0.204    |
|    learning_rate      | 0.0005   |
|    n_updates          | 14299    |
|    policy_loss        | 10.1     |
|

------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 15600    |
|    time_elapsed       | 565      |
|    total_timesteps    | 78000    |
| train/                |          |
|    entropy_loss       | -47.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 15599    |
|    policy_loss        | 36       |
|    std                | 1.18     |
|    value_loss         | 0.952    |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 8.77e+05 |
|    total_cost         | 1.57e+04 |
|    total_reward       | 3.77e+05 |
|    total_reward_pct   | 75.3     |
|    total_trades       | 45245    |
| time/                 |          |
|    fps                | 138      |
|    iterations         | 15700    |
|    time_elapsed       | 568      |
|    total_timesteps    | 78500    |
|

------------------------------------
| time/                 |          |
|    fps                | 137      |
|    iterations         | 17000    |
|    time_elapsed       | 616      |
|    total_timesteps    | 85000    |
| train/                |          |
|    entropy_loss       | -48.1    |
|    explained_variance | 0.0169   |
|    learning_rate      | 0.0005   |
|    n_updates          | 16999    |
|    policy_loss        | -33.2    |
|    std                | 1.2      |
|    value_loss         | 0.824    |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 137       |
|    iterations         | 17100     |
|    time_elapsed       | 620       |
|    total_timesteps    | 85500     |
| train/                |           |
|    entropy_loss       | -48.1     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 17099     |
|    policy_loss        | -

------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 18400    |
|    time_elapsed       | 665      |
|    total_timesteps    | 92000    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 18399    |
|    policy_loss        | -79.8    |
|    std                | 1.23     |
|    value_loss         | 2.94     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 7.99e+05 |
|    total_cost         | 7.35e+03 |
|    total_reward       | 2.99e+05 |
|    total_reward_pct   | 59.7     |
|    total_trades       | 46441    |
| time/                 |          |
|    fps                | 138      |
|    iterations         | 18500    |
|    time_elapsed       | 668      |
|    total_timesteps    | 92500    |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 19800    |
|    time_elapsed       | 711      |
|    total_timesteps    | 99000    |
| train/                |          |
|    entropy_loss       | -49.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 19799    |
|    policy_loss        | -64.2    |
|    std                | 1.25     |
|    value_loss         | 2.25     |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 139       |
|    iterations         | 19900     |
|    time_elapsed       | 715       |
|    total_timesteps    | 99500     |
| train/                |           |
|    entropy_loss       | -49.3     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 19899     |
|    policy_loss        | -

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 4.22e+05    |
|    total_cost           | 4.14e+05    |
|    total_reward         | -7.79e+04   |
|    total_reward_pct     | -15.6       |
|    total_trades         | 72162       |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 8           |
|    time_elapsed         | 105         |
|    total_timesteps      | 16384       |
| train/                  |             |
|    approx_kl            | 0.033573978 |
|    clip_fraction        | 0.265       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.9       |
|    explained_variance   | 0.0453      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.201       |
|    n_updates            | 70          |
|    policy_gradient_loss | -0.0394     |
|    std                  | 1.01        |
|    value_loss           | 1.35  

-----------------------------------------
| time/                   |             |
|    fps                  | 154         |
|    iterations           | 17          |
|    time_elapsed         | 225         |
|    total_timesteps      | 34816       |
| train/                  |             |
|    approx_kl            | 0.024058575 |
|    clip_fraction        | 0.32        |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.5       |
|    explained_variance   | 0.162       |
|    learning_rate        | 0.00025     |
|    loss                 | -0.123      |
|    n_updates            | 160         |
|    policy_gradient_loss | -0.0333     |
|    std                  | 1.03        |
|    value_loss           | 1.1         |
-----------------------------------------
day: 2705, episode: 50
begin_total_asset: 500000.00
end_total_asset: 767815.41
total_reward: 267815.41
total_cost: 385369.01
total_trades: 70129
Sharpe: 0.285
-----------------------------------------
|

----------------------------------------
| time/                   |            |
|    fps                  | 153        |
|    iterations           | 25         |
|    time_elapsed         | 333        |
|    total_timesteps      | 51200      |
| train/                  |            |
|    approx_kl            | 0.04333575 |
|    clip_fraction        | 0.318      |
|    clip_range           | 0.2        |
|    entropy_loss         | -43.8      |
|    explained_variance   | 0.239      |
|    learning_rate        | 0.00025    |
|    loss                 | 0.246      |
|    n_updates            | 240        |
|    policy_gradient_loss | -0.0272    |
|    std                  | 1.04       |
|    value_loss           | 1.91       |
----------------------------------------
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 7.6e+05     |
|    total_cost           | 3.58e+05    |
|    total_reward         | 2.6e+05     |
|    total_

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.34e+05    |
|    total_cost           | 3.64e+05    |
|    total_reward         | 3.41e+04    |
|    total_reward_pct     | 6.83        |
|    total_trades         | 69458       |
| time/                   |             |
|    fps                  | 153         |
|    iterations           | 34          |
|    time_elapsed         | 454         |
|    total_timesteps      | 69632       |
| train/                  |             |
|    approx_kl            | 0.051322907 |
|    clip_fraction        | 0.428       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.5       |
|    explained_variance   | 0.231       |
|    learning_rate        | 0.00025     |
|    loss                 | 0.129       |
|    n_updates            | 330         |
|    policy_gradient_loss | -0.0322     |
|    std                  | 1.07        |
|    value_loss           | 1.43  

-----------------------------------------
| time/                   |             |
|    fps                  | 152         |
|    iterations           | 42          |
|    time_elapsed         | 562         |
|    total_timesteps      | 86016       |
| train/                  |             |
|    approx_kl            | 0.056249224 |
|    clip_fraction        | 0.446       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.1       |
|    explained_variance   | 0.185       |
|    learning_rate        | 0.00025     |
|    loss                 | -0.119      |
|    n_updates            | 410         |
|    policy_gradient_loss | -0.015      |
|    std                  | 1.09        |
|    value_loss           | 1.03        |
-----------------------------------------
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 9e+05       |
|    total_cost           | 3.54e+05    |
|    total_reward         | 4e+05 

----------------------------------
| environment/        |          |
|    portfolio_value  | 1.01e+06 |
|    total_cost       | 1.45e+03 |
|    total_reward     | 5.11e+05 |
|    total_reward_pct | 102      |
|    total_trades     | 45927    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 109      |
|    time_elapsed     | 98       |
|    total timesteps  | 10824    |
| train/              |          |
|    actor_loss       | 28.7     |
|    critic_loss      | 67.3     |
|    learning_rate    | 5e-06    |
|    n_updates        | 8118     |
----------------------------------
day: 2705, episode: 80
begin_total_asset: 500000.00
end_total_asset: 951846.10
total_reward: 451846.10
total_cost: 1044.66
total_trades: 38781
Sharpe: 0.376
----------------------------------
| environment/        |          |
|    portfolio_value  | 9.52e+05 |
|    total_cost       | 1.04e+03 |
|    total_reward     | 4.52e+05 |
|    total_reward_pct | 90.4     |
|  

------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 400      |
|    time_elapsed       | 13       |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -42.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 399      |
|    policy_loss        | -100     |
|    std                | 1.01     |
|    value_loss         | 6.04     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 146      |
|    iterations         | 500      |
|    time_elapsed       | 17       |
|    total_timesteps    | 2500     |
| train/                |          |
|    entropy_loss       | -42.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 499      |
|    policy_loss        | 14.4     |
|

------------------------------------
| time/                 |          |
|    fps                | 147      |
|    iterations         | 1800     |
|    time_elapsed       | 60       |
|    total_timesteps    | 9000     |
| train/                |          |
|    entropy_loss       | -43.1    |
|    explained_variance | 0.25     |
|    learning_rate      | 0.0005   |
|    n_updates          | 1799     |
|    policy_loss        | 18.8     |
|    std                | 1.02     |
|    value_loss         | 0.658    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 1900     |
|    time_elapsed       | 64       |
|    total_timesteps    | 9500     |
| train/                |          |
|    entropy_loss       | -43.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1899     |
|    policy_loss        | 11.6     |
|

------------------------------------
| time/                 |          |
|    fps                | 147      |
|    iterations         | 3200     |
|    time_elapsed       | 108      |
|    total_timesteps    | 16000    |
| train/                |          |
|    entropy_loss       | -43.7    |
|    explained_variance | 0.123    |
|    learning_rate      | 0.0005   |
|    n_updates          | 3199     |
|    policy_loss        | -1.88    |
|    std                | 1.04     |
|    value_loss         | 0.146    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 147      |
|    iterations         | 3300     |
|    time_elapsed       | 112      |
|    total_timesteps    | 16500    |
| train/                |          |
|    entropy_loss       | -43.7    |
|    explained_variance | 0.285    |
|    learning_rate      | 0.0005   |
|    n_updates          | 3299     |
|    policy_loss        | 40.9     |
|

------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 4600     |
|    time_elapsed       | 160      |
|    total_timesteps    | 23000    |
| train/                |          |
|    entropy_loss       | -44.1    |
|    explained_variance | -0.337   |
|    learning_rate      | 0.0005   |
|    n_updates          | 4599     |
|    policy_loss        | -0.44    |
|    std                | 1.05     |
|    value_loss         | 0.158    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 4700     |
|    time_elapsed       | 163      |
|    total_timesteps    | 23500    |
| train/                |          |
|    entropy_loss       | -44.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 4699     |
|    policy_loss        | 4.45     |
|

------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 6000     |
|    time_elapsed       | 209      |
|    total_timesteps    | 30000    |
| train/                |          |
|    entropy_loss       | -44.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5999     |
|    policy_loss        | 57.8     |
|    std                | 1.07     |
|    value_loss         | 4.41     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.39e+06 |
|    total_cost         | 1.81e+04 |
|    total_reward       | 8.95e+05 |
|    total_reward_pct   | 179      |
|    total_trades       | 41931    |
| time/                 |          |
|    fps                | 143      |
|    iterations         | 6100     |
|    time_elapsed       | 212      |
|    total_timesteps    | 30500    |
|

------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 7400     |
|    time_elapsed       | 258      |
|    total_timesteps    | 37000    |
| train/                |          |
|    entropy_loss       | -45.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 7399     |
|    policy_loss        | -29.3    |
|    std                | 1.09     |
|    value_loss         | 0.676    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 7500     |
|    time_elapsed       | 261      |
|    total_timesteps    | 37500    |
| train/                |          |
|    entropy_loss       | -45.1    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0005   |
|    n_updates          | 7499     |
|    policy_loss        | -19.5    |
|

------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 8800     |
|    time_elapsed       | 306      |
|    total_timesteps    | 44000    |
| train/                |          |
|    entropy_loss       | -45.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 8799     |
|    policy_loss        | 50.6     |
|    std                | 1.11     |
|    value_loss         | 1.31     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.41e+06 |
|    total_cost         | 5.53e+03 |
|    total_reward       | 9.1e+05  |
|    total_reward_pct   | 182      |
|    total_trades       | 40916    |
| time/                 |          |
|    fps                | 143      |
|    iterations         | 8900     |
|    time_elapsed       | 310      |
|    total_timesteps    | 44500    |
|

------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 10200    |
|    time_elapsed       | 357      |
|    total_timesteps    | 51000    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | 0.0803   |
|    learning_rate      | 0.0005   |
|    n_updates          | 10199    |
|    policy_loss        | -10.9    |
|    std                | 1.12     |
|    value_loss         | 0.25     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 142      |
|    iterations         | 10300    |
|    time_elapsed       | 361      |
|    total_timesteps    | 51500    |
| train/                |          |
|    entropy_loss       | -45.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 10299    |
|    policy_loss        | -116     |
|

------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 11600    |
|    time_elapsed       | 410      |
|    total_timesteps    | 58000    |
| train/                |          |
|    entropy_loss       | -46.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 11599    |
|    policy_loss        | 73.7     |
|    std                | 1.13     |
|    value_loss         | 2.63     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.79e+06 |
|    total_cost         | 1.74e+04 |
|    total_reward       | 1.29e+06 |
|    total_reward_pct   | 259      |
|    total_trades       | 46581    |
| time/                 |          |
|    fps                | 141      |
|    iterations         | 11700    |
|    time_elapsed       | 413      |
|    total_timesteps    | 58500    |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 13000    |
|    time_elapsed       | 462      |
|    total_timesteps    | 65000    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 12999    |
|    policy_loss        | -68.1    |
|    std                | 1.14     |
|    value_loss         | 2.03     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 13100    |
|    time_elapsed       | 465      |
|    total_timesteps    | 65500    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 13099    |
|    policy_loss        | 13.7     |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.74e+06 |
|    total_cost         | 9.41e+03 |
|    total_reward       | 1.24e+06 |
|    total_reward_pct   | 249      |
|    total_trades       | 41239    |
| time/                 |          |
|    fps                | 140      |
|    iterations         | 14400    |
|    time_elapsed       | 512      |
|    total_timesteps    | 72000    |
| train/                |          |
|    entropy_loss       | -47.1    |
|    explained_variance | 0.103    |
|    learning_rate      | 0.0005   |
|    n_updates          | 14399    |
|    policy_loss        | -481     |
|    std                | 1.16     |
|    value_loss         | 109      |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 14500    |
|    time_elapsed       | 516      |
|    total_timesteps    | 72500    |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 15800    |
|    time_elapsed       | 562      |
|    total_timesteps    | 79000    |
| train/                |          |
|    entropy_loss       | -47.6    |
|    explained_variance | -0.42    |
|    learning_rate      | 0.0005   |
|    n_updates          | 15799    |
|    policy_loss        | 72.5     |
|    std                | 1.19     |
|    value_loss         | 3.58     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 15900    |
|    time_elapsed       | 565      |
|    total_timesteps    | 79500    |
| train/                |          |
|    entropy_loss       | -47.6    |
|    explained_variance | 0.016    |
|    learning_rate      | 0.0005   |
|    n_updates          | 15899    |
|    policy_loss        | 9.41     |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.73e+06 |
|    total_cost         | 4.46e+03 |
|    total_reward       | 1.23e+06 |
|    total_reward_pct   | 247      |
|    total_trades       | 32849    |
| time/                 |          |
|    fps                | 140      |
|    iterations         | 17200    |
|    time_elapsed       | 612      |
|    total_timesteps    | 86000    |
| train/                |          |
|    entropy_loss       | -48.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 17199    |
|    policy_loss        | 40.3     |
|    std                | 1.2      |
|    value_loss         | 0.904    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 17300    |
|    time_elapsed       | 615      |
|    total_timesteps    | 86500    |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 18600    |
|    time_elapsed       | 661      |
|    total_timesteps    | 93000    |
| train/                |          |
|    entropy_loss       | -48.5    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0005   |
|    n_updates          | 18599    |
|    policy_loss        | 25.3     |
|    std                | 1.22     |
|    value_loss         | 1.86     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 18700    |
|    time_elapsed       | 665      |
|    total_timesteps    | 93500    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 0.000169 |
|    learning_rate      | 0.0005   |
|    n_updates          | 18699    |
|    policy_loss        | 110      |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.73e+06 |
|    total_cost         | 5.34e+03 |
|    total_reward       | 1.23e+06 |
|    total_reward_pct   | 246      |
|    total_trades       | 33320    |
| time/                 |          |
|    fps                | 140      |
|    iterations         | 20000    |
|    time_elapsed       | 710      |
|    total_timesteps    | 100000   |
| train/                |          |
|    entropy_loss       | -49      |
|    explained_variance | 0.0863   |
|    learning_rate      | 0.0005   |
|    n_updates          | 19999    |
|    policy_loss        | -34.2    |
|    std                | 1.24     |
|    value_loss         | 0.731    |
------------------------------------
A2C Sharpe Ratio:  0.5220818729465211
{'ent_coef': 0.01, 'n_steps': 2048, 'learning_rate': 0.00025, 'batch_size': 128}
Using cuda device
Logging to tensorboard_log/ppo/ppo_378_2
-----------------------------
| time/        

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 3.76e+05    |
|    total_cost           | 4.16e+05    |
|    total_reward         | -1.24e+05   |
|    total_reward_pct     | -24.8       |
|    total_trades         | 72991       |
| time/                   |             |
|    fps                  | 153         |
|    iterations           | 9           |
|    time_elapsed         | 119         |
|    total_timesteps      | 18432       |
| train/                  |             |
|    approx_kl            | 0.021172006 |
|    clip_fraction        | 0.311       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.1       |
|    explained_variance   | -0.0947     |
|    learning_rate        | 0.00025     |
|    loss                 | -0.127      |
|    n_updates            | 80          |
|    policy_gradient_loss | -0.0412     |
|    std                  | 1.02        |
|    value_loss           | 1.16  

day: 2768, episode: 50
begin_total_asset: 500000.00
end_total_asset: 389746.54
total_reward: -110253.46
total_cost: 415224.51
total_trades: 73144
Sharpe: -0.014
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 3.9e+05     |
|    total_cost           | 4.15e+05    |
|    total_reward         | -1.1e+05    |
|    total_reward_pct     | -22.1       |
|    total_trades         | 73144       |
| time/                   |             |
|    fps                  | 153         |
|    iterations           | 18          |
|    time_elapsed         | 240         |
|    total_timesteps      | 36864       |
| train/                  |             |
|    approx_kl            | 0.027546087 |
|    clip_fraction        | 0.282       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.6       |
|    explained_variance   | -0.113      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0223      |

-----------------------------------------
| time/                   |             |
|    fps                  | 152         |
|    iterations           | 27          |
|    time_elapsed         | 363         |
|    total_timesteps      | 55296       |
| train/                  |             |
|    approx_kl            | 0.051956657 |
|    clip_fraction        | 0.403       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.2       |
|    explained_variance   | 0.121       |
|    learning_rate        | 0.00025     |
|    loss                 | -0.126      |
|    n_updates            | 260         |
|    policy_gradient_loss | -0.0339     |
|    std                  | 1.06        |
|    value_loss           | 0.658       |
-----------------------------------------
----------------------------------------
| environment/            |            |
|    portfolio_value      | 3.35e+05   |
|    total_cost           | 3.86e+05   |
|    total_reward         | -1.65e+05 

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 3.43e+05    |
|    total_cost           | 2.99e+05    |
|    total_reward         | -1.57e+05   |
|    total_reward_pct     | -31.5       |
|    total_trades         | 65271       |
| time/                   |             |
|    fps                  | 152         |
|    iterations           | 36          |
|    time_elapsed         | 484         |
|    total_timesteps      | 73728       |
| train/                  |             |
|    approx_kl            | 0.029845826 |
|    clip_fraction        | 0.369       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45         |
|    explained_variance   | 0.093       |
|    learning_rate        | 0.00025     |
|    loss                 | -0.141      |
|    n_updates            | 350         |
|    policy_gradient_loss | -0.0364     |
|    std                  | 1.09        |
|    value_loss           | 1.09  

day: 2768, episode: 70
begin_total_asset: 500000.00
end_total_asset: 561485.08
total_reward: 61485.08
total_cost: 211909.74
total_trades: 61096
Sharpe: 0.162
----------------------------------------
| environment/            |            |
|    portfolio_value      | 5.61e+05   |
|    total_cost           | 2.12e+05   |
|    total_reward         | 6.15e+04   |
|    total_reward_pct     | 12.3       |
|    total_trades         | 61096      |
| time/                   |            |
|    fps                  | 151        |
|    iterations           | 45         |
|    time_elapsed         | 607        |
|    total_timesteps      | 92160      |
| train/                  |            |
|    approx_kl            | 0.02894405 |
|    clip_fraction        | 0.259      |
|    clip_range           | 0.2        |
|    entropy_loss         | -45.6      |
|    explained_variance   | 0.233      |
|    learning_rate        | 0.00025    |
|    loss                 | 0.317      |
|    n_updates        

{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0005}
Using cuda device
Logging to tensorboard_log/a2c/ensemble_378_1
------------------------------------
| time/                 |          |
|    fps                | 137      |
|    iterations         | 100      |
|    time_elapsed       | 3        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -42.7    |
|    explained_variance | 0.341    |
|    learning_rate      | 0.0005   |
|    n_updates          | 99       |
|    policy_loss        | 7.7      |
|    std                | 1.01     |
|    value_loss         | 0.279    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 200      |
|    time_elapsed       | 7        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -42.8    |
|    explained_variance | 0

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 1500     |
|    time_elapsed       | 54       |
|    total_timesteps    | 7500     |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1499     |
|    policy_loss        | -65.5    |
|    std                | 1.03     |
|    value_loss         | 2.19     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 1600     |
|    time_elapsed       | 58       |
|    total_timesteps    | 8000     |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1599     |
|    policy_loss        | 6.96     |
|

day: 2831, episode: 5
begin_total_asset: 500000.00
end_total_asset: 1071187.98
total_reward: 571187.98
total_cost: 95419.01
total_trades: 52184
Sharpe: 0.390
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.07e+06 |
|    total_cost         | 9.54e+04 |
|    total_reward       | 5.71e+05 |
|    total_reward_pct   | 114      |
|    total_trades       | 52184    |
| time/                 |          |
|    fps                | 137      |
|    iterations         | 2900     |
|    time_elapsed       | 105      |
|    total_timesteps    | 14500    |
| train/                |          |
|    entropy_loss       | -43.7    |
|    explained_variance | -0.564   |
|    learning_rate      | 0.0005   |
|    n_updates          | 2899     |
|    policy_loss        | 16       |
|    std                | 1.04     |
|    value_loss         | 0.216    |
------------------------------------
------------------------------------
| time/                 |   

------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 4300     |
|    time_elapsed       | 154      |
|    total_timesteps    | 21500    |
| train/                |          |
|    entropy_loss       | -44.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 4299     |
|    policy_loss        | 8.13     |
|    std                | 1.06     |
|    value_loss         | 0.228    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 138      |
|    iterations         | 4400     |
|    time_elapsed       | 158      |
|    total_timesteps    | 22000    |
| train/                |          |
|    entropy_loss       | -44.2    |
|    explained_variance | 2.47e-05 |
|    learning_rate      | 0.0005   |
|    n_updates          | 4399     |
|    policy_loss        | 77.7     |
|

day: 2831, episode: 10
begin_total_asset: 500000.00
end_total_asset: 1644959.64
total_reward: 1144959.64
total_cost: 26814.64
total_trades: 48373
Sharpe: 0.593
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.64e+06 |
|    total_cost         | 2.68e+04 |
|    total_reward       | 1.14e+06 |
|    total_reward_pct   | 229      |
|    total_trades       | 48373    |
| time/                 |          |
|    fps                | 139      |
|    iterations         | 5700     |
|    time_elapsed       | 203      |
|    total_timesteps    | 28500    |
| train/                |          |
|    entropy_loss       | -44.7    |
|    explained_variance | -1.79    |
|    learning_rate      | 0.0005   |
|    n_updates          | 5699     |
|    policy_loss        | 14.7     |
|    std                | 1.07     |
|    value_loss         | 0.142    |
------------------------------------
------------------------------------
| time/                 | 

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 7100     |
|    time_elapsed       | 254      |
|    total_timesteps    | 35500    |
| train/                |          |
|    entropy_loss       | -45      |
|    explained_variance | 0.688    |
|    learning_rate      | 0.0005   |
|    n_updates          | 7099     |
|    policy_loss        | 2.39     |
|    std                | 1.09     |
|    value_loss         | 0.0253   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 7200     |
|    time_elapsed       | 257      |
|    total_timesteps    | 36000    |
| train/                |          |
|    entropy_loss       | -45.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 7199     |
|    policy_loss        | 64.3     |
|

day: 2831, episode: 15
begin_total_asset: 500000.00
end_total_asset: 1990053.59
total_reward: 1490053.59
total_cost: 12329.69
total_trades: 40290
Sharpe: 0.769
-------------------------------------
| environment/          |           |
|    portfolio_value    | 1.99e+06  |
|    total_cost         | 1.23e+04  |
|    total_reward       | 1.49e+06  |
|    total_reward_pct   | 298       |
|    total_trades       | 40290     |
| time/                 |           |
|    fps                | 139       |
|    iterations         | 8500      |
|    time_elapsed       | 304       |
|    total_timesteps    | 42500     |
| train/                |           |
|    entropy_loss       | -45.7     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 8499      |
|    policy_loss        | 0.692     |
|    std                | 1.11      |
|    value_loss         | 0.0235    |
-------------------------------------
-------------------------------------
| ti

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 9900     |
|    time_elapsed       | 354      |
|    total_timesteps    | 49500    |
| train/                |          |
|    entropy_loss       | -46      |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 9899     |
|    policy_loss        | 4.64     |
|    std                | 1.12     |
|    value_loss         | 0.135    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 10000    |
|    time_elapsed       | 358      |
|    total_timesteps    | 50000    |
| train/                |          |
|    entropy_loss       | -46.1    |
|    explained_variance | 0.42     |
|    learning_rate      | 0.0005   |
|    n_updates          | 9999     |
|    policy_loss        | -22.3    |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 11300    |
|    time_elapsed       | 405      |
|    total_timesteps    | 56500    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 11299    |
|    policy_loss        | 44.6     |
|    std                | 1.14     |
|    value_loss         | 1.27     |
------------------------------------
day: 2831, episode: 20
begin_total_asset: 500000.00
end_total_asset: 1823571.86
total_reward: 1323571.86
total_cost: 12266.47
total_trades: 38405
Sharpe: 0.625
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.82e+06 |
|    total_cost         | 1.23e+04 |
|    total_reward       | 1.32e+06 |
|    total_reward_pct   | 265      |
|    total_trades       | 38405    |
| time/                 | 

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 12700    |
|    time_elapsed       | 455      |
|    total_timesteps    | 63500    |
| train/                |          |
|    entropy_loss       | -46.8    |
|    explained_variance | -0.0201  |
|    learning_rate      | 0.0005   |
|    n_updates          | 12699    |
|    policy_loss        | 37.2     |
|    std                | 1.15     |
|    value_loss         | 0.898    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 12800    |
|    time_elapsed       | 459      |
|    total_timesteps    | 64000    |
| train/                |          |
|    entropy_loss       | -46.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 12799    |
|    policy_loss        | 58.9     |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 14100    |
|    time_elapsed       | 506      |
|    total_timesteps    | 70500    |
| train/                |          |
|    entropy_loss       | -47.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 14099    |
|    policy_loss        | -6.61    |
|    std                | 1.17     |
|    value_loss         | 0.449    |
------------------------------------
day: 2831, episode: 25
begin_total_asset: 500000.00
end_total_asset: 1143124.65
total_reward: 643124.65
total_cost: 11329.39
total_trades: 35736
Sharpe: 0.508
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.14e+06 |
|    total_cost         | 1.13e+04 |
|    total_reward       | 6.43e+05 |
|    total_reward_pct   | 129      |
|    total_trades       | 35736    |
| time/                 |  

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 15500    |
|    time_elapsed       | 556      |
|    total_timesteps    | 77500    |
| train/                |          |
|    entropy_loss       | -47.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 15499    |
|    policy_loss        | -66.1    |
|    std                | 1.19     |
|    value_loss         | 2.92     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 15600    |
|    time_elapsed       | 560      |
|    total_timesteps    | 78000    |
| train/                |          |
|    entropy_loss       | -47.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 15599    |
|    policy_loss        | 7.72     |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 16900    |
|    time_elapsed       | 606      |
|    total_timesteps    | 84500    |
| train/                |          |
|    entropy_loss       | -48.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 16899    |
|    policy_loss        | -21.3    |
|    std                | 1.21     |
|    value_loss         | 0.446    |
------------------------------------
day: 2831, episode: 30
begin_total_asset: 500000.00
end_total_asset: 1690199.00
total_reward: 1190199.00
total_cost: 14211.48
total_trades: 35903
Sharpe: 0.666
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.69e+06 |
|    total_cost         | 1.42e+04 |
|    total_reward       | 1.19e+06 |
|    total_reward_pct   | 238      |
|    total_trades       | 35903    |
| time/                 | 

-------------------------------------
| time/                 |           |
|    fps                | 139       |
|    iterations         | 18300     |
|    time_elapsed       | 655       |
|    total_timesteps    | 91500     |
| train/                |           |
|    entropy_loss       | -48.5     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 18299     |
|    policy_loss        | -4.55     |
|    std                | 1.22      |
|    value_loss         | 0.341     |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 18400    |
|    time_elapsed       | 659      |
|    total_timesteps    | 92000    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 18399    |
|    policy_loss       

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 19700    |
|    time_elapsed       | 706      |
|    total_timesteps    | 98500    |
| train/                |          |
|    entropy_loss       | -49.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 19699    |
|    policy_loss        | 3.4      |
|    std                | 1.24     |
|    value_loss         | 2.67     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 19800    |
|    time_elapsed       | 709      |
|    total_timesteps    | 99000    |
| train/                |          |
|    entropy_loss       | -49.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 19799    |
|    policy_loss        | -64.4    |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 1100     |
|    time_elapsed       | 39       |
|    total_timesteps    | 5500     |
| train/                |          |
|    entropy_loss       | -43.2    |
|    explained_variance | 0.195    |
|    learning_rate      | 0.0005   |
|    n_updates          | 1099     |
|    policy_loss        | -4.49    |
|    std                | 1.02     |
|    value_loss         | 0.115    |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 7.69e+05 |
|    total_cost         | 2.12e+05 |
|    total_reward       | 2.69e+05 |
|    total_reward_pct   | 53.8     |
|    total_trades       | 60725    |
| time/                 |          |
|    fps                | 141      |
|    iterations         | 1200     |
|    time_elapsed       | 42       |
|    total_timesteps    | 6000     |
|

------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 2500     |
|    time_elapsed       | 88       |
|    total_timesteps    | 12500    |
| train/                |          |
|    entropy_loss       | -43.7    |
|    explained_variance | -0.0794  |
|    learning_rate      | 0.0005   |
|    n_updates          | 2499     |
|    policy_loss        | -21.1    |
|    std                | 1.04     |
|    value_loss         | 0.5      |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 141      |
|    iterations         | 2600     |
|    time_elapsed       | 91       |
|    total_timesteps    | 13000    |
| train/                |          |
|    entropy_loss       | -43.7    |
|    explained_variance | -0.933   |
|    learning_rate      | 0.0005   |
|    n_updates          | 2599     |
|    policy_loss        | 10.6     |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 3900     |
|    time_elapsed       | 138      |
|    total_timesteps    | 19500    |
| train/                |          |
|    entropy_loss       | -44.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 3899     |
|    policy_loss        | -125     |
|    std                | 1.05     |
|    value_loss         | 8.55     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.58e+06 |
|    total_cost         | 5.37e+04 |
|    total_reward       | 1.08e+06 |
|    total_reward_pct   | 216      |
|    total_trades       | 51520    |
| time/                 |          |
|    fps                | 140      |
|    iterations         | 4000     |
|    time_elapsed       | 141      |
|    total_timesteps    | 20000    |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 5300     |
|    time_elapsed       | 188      |
|    total_timesteps    | 26500    |
| train/                |          |
|    entropy_loss       | -44.4    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0005   |
|    n_updates          | 5299     |
|    policy_loss        | 7.06     |
|    std                | 1.06     |
|    value_loss         | 0.116    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 5400     |
|    time_elapsed       | 191      |
|    total_timesteps    | 27000    |
| train/                |          |
|    entropy_loss       | -44.4    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 5399     |
|    policy_loss        | 52.9     |
|

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 6700     |
|    time_elapsed       | 238      |
|    total_timesteps    | 33500    |
| train/                |          |
|    entropy_loss       | -44.9    |
|    explained_variance | 0.194    |
|    learning_rate      | 0.0005   |
|    n_updates          | 6699     |
|    policy_loss        | -53.4    |
|    std                | 1.08     |
|    value_loss         | 1.53     |
------------------------------------
-------------------------------------
| environment/          |           |
|    portfolio_value    | 1.25e+06  |
|    total_cost         | 1.86e+04  |
|    total_reward       | 7.46e+05  |
|    total_reward_pct   | 149       |
|    total_trades       | 44781     |
| time/                 |           |
|    fps                | 140       |
|    iterations         | 6800      |
|    time_elapsed       | 242       |
|    total_timesteps    | 3

------------------------------------
| time/                 |          |
|    fps                | 140      |
|    iterations         | 8100     |
|    time_elapsed       | 289      |
|    total_timesteps    | 40500    |
| train/                |          |
|    entropy_loss       | -45.4    |
|    explained_variance | -0.154   |
|    learning_rate      | 0.0005   |
|    n_updates          | 8099     |
|    policy_loss        | 17       |
|    std                | 1.1      |
|    value_loss         | 0.266    |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 140       |
|    iterations         | 8200      |
|    time_elapsed       | 292       |
|    total_timesteps    | 41000     |
| train/                |           |
|    entropy_loss       | -45.4     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 8199      |
|    policy_loss        | 5

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 9500     |
|    time_elapsed       | 339      |
|    total_timesteps    | 47500    |
| train/                |          |
|    entropy_loss       | -45.9    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0005   |
|    n_updates          | 9499     |
|    policy_loss        | 17.3     |
|    std                | 1.12     |
|    value_loss         | 0.163    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 9600     |
|    time_elapsed       | 343      |
|    total_timesteps    | 48000    |
| train/                |          |
|    entropy_loss       | -45.9    |
|    explained_variance | 0.0496   |
|    learning_rate      | 0.0005   |
|    n_updates          | 9599     |
|    policy_loss        | 34.4     |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 10900    |
|    time_elapsed       | 390      |
|    total_timesteps    | 54500    |
| train/                |          |
|    entropy_loss       | -46.5    |
|    explained_variance | 0.0507   |
|    learning_rate      | 0.0005   |
|    n_updates          | 10899    |
|    policy_loss        | -37.6    |
|    std                | 1.14     |
|    value_loss         | 2.26     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 11000    |
|    time_elapsed       | 394      |
|    total_timesteps    | 55000    |
| train/                |          |
|    entropy_loss       | -46.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 10999    |
|    policy_loss        | 20.2     |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 12300    |
|    time_elapsed       | 441      |
|    total_timesteps    | 61500    |
| train/                |          |
|    entropy_loss       | -47.3    |
|    explained_variance | 0.054    |
|    learning_rate      | 0.0005   |
|    n_updates          | 12299    |
|    policy_loss        | -27.7    |
|    std                | 1.17     |
|    value_loss         | 0.545    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 12400    |
|    time_elapsed       | 445      |
|    total_timesteps    | 62000    |
| train/                |          |
|    entropy_loss       | -47.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 12399    |
|    policy_loss        | -48.4    |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 13700    |
|    time_elapsed       | 492      |
|    total_timesteps    | 68500    |
| train/                |          |
|    entropy_loss       | -47.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 13699    |
|    policy_loss        | -28      |
|    std                | 1.2      |
|    value_loss         | 0.796    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 13800    |
|    time_elapsed       | 495      |
|    total_timesteps    | 69000    |
| train/                |          |
|    entropy_loss       | -47.9    |
|    explained_variance | 0.285    |
|    learning_rate      | 0.0005   |
|    n_updates          | 13799    |
|    policy_loss        | 25.8     |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 15100    |
|    time_elapsed       | 540      |
|    total_timesteps    | 75500    |
| train/                |          |
|    entropy_loss       | -48.5    |
|    explained_variance | -0.394   |
|    learning_rate      | 0.0005   |
|    n_updates          | 15099    |
|    policy_loss        | -19      |
|    std                | 1.22     |
|    value_loss         | 0.319    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 15200    |
|    time_elapsed       | 544      |
|    total_timesteps    | 76000    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 1.56e-05 |
|    learning_rate      | 0.0005   |
|    n_updates          | 15199    |
|    policy_loss        | -73.2    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.15e+06 |
|    total_cost         | 6.55e+03 |
|    total_reward       | 6.52e+05 |
|    total_reward_pct   | 130      |
|    total_trades       | 49387    |
| time/                 |          |
|    fps                | 139      |
|    iterations         | 16500    |
|    time_elapsed       | 590      |
|    total_timesteps    | 82500    |
| train/                |          |
|    entropy_loss       | -49.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 16499    |
|    policy_loss        | -6.84    |
|    std                | 1.25     |
|    value_loss         | 0.0369   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 16600    |
|    time_elapsed       | 593      |
|    total_timesteps    | 83000    |
|

------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 17900    |
|    time_elapsed       | 640      |
|    total_timesteps    | 89500    |
| train/                |          |
|    entropy_loss       | -49.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 17899    |
|    policy_loss        | -53.2    |
|    std                | 1.27     |
|    value_loss         | 1.69     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 18000    |
|    time_elapsed       | 643      |
|    total_timesteps    | 90000    |
| train/                |          |
|    entropy_loss       | -49.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 17999    |
|    policy_loss        | -16.4    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 2.22e+06 |
|    total_cost         | 3.21e+04 |
|    total_reward       | 1.72e+06 |
|    total_reward_pct   | 343      |
|    total_trades       | 47391    |
| time/                 |          |
|    fps                | 139      |
|    iterations         | 19300    |
|    time_elapsed       | 690      |
|    total_timesteps    | 96500    |
| train/                |          |
|    entropy_loss       | -50.1    |
|    explained_variance | -0.0201  |
|    learning_rate      | 0.0005   |
|    n_updates          | 19299    |
|    policy_loss        | -20.1    |
|    std                | 1.29     |
|    value_loss         | 0.206    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 139      |
|    iterations         | 19400    |
|    time_elapsed       | 693      |
|    total_timesteps    | 97000    |
|

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 4.27e+05    |
|    total_cost           | 4.61e+05    |
|    total_reward         | -7.35e+04   |
|    total_reward_pct     | -14.7       |
|    total_trades         | 76606       |
| time/                   |             |
|    fps                  | 149         |
|    iterations           | 5           |
|    time_elapsed         | 68          |
|    total_timesteps      | 10240       |
| train/                  |             |
|    approx_kl            | 0.011772586 |
|    clip_fraction        | 0.277       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.8       |
|    explained_variance   | -0.0228     |
|    learning_rate        | 0.00025     |
|    loss                 | -0.0441     |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.0421     |
|    std                  | 1.01        |
|    value_loss           | 1.22  

------------------------------------------
| environment/            |              |
|    portfolio_value      | 5.74e+05     |
|    total_cost           | 4.69e+05     |
|    total_reward         | 7.38e+04     |
|    total_reward_pct     | 14.8         |
|    total_trades         | 76170        |
| time/                   |              |
|    fps                  | 150          |
|    iterations           | 13           |
|    time_elapsed         | 176          |
|    total_timesteps      | 26624        |
| train/                  |              |
|    approx_kl            | 0.0139583945 |
|    clip_fraction        | 0.271        |
|    clip_range           | 0.2          |
|    entropy_loss         | -43.4        |
|    explained_variance   | 0.0914       |
|    learning_rate        | 0.00025      |
|    loss                 | 0.152        |
|    n_updates            | 120          |
|    policy_gradient_loss | -0.0379      |
|    std                  | 1.03         |
|    value_

----------------------------------------
| time/                   |            |
|    fps                  | 148        |
|    iterations           | 22         |
|    time_elapsed         | 303        |
|    total_timesteps      | 45056      |
| train/                  |            |
|    approx_kl            | 0.04202156 |
|    clip_fraction        | 0.356      |
|    clip_range           | 0.2        |
|    entropy_loss         | -44.1      |
|    explained_variance   | 0.0158     |
|    learning_rate        | 0.00025    |
|    loss                 | -0.0872    |
|    n_updates            | 210        |
|    policy_gradient_loss | -0.0269    |
|    std                  | 1.05       |
|    value_loss           | 1.03       |
----------------------------------------
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.49e+05    |
|    total_cost           | 4.3e+05     |
|    total_reward         | 4.87e+04    |
|    total_

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 4.1e+05     |
|    total_cost           | 4.06e+05    |
|    total_reward         | -8.98e+04   |
|    total_reward_pct     | -18         |
|    total_trades         | 73234       |
| time/                   |             |
|    fps                  | 148         |
|    iterations           | 31          |
|    time_elapsed         | 427         |
|    total_timesteps      | 63488       |
| train/                  |             |
|    approx_kl            | 0.041065685 |
|    clip_fraction        | 0.383       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.9       |
|    explained_variance   | -0.0185     |
|    learning_rate        | 0.00025     |
|    loss                 | 0.269       |
|    n_updates            | 300         |
|    policy_gradient_loss | -0.0262     |
|    std                  | 1.08        |
|    value_loss           | 1.18  

-----------------------------------------
| time/                   |             |
|    fps                  | 148         |
|    iterations           | 40          |
|    time_elapsed         | 551         |
|    total_timesteps      | 81920       |
| train/                  |             |
|    approx_kl            | 0.046096742 |
|    clip_fraction        | 0.381       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.4       |
|    explained_variance   | 0.133       |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0328      |
|    n_updates            | 390         |
|    policy_gradient_loss | -0.0202     |
|    std                  | 1.1         |
|    value_loss           | 1.32        |
-----------------------------------------
day: 2831, episode: 65
begin_total_asset: 500000.00
end_total_asset: 868262.60
total_reward: 368262.60
total_cost: 388146.15
total_trades: 71608
Sharpe: 0.336
---------------------------------------
| e

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 6.01e+05    |
|    total_cost           | 3.5e+05     |
|    total_reward         | 1.01e+05    |
|    total_reward_pct     | 20.3        |
|    total_trades         | 69305       |
| time/                   |             |
|    fps                  | 148         |
|    iterations           | 49          |
|    time_elapsed         | 675         |
|    total_timesteps      | 100352      |
| train/                  |             |
|    approx_kl            | 0.045395665 |
|    clip_fraction        | 0.374       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.9       |
|    explained_variance   | 0.0293      |
|    learning_rate        | 0.00025     |
|    loss                 | 0.461       |
|    n_updates            | 480         |
|    policy_gradient_loss | -0.0291     |
|    std                  | 1.12        |
|    value_loss           | 2.28  

nan
turbulence_threshold:  458.4056541260132
{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0005}
Using cuda device
Logging to tensorboard_log/a2c/a2c_504_2
------------------------------------
| time/                 |          |
|    fps                | 133      |
|    iterations         | 100      |
|    time_elapsed       | 3        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -42.7    |
|    explained_variance | 0.183    |
|    learning_rate      | 0.0005   |
|    n_updates          | 99       |
|    policy_loss        | -52.8    |
|    std                | 1        |
|    value_loss         | 3.28     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 132      |
|    iterations         | 200      |
|    time_elapsed       | 7        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 1500     |
|    time_elapsed       | 54       |
|    total_timesteps    | 7500     |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1499     |
|    policy_loss        | -22.4    |
|    std                | 1.02     |
|    value_loss         | 0.294    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 1600     |
|    time_elapsed       | 58       |
|    total_timesteps    | 8000     |
| train/                |          |
|    entropy_loss       | -43.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 1599     |
|    policy_loss        | -64.4    |
|

day: 2894, episode: 5
begin_total_asset: 500000.00
end_total_asset: 1260626.19
total_reward: 760626.19
total_cost: 50312.02
total_trades: 52509
Sharpe: 0.524
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.26e+06 |
|    total_cost         | 5.03e+04 |
|    total_reward       | 7.61e+05 |
|    total_reward_pct   | 152      |
|    total_trades       | 52509    |
| time/                 |          |
|    fps                | 135      |
|    iterations         | 2900     |
|    time_elapsed       | 106      |
|    total_timesteps    | 14500    |
| train/                |          |
|    entropy_loss       | -43.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 2899     |
|    policy_loss        | 2.08     |
|    std                | 1.05     |
|    value_loss         | 0.012    |
------------------------------------
------------------------------------
| time/                 |   

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 4300     |
|    time_elapsed       | 157      |
|    total_timesteps    | 21500    |
| train/                |          |
|    entropy_loss       | -44.5    |
|    explained_variance | 0.117    |
|    learning_rate      | 0.0005   |
|    n_updates          | 4299     |
|    policy_loss        | 23.8     |
|    std                | 1.07     |
|    value_loss         | 0.418    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 4400     |
|    time_elapsed       | 161      |
|    total_timesteps    | 22000    |
| train/                |          |
|    entropy_loss       | -44.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 4399     |
|    policy_loss        | -36.7    |
|

-------------------------------------
| time/                 |           |
|    fps                | 136       |
|    iterations         | 5700      |
|    time_elapsed       | 209       |
|    total_timesteps    | 28500     |
| train/                |           |
|    entropy_loss       | -44.8     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 5699      |
|    policy_loss        | -14.7     |
|    std                | 1.08      |
|    value_loss         | 0.88      |
-------------------------------------
day: 2894, episode: 10
begin_total_asset: 500000.00
end_total_asset: 1683720.55
total_reward: 1183720.55
total_cost: 62083.78
total_trades: 46219
Sharpe: 0.567
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.68e+06 |
|    total_cost         | 6.21e+04 |
|    total_reward       | 1.18e+06 |
|    total_reward_pct   | 237      |
|    total_trades       | 46219    |
| time/    

------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 7100     |
|    time_elapsed       | 261      |
|    total_timesteps    | 35500    |
| train/                |          |
|    entropy_loss       | -45.2    |
|    explained_variance | 0.173    |
|    learning_rate      | 0.0005   |
|    n_updates          | 7099     |
|    policy_loss        | 73.9     |
|    std                | 1.09     |
|    value_loss         | 3.16     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 7200     |
|    time_elapsed       | 265      |
|    total_timesteps    | 36000    |
| train/                |          |
|    entropy_loss       | -45.2    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0005   |
|    n_updates          | 7199     |
|    policy_loss        | 27.9     |
|

------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 8500     |
|    time_elapsed       | 312      |
|    total_timesteps    | 42500    |
| train/                |          |
|    entropy_loss       | -45.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 8499     |
|    policy_loss        | 85.4     |
|    std                | 1.11     |
|    value_loss         | 3.64     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 8600     |
|    time_elapsed       | 316      |
|    total_timesteps    | 43000    |
| train/                |          |
|    entropy_loss       | -45.6    |
|    explained_variance | 0.161    |
|    learning_rate      | 0.0005   |
|    n_updates          | 8599     |
|    policy_loss        | 38.9     |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 7.92e+05 |
|    total_cost         | 1.06e+05 |
|    total_reward       | 2.92e+05 |
|    total_reward_pct   | 58.3     |
|    total_trades       | 51105    |
| time/                 |          |
|    fps                | 135      |
|    iterations         | 9900     |
|    time_elapsed       | 364      |
|    total_timesteps    | 49500    |
| train/                |          |
|    entropy_loss       | -46.1    |
|    explained_variance | -0.917   |
|    learning_rate      | 0.0005   |
|    n_updates          | 9899     |
|    policy_loss        | 2.67     |
|    std                | 1.13     |
|    value_loss         | 0.033    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 10000    |
|    time_elapsed       | 368      |
|    total_timesteps    | 50000    |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 11300    |
|    time_elapsed       | 414      |
|    total_timesteps    | 56500    |
| train/                |          |
|    entropy_loss       | -46.8    |
|    explained_variance | 0.0883   |
|    learning_rate      | 0.0005   |
|    n_updates          | 11299    |
|    policy_loss        | 63.8     |
|    std                | 1.15     |
|    value_loss         | 2.44     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 11400    |
|    time_elapsed       | 418      |
|    total_timesteps    | 57000    |
| train/                |          |
|    entropy_loss       | -46.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 11399    |
|    policy_loss        | -9.66    |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 12700    |
|    time_elapsed       | 466      |
|    total_timesteps    | 63500    |
| train/                |          |
|    entropy_loss       | -47.4    |
|    explained_variance | 0.2      |
|    learning_rate      | 0.0005   |
|    n_updates          | 12699    |
|    policy_loss        | 46.6     |
|    std                | 1.18     |
|    value_loss         | 1.35     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.21e+06 |
|    total_cost         | 7.22e+04 |
|    total_reward       | 7.15e+05 |
|    total_reward_pct   | 143      |
|    total_trades       | 47490    |
| time/                 |          |
|    fps                | 136      |
|    iterations         | 12800    |
|    time_elapsed       | 469      |
|    total_timesteps    | 64000    |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 14100    |
|    time_elapsed       | 517      |
|    total_timesteps    | 70500    |
| train/                |          |
|    entropy_loss       | -47.8    |
|    explained_variance | -0.0628  |
|    learning_rate      | 0.0005   |
|    n_updates          | 14099    |
|    policy_loss        | 0.697    |
|    std                | 1.19     |
|    value_loss         | 0.0375   |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 14200    |
|    time_elapsed       | 521      |
|    total_timesteps    | 71000    |
| train/                |          |
|    entropy_loss       | -47.8    |
|    explained_variance | 0.212    |
|    learning_rate      | 0.0005   |
|    n_updates          | 14199    |
|    policy_loss        | 33.5     |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 15500    |
|    time_elapsed       | 568      |
|    total_timesteps    | 77500    |
| train/                |          |
|    entropy_loss       | -48.3    |
|    explained_variance | -0.219   |
|    learning_rate      | 0.0005   |
|    n_updates          | 15499    |
|    policy_loss        | 23.3     |
|    std                | 1.21     |
|    value_loss         | 0.477    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 15600    |
|    time_elapsed       | 572      |
|    total_timesteps    | 78000    |
| train/                |          |
|    entropy_loss       | -48.4    |
|    explained_variance | 0.000484 |
|    learning_rate      | 0.0005   |
|    n_updates          | 15599    |
|    policy_loss        | -6.63    |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 16900    |
|    time_elapsed       | 620      |
|    total_timesteps    | 84500    |
| train/                |          |
|    entropy_loss       | -48.8    |
|    explained_variance | -0.2     |
|    learning_rate      | 0.0005   |
|    n_updates          | 16899    |
|    policy_loss        | 44.2     |
|    std                | 1.23     |
|    value_loss         | 0.908    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 17000    |
|    time_elapsed       | 624      |
|    total_timesteps    | 85000    |
| train/                |          |
|    entropy_loss       | -48.8    |
|    explained_variance | 0.239    |
|    learning_rate      | 0.0005   |
|    n_updates          | 16999    |
|    policy_loss        | 1.07     |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 18300    |
|    time_elapsed       | 672      |
|    total_timesteps    | 91500    |
| train/                |          |
|    entropy_loss       | -49.4    |
|    explained_variance | -0.138   |
|    learning_rate      | 0.0005   |
|    n_updates          | 18299    |
|    policy_loss        | -14.1    |
|    std                | 1.26     |
|    value_loss         | 0.12     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 18400    |
|    time_elapsed       | 675      |
|    total_timesteps    | 92000    |
| train/                |          |
|    entropy_loss       | -49.4    |
|    explained_variance | -0.0537  |
|    learning_rate      | 0.0005   |
|    n_updates          | 18399    |
|    policy_loss        | -29.3    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.56e+06 |
|    total_cost         | 1.01e+05 |
|    total_reward       | 1.06e+06 |
|    total_reward_pct   | 212      |
|    total_trades       | 54001    |
| time/                 |          |
|    fps                | 136      |
|    iterations         | 19700    |
|    time_elapsed       | 723      |
|    total_timesteps    | 98500    |
| train/                |          |
|    entropy_loss       | -49.8    |
|    explained_variance | -4.38    |
|    learning_rate      | 0.0005   |
|    n_updates          | 19699    |
|    policy_loss        | 33.2     |
|    std                | 1.28     |
|    value_loss         | 0.628    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 19800    |
|    time_elapsed       | 727      |
|    total_timesteps    | 99000    |
|

day: 2894, episode: 40
begin_total_asset: 500000.00
end_total_asset: 550325.34
total_reward: 50325.34
total_cost: 491422.19
total_trades: 78595
Sharpe: 0.140
-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.5e+05     |
|    total_cost           | 4.91e+05    |
|    total_reward         | 5.03e+04    |
|    total_reward_pct     | 10.1        |
|    total_trades         | 78595       |
| time/                   |             |
|    fps                  | 147         |
|    iterations           | 8           |
|    time_elapsed         | 111         |
|    total_timesteps      | 16384       |
| train/                  |             |
|    approx_kl            | 0.026309162 |
|    clip_fraction        | 0.265       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.9       |
|    explained_variance   | -0.00532    |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0225      |
| 

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.12e+05    |
|    total_cost           | 4.37e+05    |
|    total_reward         | 1.22e+04    |
|    total_reward_pct     | 2.43        |
|    total_trades         | 75656       |
| time/                   |             |
|    fps                  | 145         |
|    iterations           | 17          |
|    time_elapsed         | 238         |
|    total_timesteps      | 34816       |
| train/                  |             |
|    approx_kl            | 0.032959804 |
|    clip_fraction        | 0.32        |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.4       |
|    explained_variance   | 0.0483      |
|    learning_rate        | 0.00025     |
|    loss                 | -0.0518     |
|    n_updates            | 160         |
|    policy_gradient_loss | -0.0306     |
|    std                  | 1.03        |
|    value_loss           | 1.12  

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.08e+05    |
|    total_cost           | 4.48e+05    |
|    total_reward         | 7.91e+03    |
|    total_reward_pct     | 1.58        |
|    total_trades         | 75930       |
| time/                   |             |
|    fps                  | 146         |
|    iterations           | 26          |
|    time_elapsed         | 364         |
|    total_timesteps      | 53248       |
| train/                  |             |
|    approx_kl            | 0.042959224 |
|    clip_fraction        | 0.334       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.1       |
|    explained_variance   | -0.0803     |
|    learning_rate        | 0.00025     |
|    loss                 | -0.00466    |
|    n_updates            | 250         |
|    policy_gradient_loss | -0.0299     |
|    std                  | 1.05        |
|    value_loss           | 1.14  

----------------------------------------
| time/                   |            |
|    fps                  | 146        |
|    iterations           | 35         |
|    time_elapsed         | 489        |
|    total_timesteps      | 71680      |
| train/                  |            |
|    approx_kl            | 0.04333859 |
|    clip_fraction        | 0.298      |
|    clip_range           | 0.2        |
|    entropy_loss         | -44.8      |
|    explained_variance   | 0.121      |
|    learning_rate        | 0.00025    |
|    loss                 | 0.837      |
|    n_updates            | 340        |
|    policy_gradient_loss | -0.0157    |
|    std                  | 1.08       |
|    value_loss           | 2.9        |
----------------------------------------
day: 2894, episode: 60
begin_total_asset: 500000.00
end_total_asset: 514561.10
total_reward: 14561.10
total_cost: 391491.13
total_trades: 73251
Sharpe: 0.116
-----------------------------------------
| environment/       

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 5.99e+05    |
|    total_cost           | 3.09e+05    |
|    total_reward         | 9.88e+04    |
|    total_reward_pct     | 19.8        |
|    total_trades         | 68169       |
| time/                   |             |
|    fps                  | 146         |
|    iterations           | 44          |
|    time_elapsed         | 615         |
|    total_timesteps      | 90112       |
| train/                  |             |
|    approx_kl            | 0.029988647 |
|    clip_fraction        | 0.332       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.4       |
|    explained_variance   | -0.0787     |
|    learning_rate        | 0.00025     |
|    loss                 | 0.0921      |
|    n_updates            | 430         |
|    policy_gradient_loss | -0.0184     |
|    std                  | 1.1         |
|    value_loss           | 1.54  

day: 2894, episode: 85
begin_total_asset: 500000.00
end_total_asset: 1145909.74
total_reward: 645909.74
total_cost: 1673.64
total_trades: 53570
Sharpe: 0.447
----------------------------------
| environment/        |          |
|    portfolio_value  | 1.16e+06 |
|    total_cost       | 1.68e+03 |
|    total_reward     | 6.61e+05 |
|    total_reward_pct | 132      |
|    total_trades     | 51045    |
| time/               |          |
|    episodes         | 16       |
|    fps              | 96       |
|    time_elapsed     | 479      |
|    total timesteps  | 46320    |
| train/              |          |
|    actor_loss       | -4.19    |
|    critic_loss      | 3.95     |
|    learning_rate    | 5e-06    |
|    n_updates        | 43425    |
----------------------------------
{'action_noise': OrnsteinUhlenbeckActionNoise(mu=[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.], sigma=[0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 700      |
|    time_elapsed       | 25       |
|    total_timesteps    | 3500     |
| train/                |          |
|    entropy_loss       | -42.7    |
|    explained_variance | -2.31    |
|    learning_rate      | 0.0005   |
|    n_updates          | 699      |
|    policy_loss        | -30.3    |
|    std                | 1.01     |
|    value_loss         | 0.62     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 800      |
|    time_elapsed       | 29       |
|    total_timesteps    | 4000     |
| train/                |          |
|    entropy_loss       | -42.7    |
|    explained_variance | 0.445    |
|    learning_rate      | 0.0005   |
|    n_updates          | 799      |
|    policy_loss        | 16.3     |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 2100     |
|    time_elapsed       | 76       |
|    total_timesteps    | 10500    |
| train/                |          |
|    entropy_loss       | -43.1    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 2099     |
|    policy_loss        | 26.1     |
|    std                | 1.02     |
|    value_loss         | 1.67     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 2200     |
|    time_elapsed       | 80       |
|    total_timesteps    | 11000    |
| train/                |          |
|    entropy_loss       | -43.1    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 2199     |
|    policy_loss        | -6.68    |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 3500     |
|    time_elapsed       | 128      |
|    total_timesteps    | 17500    |
| train/                |          |
|    entropy_loss       | -43.7    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 3499     |
|    policy_loss        | -5.07    |
|    std                | 1.04     |
|    value_loss         | 0.977    |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 2.08e+06 |
|    total_cost         | 3.97e+04 |
|    total_reward       | 1.58e+06 |
|    total_reward_pct   | 315      |
|    total_trades       | 44102    |
| time/                 |          |
|    fps                | 136      |
|    iterations         | 3600     |
|    time_elapsed       | 131      |
|    total_timesteps    | 18000    |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 4900     |
|    time_elapsed       | 180      |
|    total_timesteps    | 24500    |
| train/                |          |
|    entropy_loss       | -43.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 4899     |
|    policy_loss        | -35.2    |
|    std                | 1.05     |
|    value_loss         | 1.22     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 5000     |
|    time_elapsed       | 183      |
|    total_timesteps    | 25000    |
| train/                |          |
|    entropy_loss       | -44      |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0005   |
|    n_updates          | 4999     |
|    policy_loss        | 15       |
|

------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 6300     |
|    time_elapsed       | 230      |
|    total_timesteps    | 31500    |
| train/                |          |
|    entropy_loss       | -44.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 6299     |
|    policy_loss        | -55.8    |
|    std                | 1.07     |
|    value_loss         | 1.84     |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 6400     |
|    time_elapsed       | 234      |
|    total_timesteps    | 32000    |
| train/                |          |
|    entropy_loss       | -44.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 6399     |
|    policy_loss        | -45.2    |
|

------------------------------------
| environment/          |          |
|    portfolio_value    | 1.22e+06 |
|    total_cost         | 6.3e+04  |
|    total_reward       | 7.18e+05 |
|    total_reward_pct   | 144      |
|    total_trades       | 48013    |
| time/                 |          |
|    fps                | 136      |
|    iterations         | 7700     |
|    time_elapsed       | 282      |
|    total_timesteps    | 38500    |
| train/                |          |
|    entropy_loss       | -45.1    |
|    explained_variance | -0.292   |
|    learning_rate      | 0.0005   |
|    n_updates          | 7699     |
|    policy_loss        | -7.43    |
|    std                | 1.09     |
|    value_loss         | 0.105    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 7800     |
|    time_elapsed       | 286      |
|    total_timesteps    | 39000    |
|

-------------------------------------
| time/                 |           |
|    fps                | 135       |
|    iterations         | 9100      |
|    time_elapsed       | 334       |
|    total_timesteps    | 45500     |
| train/                |           |
|    entropy_loss       | -45.5     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 9099      |
|    policy_loss        | -13.1     |
|    std                | 1.11      |
|    value_loss         | 0.0977    |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 9200     |
|    time_elapsed       | 338      |
|    total_timesteps    | 46000    |
| train/                |          |
|    entropy_loss       | -45.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 9199     |
|    policy_loss       

-------------------------------------
| time/                 |           |
|    fps                | 135       |
|    iterations         | 10500     |
|    time_elapsed       | 386       |
|    total_timesteps    | 52500     |
| train/                |           |
|    entropy_loss       | -46.3     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 10499     |
|    policy_loss        | -11.2     |
|    std                | 1.14      |
|    value_loss         | 0.702     |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 10600    |
|    time_elapsed       | 390      |
|    total_timesteps    | 53000    |
| train/                |          |
|    entropy_loss       | -46.4    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 10599    |
|    policy_loss       

day: 2957, episode: 20
begin_total_asset: 500000.00
end_total_asset: 1329573.20
total_reward: 829573.20
total_cost: 5463.24
total_trades: 47940
Sharpe: 0.580
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.33e+06 |
|    total_cost         | 5.46e+03 |
|    total_reward       | 8.3e+05  |
|    total_reward_pct   | 166      |
|    total_trades       | 47940    |
| time/                 |          |
|    fps                | 136      |
|    iterations         | 11900    |
|    time_elapsed       | 437      |
|    total_timesteps    | 59500    |
| train/                |          |
|    entropy_loss       | -46.8    |
|    explained_variance | 0.273    |
|    learning_rate      | 0.0005   |
|    n_updates          | 11899    |
|    policy_loss        | 25.6     |
|    std                | 1.15     |
|    value_loss         | 0.333    |
------------------------------------
-------------------------------------
| time/                 |  

-------------------------------------
| time/                 |           |
|    fps                | 136       |
|    iterations         | 13300     |
|    time_elapsed       | 488       |
|    total_timesteps    | 66500     |
| train/                |           |
|    entropy_loss       | -47.5     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0005    |
|    n_updates          | 13299     |
|    policy_loss        | -40       |
|    std                | 1.18      |
|    value_loss         | 1.08      |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 136      |
|    iterations         | 13400    |
|    time_elapsed       | 492      |
|    total_timesteps    | 67000    |
| train/                |          |
|    entropy_loss       | -47.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 13399    |
|    policy_loss       

------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 14700    |
|    time_elapsed       | 542      |
|    total_timesteps    | 73500    |
| train/                |          |
|    entropy_loss       | -48      |
|    explained_variance | 0.385    |
|    learning_rate      | 0.0005   |
|    n_updates          | 14699    |
|    policy_loss        | -98.8    |
|    std                | 1.2      |
|    value_loss         | 4.58     |
------------------------------------
day: 2957, episode: 25
begin_total_asset: 500000.00
end_total_asset: 1059293.64
total_reward: 559293.64
total_cost: 20066.99
total_trades: 52146
Sharpe: 0.436
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.06e+06 |
|    total_cost         | 2.01e+04 |
|    total_reward       | 5.59e+05 |
|    total_reward_pct   | 112      |
|    total_trades       | 52146    |
| time/                 |  

------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 16100    |
|    time_elapsed       | 595      |
|    total_timesteps    | 80500    |
| train/                |          |
|    entropy_loss       | -48.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 16099    |
|    policy_loss        | -8.97    |
|    std                | 1.23     |
|    value_loss         | 0.122    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 16200    |
|    time_elapsed       | 599      |
|    total_timesteps    | 81000    |
| train/                |          |
|    entropy_loss       | -48.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 16199    |
|    policy_loss        | -78.3    |
|

------------------------------------
| time/                 |          |
|    fps                | 134      |
|    iterations         | 17500    |
|    time_elapsed       | 648      |
|    total_timesteps    | 87500    |
| train/                |          |
|    entropy_loss       | -49.3    |
|    explained_variance | 0.627    |
|    learning_rate      | 0.0005   |
|    n_updates          | 17499    |
|    policy_loss        | -24.9    |
|    std                | 1.26     |
|    value_loss         | 0.386    |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 134      |
|    iterations         | 17600    |
|    time_elapsed       | 651      |
|    total_timesteps    | 88000    |
| train/                |          |
|    entropy_loss       | -49.4    |
|    explained_variance | 0.0698   |
|    learning_rate      | 0.0005   |
|    n_updates          | 17599    |
|    policy_loss        | 28.4     |
|

------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 18900    |
|    time_elapsed       | 699      |
|    total_timesteps    | 94500    |
| train/                |          |
|    entropy_loss       | -49.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0005   |
|    n_updates          | 18899    |
|    policy_loss        | 30.7     |
|    std                | 1.27     |
|    value_loss         | 0.48     |
------------------------------------
------------------------------------
| environment/          |          |
|    portfolio_value    | 1.46e+06 |
|    total_cost         | 1.89e+04 |
|    total_reward       | 9.61e+05 |
|    total_reward_pct   | 192      |
|    total_trades       | 53510    |
| time/                 |          |
|    fps                | 135      |
|    iterations         | 19000    |
|    time_elapsed       | 703      |
|    total_timesteps    | 95000    |
|

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 3.67e+05    |
|    total_cost           | 4.97e+05    |
|    total_reward         | -1.33e+05   |
|    total_reward_pct     | -26.6       |
|    total_trades         | 80083       |
| time/                   |             |
|    fps                  | 140         |
|    iterations           | 3           |
|    time_elapsed         | 43          |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.017783215 |
|    clip_fraction        | 0.215       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.7       |
|    explained_variance   | -0.0219     |
|    learning_rate        | 0.00025     |
|    loss                 | 0.16        |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.0433     |
|    std                  | 1.01        |
|    value_loss           | 1.28  

-----------------------------------------
| environment/            |             |
|    portfolio_value      | 4.65e+05    |
|    total_cost           | 4.72e+05    |
|    total_reward         | -3.55e+04   |
|    total_reward_pct     | -7.1        |
|    total_trades         | 78471       |
| time/                   |             |
|    fps                  | 141         |
|    iterations           | 12          |
|    time_elapsed         | 174         |
|    total_timesteps      | 24576       |
| train/                  |             |
|    approx_kl            | 0.030765755 |
|    clip_fraction        | 0.304       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.3       |
|    explained_variance   | 0.109       |
|    learning_rate        | 0.00025     |
|    loss                 | -0.11       |
|    n_updates            | 110         |
|    policy_gradient_loss | -0.0383     |
|    std                  | 1.03        |
|    value_loss           | 1.02  

----------------------------------------
| environment/            |            |
|    portfolio_value      | 8.29e+05   |
|    total_cost           | 4.61e+05   |
|    total_reward         | 3.29e+05   |
|    total_reward_pct     | 65.8       |
|    total_trades         | 77597      |
| time/                   |            |
|    fps                  | 142        |
|    iterations           | 21         |
|    time_elapsed         | 302        |
|    total_timesteps      | 43008      |
| train/                  |            |
|    approx_kl            | 0.03207866 |
|    clip_fraction        | 0.323      |
|    clip_range           | 0.2        |
|    entropy_loss         | -43.9      |
|    explained_variance   | -0.0819    |
|    learning_rate        | 0.00025    |
|    loss                 | 0.106      |
|    n_updates            | 200        |
|    policy_gradient_loss | -0.0384    |
|    std                  | 1.05       |
|    value_loss           | 1.6        |
----------------

In [82]:
df_summary

Unnamed: 0,Iter,Val Start,Val End,Model Used,A2C Sharpe,PPO Sharpe,DDPG Sharpe
0,126,2016-01-04,2016-04-05,PPO,0.127394,0.303177,0.264977
1,189,2016-04-05,2016-07-05,A2C,0.12121,-0.002137,0.070818
2,252,2016-07-05,2016-10-03,DDPG,-0.042202,-0.045131,0.151223
3,315,2016-10-03,2017-01-03,DDPG,0.573297,0.566399,0.705078
4,378,2017-01-03,2017-04-04,DDPG,-0.122285,-0.138578,-0.031246
5,441,2017-04-04,2017-07-05,DDPG,0.150122,0.098266,0.268507
6,504,2017-07-05,2017-10-03,A2C,0.30975,0.240212,0.113961
7,567,2017-10-03,2018-01-03,A2C,0.832907,0.487907,0.633911
8,630,2018-01-03,2018-04-05,A2C,-0.038953,-0.120097,-0.115019
9,693,2018-04-05,2018-07-05,DDPG,-0.208591,-0.177387,0.055126


In [83]:
del fe

NameError: name 'fe' is not defined

In [84]:
import dill
dill.dump_session('17_year_guardian_sentiment_fixed_fixed_dow.db')

PicklingError: Can't pickle <class 'finrl.preprocessing.preprocessors.FeatureEngineer'>: it's not the same object as finrl.preprocessing.preprocessors.FeatureEngineer

In [None]:
processed_full

In [148]:
import pickle
with open('sentiment_1.pkl', 'wb') as f:
    pickle.dump(processed_full, f)

In [85]:
import pickle
with open('17_year_guardian_sentiment_fixed_fixed_dow_df_summary.pkl', 'wb') as f:
    pickle.dump(df_summary, f)

In [86]:
with open('17_year_guardian_sentiment_fixed_fixed_dow_processed_full.pkl', 'wb') as f:
    pickle.dump(processed_full, f)

In [92]:
with open('17_year_guardian_sentiment_fixed_fixed_dow_ensemble_agent.pkl', 'wb') as f:
    pickle.dump(ensemble_agent, f)

PicklingError: Can't pickle <class 'finrl.env.env_stocktrading.StockTradingEnv'>: it's not the same object as finrl.env.env_stocktrading.StockTradingEnv