Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error encountered on 'Unreal Stacked Lstm example' #80

Closed
JaCoderX opened this issue Nov 21, 2018 · 14 comments
Closed

Error encountered on 'Unreal Stacked Lstm example' #80

JaCoderX opened this issue Nov 21, 2018 · 14 comments

Comments

@JaCoderX
Copy link
Contributor

JaCoderX commented Nov 21, 2018

I encountered an error and I'm trying to figure if it is wrong setting on my end or possible bug.

I took the 'Unreal Stacked Lstm example' and set the trainer to work with PPO.
on the strategy params I made the following changes:

MyCerebro.addstrategy(
    ...
    skip_frame=60, # skip_frame_period <= avg_period <= time_embedding_period:
    time_dim = 128,
    avg_period = 100,
    ...

I'm experimenting with skip_frame and time_dim to try and compare two models that are trained with different time frames. with the following settings:

  • First model -
    one year data of 1 min resolution, skip_frame ~ one hour (60 frames), time_dim ~ 2 hours (or above)

  • Second model
    one year data of 1 hour resolution, skip_frame ~ one hour (1 frame), time_dim ~ 120 hours (or above)

With this setting the models are making decisions at the same time (every hour) but after learning 2 different representation of the same data.
I'm curious in seeing if the '1 min model' can learn also the representation of the '1 hour model'

I get this error only after changing to the above values (no error on the default values: skip_frame=10, time_dim=30, avg_period=20)

this is the error:

Traceback (most recent call last):
File "/home/jack/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
return fn(*args)
File "/home/jack/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/jack/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: PPO/model/advantage
[[{{node PPO/model/advantage}} = HistogramSummary[T=DT_FLOAT, _device="/job:worker/replica:0/task:0/device:CPU:0"](PPO/model/advantage/tag, _recv_PPO/PPO/on_policy_advantage_pl_0)]]
[[{{node PPO/clip_by_global_norm/mul_47_S555}} = _Recvclient_terminated=false, recv_device="/job:ps/replica:0/task:0/device:CPU:0", send_device="/job:worker/replica:0/task:0/device:CPU:0", send_device_incarnation=-1203984878618912631, tensor_name="edge_4873_PPO/clip_by_global_norm/mul_47", tensor_type=DT_FLOAT, _device="/job:ps/replica:0/task:0/device:CPU:0"]]

@Kismuz
Copy link
Owner

Kismuz commented Nov 21, 2018

@JacobHanouna ,

  • the log says Advantage estimation graph returned NaN which has been passed to freaky nan-intolerant tf.summary.histogram:
    "nvalidArgumentError (see above for traceback): Nan in summary histogram for: PPO/model/advantage".
  • It's unclear what caused NaN; I'd recommend trying A3C class with same settings to check if error persists;
  • while playing with env. parameters it is good practice to run it manually couple of times before launching TF cluster; some inconsistent behaviour can be spotted in a first place. I do it so often I wrote very simple wrapper to collect data; could be used like this:
from _everywhere import _everything_needed
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

from btgym.research.misc_utils import EnvRunner

env = BTGymEnv(**my_env_config_kwargs)
data_provider = EnvRunner(env=env)

# Get episode and prepare data,
# here - from train dataset, set sample_type=1 to get test:

obs = data_provider.get_episode(sample_type=0)

image = data_provider.env.render('episode')

data_provider.close()

# Get stream of external observations: 
data = np.concatenate(obs['external'], axis=1)

# See what data looks like for an agent:
plt.figure(num=1, figsize=(14, 8))
plt.title('External data:')
plt.grid(True)
_ = plt.plot(data[-1, :, :])

# Show rendered episode:
plt.figure(num=2, figsize=(22, 30))
plt.title('Episode summary:')
_ = plt.imshow(image)

# See rewards closeup:
r = np.asarray(obs['reward'])
plt.figure(num=3, figsize=(14, 8))
plt.title('Reward:')
_ = plt.plot(r)
plt.grid(True)

@JaCoderX JaCoderX changed the title Error encountered when using PPO on 'Unreal Stacked Lstm example' Error encountered on 'Unreal Stacked Lstm example' Nov 22, 2018
@JaCoderX
Copy link
Contributor Author

while playing with env. parameters it is good practice to run it manually couple of times before launching TF cluster; some inconsistent behaviour can be spotted in a first place. I do it so often I wrote very simple wrapper to collect data; could be used like this:

Thanks for the tip. I will use it :)

It's unclear what caused NaN; I'd recommend trying A3C class with same settings to check if error persists;

OK you are right this is not PPO related. I get the same error also when using the BaseAAC trainer with StackedLstmPolicy.

Tried also using BaseAacPolicy which raise a different error (let me know if you need full traceback)

INFO:tensorflow:Restoring parameters from /home/jack/tmp/test/train/model.ckpt-0
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key AAC/global/conv2d/_layer_1/W/Adam not found in checkpoint
	 [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:ps/replica:0/task:0/device:CPU:0"](_recv_save/Const_0_S1, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] 

All the errors happen only if I change to the above settings on the strategy.

@Kismuz
Copy link
Owner

Kismuz commented Nov 23, 2018

@JacobHanouna, as traceback says, it failed to load saved model weights to new graph; usually appears when you have changed tf.graph definition (like switched from PPO to A3C or did some other alterations to graph) and tried to load previously saved model. Discard old checkpoints and start from scratch.

@JaCoderX
Copy link
Contributor Author

@Kismuz
This was tested on clean model (I made sure to delete the result folder after every test)

@Kismuz
Copy link
Owner

Kismuz commented Nov 23, 2018

what it tries restoring than? have you got clean manual env. run with strategy settings altered your way?

  • another option for orig. error can be: switch back to PPO and alter BaseAAC method _combine_summaries" - comment out following line:
                         tf.summary.histogram('advantage', self.local_network.on_pi_adv_target),

@JaCoderX
Copy link
Contributor Author

@JacobHanouna, as traceback says, it failed to load saved model weights to new graph; usually appears when you have changed tf.graph definition (like switched from PPO to A3C or did some other alterations to graph) and tried to load previously saved model. Discard old checkpoints and start from scratch.

You were right, probably didn't clear that result. using BaseAacPolicy give the same error as original.

from btgym.research.misc_utils import EnvRunner

can you share EnvRunner as well (if it is not private of course :) )

another option for orig. error can be: switch back to PPO and alter BaseAAC method _combine_summaries - comment out following line

after commenting out this line I get a different error
InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had NaN values [[{{node AAC/VerifyFinite/CheckNumerics}} = CheckNumerics[T=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:worker/replica:0/task:0/device:CPU:0"](AAC/global_norm/global_norm)]] INFO:tensorflow:Error reported to Coordinator: <class 'RuntimeError'>, process() exception occurred

@Kismuz
Copy link
Owner

Kismuz commented Nov 23, 2018

can you share EnvRunner

it's here, update BTgym:

git pull
pip install --upgrade -e .

Found Inf or NaN global norm. : Tensor had NaN values

well, time to manually check environment run.

@JaCoderX
Copy link
Contributor Author

I ran it manually but no error

@Kismuz
Copy link
Owner

Kismuz commented Nov 24, 2018

@JacobHanouna , can you share exact env. setup code so I can replicate a error?

@JaCoderX
Copy link
Contributor Author

All env setting are the one used in 'unreal example'. only changed the strategy a bit as follows:
(skip_frame, time_dim, avg_period)

# Define strategy and broker account parameters:
MyCerebro.addstrategy(
    DevStrat_4_11,
    start_cash=2000,  # initial broker cash
    commission=0.0001,  # commisssion to imitate spread
    leverage=10.0,
    order_size=2000,  # fixed stake, mind leverage
    drawdown_call=10, # max % to loose, in percent of initial cash
    target_call=10,  # max % to win, same
    skip_frame=60, # skip_frame_period <= avg_period <= time_embedding_period:
    time_dim = 128,
    avg_period = 100,
    gamma=0.99,
    reward_scale=7, # gardient`s nitrox, touch with care!
    state_ext_scale = np.linspace(3e3, 1e3, num=5)
)

@Kismuz
Copy link
Owner

Kismuz commented Nov 24, 2018

@JacobHanouna,
Well, there is caveat with setting oftime_dim and avg_period . Namely, BTgym API shell needs to infer observation state shape before actual instance of class DevStrat_4_11 is created; that means (taking in account backtrader paradigm that instances of classes are created at actual backtest runtime) observation state shape and all variables it depends upon should be class attributes and therefore can not be set via parameters dictionary (and, consequently, via addstrategy () method).
So, redefining skip_frame is fine but for others mentioned it is no easier way than to subclass startegy and explicitly set all class attributes; in your case it could look like this:

from gym import spaces
from btgym import DictSpace

class DevStrat_4_11_prime(DevStrat_4_11):
    time_dim = 128  
    skip_frame = 60
    avg_period = 100
    portfolio_actions = ('hold', 'buy', 'sell', 'close')
    gamma = 0.99  
    state_ext_scale = np.linspace(3e3, 1e3, num=5)
    params = dict(
        # Note: fake `Width` dimension to use 2d conv etc.:
        state_shape=
        {
            'external': spaces.Box(low=-100, high=100, shape=(time_dim, 1, 5), dtype=np.float32),
            'internal': spaces.Box(low=-2, high=2, shape=(avg_period, 1, 6), dtype=np.float32),
            'metadata': DictSpace(
                {
                    'type': spaces.Box(
                        shape=(),
                        low=0,
                        high=1,
                        dtype=np.uint32
                    ),
                    'trial_num': spaces.Box(
                        shape=(),
                        low=0,
                        high=10 ** 10,
                        dtype=np.uint32
                    ),
                    'trial_type': spaces.Box(
                        shape=(),
                        low=0,
                        high=1,
                        dtype=np.uint32
                    ),
                    'sample_num': spaces.Box(
                        shape=(),
                        low=0,
                        high=10 ** 10,
                        dtype=np.uint32
                    ),
                    'first_row': spaces.Box(
                        shape=(),
                        low=0,
                        high=10 ** 10,
                        dtype=np.uint32
                    ),
                    'timestamp': spaces.Box(
                        shape=(),
                        low=0,
                        high=np.finfo(np.float64).max,
                        dtype=np.float64
                    ),
                }
            )
        },
        cash_name='default_cash',
        asset_names=['default_asset'],
        start_cash=None,
        commission=None,
        leverage=1.0,
        drawdown_call=5,
        target_call=19,
        portfolio_actions=portfolio_actions,
        initial_action=None,
        initial_portfolio_action=None,
        skip_frame=skip_frame,
        gamma=gamma,
        reward_scale=1.0,
        state_ext_scale=state_ext_scale,  # EURUSD
        state_int_scale=1.0,
        metadata={},
    )
...............
MyCerebro.addstrategy(
    DevStrat_4_11_prime,
    start_cash=2000,  
    commission=0.0001,  
    leverage=10.0,
    order_size=2000,  
    drawdown_call=10, 
    target_call=10,  
    skip_frame=60, 
    gamma=0.99,
    reward_scale=7, 
)

ugly, I know - i'll try to address it in future;

@JaCoderX
Copy link
Contributor Author

OK Thanks :)

@JaCoderX
Copy link
Contributor Author

JaCoderX commented Dec 8, 2018

@Kismuz, I came across another issue while playing around with this example.

Under the trainer_config I have enabled use_value_replay=True and got the following error:

File "/home/jack/btgym/btgym/algorithms/policy/base.py", line 373, in get_pc_target
feeder = {self.pc_change_state_in: state['external'], self.pc_change_last_state_in: last_state['external']}
AttributeError: 'AacStackedRL2Policy' object has no attribute 'pc_change_state_in'

I checked the code and on StckedLstmPolicy you have disabled Aux 1 - Pixel Control. So pc_change_state_in doesn't get declared... but it appears that base is expecting it.

@Kismuz
Copy link
Owner

Kismuz commented Dec 9, 2018

@JacobHanouna,
fixed, update package; thanks for spotting it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants