-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gymnasium support #40
Conversation
Hi @Alex2782 , is this PR ready to review? |
5 months ago I already changed your 'README.ipynb'. I find 'Jupyter Notebook' terrible. I can no longer track what I have customized and what changes are still needed. |
You could use VSCode for editing the notebooks. It highlights the changes and provides some other useful features. |
I'll check the render function. |
How about the Github changes page here? |
It seems that Gymnasium doesn't allow passing additional args to the render method. I managed to fix it by the below code, however, I don't know if it is a correct solution based on Gymnasium rules. import numpy as np
import random
import torch
from stable_baselines3 import A2C
import gymnasium as gym
from gym_mtsim import (
Timeframe, SymbolInfo,
MtSimulator, OrderType, Order, SymbolNotFound, OrderNotFound,
MtEnv,
FOREX_DATA_PATH, STOCKS_DATA_PATH, CRYPTO_DATA_PATH, MIXED_DATA_PATH,
)
env_name = 'forex-hedge-v0'
env = gym.make(env_name)
# reproduce training and test
seed = 42
env.reset(seed=seed)
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
model = A2C('MultiInputPolicy', env, verbose=0)
model.learn(total_timesteps=10000)
observation, info = env.reset(seed=seed)
while True:
action, _states = model.predict(observation)
observation, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
if done:
break
env.unwrapped.render('advanced_figure', time_format='%Y-%m-%d') |
I think that is okay https://gymnasium.farama.org/api/env/#gymnasium.Env.render |
Great! Let's continue with this then. Thanks @Alex2782 for checking the docs. |
Hi @Alex2782 |
Hi @AminHP, which changes exactly are still necessary? only adjust the examples? |
Yes, the examples should probably use the changes we discussed in the above code. It is also better to have them in ipynb format, like the one in anytrading. |
- env.unwrapped.render
Hi @AminHP, I have checked and adjusted your file 'README.ipynb'. Note: without the first change this error comes, it may also be due to my python or numpy version. |
I think the documentation would also have to be adapted, for example there is still the file My example |
Thanks for the changes. I will update and revise the docs later. The last change is providing the examples in ipynb format. |
Any updates @Alex2782 ? |
Should I remove the file 'train_SB3_gymnasium.py'? I think the 'README.ipynb' is enough. |
No, there is no need to remove the example. I will try to change it to ipynb version. The example is beneficial as it compares A2C and PPO. |
@AminHP I had 'train_SB3_gymnasium.py' also in the project 'gym-anytrading' to test 'gymnasium' support. |
hello ✋🏿 any updates on the merge of this feature? |
I have finally fixed the minor issues, updated the examples, and merged the PR :)) |
thanks guys, appreciate the effort 👍🏿 |
The render() function is not yet compatible with SB3 + Gymnasium. (DLR-RM/stable-baselines3#1327)
Training works with: A2C, PPO, RecurrentPPO, TRPO (added gym-mtsim/examples/train_SB3_gymnasium.py)
Random actions vs. SB3 - Agents x [50K, 250K, 500K] learning_timesteps