Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate a trained model #18

Open
dvalbuena1 opened this issue Oct 21, 2022 · 1 comment
Open

Evaluate a trained model #18

dvalbuena1 opened this issue Oct 21, 2022 · 1 comment

Comments

@dvalbuena1
Copy link

Hello!
I have a question.

During the implementation of ARES, did you ever think of trying to implement a way to evaluate an already trained model, and not just continue with the training?
Since in your paper I don't see it mentioned as a limitation.
Besides, at this moment I am trying to implement it, but I have been stuck because of a limitation that may be Stable Baselines3 has to change the action space each time you try to predict an action.

Like this

env.action_space.high[0] = env.env.ACTION_SPACE
logger.info("Loading policy...")
model = SAC.load(os.path.splitext(file_path)[0], env)

obs = env.reset()
for i in range(10):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)

As you can see, the predict method only receives the observation, not the environment, so modifying the action space is useless.

I just wanted to know if maybe you who developed this tool have any idea or clarification that you can give me, I would appreciate it. 😉

@H2SO4T
Copy link
Owner

H2SO4T commented Oct 23, 2022

Hi!

During the implementation of ARES, did you ever think of trying to implement a way to evaluate an already trained model, and not just continue with the training? Since in your paper I don't see it mentioned as a limitation.

I have not evaluated a trained model, but you can try to do it. However, it would be best if you waited more than one-hour testing so that the learned policy can learn the correct action according to the observed state.

Besides, at this moment I am trying to implement it, but I have been stuck because of a limitation that may be Stable Baselines3 has to change the action space each time you try to predict an action.

In the function env.step()there is an if statement that checks whether the environment can apply the action generated by the NN to the current state of the application under test. Unfortunately, you can not modify the output dimension of the NN (the action space you are referring to). The only way to always output a "correct" action is to learn an optimal policy, which means training the RL algorithm for hours (or more, as the answer is not obvious and not guaranteed).

Let me know if I can help again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants