Evaluate a trained model #18

dvalbuena1 · 2022-10-21T19:59:05Z

Hello!
I have a question.

During the implementation of ARES, did you ever think of trying to implement a way to evaluate an already trained model, and not just continue with the training?
Since in your paper I don't see it mentioned as a limitation.
Besides, at this moment I am trying to implement it, but I have been stuck because of a limitation that may be Stable Baselines3 has to change the action space each time you try to predict an action.

Like this

env.action_space.high[0] = env.env.ACTION_SPACE
logger.info("Loading policy...")
model = SAC.load(os.path.splitext(file_path)[0], env)

obs = env.reset()
for i in range(10):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)

As you can see, the predict method only receives the observation, not the environment, so modifying the action space is useless.

I just wanted to know if maybe you who developed this tool have any idea or clarification that you can give me, I would appreciate it. 😉

The text was updated successfully, but these errors were encountered:

H2SO4T · 2022-10-23T12:46:37Z

Hi!

During the implementation of ARES, did you ever think of trying to implement a way to evaluate an already trained model, and not just continue with the training? Since in your paper I don't see it mentioned as a limitation.

I have not evaluated a trained model, but you can try to do it. However, it would be best if you waited more than one-hour testing so that the learned policy can learn the correct action according to the observed state.

Besides, at this moment I am trying to implement it, but I have been stuck because of a limitation that may be Stable Baselines3 has to change the action space each time you try to predict an action.

In the function env.step()there is an if statement that checks whether the environment can apply the action generated by the NN to the current state of the application under test. Unfortunately, you can not modify the output dimension of the NN (the action space you are referring to). The only way to always output a "correct" action is to learn an optimal policy, which means training the RL algorithm for hours (or more, as the answer is not obvious and not guaranteed).

Let me know if I can help again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate a trained model #18

Evaluate a trained model #18

dvalbuena1 commented Oct 21, 2022

H2SO4T commented Oct 23, 2022

Evaluate a trained model #18

Evaluate a trained model #18

Comments

dvalbuena1 commented Oct 21, 2022

H2SO4T commented Oct 23, 2022