-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
execute_action function? #19
Comments
I like the idea. I think the current state would need to be an input argument as well. The function could roughly look something like this (without any concerns for memory allocation): function execute_action(rng::AbstractRNG, pomdp::POMDP, s::State, a::Action)
sp = create_state(pomdp)
o = create_observation(pomdp)
td = create_transition_distribution(pomdp)
od = create_observation_distribution(pomdp)
transition!(td, pomdp, s, a)
rand!(rng, sp, td)
observation!(od, pomdp, sp, a)
rand!(rng, o, od)
r = reward(pomdp, s, a)
return (sp, o, r)
end One thing we can do is have a concrete type that has a state, observation, and their distributions pre-allocated in it, but something like that should not live in POMDPs.jl. Maybe this is something that can go in POMDPToolbox? |
Yes, you are right about the current state and the rng. I had those On 8/22/2015 9:46 AM, Maxim Egorov wrote:
|
You could have it override a given state and observation: execute_action!(sp::State, o::Observation, rng::AbstractRNG, pomdp::POMDP, s::State, a::Action) |
I agree with Tim. |
That seems like too many input arguments into a function to me. Especially since they are not optional or key worded. The distributions need to be initialized as well, so I'm not quite sure we solve the memory allocation problem entirely by passing in the state and observation. |
Oh, yeah. I forgot that you need the transition distribution initialized. Maybe keyword it with the default being a call to the various create functions. Then, if folks want to be memory efficient, they can be... but allow simpler calls if not. |
Sounds good to me. On 8/22/2015 4:40 PM, Mykel Kochenderfer wrote:
|
Is everyone ok with the following syntax: (r, sp, o) = execute_action!(sp::State, o::Observation, # being modified
rng::AbstractRNG, pomdp::POMDP, s::State, a::Action; # not modified
td=create_transition_distribution(pomdp), od=create_observation_distribution(pomdp) # optional kargs
) A tuple is being returned for the reward value. It's pretty messy, but we can update it later if everyone agrees on this for now. |
I'm going to reply to this later this morning after taking care of a few On Wed, Aug 26, 2015, 09:32 Maxim Egorov notifications@github.com wrote:
|
What about something like: (reward, next_state, observation) = execute_action(rng::AbstractRNG, pomdp::POMDP, state::State, action::Action;
transistion_distribution=create_transition_distribution(pomdp),
observation_distribution=create_observation_distribution(pomdp),
next_state=create_state(pomdp),
next_observation=create_observation(pomdp)
) Note the lack of !. |
I like Mykel's version better. Both next_state and next_observation would be modified in the function, and the function would still return pointers to them correct? |
Yep! I kind of like this style. |
Works for me. I presume you meant 'next_observation' in the return tuple? On 8/26/2015 11:02 AM, Mykel Kochenderfer wrote:
|
Yep. |
I think we should clarify something about this function. If I am interpreting If anyone is thinking of this as part of the actual interface, i.e. the problem-writer can define Conceptually, POMDPs.jl can be divided into two parts:
*convenient for the solver-writers, not any more convenient for the problem-writer |
Yes, it is a convenience function, not part of the interface. It is perhaps better to move this to POMDPToolbox so that folks don't get confused about it being part of the interface, but I'm happy to hear other opinions. |
The way I understand POMDPToolbox.jl (and please feel free to correct me On 8/26/2015 12:28 PM, Mykel Kochenderfer wrote:
|
I think the idea is to keep POMDPs.jl as clean, simple, and pure as possible without any implementation. That way folks can look at a page of code and see the entire API and not be overwhelmed. POMDPToolbox.jl would be a collection of commonly used utilities and so forth that makes POMDPs.jl easy to work with. I would anticipate that most solvers would import POMDPToolbox. |
Ok, if that's the philosophy, I am fine with it. In that case both On 8/26/2015 1:56 PM, Mykel Kochenderfer wrote:
|
There is an argument for keeping I would be happy keeping |
I was thinking that simulate would be kept abstract in POMDPs.jl, and POMDPToolbox.jl would provide some implementations. You might have different kinds of simulators---some that collect all sorts of diagnostic information, some that make neat displays, and other simulators that are just designed to run quickly. The inspiration comes from RLGlue. See fig. 2 of this paper. You will see the "agent program" (policy), the "experiment program" (simulator), and the "environment program" (pomdp). |
What exactly do you mean by "keeping I would advocate for including a default easy-to-read implementation in My initial thought is that a simulator function that does something other We should really open a different issue for this if we discuss it further. On Wed, Aug 26, 2015 at 2:38 PM Mykel Kochenderfer notifications@github.com
|
I think we decided to put this in POMDPToolbox. the simulate issue is separate. |
It seems to me that we may want the following function in the API (or something similar):
obs, reward = execute_action (pomdp, action)
Then we can use the observation to update our belief state. This is handy for online solvers that generate one action at a time. Am I missing something in the API that can support this already? Simulate(...) is not quite the same thing.
The text was updated successfully, but these errors were encountered: