Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execute_action function? #19

Closed
ebalaban opened this issue Aug 22, 2015 · 23 comments
Closed

execute_action function? #19

ebalaban opened this issue Aug 22, 2015 · 23 comments

Comments

@ebalaban
Copy link
Contributor

It seems to me that we may want the following function in the API (or something similar):

obs, reward = execute_action (pomdp, action)

Then we can use the observation to update our belief state. This is handy for online solvers that generate one action at a time. Am I missing something in the API that can support this already? Simulate(...) is not quite the same thing.

@etotheipluspi
Copy link
Member

I like the idea. I think the current state would need to be an input argument as well. The function could roughly look something like this (without any concerns for memory allocation):

function execute_action(rng::AbstractRNG, pomdp::POMDP, s::State, a::Action)
    sp = create_state(pomdp)
    o = create_observation(pomdp)
    td = create_transition_distribution(pomdp)
    od = create_observation_distribution(pomdp)
    transition!(td, pomdp, s, a)
    rand!(rng, sp, td)
    observation!(od, pomdp, sp, a)
    rand!(rng, o, od)
    r = reward(pomdp, s, a)
    return (sp, o, r)
end

One thing we can do is have a concrete type that has a state, observation, and their distributions pre-allocated in it, but something like that should not live in POMDPs.jl. Maybe this is something that can go in POMDPToolbox?

@ebalaban
Copy link
Contributor Author

Yes, you are right about the current state and the rng. I had those
maintained within my POMDP structure, but, as we discussed yesterday
with Mykel, that's not the design philosophy that you guys have been
going for.

On 8/22/2015 9:46 AM, Maxim Egorov wrote:

I like the idea. I think the current state would need to be an input
argument as well. The function could roughly look something like this
(without any concerns for memory allocation):

function execute_action(rng::AbstractRNG, pomdp::POMDP, s::State, a::Action)
sp= create_state(pomdp)
o= create_observation(pomdp)
td= create_transition_distribution(pomdp)
od= create_observation_distribution(pomdp)
transition!(td, pomdp, s, a)
rand!(rng, sp, td)
observation!(od, pomdp, sp, a)
rand!(rng, o, od)
r= reward(pomdp, s, a)
return (sp, o, r)
end

One thing we can do is have a concrete type that has a state,
observation, and their distributions pre-allocated in it, but
something like that should not live in POMDPs.jl. Maybe this is
something that can go in POMDPToolbox?


Reply to this email directly or view it on GitHub
#19 (comment).

@tawheeler
Copy link
Contributor

You could have it override a given state and observation:

execute_action!(sp::State, o::Observation, rng::AbstractRNG, pomdp::POMDP, s::State, a::Action)

@mykelk
Copy link
Member

mykelk commented Aug 22, 2015

I agree with Tim.

@etotheipluspi
Copy link
Member

That seems like too many input arguments into a function to me. Especially since they are not optional or key worded. The distributions need to be initialized as well, so I'm not quite sure we solve the memory allocation problem entirely by passing in the state and observation.

@mykelk
Copy link
Member

mykelk commented Aug 22, 2015

Oh, yeah. I forgot that you need the transition distribution initialized. Maybe keyword it with the default being a call to the various create functions. Then, if folks want to be memory efficient, they can be... but allow simpler calls if not.

@ebalaban
Copy link
Contributor Author

Sounds good to me.

On 8/22/2015 4:40 PM, Mykel Kochenderfer wrote:

Oh, yeah. I forgot that you need the transition distribution
initialized. Maybe keyword it with the default being a call to the
various create functions. Then, if folks want to be memory efficient,
they can be... but allow simpler calls if not.


Reply to this email directly or view it on GitHub
#19 (comment).

@etotheipluspi
Copy link
Member

Is everyone ok with the following syntax:

(r, sp, o) = execute_action!(sp::State, o::Observation, # being modified
                             rng::AbstractRNG, pomdp::POMDP, s::State, a::Action; # not modified
                             td=create_transition_distribution(pomdp), od=create_observation_distribution(pomdp) # optional kargs 
                             )

A tuple is being returned for the reward value. It's pretty messy, but we can update it later if everyone agrees on this for now.

@zsunberg
Copy link
Member

I'm going to reply to this later this morning after taking care of a few
other things (you can wait for my input or just go ahead)

On Wed, Aug 26, 2015, 09:32 Maxim Egorov notifications@github.com wrote:

Is everyone ok with the following syntax:

(r, sp, o) = execute_action!(sp::State, o::Observation, # being modified
rng::AbstractRNG, pomdp::POMDP, s::State, a::Action; # not modified
td=create_transition_distribution(pomdp), od=create_observation_distribution # optional kargs
)

A tuple is being returned for the reward value. It's pretty messy, but we
can update it later if everyone agrees on this for now.


Reply to this email directly or view it on GitHub
#19 (comment).

@mykelk
Copy link
Member

mykelk commented Aug 26, 2015

What about something like:

(reward, next_state, observation) = execute_action(rng::AbstractRNG, pomdp::POMDP, state::State, action::Action;
transistion_distribution=create_transition_distribution(pomdp), 
observation_distribution=create_observation_distribution(pomdp),
next_state=create_state(pomdp),
next_observation=create_observation(pomdp)
)

Note the lack of !.

@etotheipluspi
Copy link
Member

I like Mykel's version better. Both next_state and next_observation would be modified in the function, and the function would still return pointers to them correct?

@mykelk
Copy link
Member

mykelk commented Aug 26, 2015

Yep! I kind of like this style.

@ebalaban
Copy link
Contributor Author

Works for me. I presume you meant 'next_observation' in the return tuple?

On 8/26/2015 11:02 AM, Mykel Kochenderfer wrote:

Yep! I kind of like this style.


Reply to this email directly or view it on GitHub
#19 (comment).

@mykelk
Copy link
Member

mykelk commented Aug 26, 2015

Yep.

@zsunberg
Copy link
Member

I think we should clarify something about this function. If I am interpreting execute_action correctly, it is simply a convenience* function that will act as a generative model for a pomdp defined by the standard interface**, and the implementation of it will be included in POMDPs.jl. Am I correct to think this?

If anyone is thinking of this as part of the actual interface, i.e. the problem-writer can define execute_action instead of transition!, observation!, etc., we should have a longer discussion about it. This parallel interface is exactly what I proposed to Mykel at Graphex, and we sort of decided that it wouldn't be worth it.

Conceptually, POMDPs.jl can be divided into two parts:

  1. the standard interface, which consists of the functions that the problem-writer uses to define the problem, and
  2. convenience functions, simulate and execute_action.
    I think we should make this distinction clear in the documentation, or we should put the convenience functions in POMDPToolbox.

*convenient for the solver-writers, not any more convenient for the problem-writer
**by "standard interface" I mean reward, transition!, observation!, etc.

@mykelk
Copy link
Member

mykelk commented Aug 26, 2015

Yes, it is a convenience function, not part of the interface. It is perhaps better to move this to POMDPToolbox so that folks don't get confused about it being part of the interface, but I'm happy to hear other opinions.

@ebalaban
Copy link
Contributor Author

The way I understand POMDPToolbox.jl (and please feel free to correct me
if I am wrong) is that it is a container for the more high-level (and
more optional) POMDP-related tools (various interpolants, belief
updaters, etc). execute_action and simulate are more "fundamental", for
the lack of a better term, as almost every POMDP can take advantage of
them. My gut feeling is that POMDPs.jl is a better place for these
functions, but I am ok with them being moved to POMDPToolbox.jl as well.

On 8/26/2015 12:28 PM, Mykel Kochenderfer wrote:

Yes, it is a convenience function, not part of the interface. It is
perhaps better to move this to POMDPToolbox so that folks don't get
confused about it being part of the interface, but I'm happy to hear
other opinions.


Reply to this email directly or view it on GitHub
#19 (comment).

@mykelk
Copy link
Member

mykelk commented Aug 26, 2015

I think the idea is to keep POMDPs.jl as clean, simple, and pure as possible without any implementation. That way folks can look at a page of code and see the entire API and not be overwhelmed. POMDPToolbox.jl would be a collection of commonly used utilities and so forth that makes POMDPs.jl easy to work with. I would anticipate that most solvers would import POMDPToolbox.

@ebalaban
Copy link
Contributor Author

Ok, if that's the philosophy, I am fine with it. In that case both
execute_action and simulate should be moved to POMDPToolbox.

On 8/26/2015 1:56 PM, Mykel Kochenderfer wrote:

I think the idea is to keep POMDPs.jl as clean, simple, and pure as
possible without any implementation. That way folks can look at a page
of code and see the entire API and not be overwhelmed. POMDPToolbox.jl
would be a collection of commonly used utilities and so forth that
makes POMDPs.jl easy to work with. I would anticipate that most
solvers would import POMDPToolbox.


Reply to this email directly or view it on GitHub
#19 (comment).

@zsunberg
Copy link
Member

There is an argument for keeping simulate() in POMDPs.jl as an exception to the rule - it serves as documentation. If someone wants to see how the interface functions are used, they can look at simulate(). Sometimes concise code is a good thing to augment text documentation (maybe I'm biased because I proposed writing simulate though :)).

I would be happy keeping simulate here or moving it and saying "refer to..." in the POMDPs.jl documentation.

@mykelk
Copy link
Member

mykelk commented Aug 26, 2015

I was thinking that simulate would be kept abstract in POMDPs.jl, and POMDPToolbox.jl would provide some implementations. You might have different kinds of simulators---some that collect all sorts of diagnostic information, some that make neat displays, and other simulators that are just designed to run quickly.

The inspiration comes from RLGlue. See fig. 2 of this paper. You will see the "agent program" (policy), the "experiment program" (simulator), and the "environment program" (pomdp).

@zsunberg
Copy link
Member

What exactly do you mean by "keeping simulate abstract"? Do you mean to
have it print an error like the other functions in POMDPs.jl now do by
default? Would problem-writers often want to redefine simulate for a
specific pomdp?

I would advocate for including a default easy-to-read implementation in
POMDPs.jl like we have now for clarity on what simulate should do. It
might be hard to unambiguously describe what simulate should do in text.

My initial thought is that a simulator function that does something other
than just simulate (like creating a display) should have a different name,
especially if simulate is meant to be used inside solvers.

We should really open a different issue for this if we discuss it further.

On Wed, Aug 26, 2015 at 2:38 PM Mykel Kochenderfer notifications@github.com
wrote:

I was thinking that simulate would be kept abstract in POMDPs.jl, and
POMDPToolbox.jl would provide some implementations. You might have
different kinds of simulators---some that collect all sorts of diagnostic
information, some that make neat displays, and other simulators that are
just designed to run quickly.

The inspiration comes from RLGlue. See fig. 2 of this paper
http://www.jmlr.org/papers/volume10/tanner09a/tanner09a.pdf. You will
see the "agent program" (policy), the "experiment program" (simulator), and
the "environment program" (pomdp).


Reply to this email directly or view it on GitHub
#19 (comment).

@zsunberg
Copy link
Member

zsunberg commented Sep 5, 2015

I think we decided to put this in POMDPToolbox. the simulate issue is separate.

@zsunberg zsunberg closed this as completed Sep 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants