execute_action function? #19

ebalaban · 2015-08-22T01:30:40Z

It seems to me that we may want the following function in the API (or something similar):

obs, reward = execute_action (pomdp, action)

Then we can use the observation to update our belief state. This is handy for online solvers that generate one action at a time. Am I missing something in the API that can support this already? Simulate(...) is not quite the same thing.

etotheipluspi · 2015-08-22T16:45:59Z

I like the idea. I think the current state would need to be an input argument as well. The function could roughly look something like this (without any concerns for memory allocation):

function execute_action(rng::AbstractRNG, pomdp::POMDP, s::State, a::Action)
    sp = create_state(pomdp)
    o = create_observation(pomdp)
    td = create_transition_distribution(pomdp)
    od = create_observation_distribution(pomdp)
    transition!(td, pomdp, s, a)
    rand!(rng, sp, td)
    observation!(od, pomdp, sp, a)
    rand!(rng, o, od)
    r = reward(pomdp, s, a)
    return (sp, o, r)
end

One thing we can do is have a concrete type that has a state, observation, and their distributions pre-allocated in it, but something like that should not live in POMDPs.jl. Maybe this is something that can go in POMDPToolbox?

ebalaban · 2015-08-22T16:56:03Z

Yes, you are right about the current state and the rng. I had those
maintained within my POMDP structure, but, as we discussed yesterday
with Mykel, that's not the design philosophy that you guys have been
going for.

On 8/22/2015 9:46 AM, Maxim Egorov wrote:

I like the idea. I think the current state would need to be an input
argument as well. The function could roughly look something like this
(without any concerns for memory allocation):

function execute_action(rng::AbstractRNG, pomdp::POMDP, s::State, a::Action)
sp= create_state(pomdp)
o= create_observation(pomdp)
td= create_transition_distribution(pomdp)
od= create_observation_distribution(pomdp)
transition!(td, pomdp, s, a)
rand!(rng, sp, td)
observation!(od, pomdp, sp, a)
rand!(rng, o, od)
r= reward(pomdp, s, a)
return (sp, o, r)
end

One thing we can do is have a concrete type that has a state,
observation, and their distributions pre-allocated in it, but
something like that should not live in POMDPs.jl. Maybe this is
something that can go in POMDPToolbox?

—
Reply to this email directly or view it on GitHub
#19 (comment).

tawheeler · 2015-08-22T18:40:51Z

You could have it override a given state and observation:

execute_action!(sp::State, o::Observation, rng::AbstractRNG, pomdp::POMDP, s::State, a::Action)

mykelk · 2015-08-22T20:25:33Z

I agree with Tim.

etotheipluspi · 2015-08-22T22:07:34Z

That seems like too many input arguments into a function to me. Especially since they are not optional or key worded. The distributions need to be initialized as well, so I'm not quite sure we solve the memory allocation problem entirely by passing in the state and observation.

mykelk · 2015-08-22T23:40:02Z

Oh, yeah. I forgot that you need the transition distribution initialized. Maybe keyword it with the default being a call to the various create functions. Then, if folks want to be memory efficient, they can be... but allow simpler calls if not.

ebalaban · 2015-08-25T20:22:44Z

Sounds good to me.

On 8/22/2015 4:40 PM, Mykel Kochenderfer wrote:

Oh, yeah. I forgot that you need the transition distribution
initialized. Maybe keyword it with the default being a call to the
various create functions. Then, if folks want to be memory efficient,
they can be... but allow simpler calls if not.

—
Reply to this email directly or view it on GitHub
#19 (comment).

etotheipluspi · 2015-08-26T16:32:10Z

Is everyone ok with the following syntax:

(r, sp, o) = execute_action!(sp::State, o::Observation, # being modified
                             rng::AbstractRNG, pomdp::POMDP, s::State, a::Action; # not modified
                             td=create_transition_distribution(pomdp), od=create_observation_distribution(pomdp) # optional kargs 
                             )

A tuple is being returned for the reward value. It's pretty messy, but we can update it later if everyone agrees on this for now.

zsunberg · 2015-08-26T16:36:30Z

I'm going to reply to this later this morning after taking care of a few
other things (you can wait for my input or just go ahead)

On Wed, Aug 26, 2015, 09:32 Maxim Egorov notifications@github.com wrote:

Is everyone ok with the following syntax:

(r, sp, o) = execute_action!(sp::State, o::Observation, # being modified
rng::AbstractRNG, pomdp::POMDP, s::State, a::Action; # not modified
td=create_transition_distribution(pomdp), od=create_observation_distribution # optional kargs
)

A tuple is being returned for the reward value. It's pretty messy, but we
can update it later if everyone agrees on this for now.

—
Reply to this email directly or view it on GitHub
#19 (comment).

mykelk · 2015-08-26T16:59:59Z

What about something like:

(reward, next_state, observation) = execute_action(rng::AbstractRNG, pomdp::POMDP, state::State, action::Action;
transistion_distribution=create_transition_distribution(pomdp), 
observation_distribution=create_observation_distribution(pomdp),
next_state=create_state(pomdp),
next_observation=create_observation(pomdp)
)

Note the lack of !.

etotheipluspi · 2015-08-26T17:55:15Z

I like Mykel's version better. Both next_state and next_observation would be modified in the function, and the function would still return pointers to them correct?

mykelk · 2015-08-26T18:02:39Z

Yep! I kind of like this style.

ebalaban · 2015-08-26T18:13:26Z

Works for me. I presume you meant 'next_observation' in the return tuple?

On 8/26/2015 11:02 AM, Mykel Kochenderfer wrote:

Yep! I kind of like this style.

—
Reply to this email directly or view it on GitHub
#19 (comment).

mykelk · 2015-08-26T18:21:06Z

Yep.

zsunberg · 2015-08-26T19:18:30Z

I think we should clarify something about this function. If I am interpreting execute_action correctly, it is simply a convenience* function that will act as a generative model for a pomdp defined by the standard interface**, and the implementation of it will be included in POMDPs.jl. Am I correct to think this?

If anyone is thinking of this as part of the actual interface, i.e. the problem-writer can define execute_action instead of transition!, observation!, etc., we should have a longer discussion about it. This parallel interface is exactly what I proposed to Mykel at Graphex, and we sort of decided that it wouldn't be worth it.

Conceptually, POMDPs.jl can be divided into two parts:

the standard interface, which consists of the functions that the problem-writer uses to define the problem, and
convenience functions, simulate and execute_action.
I think we should make this distinction clear in the documentation, or we should put the convenience functions in POMDPToolbox.

*convenient for the solver-writers, not any more convenient for the problem-writer
**by "standard interface" I mean reward, transition!, observation!, etc.

mykelk · 2015-08-26T19:28:16Z

Yes, it is a convenience function, not part of the interface. It is perhaps better to move this to POMDPToolbox so that folks don't get confused about it being part of the interface, but I'm happy to hear other opinions.

ebalaban · 2015-08-26T20:50:47Z

The way I understand POMDPToolbox.jl (and please feel free to correct me
if I am wrong) is that it is a container for the more high-level (and
more optional) POMDP-related tools (various interpolants, belief
updaters, etc). execute_action and simulate are more "fundamental", for
the lack of a better term, as almost every POMDP can take advantage of
them. My gut feeling is that POMDPs.jl is a better place for these
functions, but I am ok with them being moved to POMDPToolbox.jl as well.

On 8/26/2015 12:28 PM, Mykel Kochenderfer wrote:

Yes, it is a convenience function, not part of the interface. It is
perhaps better to move this to POMDPToolbox so that folks don't get
confused about it being part of the interface, but I'm happy to hear
other opinions.

—
Reply to this email directly or view it on GitHub
#19 (comment).

mykelk · 2015-08-26T20:56:13Z

I think the idea is to keep POMDPs.jl as clean, simple, and pure as possible without any implementation. That way folks can look at a page of code and see the entire API and not be overwhelmed. POMDPToolbox.jl would be a collection of commonly used utilities and so forth that makes POMDPs.jl easy to work with. I would anticipate that most solvers would import POMDPToolbox.

ebalaban · 2015-08-26T21:15:23Z

Ok, if that's the philosophy, I am fine with it. In that case both
execute_action and simulate should be moved to POMDPToolbox.

On 8/26/2015 1:56 PM, Mykel Kochenderfer wrote:

I think the idea is to keep POMDPs.jl as clean, simple, and pure as
possible without any implementation. That way folks can look at a page
of code and see the entire API and not be overwhelmed. POMDPToolbox.jl
would be a collection of commonly used utilities and so forth that
makes POMDPs.jl easy to work with. I would anticipate that most
solvers would import POMDPToolbox.

—
Reply to this email directly or view it on GitHub
#19 (comment).

zsunberg · 2015-08-26T21:33:44Z

There is an argument for keeping simulate() in POMDPs.jl as an exception to the rule - it serves as documentation. If someone wants to see how the interface functions are used, they can look at simulate(). Sometimes concise code is a good thing to augment text documentation (maybe I'm biased because I proposed writing simulate though :)).

I would be happy keeping simulate here or moving it and saying "refer to..." in the POMDPs.jl documentation.

mykelk · 2015-08-26T21:38:34Z

I was thinking that simulate would be kept abstract in POMDPs.jl, and POMDPToolbox.jl would provide some implementations. You might have different kinds of simulators---some that collect all sorts of diagnostic information, some that make neat displays, and other simulators that are just designed to run quickly.

The inspiration comes from RLGlue. See fig. 2 of this paper. You will see the "agent program" (policy), the "experiment program" (simulator), and the "environment program" (pomdp).

zsunberg · 2015-08-26T22:42:37Z

What exactly do you mean by "keeping simulate abstract"? Do you mean to
have it print an error like the other functions in POMDPs.jl now do by
default? Would problem-writers often want to redefine simulate for a
specific pomdp?

I would advocate for including a default easy-to-read implementation in
POMDPs.jl like we have now for clarity on what simulate should do. It
might be hard to unambiguously describe what simulate should do in text.

My initial thought is that a simulator function that does something other
than just simulate (like creating a display) should have a different name,
especially if simulate is meant to be used inside solvers.

We should really open a different issue for this if we discuss it further.

On Wed, Aug 26, 2015 at 2:38 PM Mykel Kochenderfer notifications@github.com
wrote:

I was thinking that simulate would be kept abstract in POMDPs.jl, and
POMDPToolbox.jl would provide some implementations. You might have
different kinds of simulators---some that collect all sorts of diagnostic
information, some that make neat displays, and other simulators that are
just designed to run quickly.

The inspiration comes from RLGlue. See fig. 2 of this paper
http://www.jmlr.org/papers/volume10/tanner09a/tanner09a.pdf. You will
see the "agent program" (policy), the "experiment program" (simulator), and
the "environment program" (pomdp).

—
Reply to this email directly or view it on GitHub
#19 (comment).

zsunberg · 2015-09-05T01:19:46Z

I think we decided to put this in POMDPToolbox. the simulate issue is separate.

mykelk mentioned this issue Aug 26, 2015

Abstract simulate #22

Closed

zsunberg closed this as completed Sep 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execute_action function? #19

execute_action function? #19

ebalaban commented Aug 22, 2015

etotheipluspi commented Aug 22, 2015

ebalaban commented Aug 22, 2015

tawheeler commented Aug 22, 2015

mykelk commented Aug 22, 2015

etotheipluspi commented Aug 22, 2015

mykelk commented Aug 22, 2015

ebalaban commented Aug 25, 2015

etotheipluspi commented Aug 26, 2015

zsunberg commented Aug 26, 2015

mykelk commented Aug 26, 2015

etotheipluspi commented Aug 26, 2015

mykelk commented Aug 26, 2015

ebalaban commented Aug 26, 2015

mykelk commented Aug 26, 2015

zsunberg commented Aug 26, 2015

mykelk commented Aug 26, 2015

ebalaban commented Aug 26, 2015

mykelk commented Aug 26, 2015

ebalaban commented Aug 26, 2015

zsunberg commented Aug 26, 2015

mykelk commented Aug 26, 2015

zsunberg commented Aug 26, 2015

zsunberg commented Sep 5, 2015

execute_action function? #19

execute_action function? #19

Comments

ebalaban commented Aug 22, 2015

etotheipluspi commented Aug 22, 2015

ebalaban commented Aug 22, 2015

tawheeler commented Aug 22, 2015

mykelk commented Aug 22, 2015

etotheipluspi commented Aug 22, 2015

mykelk commented Aug 22, 2015

ebalaban commented Aug 25, 2015

etotheipluspi commented Aug 26, 2015

zsunberg commented Aug 26, 2015

mykelk commented Aug 26, 2015

etotheipluspi commented Aug 26, 2015

mykelk commented Aug 26, 2015

ebalaban commented Aug 26, 2015

mykelk commented Aug 26, 2015

zsunberg commented Aug 26, 2015

mykelk commented Aug 26, 2015

ebalaban commented Aug 26, 2015

mykelk commented Aug 26, 2015

ebalaban commented Aug 26, 2015

zsunberg commented Aug 26, 2015

mykelk commented Aug 26, 2015

zsunberg commented Aug 26, 2015

zsunberg commented Sep 5, 2015