Consistent distribution semantics (change initialstate? change action?) #308

zsunberg · 2020-06-11T22:23:21Z

I just added ImplicitDistribution to POMDPModelTools. This makes it easier to create a distribution object when you can't write down the explicit distribution, but only have a function to sample from the distribution.

We may want to change a couple things in light of this:

1. `initialstate`

This makes initialstate(m, rng) somewhat unnecessary because now it is easy enough to do

initialstate_distribution(m::MyPOMDP) = ImplicitDistribution(rng -> #=whatever used to be in initialstate=#)

I see two options for eliminating one of these redundant functions:

Make initialstate return a distribution
- Pros: Consistent with transition and observation
- Cons:
  - Inconsistent with previous versions (but luckily there is a clear deprecation pattern: @deprecate initialstate(m, rng) rand(rng, initialstate(m)) and @deprecate initialstate_distribution(m) initialstate(m))
  - Breaks QuickMDP(initialstate=1, ...)
Get rid of initalstate altogether (keep initialstate_distribution)
- Pros: No change to meaning of initialstate
- Cons: Inconsistent with transition and observation
Get rid of initialstate and initialstate_distribution and create a new function initial(m::Union{MDP, POMDP}) that returns the initial state distribution.
- Pros: Easy deprecation pattern and no confusion
- Cons: Name is a bit less clear
- Notes: Still need to figure out what to do about initialobs

2. `action`

If we are going to make everything distribution-focused, should we also make action return a distribution?

Options:

Make no changes
- Pros: No changes
- Cons:
  - Inconsistent with transition and observation (but this does not seem too bad since policies are different from POMDPs)
  - No standard interface for stochastic policies.
Make action return a distribution
- Pros: Consistent with transition and observation
- Cons:
  - Big change with no clear deprecation pattern
  - rand(action(policy, b)) does not feel that clean
  - Breaks a bunch of things like FunctionPolicy
Add action_distribution(policy, b) or just distribution(policy, b)
- Pros: Standard interface for stochastic policies
- Cons: Inconsistent with transition and observation
- Notes: Some details would need to be ironed out, e.g. should we have action(policy, b, rng)? What is the default fallback pattern?

Though I'm very undecided at this point, my initial feeling is that we should introduce initial (option (3)) and not make any changes to action (option (1)).

The text was updated successfully, but these errors were encountered:

lassepe · 2020-06-12T09:42:32Z

For initialstate I would actually prefer (1); i.e. have initialstate return a distribution. I think initial is a weird name and it is not clear when reading the code that this refers to the state (plus the issue with initialobs). I think it would be more clear if initial takes a trait-like type:

abstract type InitialType end

struct InitialState <: IntialType end
struct InitialObs <: InitialType end

initial(m::Union{MDP, POMDP}, ::InitialType) = ...

But this is also not really consistent with transition and observation.

For action I would leave things as they are right now and not change anything.

zsunberg · 2020-06-12T16:41:44Z

Yeah, I think (1) might result in the best final outcome... It just might be a little bumpy to transition to it.

MaximeBouton · 2020-06-12T17:40:51Z

I am more in favor of 1.1, but can you clarify the following:

How would one define a problem using the generative interface with (1)?

initialstate would return a distribution
gen would return a named tuple

I think it is a bit inconsistent? one would need a call to rand and not the other?
I almost see three ways of defining a problem now: explicit distribution (everything return distribution with pdfs defined), implicit distribution, generative.
In the first two approaches you need to create a distribution object and call rand (whether it is explicit or implicit) and with the generative approach you just sample without calling rand.

For 1.3, I think initial is too vague.

lassepe · 2020-06-12T19:38:37Z

I guess that with the ImplicitDistribution we are stepping towards #269 and it would be a really consistent way of describing things. Maybe 0.9 should be used to deprecate the gen syntax all together?

zsunberg · 2020-06-13T01:13:28Z

Thanks for the thoughts! this is very helpful!

initialstate could be thought of as a static part of the problem, like states or discount (see #237 ). Most POMDP writers actually needed to implement initialstate_distribution anyways to use it with a particle filter. Gen is worth considering independently (#309).

zsunberg · 2020-06-13T20:15:20Z

Moving forward in development and documentation with initialstate returning a distribution. This will make some details a lot simpler and I am very happy about it!

zsunberg · 2020-06-13T22:52:53Z

Another question is what do we do about initialobs. Options are:

Eliminate it in favor of observation(m, s)
Have initialobs be completely separate from observation(m, s), but still return a distribution.
Have initialobs fall back to observation(m, s)

I guess my vote is for 2. It is only really for the reinforcement learning case and it is not the observation for a particular step, so it is a different concept. Making it fall back adds some difficulties with throwing errors.

* fix #250 * travis only tests 1.1 and 1 * removed inferred_in_latest * removed all of the old deprecated generative stuff * removed ddn code * before removing programatic deprecation macros * tests pass * before switching back to master * initial steps * tests pass * started * got rid of errors, switched to distribution initialstate (#308) * DDNOut -> Val * brought back DDNOut * tests pass * working on docs * working on docs * cleaned up example * a bit more cleanup * finished documentation to fix #280 * added deprecation case for when initialstate_distribution is implemented * Changed emphasis of explit/generative explanation * Update README.md * fixed typo * Update docs/src/def_solver.md Co-authored-by: Jayesh K. Gupta <mail@rejuvyesh.com> * Update runtests.jl * moved available() and add_registry() to deprecated.jl * Update def_pomdp.md Co-authored-by: Jayesh K. Gupta <mail@rejuvyesh.com>

zsunberg added the decision label Jun 11, 2020

zsunberg added this to the 0.9 milestone Jun 11, 2020

zsunberg mentioned this issue Jun 13, 2020

Role of gen and @gen #309

Closed

zsunberg added a commit that referenced this issue Jun 24, 2020

got rid of errors, switched to distribution initialstate (#308)

1da2236

zsunberg mentioned this issue Jul 8, 2020

Version 0.9 #307

Merged

zsunberg closed this as completed in #307 Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent distribution semantics (change initialstate? change action?) #308

Consistent distribution semantics (change initialstate? change action?) #308

zsunberg commented Jun 11, 2020

lassepe commented Jun 12, 2020

zsunberg commented Jun 12, 2020

MaximeBouton commented Jun 12, 2020

lassepe commented Jun 12, 2020

zsunberg commented Jun 13, 2020 •

edited

Loading

zsunberg commented Jun 13, 2020

zsunberg commented Jun 13, 2020

Consistent distribution semantics (change initialstate? change action?) #308

Consistent distribution semantics (change initialstate? change action?) #308

Comments

zsunberg commented Jun 11, 2020

1. initialstate

2. action

lassepe commented Jun 12, 2020

zsunberg commented Jun 12, 2020

MaximeBouton commented Jun 12, 2020

lassepe commented Jun 12, 2020

zsunberg commented Jun 13, 2020 • edited Loading

zsunberg commented Jun 13, 2020

zsunberg commented Jun 13, 2020

1. `initialstate`

2. `action`

zsunberg commented Jun 13, 2020 •

edited

Loading