Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent distribution semantics (change initialstate? change action?) #308

Closed
zsunberg opened this issue Jun 11, 2020 · 7 comments · Fixed by #307
Closed

Consistent distribution semantics (change initialstate? change action?) #308

zsunberg opened this issue Jun 11, 2020 · 7 comments · Fixed by #307
Labels
Milestone

Comments

@zsunberg
Copy link
Member

I just added ImplicitDistribution to POMDPModelTools. This makes it easier to create a distribution object when you can't write down the explicit distribution, but only have a function to sample from the distribution.

We may want to change a couple things in light of this:

1. initialstate

This makes initialstate(m, rng) somewhat unnecessary because now it is easy enough to do

initialstate_distribution(m::MyPOMDP) = ImplicitDistribution(rng -> #=whatever used to be in initialstate=#)

I see two options for eliminating one of these redundant functions:

  1. Make initialstate return a distribution
    • Pros: Consistent with transition and observation
    • Cons:
      • Inconsistent with previous versions (but luckily there is a clear deprecation pattern: @deprecate initialstate(m, rng) rand(rng, initialstate(m)) and @deprecate initialstate_distribution(m) initialstate(m))
      • Breaks QuickMDP(initialstate=1, ...)
  2. Get rid of initalstate altogether (keep initialstate_distribution)
    • Pros: No change to meaning of initialstate
    • Cons: Inconsistent with transition and observation
  3. Get rid of initialstate and initialstate_distribution and create a new function initial(m::Union{MDP, POMDP}) that returns the initial state distribution.
    • Pros: Easy deprecation pattern and no confusion
    • Cons: Name is a bit less clear
    • Notes: Still need to figure out what to do about initialobs

2. action

If we are going to make everything distribution-focused, should we also make action return a distribution?

Options:

  1. Make no changes
    • Pros: No changes
    • Cons:
      • Inconsistent with transition and observation (but this does not seem too bad since policies are different from POMDPs)
      • No standard interface for stochastic policies.
  2. Make action return a distribution
    • Pros: Consistent with transition and observation
    • Cons:
      • Big change with no clear deprecation pattern
      • rand(action(policy, b)) does not feel that clean
      • Breaks a bunch of things like FunctionPolicy
  3. Add action_distribution(policy, b) or just distribution(policy, b)
    • Pros: Standard interface for stochastic policies
    • Cons: Inconsistent with transition and observation
    • Notes: Some details would need to be ironed out, e.g. should we have action(policy, b, rng)? What is the default fallback pattern?

Though I'm very undecided at this point, my initial feeling is that we should introduce initial (option (3)) and not make any changes to action (option (1)).

@zsunberg zsunberg added this to the 0.9 milestone Jun 11, 2020
@lassepe
Copy link
Member

lassepe commented Jun 12, 2020

For initialstate I would actually prefer (1); i.e. have initialstate return a distribution. I think initial is a weird name and it is not clear when reading the code that this refers to the state (plus the issue with initialobs). I think it would be more clear if initial takes a trait-like type:

abstract type InitialType end

struct InitialState <: IntialType end
struct InitialObs <: InitialType end

initial(m::Union{MDP, POMDP}, ::InitialType) = ...

But this is also not really consistent with transition and observation.


For action I would leave things as they are right now and not change anything.

@zsunberg
Copy link
Member Author

Yeah, I think (1) might result in the best final outcome... It just might be a little bumpy to transition to it.

@MaximeBouton
Copy link
Contributor

I am more in favor of 1.1, but can you clarify the following:

How would one define a problem using the generative interface with (1)?

  • initialstate would return a distribution
  • gen would return a named tuple

I think it is a bit inconsistent? one would need a call to rand and not the other?
I almost see three ways of defining a problem now: explicit distribution (everything return distribution with pdfs defined), implicit distribution, generative.
In the first two approaches you need to create a distribution object and call rand (whether it is explicit or implicit) and with the generative approach you just sample without calling rand.

For 1.3, I think initial is too vague.

@lassepe
Copy link
Member

lassepe commented Jun 12, 2020

I guess that with the ImplicitDistribution we are stepping towards #269 and it would be a really consistent way of describing things. Maybe 0.9 should be used to deprecate the gen syntax all together?

@zsunberg
Copy link
Member Author

zsunberg commented Jun 13, 2020

Thanks for the thoughts! this is very helpful!

initialstate could be thought of as a static part of the problem, like states or discount (see #237 ). Most POMDP writers actually needed to implement initialstate_distribution anyways to use it with a particle filter. Gen is worth considering independently (#309).

@zsunberg
Copy link
Member Author

Moving forward in development and documentation with initialstate returning a distribution. This will make some details a lot simpler and I am very happy about it!

@zsunberg
Copy link
Member Author

Another question is what do we do about initialobs. Options are:

  1. Eliminate it in favor of observation(m, s)
  2. Have initialobs be completely separate from observation(m, s), but still return a distribution.
  3. Have initialobs fall back to observation(m, s)

I guess my vote is for 2. It is only really for the reinforcement learning case and it is not the observation for a particular step, so it is a different concept. Making it fall back adds some difficulties with throwing errors.

@zsunberg zsunberg mentioned this issue Jul 8, 2020
zsunberg added a commit that referenced this issue Jul 27, 2020
* fix #250

* travis only tests 1.1 and 1

* removed inferred_in_latest

* removed all of the old deprecated generative stuff

* removed ddn code

* before removing programatic deprecation macros

* tests pass

* before switching back to master

* initial steps

* tests pass

* started

* got rid of errors, switched to distribution initialstate (#308)

* DDNOut -> Val

* brought back DDNOut

* tests pass

* working on docs

* working on docs

* cleaned up example

* a bit more cleanup

* finished documentation to fix #280

* added deprecation case for when initialstate_distribution is implemented

* Changed emphasis of explit/generative explanation

* Update README.md

* fixed typo

* Update docs/src/def_solver.md

Co-authored-by: Jayesh K. Gupta <mail@rejuvyesh.com>

* Update runtests.jl

* moved available() and add_registry() to deprecated.jl

* Update def_pomdp.md

Co-authored-by: Jayesh K. Gupta <mail@rejuvyesh.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants