Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove latest observation from CommonRLEnv POMDP state #516

Merged
merged 4 commits into from
Sep 11, 2023

Conversation

johannes-fischer
Copy link
Contributor

I find it inconsistent that state(::MDPCommonRLEnv) returns the state (converted to type RLO) but state(::POMDPCommonRLEnv) returns a state-observation tuple, without converting anything. Since the observation is not really part of the state, I think it makes sense to not provide it as part of state(::POMDPCommonRLEnv). Additionally, the state should also be optionally converted to another type, just like the observation.

This PR implements the proposed changes.

Add RLS parameter for optional state type conversion. Return only
converted state instead of state-observation tuple in `RL.state()`
@johannes-fischer johannes-fischer changed the title Remove last observation from CommonRLEnv POMDP state Remove latest observation from CommonRLEnv POMDP state Aug 23, 2023
@zsunberg
Copy link
Member

The original reason for including the observation in the env state is: If the observation is not included, someone can call observe(env) multiple times and get different results, and by calling observe many times, could get a better estimate of the state than is actually "allowed" by the POMDP.

I suppose that perhaps this is more confusing than beneficial though.

Does hearing the original reason change your opinion at all @johannes-fischer ?

The best path forward would probably be to make it an option.

@johannes-fischer
Copy link
Contributor Author

I get the reason for including the observation in the POMDPCommonRLEnv struct, which I left unchanged. This allows to return the same observation for subsequent calls to observe(env). However, this does not require to also return the observation as part of state(env) does it?

With this PR, subsequent calls to observe will still return the same observation. Getting different observations for the same env state can only happen with

observe(env)
state = state(env)
setstate!(state) # calls `initialobs`
observe(env)

I don't think this will be called accidentally, so I would not worry about this potential state information leak. The same information leak is accessible in the current implementation by using o = rand(initialobs(env.m, state(env)[1]).

@zsunberg
Copy link
Member

zsunberg commented Sep 9, 2023

Sorry for the delay - the semester started up last week.

I see what you're saying now and you're definitely right. Thanks for the fix! The only thing I changed is that you no longer need CommonRLInterface.@provide. will merge once the tests pass!

@johannes-fischer
Copy link
Contributor Author

No worries, sounds good!

@zsunberg zsunberg merged commit 851d26e into JuliaPOMDP:master Sep 11, 2023
4 checks passed
@johannes-fischer johannes-fischer deleted the common_rl_pomdp_state branch September 21, 2023 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants