Taking account of selected action in next action space #455

DonalOCois · 2023-01-24T15:07:48Z

DonalOCois
Jan 24, 2023

Hi all,
I have been working on something for a few days now that I hope is somehow doable, but I haven't found a solution.
Basically what I want to do is to take account of the action that I have selected in which actions will be available in the next time step.

So, my code is as follows. I have 4 elements in my state space, the first two representing the position of a robot on an x-y axis and the latter two representing an internal state , which is based on whatever action was selected. I think of it like steering your car to the left being the action, and the internal state variables here (isx and isy) representing the position of your steering wheel.
In the actions function, the available action space gets updated on the basis of those internal and also external state variables.

mdp = QuickMDP(
	function gen(s, a, rng)
		x, y, isx, isy = s
		xₚ = clamp(x+a[1], -10, 10)
		yₚ = clamp(y+a[2], -10, 10)
                isxₚ = clamp(isx+a[1], -1, 1)
		isyₚ = clamp(isy+a[2], -1, 1)
return (sp=[xₚ, yₚ, isxₚ, isyₚ])	
	end,



actions = function ((a=[0,0]),(s=(0.,0.,1.,0.)))
if s[3] == 1
         if s[1] >= 5
                  return [[-1,1], [-1,-1]]
         else 
                 return [[1,0]]
end
elseif.. _(other possible movements)_
end
end,

This doesn't seem to work unfortunately. To test it, I choose s[3] to be 1 by default as you see. Therefore the first if statement is true. The second if statement should also become true when the position of the robot on the x-axis (s[1]) reaches 5. This does not happen, and instead the code carries on returning [1,0] as the action space. In other words, after the first iteration, the action space does not update anymore (is not returned by the second snippet), to take account of the changed state space.

So my question is simply if anyone sees something wrong with my approach here. If anyone does have a suggestion (which would be hugely appreciated of course), please do give the most explicit detail you can. Sadly, if there is a mistake to be made, I am quite capable of finding it!
Thanks very much
Donal

Answered by zsunberg

Jan 27, 2023

Hi @DonalOCois , I think you have the right idea - you should augment the state space to include the previous action if the current action space depends on the previous action (the steering wheel you mention is the right idea).

However, it doesn't look like you are implementing it quite right. It might be easier to debug and the syntax might be easier to understand if you break the function out separately from the QuickMDP constructor like this:

function my_actions(s = #=default state=#)
    # return the action space
end

mdp = QuickMDP(
    #...
    actions = my_actions,
    #...
)

Then you can debug the my_actions function more easily. Note that the function should only take one argume…

View full answer

zsunberg · 2023-01-27T04:21:57Z

zsunberg
Jan 27, 2023
Maintainer

Hi @DonalOCois , I think you have the right idea - you should augment the state space to include the previous action if the current action space depends on the previous action (the steering wheel you mention is the right idea).

However, it doesn't look like you are implementing it quite right. It might be easier to debug and the syntax might be easier to understand if you break the function out separately from the QuickMDP constructor like this:

function my_actions(s = #=default state=#)
    # return the action space
end

mdp = QuickMDP(
    #...
    actions = my_actions,
    #...
)

Then you can debug the my_actions function more easily. Note that the function should only take one argument that is the state.

11 replies

zsunberg Feb 15, 2023
Maintainer

This example of mountaincar implements a generative model: https://github.com/JuliaPOMDP/QuickPOMDPs.jl/blob/master/examples/mountaincar.jl. The first argument of a QuickMDP is interpreted as the gen function if no keyword is given. Then a solver or simulator could, for example call @gen(:sp)(mountaincar, (0.0, 0.0), -1.0) to get the next state. Does that answer your question?

DonalOCois Feb 17, 2023
Author

Well I can see how that returns the next state, given the specific states and actions that you pass to it. I can also see how you might use a for loop to iterate over a bunch of possible state-action pairs and get 'sp' returned for each of them.

But, and sorry if it should be obvious, it's not clear to me how that's used with solvers, which seem to require the mdp itself to be passed to them so that they can iterate over many possible state-action pairs. I.e something like: policy = solve(solver, mountaincar)
Could you show how a solver would do it in this case?

zsunberg Feb 24, 2023
Maintainer

Ah, I think I might understand what you're getting at. Are you saying that it's only possible to reach a small number of states? For example, if the state is represented by a real number, but it is only possible to reach 1.1, 2.3, 4.7, and 6.1 ?

DonalOCois Feb 24, 2023
Author

Simpler than that - it's just a syntax issue. You said that a "solver or simulator could, for example call @gen(:sp)(mountaincar, (0.0, 0.0), -1.0) to get the next state". I'm just asking how that would explicitly be done, because in examples that I have found, the "solve" function calls a given solver and MDP.

zsunberg Mar 10, 2023
Maintainer

Here is an example of where it is used in MCTS: https://github.com/JuliaPOMDP/MCTS.jl/blob/397195917faf35d02aa9ffe13eaec428fd7c1861/src/vanilla.jl#L287

Note that since MCTS is an online solver, no actual work is carried out in the solve(::MCTSSolver, ::MDP) function. All of the planning is done online in the action(::MCTSPlanner, state) function. That way @gen is only called on states that are reached within the MCTS tree, so there is no need to enumerate all of the state space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking account of selected action in next action space #455

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Taking account of selected action in next action space #455

DonalOCois Jan 24, 2023

Replies: 1 comment · 11 replies

zsunberg Jan 27, 2023 Maintainer

zsunberg Feb 15, 2023 Maintainer

DonalOCois Feb 17, 2023 Author

zsunberg Feb 24, 2023 Maintainer

DonalOCois Feb 24, 2023 Author

zsunberg Mar 10, 2023 Maintainer

DonalOCois
Jan 24, 2023

Replies: 1 comment 11 replies

zsunberg
Jan 27, 2023
Maintainer

zsunberg Feb 15, 2023
Maintainer

DonalOCois Feb 17, 2023
Author

zsunberg Feb 24, 2023
Maintainer

DonalOCois Feb 24, 2023
Author

zsunberg Mar 10, 2023
Maintainer