Solving "Battleship" with POMDPS.jl/BasicPOMCP (legal action spaces that change overtime) #335

sreejank · 2021-02-09T23:19:36Z

Hi,
Fantastic job with this library. Looks really nice. I am trying to implement a POMDP problem that is pretty much a scaled down version of the battleship problem in the original Silver & Veness POMCP paper (https://papers.nips.cc/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html).

Each state in the state space is a 3x3 grid (or an array of 9 numbers) of one's and zeros that denote the location of the battleships, where 1 are "hit" tiles and 0 are "miss" tiles. There are 9 actions corresponding to each of the tiles in the 3x3 grid. There are two observations, "hit" or "miss."

This should be a simple POMDP to implement, but the catch is that the agent cannot click on a tile twice until the state has changed (i.e. as long as the agent is in state s, each action can only be chosen once). I've implemented this as a field of the POMDP struct called "board" which is an array of -1 (tile not chosen by agent yet), 0 (tile chosen but miss), and 1 (tile chosen and hit). I am using the gen function to implement the transition so that I can generate the next state, observation, and reward simultaneously. In the gen function, once the agent has gotten all hit tiles, the board is re-cleared into -1's and the state changes. The code is pretty short and can be seen below.

I tried using BasicPOMCP on this problem, but I am noticing that the board isnt getting cleared across state changes. This is probably because of the rather "hack-y" way I have implemented checking if an agent has taken a specific action or not.

What would be the best way to implement this problem? I briefly considered making the entire board the observation, but I am not sure if I am able to pass the current observation into the gen function, only the state and action...

The larger question here would be what is the best way to implement constraints on the action space of the agent based on the current history/observation?

Thanks for the help in advance!

struct GridClickPOMDP <:POMDP{Array{Int64},Int64,Bool}
	board::Array{Int64}
	discount::Float64
	GridClickPOMDP()=new([-1 for n in 1:9],1.0)
end

state_matrix=npzread("data/state_space.npy")
state_space=[r[:] for r in eachrow(state_matrix)]
state_idxs=Dict()
for i=1:length(state_space)
	state_idxs[state_space[i]]=i
end



state_probs=npzread("data/state_probs.npy")

POMDPs.states(pomdp::GridClickPOMDP)=state_space 
POMDPs.stateindex(pomdp::GridClickPOMDP,s::Array{Int64,1})=state_idxs[s] 

POMDPs.actions(pomdp::GridClickPOMDP)=collect(1:9)
POMDPs.actionindex(pomdp::GridClickPOMDP,a::Int64)=a  



function POMDPs.gen(pomdp::GridClickPOMDP,s,a,rng)
	if pomdp.board[a]!=-1
		return(sp=s,o=false,r=-200)
	else
		pomdp.board[a]=s[a]
		obs=(s[a]==1)
		if length(pomdp.board[pomdp.board.==1])==length(s[s.==1])
			s_next=state_space[rand(Categorical(state_probs))]
			for i=1:9
				pomdp.board[i]=-1
			end
			return(sp=s_next,o=obs,r=10)
		else
			if obs
				rew=1
			else
				rew=-1
			end
			return(sp=s,o=obs,r=rew)
		end
	end
end
POMDPs.initialstate_distribution(pomdp::GridClickPOMDP)=SparseCat(state_space,state_probs)
POMDPs.discount(pomdp::GridClickPOMDP)=1.0

The text was updated successfully, but these errors were encountered:

zsunberg · 2021-02-10T00:24:30Z

Going to make this a discussion - hope we can provide some help!

zsunberg closed this as completed Feb 10, 2021

JuliaPOMDP locked and limited conversation to collaborators Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Solving "Battleship" with POMDPS.jl/BasicPOMCP (legal action spaces that change overtime) #335

Solving "Battleship" with POMDPS.jl/BasicPOMCP (legal action spaces that change overtime) #335

sreejank commented Feb 9, 2021 •

edited

Loading

zsunberg commented Feb 10, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Solving "Battleship" with POMDPS.jl/BasicPOMCP (legal action spaces that change overtime) #335

Solving "Battleship" with POMDPS.jl/BasicPOMCP (legal action spaces that change overtime) #335

Comments

sreejank commented Feb 9, 2021 • edited Loading

zsunberg commented Feb 10, 2021

This issue was moved to a discussion.

sreejank commented Feb 9, 2021 •

edited

Loading