Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving "Battleship" with POMDPS.jl/BasicPOMCP (legal action spaces that change overtime) #335

Closed
sreejank opened this issue Feb 9, 2021 · 1 comment

Comments

@sreejank
Copy link

sreejank commented Feb 9, 2021

Hi,
Fantastic job with this library. Looks really nice. I am trying to implement a POMDP problem that is pretty much a scaled down version of the battleship problem in the original Silver & Veness POMCP paper (https://papers.nips.cc/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html).

Each state in the state space is a 3x3 grid (or an array of 9 numbers) of one's and zeros that denote the location of the battleships, where 1 are "hit" tiles and 0 are "miss" tiles. There are 9 actions corresponding to each of the tiles in the 3x3 grid. There are two observations, "hit" or "miss."

This should be a simple POMDP to implement, but the catch is that the agent cannot click on a tile twice until the state has changed (i.e. as long as the agent is in state s, each action can only be chosen once). I've implemented this as a field of the POMDP struct called "board" which is an array of -1 (tile not chosen by agent yet), 0 (tile chosen but miss), and 1 (tile chosen and hit). I am using the gen function to implement the transition so that I can generate the next state, observation, and reward simultaneously. In the gen function, once the agent has gotten all hit tiles, the board is re-cleared into -1's and the state changes. The code is pretty short and can be seen below.

I tried using BasicPOMCP on this problem, but I am noticing that the board isnt getting cleared across state changes. This is probably because of the rather "hack-y" way I have implemented checking if an agent has taken a specific action or not.

What would be the best way to implement this problem? I briefly considered making the entire board the observation, but I am not sure if I am able to pass the current observation into the gen function, only the state and action...

The larger question here would be what is the best way to implement constraints on the action space of the agent based on the current history/observation?

Thanks for the help in advance!

struct GridClickPOMDP <:POMDP{Array{Int64},Int64,Bool}
	board::Array{Int64}
	discount::Float64
	GridClickPOMDP()=new([-1 for n in 1:9],1.0)
end

state_matrix=npzread("data/state_space.npy")
state_space=[r[:] for r in eachrow(state_matrix)]
state_idxs=Dict()
for i=1:length(state_space)
	state_idxs[state_space[i]]=i
end



state_probs=npzread("data/state_probs.npy")

POMDPs.states(pomdp::GridClickPOMDP)=state_space 
POMDPs.stateindex(pomdp::GridClickPOMDP,s::Array{Int64,1})=state_idxs[s] 

POMDPs.actions(pomdp::GridClickPOMDP)=collect(1:9)
POMDPs.actionindex(pomdp::GridClickPOMDP,a::Int64)=a  



function POMDPs.gen(pomdp::GridClickPOMDP,s,a,rng)
	if pomdp.board[a]!=-1
		return(sp=s,o=false,r=-200)
	else
		pomdp.board[a]=s[a]
		obs=(s[a]==1)
		if length(pomdp.board[pomdp.board.==1])==length(s[s.==1])
			s_next=state_space[rand(Categorical(state_probs))]
			for i=1:9
				pomdp.board[i]=-1
			end
			return(sp=s_next,o=obs,r=10)
		else
			if obs
				rew=1
			else
				rew=-1
			end
			return(sp=s,o=obs,r=rew)
		end
	end
end
POMDPs.initialstate_distribution(pomdp::GridClickPOMDP)=SparseCat(state_space,state_probs)
POMDPs.discount(pomdp::GridClickPOMDP)=1.0
@zsunberg
Copy link
Member

Going to make this a discussion - hope we can provide some help!

@JuliaPOMDP JuliaPOMDP locked and limited conversation to collaborators Feb 10, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants