This repository contains the source code for my submission to the CoderOne AI challenge in December 2020.
Our bot uses a method that has been applied to control AI in other games such as Brogue, known as Dijkstra maps within the gaming space. These models have also been referred to as potential or flow methods in other domains. For a write-up demonstrating the theory and the usefulness of these methods, see here. Our code makes use of a Python implemention written by Kehvarl.
To populate a Dijkstra map, the following algorithm is applied:
- Set each grid cell to a default value of 100
- Given a set of goals, set the value of each goal to 0
- For each cell
c
in the grid:- If there is an adjacent cell
c_a
with a difference greater than 1, setc= c_a + 1
- Else do nothing
- If there is an adjacent cell
- Repeat 1-3 until no changes are recorded
The end result will be a gridmap with 0s at our goal locations, and increasing values as we move away from the goal. For example, a 5x5 grid with a goal at the center would return:
43234
32123
21012
32123
43234
The same grid with goals at each corner would look like:
01210
12321
23432
12321
01210
To navigate a Dijkstra map, our agent decides to move to the lowest adjacent value. If the lowest value is the current position, then the agent will remain in the same location. If there are multiple options, the agent randomly chooses between them. In this way, the pathfinding problem has been reduced to "rolling downhill". Note that the default value of 0 can be replaced with any other chosen value. Negative goal values will have the agent seek out the goal from further away.
A key observation is in the second example, an agent placed at the centre can navigate to any of the four goals. This demonstrates that the agent already has some capacity to decide between different goals (even if our decision process is random at this point).
Given two Dijkstra maps D_1
and D_2
, we can define the sum elemntwise by D_3[i][j] = D_1[i][j] + D_2[i][j]
. Taking our previous examples, this yields
43234 01210 44444
32123 12321 44444
21012 + 23432 = 44444
32123 12321 44444
43234 01210 44444
The resulting uniform map is not particularly useful, as the agent will simply wander the field randomly. To remedy this, we define multiplication of a Dijkstra map D
by a scalar a
by aD[i][j] = D[i][j] * a
. Multiplying a map by a constant can be thought of multiplying the height of the hills in the Dijkstra map, so that the agent will be more likely to navigate towards the goals in maps with higher coefficients.
Our agent relies on the following logic to decide it's action for each turn.
- If the previous action hasn't been completed, skip its turn (this accounts for desync conditions)
- Update a list of bombs in play and the tiles that will be hit with an explosion
- Update an internal list of 'desires' based on game state
- Generate a list of Dijkstra maps based on game state
- Create a final 'goal' Dijkstra map a linear combination of the maps from step 3, with coefficients calculated in step 2
- Check which surrounding tile the agent can move to based on its current position
- If the action returned is to do nothing, the agent checks if it should place a bomb
- If the action returned would move the agent into the opponent, the agent does nothing
The following functions are used to generate the maps in step 4:
This map has the goals set to be the tiles on the map that will yield points if a bomb were to be placed there. The function takes into account that explosions can't travel through walls, and the health of blocks. If there are less than 6 blocks remaining on the map, only tiles one block away will have goal tiles set.
This map is used to give our agent a general aversion to being near bombs. Using just this map, the agent would try and flee from all bombs on the map.
If we were to simply set the positions of bombs as goals and multiply the map by a negative coefficient, our agent would simply pathfind into corners of rooms. By taking the resulting map, multiplying it by a number in the vicinity of -1.2 and passing it through the algorithm again we achieve a map with smarter fleeing behaviour. Passing the map through the algorithm again essentially places goals at parts of the map farther away so that our agent won't get trapped between bombs and walls.
This map sets every game tile that isn't in the explosion radius of a bomb as a goal. This gives our agent a notion of the tiles that are safe to walk on.
Similar to the previous function, this map only sets tiles to be safe if they're outside the explosion radius of a bomb that is close to exploding. This will allow our agent to avoid being in the explosion of a bomb that is close to expiring.
This map is calculated after the goal Dijkstra map is created. This is to account for the fact that explosions last a tick after the bomb explodes.
This map places goals at the locations of ammo and treasure pickups. Ammo is weighted slightly higher than treasure.
Similar to the ammo map, this map only places goals on treasure drops.
This map places goals in a square ranging from (3,3) to (8,6). This is used in the late game to keep our agent near the center of the map.
This map works the same was as our bomb flee map, but flees from the opponent agent.
Before placing a bomb, our agent checks whether any of the following conditions fail:
- The agent will receive points if it were to place a bomb, or it will damage an ore block
- There is a path to a safe tile