In [98]:
workspace()

In [6]:
include("PixelArts/PixelArts.jl")
include("RL/RL.jl")
using RL
using PixelArts



# Using RL and Pixel Arts

## Simple use

For a simple use we can create an environment and agent using the function RLEnv and RLAgent and their methods. The `fieldnames` function of julia gives the fields of the classes `RLEnv` and `RLAgent`. For a better description we may also apply the `validate` function to the empty objects `RLEnv()`  and `RLAgent()`. The `validate` function can be used for a more advanced used creating new environment types different that the minimal types `RLEnv` and `RLAgent` (more on this later). Order is important.

In [7]:
println(fieldnames(RLEnv))
println(fieldnames(RLAgent))

Symbol[:states, :actions, :step, :task_end]
Symbol[:policy, :state]


In [8]:
validate(RLEnv())

[1m[36mINFO: [39m[22m[36mField states::Any missing
[39m[1m[36mINFO: [39m[22m[36mField actions::Function (s) -> (A(s)) missing
[39m[1m[36mINFO: [39m[22m[36mField step::Function (s, a) -> (s', r) missing
[39m[1m[36mINFO: [39m[22m[36mField task_end::Function missing, for non-episodic tasks use s -> false
[39m

In [9]:
validate(RLAgent())

[1m[36mINFO: [39m[22m[36mField policy::Function (s) -> (a) missing from type
[39m

Here is an example: suppose there is a task where you play coin tosses. You start with one dollar, every time you loose you are allowed to bet the same quantity again or to double the bet double for the next round. The game stops when you win one game. The reward at each round is the money lost/earned.

In [153]:
# states will be [current_bet, last_game_status]
states = [2^n for n in 0:1000] # a large space of interger options
actions(state) = ["same_bet" , "double_bet"] # independent of state
function step(state, action)
    lost = rand() < 0.5
    current_bet = state[1] # default
    if lost 
        new_bet = (action == "double_bet") ? 2current_bet : current_bet 
        state = [new_bet, "lost"]
        reward = -current_bet
    else
        state = [0, "won"]
        reward = current_bet
    end
    return state, reward
end
task_end(state) = (state[2] == "won") ? true : false
tossgame = RLEnv(states, actions, step, task_end)


RL.RLEnv([1, 2, 4, 8, 16, 32, 64, 128, 256, 512  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], actions, step, task_end)

In [154]:
# Some examples
init_step = [1, "lost"] # bet 1 dollar
for i in 1:10
    new_state, reward = tossgame.step(init_step, "double_bet")
    println(" $(new_state[2]) last game receiving $reward now betting $(new_state[1]). Task ended? $(tossgame.task_end(new_state))")
end

 lost last game receiving -1 now betting 2. Task ended? false
 won last game receiving 1 now betting 0. Task ended? true
 won last game receiving 1 now betting 0. Task ended? true
 won last game receiving 1 now betting 0. Task ended? true
 won last game receiving 1 now betting 0. Task ended? true
 lost last game receiving -1 now betting 2. Task ended? false
 lost last game receiving -1 now betting 2. Task ended? false
 lost last game receiving -1 now betting 2. Task ended? false
 lost last game receiving -1 now betting 2. Task ended? false
 won last game receiving 1 now betting 0. Task ended? true


Let's create some agents

In [155]:
init_state = [-1, "lost"] # agents start betting one dollar
# Agent that always bets the same
policy_random(s) = rand(tossgame.actions(s))
agent_random = RLAgent(policy_random)

RL.RLAgent(policy_random, #undef)

In [156]:
# who is better: Agent simple
state = [1, "lost"] # bet 1 dollar
total_reward = 0
ntrials = 10
for i in 1:ntrials
    state = [1, "lost"]
    episode_reward = 0
    tosses = 0
    while !tossgame.task_end(state)
        action = agent_random.policy(state)
        state, reward = tossgame.step(state, action)
        episode_reward += reward
        tosses += 1 
    end
    total_reward += episode_reward
    println("Episode $i: received $episode_reward playing $tosses")
end
println("Agent average reward: $(total_reward/ntrials)")


Episode 1: received 1 playing 1
Episode 2: received 1 playing 1
Episode 3: received 1 playing 1
Episode 4: received 1 playing 1
Episode 5: received 1 playing 1
Episode 6: received 1 playing 1
Episode 7: received 1 playing 1
Episode 8: received -1 playing 3
Episode 9: received 1 playing 2
Episode 10: received 1 playing 2
Agent average reward: 0.8


# Preloaded environments

In [101]:
using PixelArts

## Right turn

In [103]:
right_turn, colour_array = get_env("right_turn")

(RL.RLEnv(Array{Int64,1}[[1, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [1, 7], [1, 8], [1, 9], [1, 10]  …  [18, 8], [18, 9], [18, 10], [18, 11], [18, 12], [18, 13], [18, 14], [18, 15], [18, 16], [18, 17]], RL.actions, RL.step, RL.task_end), String["green" "green" … "green" "green"; "green" "green" … "green" "green"; … ; "green" "green" … "green" "green"; "green" "green" … "green" "green"])

In [145]:
rtcanvas = create_canvas(size(colour_array)..., 400, 400)
add_pixels(rtcanvas, colour_array)

In [53]:
racer = add_pixel_cross(rtcanvas, 17, 4, "blue")

"pixel_crosssz9"

In [142]:
policy(state) = rand(right_turn.actions(state))
init_state = [18, 3]
agent = RLAgent(policy, init_state)

RL.RLAgent(policy, [18, 3])

for i in 1:100
    interact!(agent, right_turn)
    translate_element(racer, agent.state...)
    sleep(0.05)
end

In [107]:
agent.state = [18, 3]
s, r, a = episode!(agent, right_turn, verbose = true)

step: 0, choosing down, moving to state [18, 3], obtaining reward -1
step: 1, choosing up, moving to state [17, 3], obtaining reward -1
step: 2, choosing up, moving to state [16, 3], obtaining reward -1
step: 3, choosing right, moving to state [16, 4], obtaining reward -1
step: 4, choosing down, moving to state [17, 4], obtaining reward -1
step: 5, choosing down, moving to state [18, 4], obtaining reward -1
step: 6, choosing up, moving to state [17, 4], obtaining reward -1
step: 7, choosing down, moving to state [18, 4], obtaining reward -1
step: 8, choosing down, moving to state [18, 4], obtaining reward -1
step: 9, choosing up, moving to state [17, 4], obtaining reward -1
step: 10, choosing left, moving to state [17, 3], obtaining reward -1
step: 11, choosing right, moving to state [17, 4], obtaining reward -1
step: 12, choosing down, moving to state [18, 4], obtaining reward -1
step: 13, choosing right, moving to state [18, 5], obtaining reward -1
step: 14, choosing left, moving to 

(Array{Int64,1}[[18, 3], [18, 3], [17, 3], [16, 3], [16, 4], [17, 4], [18, 4], [17, 4], [18, 4], [18, 4]  …  [3, 13], [3, 14], [2, 14], [2, 15], [2, 16], [2, 17], [2, 16], [1, 16], [1, 16], [1, 15]], Any[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1  …  -1, -1, -1, -1, -10, -10, -10, -10, -10, 0], Any["down", "up", "up", "right", "down", "down", "up", "down", "down", "up"  …  "right", "right", "up", "right", "right", "right", "left", "up", "up", "left"])

### Q-matrix

In [131]:
Q = zeros(size(right_turn.states, 1), 4)
greedy = Dict()
n = 1
for (i, x) in enumerate(right_turn.states)
    action_set = right_turn.actions(x)
    for (j, action) in enumerate(action_set)
        totalr = 0
        for k in 1:n
            s, r, a = episode(agent, right_turn, x, action)
            totalr += (sum(r) - totalr) / k
        end
        Q[i, j] = totalr
    end
    val, k = findmax(Q[i,:])
    greedy[x] = action_set[k]
end

In [146]:
rotation = Dict("up" => 90, "down" => 270, "right" => 0, "left" => 180)
for key in keys(greedy) add_pixel_arrow(rtcanvas, key..., rotation[greedy[key]]) end

In [143]:
n = 10
for outi in 1:5
    Q = zeros(size(right_turn.states, 1), 4)
    greedy = Dict()
    for (i, x) in enumerate(right_turn.states)
        action_set = right_turn.actions(x)
        for (j, action) in enumerate(action_set)
            totalr = 0
            for k in 1:n
                s, r, a = episode(agent, right_turn, x, action)
                totalr += (sum(r) - totalr) / k
            end
            Q[i, j] = totalr
        end
        val, k = findmax(Q[i,:])
        greedy[x] = action_set[k]
    end
    old_greedy = copy(greedy)
    agent.policy(s) = (rand() < 0.1) ? rand(right_turn.actions(s)) : old_greedy[s]
end

## Circuit

In [36]:
circuit, colour_array = get_env("circuit")
circuit_canvas = create_canvas(size(colour_array)..., 360, 720)
add_pixels(circuit_canvas, colour_array)

In [37]:
agent = RLAgent(s -> rand(circuit.actions(s)))

RL.RLAgent(#1, #undef)

In [48]:
s, r, a = episode(agent, circuit, [16, 6], max_steps = 300)



(Array{Int64,1}[[16, 6], [17, 6], [18, 6], [17, 6], [18, 6], [18, 6], [17, 6], [16, 6], [16, 5], [15, 5]  …  [5, 7], [5, 8], [5, 7], [4, 7], [5, 7], [5, 6], [6, 6], [6, 5], [6, 4], [7, 4]], Any[-1, -1, -1, -1, -1, -1, -1, -1, -0.0703065, 0.0475831  …  -0.0454233, -0.0554985, 0.0554985, -1, 0.0949517, 0.0454233, 0.0964738, 0.0302938, -1, -1], Any["down", "down", "up", "down", "down", "up", "up", "left", "up", "right"  …  "right", "right", "left", "up", "down", "left", "down", "left", "left", "down"])

In [49]:
racer = add_pixel_cross(circuit_canvas, 16, 6, "blue")
for x in s translate_element(racer, x...); sleep(0.1) end

In [118]:
dd

LoadError: [91mMethodError: Cannot `convert` an object of type Int64 to an object of type Array{Any,(3, 1)}
This may have arisen from a call to the constructor Array{Any,(3, 1)}(...),
since type constructors fall back to convert methods.[39m