# REINFORCEMENT LEARNING
---

In [1]:
versioninfo() # -> v"1.11.1"

Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = /home/mhamdi/torch/install/lib:/home/mhamdi/torch/install/lib:/home/mhamdi/torch/install/lib:
  DYLD_LIBRARY_PATH = /home/mhamdi/torch/install/lib:/home/mhamdi/torch/install/lib:/home/mhamdi/torch/install/lib:
  JULIA_NUM_THREADS = 8


In [None]:
using ReinforcementLearning
using Flux: Descent

Define the environment

In [None]:
env = RandomWalk1D()

Instantiate the agent

In [None]:
agent = Agent(
    policy = QBasedPolicy(
        learner = TDLearner(
            approximator = TabularQApproximator(
                n_state = 11,
                n_action = 2,
                init = 0.0,
                opt = Descent(0.1) # Learning rate
            ),
            method = :SARSA,
            γ = 0.99
        ),
        explorer = EpsilonGreedyExplorer(0.1),
    ),
    trajectory = VectorSARTTrajectory(),
)

Run the experiment

In [None]:
hook = TotalRewardPerEpisode()
run(agent, env, StopAfterEpisode(10_000), hook)

Print rewards

In [None]:
println("Total reward per episode:")
println(hook.rewards)

Print `Q-table`

In [None]:
q_table = agent.policy.learner.approximator.table
println("\nLearned Q-table:")
println(q_table)