#### Installation

The simplest way to install gymnasium is to use *pip*. This is easily done, apart from a slight option activation to get all the available environments installed, using the code below:

In [None]:
!pip install -r requirements.txt

Collecting gymnasium
  Using cached gymnasium-0.27.1-py3-none-any.whl (883 kB)
Collecting jax-jumpy>=0.2.0
  Using cached jax_jumpy-0.2.0-py3-none-any.whl (11 kB)
Collecting gymnasium-notices>=0.0.1
  Using cached gymnasium_notices-0.0.1-py3-none-any.whl (2.8 kB)
Collecting cloudpickle>=1.2.0
  Using cached cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Installing collected packages: gymnasium-notices, jax-jumpy, cloudpickle, gymnasium
Successfully installed cloudpickle-2.2.1 gymnasium-0.27.1 gymnasium-notices-0.0.1 jax-jumpy-0.2.0
Note: you may need to restart the kernel to use updated packages.


In [6]:
import gymnasium as gym
import pygame
import sys
import argparse
from tools.qlearning import *
import numpy as np

In [10]:
q_table = np.fromfile("data/q_table.dat", dtype=float)
q_table.shape

(64,)

You may also have to install the *pygame* library as well.

#### Interacting with the environment

The gymnasium library is a collection of test problems, often called **environments**, that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms. Let's have a look at the CartPole environments and play a bunch of games.s

In [None]:
env = gym.make("FrozenLake-v1", render_mode="human")
observation, info = env.reset(seed=42)
num_actions = env.action_space.n
num_states = env.observation_space.n



In [3]:
q_table = np.zeros((16, 4))

In [5]:


while True:
   keys = pygame.key.get_pressed()
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)
   result = env.step(action)
   print(result)
   if terminated or truncated:
      observation, info = env.reset()
   env.reset()
   if keys[pygame.K_ESCAPE]:
      env.close()

(4, 0.0, False, False, {'prob': 0.3333333333333333})
(0, 0.0, False, False, {'prob': 0.3333333333333333})
(4, 0.0, False, False, {'prob': 0.3333333333333333})
(5, 0.0, True, False, {'prob': 0.3333333333333333})
(4, 0.0, False, False, {'prob': 0.3333333333333333})
(0, 0.0, False, False, {'prob': 0.3333333333333333})
(1, 0.0, False, False, {'prob': 0.3333333333333333})


error: video system not initialized

In [None]:
num_actions = env.action_space.n
num_states = env.observation_space.n
q_table = np.zeros((num_states, num_actions))

In [3]:

import numpy as np
from tqdm import *


def q_train_greedy(env, alpha=0.9, gamma=0.95, max_epsilon=1, min_epsilon=0.001, decay_rate=0.001, max_n_steps=100, n_episodes=100):
    """ Q–learning algorithm (epsilon-greedy)
    
    """
    num_actions = env.action_space.n
    num_states = env.observation_space.n
    q_table = np.zeros((num_states, num_actions))


    rewards = []

    epsilon = max_epsilon

    print()
    print("Starting Q-learning algorithm...")
    for episode in trange(n_episodes):
        s = env.reset()
        total_reward = 0
        for i in range(max_n_steps):
            U = np.random.uniform(0, 1)
            if U < epsilon:
                a = env.action_space.sample() # selecting action a at random from A 
            else:
                a = np.argmax(q_table[s]) # Select action a given X following policy derived from q;
            
            s_new, r, done, _ , _= env.step(a)
            q_table[s, a] = (1-alpha)*q_table[s, a] + alpha*(r + gamma*np.max(q_table[s_new]) - q_table[s, a])
            
            s, total_reward = s_new, total_reward+r

            # if X is a terminal state then go to next episode;
            if done: 
                rewards.append(total_reward) 
                epsilon = min_epsilon + (max_epsilon-min_epsilon)*np.exp(-decay_rate*episode)
                break
    env.close()

In [4]:
q_train_greedy(env)

Starting Q-learning algorithm...


  0%|          | 0/100 [00:00<?, ?it/s]


error: display Surface quit