<a href="https://colab.research.google.com/github/comparativechrono/Principles-of-Data-Science/blob/main/Week_9/section_6__Python_Example__Simple_Reinforcement_Learning_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Section 6: python example - simple reinforcement learning model

Reinforcement learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward. This section provides a practical example of building a simple reinforcement learning model using Python, specifically implementing the Q-learning algorithm, a popular method for learning optimal policies in finite Markov decision processes (MDPs).

1. Setting Up the Environment:

To get started with our reinforcement learning example, ensure Python is equipped with essential libraries. If not already installed, add them via pip:

In [None]:
pip install numpy gym

Gym by OpenAI is a toolkit for developing and comparing reinforcement learning algorithms. It provides a variety of environments that simulate different physical and virtual settings, which we will use to test our RL agent.

2. Importing Required Libraries:

Import the necessary libraries for creating the environment and performing numerical computations:

In [None]:
import gym
import numpy as np
import random

3. Setting Up the RL Environment:

For this example, we'll use the "FrozenLake-v0" environment from Gym, which represents a grid world where an agent must navigate across a frozen lake without falling into holes:

In [None]:
# Create the FrozenLake environment
env = gym.make('FrozenLake-v1', is_slippery=False) # 'is_slippery: False' makes the environment deterministic
env.reset() # Reset the environment to the initial state

4. Implementing Q-Learning:

Q-learning is an off-policy learner that seeks to find the best action to take given the current state. It does this by updating Q-values (action-value pairs) using the equation:

Q(s,a)←Q(s,a)+α[r+γmaxa′​Q(s′,a′)−Q(s,a)]

where:

*  **s** is the current state,
*  **a** is the current action,
*  **r** is the reward received after executing the action,
*  **s′** is the new state after action aa,
*  **α** is the learning rate,
*  **γ** is the discount factor.

In [None]:
# Initialize the Q-table to zero
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set hyperparameters
alpha = 0.8  # Learning rate
gamma = 0.95  # Discount factor
num_episodes = 2000

# Q-learning algorithm
for i in range(num_episodes):
    state = env.reset()
    done = False

    while not done:
        # Choose an action by greedily picking from Q table (with noise)
        action = np.argmax(Q[state, :] + np.random.randn(1, env.action_space.n) * (1. / (i + 1)))

        # Take the action and observe the new state and reward
        new_state, reward, done, info = env.step(action)

        # Update Q-Table using the Bellman equation
        Q[state, action] = Q[state, action] + alpha * (reward + gamma * np.max(Q[new_state, :]) - Q[state, action])

        state = new_state

# Display the Q-table
print("Q-table:")
print(Q)


5. Testing the Agent:

After training, you can test the agent by making it navigate the environment using the learned Q-values. Please note this will not work in colab - you are best to run this in your own Python environment.

In [None]:
state = env.reset()
env.render()

for _ in range(1000):  # Limit the number of steps
    action = np.argmax(Q[state, :])  # Choose the best action from the Q-table
    new_state, reward, done, _ = env.step(action)
    env.render()  # Display the environment
    if done:
        break
    state = new_state

6. Conclusion:

This simple example demonstrates the implementation of a Q-learning agent in a deterministic version of the FrozenLake environment. Reinforcement learning, particularly Q-learning, offers a robust framework for teaching agents to perform complex tasks by learning optimal policies. While the example is basic, the principles can be extended to more complex and realistic environments. As such, reinforcement learning continues to be a valuable approach in the field of artificial intelligence, providing tools to solve diverse decision-making problems.