<a href="https://colab.research.google.com/github/Armin-Abdollahi/Machine-Learning/blob/main/Q_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Q-Learning

Q-Learning is a model-free reinforcement learning algorithm used to find the optimal actionselection policy for a given problem. It learns by interacting with an environment, updating a Q-table (a matrix of state-action values), and maximizing the expected cumulative reward. Q-Learning is e8ective in problems where the environment can be represented by discrete states and actions.

Here’s a simple implementation of Q-Learning in a grid environment:

In [1]:
import numpy as np
import random

# Define the environment (4x4 grid)
num_states = 16  # 4x4 grid
num_actions = 4  # Up, Right, Down, Left
q_table = np.zeros((num_states, num_actions))

# Define the parameters
alpha = 0.1  # Learning rate
gamma = 0.9  # Discount factor
epsilon = 0.2  # Exploration rate
num_episodes = 1000

# Define a simple reward structure
rewards = np.zeros(num_states)
rewards[15] = 1  # Goal state with reward

# Define possible actions for each state (0: Up, 1: Right, 2: Down, 3: Left)
actions = {
    0: [(4, 1)], 1: [(5, 1), (0, 3)], 2: [(6, 1), (1, 3)], 3: [(7, 1), (2, 3)],
    4: [(0, 0), (8, 1), (5, 3)], 5: [(1, 0), (4, 2), (6, 1), (9, 3)], 6: [(2, 0), (5, 2), (7, 1), (10, 3)], 7: [(3, 0), (6, 2), (11, 1)],
    8: [(4, 0), (12, 1), (9, 3)], 9: [(5, 0), (8, 2), (10, 1), (13, 3)], 10: [(6, 0), (9, 2), (11, 1), (14, 3)], 11: [(7, 0), (10, 2), (15, 1)],
    12: [(8, 0), (13, 1)], 13: [(9, 0), (12, 2), (14, 1)], 14: [(10, 0), (13, 2), (15, 1)], 15: [(11, 0), (14, 2)]
}

# Q-Learning algorithm
for episode in range(num_episodes):
    state = random.randint(0, num_states - 1)  # Start from a random state
    while state != 15:  # Loop until reaching the goal state
        if random.uniform(0, 1) < epsilon:
            next_state, action = random.choice(actions[state])  # Exploration
        else:
            action_values = [q_table[state, action[1]] for action in actions[state]]
            action_index = np.argmax(action_values)
            next_state, action = actions[state][action_index]

        reward = rewards[next_state]
        old_value = q_table[state, action]
        next_max = np.max(q_table[next_state])

        # Q-Learning update rule
        new_value = old_value + alpha * (reward + gamma * next_max - old_value)
        q_table[state, action] = new_value

        state = next_state  # Move to the next state

# Display the learned Q-Table
print("Learned Q-Table:")
print(q_table)

Learned Q-Table:
[[0.         0.59039776 0.         0.        ]
 [0.         0.65607715 0.         0.24417907]
 [0.         0.72880909 0.         0.43138882]
 [0.         0.80976195 0.         0.35251017]
 [0.38871906 0.43473295 0.         0.6561    ]
 [0.53841595 0.729      0.54922315 0.53314126]
 [0.62550393 0.79200721 0.62143845 0.81      ]
 [0.51579154 0.9        0.60517024 0.        ]
 [0.5901829  0.29707423 0.         0.34652622]
 [0.65609942 0.55581246 0.31245698 0.19701784]
 [0.72576144 0.9        0.58636115 0.71988224]
 [0.80911148 1.         0.80851334 0.        ]
 [0.1479626  0.6535386  0.         0.        ]
 [0.43949131 0.72899826 0.22177056 0.        ]
 [0.81       0.6861894  0.55963409 0.        ]
 [0.         0.         0.         0.        ]]
