<a href="https://colab.research.google.com/github/Jhansipothabattula/Machine_Learning/blob/main/Day100.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Q-Learning

**Q-Learning**

Q-Learning is a model-free reinforcement learning algorithm used to find the optimal action-selection policy for a given problem. It learns by interacting with an environment, updating a Q-table (a matrix of state-action values), and maximizing the expected cumulative reward. Q-Learning is effective in problems where the environment can be represented by discrete states and actions

In [1]:
import numpy as np
import random

# Define the environment (4x4 grid)
num_states = 16  # 4x4 grid
num_actions = 4  # Up, Right, Down, Left
q_table = np.zeros((num_states, num_actions))

# Define the parameters
alpha = 0.1          # Learning rate
gamma = 0.9          # Discount factor
epsilon = 0.2        # Exploration rate
num_episodes = 1000

# Define a simple reward structure
rewards = np.zeros(num_states)
rewards[15] = 1  # Goal state with a reward

# Function to determine the next state based on the action
def get_next_state(state, action):
    if action == 0 and state >= 4:               # Up
        return state - 4
    elif action == 1 and (state + 1) % 4 != 0:   # Right
        return state + 1
    elif action == 2 and state < 12:             # Down
        return state + 4
    elif action == 3 and state % 4 != 0:         # Left
        return state - 1
    else:
        return state  # If action goes out of bounds, remain in the same state

# Q-Learning algorithm
for episode in range(num_episodes):
    state = random.randint(0, num_states - 1)  # Start from a random state
    while state != 15:  # Loop until reaching the goal state
        if random.uniform(0, 1) < epsilon:
            action = random.randint(0, num_actions - 1)  # Random action (exploration)
        else:
            action = np.argmax(q_table[state])           # Best known action (exploitation)

        next_state = get_next_state(state, action)       # Get the resulting state
        reward = rewards[next_state]                     # Get the reward for the next state
        old_value = q_table[state, action]               # Current Q-value
        next_max = np.max(q_table[next_state])           # Max Q-value for next state

        # Q-Learning update rule
        new_value = old_value + alpha * (reward + gamma * next_max - old_value)
        q_table[state, action] = new_value

        state = next_state # Move to the next state

# Display the learned Q-table
print("Learned Q-Table:")
print(q_table)

Learned Q-Table:
[[0.4020224  0.59049    0.36421062 0.38513481]
 [0.57665866 0.6561     0.52347809 0.49894848]
 [0.65059386 0.729      0.58464436 0.57671042]
 [0.72479514 0.72546503 0.81       0.65246629]
 [0.53144096 0.35525816 0.27318436 0.29910465]
 [0.59049    0.4186653  0.33124861 0.31882168]
 [0.6560995  0.49618813 0.41007105 0.40397628]
 [0.71562109 0.80356867 0.9        0.57959449]
 [0.47828442 0.15764974 0.17108197 0.20759823]
 [0.53143891 0.522374   0.19078838 0.20795275]
 [0.4983186  0.89999993 0.31319413 0.31636065]
 [0.80672235 0.89795832 1.         0.80672197]
 [0.42866037 0.13114445 0.08510469 0.07988133]
 [0.47663759 0.23396905 0.14567781 0.03832717]
 [0.80755639 0.40951    0.15587229 0.11805592]
 [0.         0.         0.         0.        ]]
