## **ε-Greedy Exploration Algorithm**

The ε-greedy algorithm is a simple and widely used strategy for balancing exploration and exploitation. In this approach, with probability ε, the agent chooses a random action (exploration), and with probability 1 - ε, the agent selects the action that maximizes the current value estimate.


**Imports**

In [3]:
import numpy as np
import gym

**Data Loading**

In [None]:
env = gym.make('CartPole-v1')
# Hyperparameters
epsilon = 0.1  # Exploration rate
alpha = 0.1  # Learning rate
gamma = 0.99  # Discount factor

# Q-table initialization
Q = np.zeros((env.observation_space.shape[0], env.action_space.n))

**Model Building**

In [None]:
def epsilon_greedy_policy(state):
    if np.random.rand() < epsilon:
        return np.random.choice(env.action_space.n)  # Explore
    else:
        return np.argmax(Q[state])  # Exploit

def q_learning(env, n_episodes=1000):
    for episode in range(n_episodes):
        state = env.reset()
        done = False
        while not done:
            action = epsilon_greedy_policy(state)  # Select action based on epsilon-greedy policy
            next_state, reward, done, _ = env.step(action)

            # Q-value update
            Q[state, action] += alpha * (reward + gamma * np.max(Q[next_state]) - Q[state, action])

            state = next_state

q_learning(env)