# SephsBiome Adaptive Learning Model using Deep Q-Networks

## Introduction

Welcome to the SephsBiome project's advanced learning model, where we leverage the robust capabilities of Deep Q-Networks (DQNs) to facilitate adaptive, intelligent behaviors in complex environments. This notebook serves as a foundational piece in our project's journey towards integrating deep reinforcement learning with evolutionary algorithms, setting a new standard for autonomous learning and adaptation.

### The Role of Reinforcement Learning in SephsBiome

In the realm of SephsBiome, reinforcement learning is not just a tool; it's a cornerstone that underpins our model's ability to interact with, learn from, and adapt to dynamic environments. The incorporation of DQNs elevates this process, offering a sophisticated approach to handle high-dimensional data and complex decision-making scenarios.

### Objective of This Notebook

In this notebook, we aim to:
1. Introduce the concepts and mechanics of Deep Q-Networks.
2. Demonstrate the implementation of a DQN model.
3. Showcase the model's training and evaluation within a sample environment representative of the challenges faced in the SephsBiome project.

Let's embark on this journey by setting up our environment and delving into the world of DQNs, a pivotal element in the evolution of the SephsBiome project.


In [1]:
%pip install gym tensorflow numpy matplotlib

Defaulting to user installation because normal site-packages is not writeable


ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow

[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.


In [2]:
import gym
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten
from tensorflow.keras.optimizers import Adam

ModuleNotFoundError: No module named 'tensorflow'

### Setting Up the Environment
We will use a standard environment for our demonstration. Let's initialize it.

In [None]:
env = gym.make('BipedalWalker-v3')
state_size = env.observation_space.shape[0]
action_size = env.action_space.n

With the environment set up, we can now proceed to discuss the components of the DQN algorithm.

## DQN Components
In this section, we will explore the key components of a Deep Q-Network.
### Neural Network Architecture
The neural network in DQN acts as a function approximator for our Q-value. For our CartPole example, we'll use a simple network with fully connected layers.

In [1]:

    model = Sequential()
    model.add(Flatten(input_shape=(1, state_size)))  # Adjust the input shape as per the BipedalWalker environment
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(action_size, activation='linear'))  # Ensure action size matches the BipedalWalker environment
    

### Experience Replay
Experience replay allows our DQN to learn from past experiences, stored in a replay buffer. This helps in stabilizing the learning process.

In [None]:
class ReplayBuffer:
    def __init__(self, buffer_size):
        self.buffer = collections.deque(maxlen=buffer_size)

    def add(self, state, action, reward, next_state, done):
        self.buffer.append((state, action, reward, next_state, done))

    def sample(self, batch_size):
        minibatch = random.sample(self.buffer, batch_size)
        return map(np.array, zip(*minibatch))


### Exploration vs Exploitation
A key challenge in RL is the trade-off between exploration (trying new things) and exploitation (using known information). This is often managed using an ε-greedy strategy.

In [None]:
def choose_action(state, epsilon):
    if np.random.rand() <= epsilon:
        return random.randrange(action_size)
    else:
        q_values = model.predict(state)
        return np.argmax(q_values[0])

With these components in place, we are ready to build and train our DQN model.

## Implementing DQN
Now that we have discussed the components of DQN, let's implement it.
### Building the DQN Model
Using the function `build_model` we defined earlier, we can create our DQN model.

In [None]:
model = build_model(state_size, action_size)
model.compile(loss='mse', optimizer=Adam(lr=0.001))

### Defining the Replay Buffer
We instantiate our ReplayBuffer class for storing and sampling experiences.

In [None]:
replay_buffer = ReplayBuffer(buffer_size=100000)


### Setting Up the Environment
We have already initialized our environment in the Setup section. We will now define additional parameters for our DQN agent.

In [None]:
epsilon = 1.0          # Exploration rate
epsilon_min = 0.01     # Minimum exploration rate
epsilon_decay = 0.995  # Decay rate for exploration
batch_size = 64        # Batch size for training

With the model, replay buffer, and environment set up, we are ready to train our DQN agent.

## Training the DQN Agent
Training a DQN agent involves interacting with the environment and using the gathered experiences to improve our policy.
### Training Loop
Below is the main loop for training our DQN agent.

In [None]:
num_episodes = 1000  # Total number of episodes for training

for e in range(num_episodes):
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    total_reward = 0

    for time in range(500):
        action = choose_action(state, epsilon)
        next_state, reward, done, _ = env.step(action)
        next_state = np.reshape(next_state, [1, state_size])
        
        replay_buffer.add(state, action, reward, next_state, done)
        
        state = next_state
        total_reward += reward
        
        if len(replay_buffer.buffer) > batch_size:
            experiences = replay_buffer.sample(batch_size)
            
# Extracting experiences
states, actions, rewards, next_states, dones = replay_buffer.sample(batch_size)

# Predicting Q-values for the current states and next states
q_values = model.predict(states)
next_q_values = model.predict(next_states)

# Q-Learning update rule
target_q_values = rewards + gamma * np.max(next_q_values, axis=1) * (1 - dones)

# Preparing target and enabling gradient update only for the chosen actions
targets_full = q_values
indices = np.arange(batch_size)
targets_full[indices, actions] = target_q_values

# Performing a gradient descent step
model.fit(states, targets_full, epochs=1, verbose=0)


        if done:
            print(f"Episode: {e}/{num_episodes}, Score: {total_reward}")
            break

    if epsilon > epsilon_min:
        epsilon *= epsilon_decay

### Tracking Learning Progress
We can track the agent's learning progress by plotting the total rewards obtained in each episode.

In [None]:
plt.plot([i for i in range(num_episodes)], [total_rewards[i] for i in range(num_episodes)])
plt.ylabel('Total Rewards')
plt.xlabel('Episodes')
plt.show()

After training, we can evaluate the performance of our DQN agent.

## Evaluation and Visualization
Once the DQN agent is trained, it's important to evaluate its performance.
### Evaluating the Trained Model
We can test the trained model by running it on the environment without exploration.

In [None]:
for e in range(10):  # Test for 10 episodes
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    total_reward = 0

    for time in range(500):
        action = np.argmax(model.predict(state)[0])
        next_state, reward, done, _ = env.step(action)
        next_state = np.reshape(next_state, [1, state_size])
        state = next_state
        total_reward += reward
        if done:
            print(f"Test Episode: {e}, Score: {total_reward}")
            break



### Visualizing Performance Metrics
Plotting the rewards or other metrics over time can give insights into the learning process and performance.
```python
# Plotting code for performance metrics
```

In [None]:
plt.plot(total_rewards)
plt.xlabel('Episodes')
plt.ylabel('Reward')
plt.title('Rewards per Episode')
plt.show()

In [None]:
model.save('SEPHDQN_model.h5')  # Save the model