# Introduction
optimizing resource allocation in mobile devices using Reinforcement Learning (RL) with Deep Q-Learning (DQL). The objective is to decide whether a task should be offloaded or executed locally on a mobile device.


**in order to:** Maximize the performance and resource efficiency (e.g., battery life, execution time) of mobile devices

**Environment:** Mobile device with varying battery levels, bandwidth availability, and task sizes.


## setup


In [None]:
# @title Install required packages { display-mode: "form" }
# @markdown This may take a minute to complete.
%%capture
!pip install tensorflow
!pip install numpy
!pip install matplotlib


In [None]:
# @title Import required packages (run me) { display-mode: "form" }
%%capture
import numpy as np
import random
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
from collections import deque # for data structures
import gym # reinforcement learning environments
import matplotlib.pyplot as plt # graph plotting library
import os

# Hide warnings
import warnings
warnings.filterwarnings('ignore')

# Environment
starting by defining an environment where the state includes the current load on the device, the available bandwidth, and other relevant parameters. The actions will be to either offload to the cloud or execute locally.
1. States: (Represents the current status of the mobile device and network.)
  - Battery level of the mobile device.
  - Available bandwidth.
  - Task size.
2. Actions: (Possible actions the agent can take)
  - Offload the task to the cloud.
  - Execute the task locally on the mobile device.
3. Reward Function: (Determines the immediate feedback from the environment after taking an action)
  - Positive reward for reducing energy consumption.
  - Positive reward for faster task completion.
  - Negative reward for high latency or failure to offload.


In [None]:
class Environment:
    def __init__(self):
        self.initial_battery_level = 100
        self.initial_bandwidth = 10
        self.battery_level = self.initial_battery_level # Battery level in percentage
        self.bandwidth = self.initial_bandwidth # Network bandwidth in Mbps
        self.task_size = np.random.randint(1, 10) # Task size in bytes

    def reset(self):
        self.battery_level = self.initial_battery_level
        self.bandwidth = self.initial_bandwidth
        self.task_size = np.random.randint(1, 10)
        return np.array([self.battery_level, self.bandwidth, self.task_size])

    def step(self, action):
        if action == 0:
            reward, done = self.offload_task() # Offload to cloud
        else:
            reward, done = self.execute_task() # Execute locally
        self.task_size = np.random.randint(1, 10)
        next_state = np.array([self.battery_level, self.bandwidth, self.task_size])
        return next_state, reward, done

    def offload_task(self):
        self.battery_level -= 5 # Offloading consumes less battery and more bandwidth
        self.bandwidth -= 2
        reward = (self.battery_level / 10) - self.bandwidth
        if self.battery_level <= 0 or self.bandwidth <= 0:
            return reward-10, True
        return reward, False

    def execute_task(self):
        self.battery_level -= 10 # Local execution consumes more battery and no bandwidth
        reward = (self.battery_level / 10)
        if self.battery_level <= 0:
            return reward-10, True
        return reward, False

    def render(self):
        print(f"Battery Level: {self.battery_level}")
        print(f"Bandwidth: {self.bandwidth}")
        print(f"Task Size: {self.task_size}")


## Deep Q-Learning (DQN) agent
Building a Deep Q-Learning (DQN) agent involves using a neural network to approximate the Q-values for each state-action pair. This neural network, often referred to as the Q-network, takes the current state as input and outputs Q-values for each possible action.

**we select Deep Q-Learning (DQL) because it is suitable for problems with large state and action spaces.**

In [None]:
class DQLAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95
        self.epsilon = 1.0
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = self.build_model()

    def build_model(self):
        model = tf.keras.Sequential()
        model.add(tf.keras.layers.Dense(24, input_dim=self.state_size, activation='relu'))
        model.add(tf.keras.layers.Dense(24, activation='relu'))
        model.add(tf.keras.layers.Dense(self.action_size, activation='linear'))
        model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(lr=self.learning_rate))
        return model

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_size)
        q_values = self.model.predict(state)
        return np.argmax(q_values[0])

    def replay(self, batch_size):
        minibatch = random.sample(self.memory, batch_size)
        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
                target = reward + self.gamma * np.amax(self.model.predict(next_state)[0])
            target_f = self.model.predict(state)
            target_f[0][action] = target
            self.model.fit(state, target_f, epochs=1, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

    def load(self, name):
        self.model.load_weights(name)

    def save(self, name):
        self.model.save_weights(name)


Explaining Implementation of the RL Model:
1. Initialize the Environment:
  - Create a simulation environment for the mobile device and MEC setup.

2. Initialize the Q-Network:
  - Define the architecture of the neural network used to approximate the Q-values.

3. Experience Replay:
  - Store past experiences (state, action, reward, next state) in a replay buffer to break the correlation between consecutive experiences.

4. Target Network:
 - Use a separate target network to stabilize training by periodically updating it with the weights of the main Q-network.

5. Training Loop:
 - For each episode, initialize the state.
 - For each step in the episode, select an action using an epsilon-greedy policy.
 - Execute the action and observe the reward and next state.
 - Store the experience in the replay buffer.
 - Sample a mini-batch of experiences from the replay buffer.
 - Compute the target Q-value.
 - Update the Q-network by minimizing the loss between the predicted and target Q-values.
 - Update the state to the next state.
 - Periodically update the target network.

## Main

In [None]:
def evaluate_agent(agent, env, episodes=100):
    total_rewards = []
    total_battery_consumed = []
    total_tasks_completed = []
    total_latency = []

    for e in range(episodes):
        state = env.reset()
        state = np.reshape(state, [1, state_size])
        episode_reward = 0
        battery_consumed = 0
        tasks_completed = 0
        latency = 0
        done = False
        while not done:
            action = agent.act(state)
            next_state, reward, done = env.step(action)
            next_state = np.reshape(next_state, [1, state_size])
            episode_reward += reward
            battery_consumed += (env.initial_battery_level - env.battery_level)
            tasks_completed += 1
            latency += 1
            state = next_state

        total_rewards.append(episode_reward)
        total_battery_consumed.append(battery_consumed)
        total_tasks_completed.append(tasks_completed)
        total_latency.append(latency)

    avg_reward = np.mean(total_rewards)
    avg_battery_consumed = np.mean(total_battery_consumed)
    avg_tasks_completed = np.mean(total_tasks_completed)
    avg_latency = np.mean(total_latency)

    print(f"Average Reward: {avg_reward}")
    print(f"Average Battery Consumed: {avg_battery_consumed}")
    print(f"Average Tasks Completed: {avg_tasks_completed}")
    print(f"Average Latency: {avg_latency}")

    # Plotting the rewards over episodes
    plt.plot(total_rewards)
    plt.xlabel('Episodes')
    plt.ylabel('Total Reward')
    plt.title('Total Rewards over Episodes')
    plt.show()

def test_agent(agent, state):
    state = np.reshape(state, [1, state_size])
    action = agent.act(state)
    action_name = "Offload" if action == 0 else "Execute Locally"
    print(f"Given state: {state}, Action chosen: {action_name}")

if __name__ == "__main__":
    env = Environment()
    state_size = 3  # Battery level, bandwidth, task size
    action_size = 2  # Offload or execute locally
    agent = DQLAgent(state_size, action_size)
    episodes = 1000 # 100  # Reduced number of episodes for quicker training
    batch_size = 32

    rewards = []

    for e in range(episodes):
        state = env.reset()
        state = np.reshape(state, [1, state_size])
        total_reward = 0
        for time in range(200):
            action = agent.act(state)
            next_state, reward, done = env.step(action)
            reward = reward if not done else -10
            next_state = np.reshape(next_state, [1, state_size])
            agent.remember(state, action, reward, next_state, done)
            state = next_state
            total_reward += reward
            if done:
                print(f"episode: {e}/{episodes}, score: {time}, e: {agent.epsilon:.2}")
                break
            if len(agent.memory) > batch_size:
                agent.replay(batch_size)
        rewards.append(total_reward)
        # Optionally, save the model weights
        agent.save("dqn_mcc_model.h5")

    # Plot the training rewards
    plt.plot(rewards)
    plt.xlabel('Episode')
    plt.ylabel('Total Reward')
    plt.title('Training Rewards over Episodes')
    plt.show()

    # Evaluate the agent
    evaluate_agent(agent, env, episodes=100)

    # Test the agent with a specific state
    test_state = np.array([20, 8, 5])  # Example state: 80% battery, 8 bandwidth, task size 5
    test_agent(agent, test_state)




[1;30;43mStreaming output truncated to the last 5000 lines.[0m
episode: 83/1000, score: 9, e: 0.02
episode: 84/1000, score: 9, e: 0.019
episode: 85/1000, score: 9, e: 0.019
episode: 86/1000, score: 9, e: 0.018
episode: 87/1000, score: 9, e: 0.017
episode: 88/1000, score: 9, e: 0.016
episode: 89/1000, score: 9, e: 0.016
episode: 90/1000, score: 9, e: 0.015
episode: 91/1000, score: 9, e: 0.014


## Test
test the trained agent with a specific state and print whether it decides to offload or execute the task locally

In [None]:
# Test the agent with a specific state
def test_agent(agent, state):
    state = np.reshape(state, [1, agent.state_size])
    action = agent.act(state)
    action_name = "Offload" if action == 0 else "Execute Locally"
    print(f"Given state: {state}, Action chosen: {action_name}")

if __name__ == "__main__":
    # Load the trained agent model
    state_size = 3  # Battery level, bandwidth, task size
    action_size = 2  # Offload or execute locally
    agent = DQLAgent(state_size, action_size)
    agent.load("dqn_mcc_model.h5")  # Make sure the model weights are saved in this file during training

    # Test the agent with a specific state
    test_state1 = np.array([20, 8, 5])  # Example state: 20% battery, 8 bandwidth, task size 5
    test_agent(agent, test_state1)

    test_state1 = np.array([80, 2, 12])  # Example state: 80% battery, 2 bandwidth, task size 12
    test_agent(agent, test_state1)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Conclusion : Deployment and Integration of the Trained Reinforcement Learning Model

In this project, we developed a Reinforcement Learning (RL) agent using Deep Q-Learning (DQL) to optimize task offloading and execution on mobile devices. The agent aims to maximize battery efficiency while minimizing network usage in order to enhanced battery life, efficient network usage, and improved overall performance of mobile applications.

- We trained the RL agent using a simulated environment, which included states representing battery level, bandwidth, and task size.
- The reward function was designed to encourage actions that maximize battery level and minimize network usage, balancing the trade-offs between local execution and offloading tasks.
- Through training, the agent learned to make decisions that optimize the overall system performance.

- Training progress was visualized using matplotlib to plot rewards over episodes, providing insights into the agent's learning curve and performance improvements.

- Once trained and evaluated, the model is ready for deployment. The deployment involves embedding the trained RL model into the mobile device's task scheduler.

- The integrated trained model into the mobile device's task scheduler, enabling it to make real-time decisions about whether to offload tasks or execute them locally.
- This integration ensures efficient and real-time decision-making, leveraging the learned policies to optimize battery usage and minimize network consumption dynamically.