# REINFORCEMENT LEARNING PROJECT
*This is the notebook for the eighth project of the AI Engineering Master with Professio AI*

It is organized in 2 sections:
1. SARSA algorithm 
2. DDQN algorithm

## IDS-Game Environment Exploration

### Import Libraries

In [1]:
import sys
import os
import time
import warnings
warnings.filterwarnings('ignore')

from src.environment.explorer import IdsGameExplorer, IDSGAME_ENV_MAPPING

### Show the available environments

In [2]:
print("Available IdsGame Environments:")

for env_name in IDSGAME_ENV_MAPPING.keys():
    print(f"- {env_name}")

Available IdsGame Environments:
- idsgame-random-attack-v0
- idsgame-maximal-attack-v0
- idsgame-minimal-defense-v0
- idsgame-random-defense-v0


### Environmental metric analysis

In [2]:
explorer = IdsGameExplorer("idsgame-random-attack-v0")

metrics = explorer.get_environment_metrics()

print("\nEnvironment Metrics:")
print(f"Number of nodes: {metrics.num_nodes}")
print(f"Number of attack types: {metrics.num_attack_types}")
print(f"Maximum value: {metrics.max_value}")
print(f"\nDefense position shape: {metrics.defense_position.shape}")
print(f"Attack position shape: {metrics.attack_position.shape}")

2024-11-13 13:27:07.249 Python[31781:17204670] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/8d/09j2zdrx7klfl7nffqz630000000gn/T/org.python.python.savedState



Environment Metrics:
Number of nodes: 3
Number of attack types: 10
Maximum value: 9

Defense position shape: (11,)
Attack position shape: (11,)


  logger.deprecation(


### Analyse the space of actions

In [3]:
action_info = explorer.explore_action_space()

print("\nAction Space Analysis:")
for key, value in action_info.items():
    print(f"{key}: {value}")


Action Space Analysis:
type: <class 'gymnasium.spaces.discrete.Discrete'>
shape: ()
sample: 17
n: 30


### Explore some state transitions

In [4]:
transitions = explorer.analyze_state_transition(num_steps=5, render=True)

print("\nAnalyzing State Transitions:")
for transition in transitions['transitions']:
    print(f"\nStep {transition['step']}:")
    print(f"Actions taken:")
    print(f"  Attack: {transition['action']['attack']}")
    print(f"  Defense: {transition['action']['defense']}")
    print(f"Reward received: {transition['reward']}")
    print(f"Episode terminated: {transition['terminated']}")
    print(f"State changed: {transition['observation_change']}")

time.sleep(10)
explorer.close()

2024-11-13 13:27:13.098 Python[31781:17204670] +[IMKClient subclass]: chose IMKClient_Modern
2024-11-13 13:27:13.157 Python[31781:17204670] +[IMKInputSession subclass]: chose IMKInputSession_Modern



Analyzing State Transitions:

Step 1:
Actions taken:
  Attack: 27
  Defense: 1
Reward received: (0, 0)
Episode terminated: False
State changed: True

Step 2:
Actions taken:
  Attack: 22
  Defense: 3
Reward received: (0, 0)
Episode terminated: False
State changed: True

Step 3:
Actions taken:
  Attack: 24
  Defense: 16
Reward received: (-1, 1)
Episode terminated: True
State changed: True

Step 4:
Actions taken:
  Attack: 27
  Defense: 5
Reward received: (0, 0)
Episode terminated: False
State changed: True

Step 5:
Actions taken:
  Attack: 19
  Defense: 3
Reward received: (0, 0)
Episode terminated: False
State changed: True


## SARSA algorithm

## DDQN algorithm

In [4]:
import gymnasium as gym
import gym_idsgame
import numpy as np
from gym_idsgame.envs.idsgame_env import IdsGameRandomAttackV0Env

env_name = "idsgame-random-attack-v0"
if env_name not in gym.envs.registry:
    gym.register(
        id=env_name,
        entry_point="gym_idsgame.envs.idsgame_env:IdsGameRandomAttackV0Env",
        kwargs={"idsgame_config": None, "save_dir": None, "initial_state_path": None}
    )

def explore_environment():
    try:
        env = gym.make("idsgame-random-attack-v0")
        
        print("\n=== Environment Information ===")
        print(f"Observation Space Type: {type(env.observation_space)}")
        print(f"Action Space Type: {type(env.action_space)}")
        print(f"Action Space: {env.action_space}")
        
        initial_obs, _ = env.reset()
        print("\n=== Initial Observation ===")
        print(f"Type: {type(initial_obs)}")
        print(f"Shape: {initial_obs.shape}")
        
        print("\n=== Testing Random Actions ===")
        for i in range(3):
            attack_action = env.action_space.sample()
            defense_action = env.action_space.sample()
            action = (attack_action, defense_action)
            
            obs, reward, terminated, truncated, info = env.step(action)
            
            print(f"\nStep {i+1}:")
            print(f"Action taken - Attack: {attack_action}, Defense: {defense_action}")
            print(f"Reward: {reward}")
            print(f"Terminated: {terminated}")
            print(f"Truncated: {truncated}")
            print(f"Info: {info}")
            print(f"New Observation Shape: {obs.shape}")
            
            try:
                env.render()
            except Exception as e:
                print(f"Rendering failed: {e}")
                
            if terminated or truncated:
                obs, _ = env.reset()
        
        env.close()
        
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        print(f"Available environments: {list(gym.envs.registry.keys())}")
        raise e

if __name__ == "__main__":
    explore_environment()
    print("\nExploration completed!")

2024-11-13 11:43:37.928 Python[39060:16942726] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/8d/09j2zdrx7klfl7nffqz630000000gn/T/org.python.python.savedState



=== Environment Information ===
Observation Space Type: <class 'gymnasium.spaces.box.Box'>
Action Space Type: <class 'gymnasium.spaces.discrete.Discrete'>
Action Space: Discrete(30)

=== Initial Observation ===
Type: <class 'numpy.ndarray'>
Shape: (3, 11)

=== Testing Random Actions ===

Step 1:
Action taken - Attack: 9, Defense: 9
Reward: (0, 0)
Terminated: False
Truncated: False
Info: {'moved': False}
New Observation Shape: (3, 11)


  logger.warn(
  logger.deprecation(
  logger.warn(
  logger.warn(f"{pre} is not within the observation space.")
  logger.warn(
  logger.warn(f"{pre} is not within the observation space.")
  logger.warn(
  logger.warn(
2024-11-13 11:43:41.925 Python[39060:16942726] +[IMKClient subclass]: chose IMKClient_Modern
2024-11-13 11:43:41.985 Python[39060:16942726] +[IMKInputSession subclass]: chose IMKInputSession_Modern



Step 2:
Action taken - Attack: 28, Defense: 1
Reward: (0, 0)
Terminated: False
Truncated: False
Info: {'moved': False}
New Observation Shape: (3, 11)

Step 3:
Action taken - Attack: 17, Defense: 13
Reward: (0, 0)
Terminated: False
Truncated: False
Info: {'moved': False}
New Observation Shape: (3, 11)

Exploration completed!


: 