This script is a tutorial for using RL-ADN to train the DDPG Agent step by step. The 34 node environment is used.


import environment and show the information

In [1]:
from rl_adn.environments.env import PowerNetEnv,env_config
import pandas as pd
env_config

{'voltage_limits': [0.95, 1.05],
 'algorithm': 'Laurent',
 'battery_list': [11, 15, 26, 29, 33],
 'year': 2020,
 'month': 1,
 'day': 1,
 'train': True,
 'state_pattern': 'default',
 'network_info': {'vm_pu': 1.0,
  's_base': 1000,
  'bus_info_file': '../data_sources/network_data/node_34/Nodes_34.csv',
  'branch_info_file': '../data_sources/network_data/node_34/Lines_34.csv'},
 'time_series_data_path': '../data_sources/time_series_data/34_node_time_series.csv'}

### Prepare the data for the environment

In [4]:
env_config['network_info']['bus_info_file']='../power_network_rl/data_sources/network_data/node_34/Nodes_34.csv'
env_config['network_info']['branch_info_file']='../power_network_rl/data_sources/network_data/node_34/Lines_34.csv'
env_config['network_info']['time_series_data_path']='../power_network_rl/data_sources/time_series_data/34_node_time_series.csv'
env=PowerNetEnv(env_config)

Data scale: from 2020-07-17 to 2021-01-01
Data time interval: 15 minutes
Dataset loaded from ../power_network_rl/data_sources/time_series_data/34_node_time_series.csv
Dataset dimensions: (16224, 69)
Dataset contains the following types of data:
Active power columns: ['active_power_node_1', 'active_power_node_2', 'active_power_node_3', 'active_power_node_4', 'active_power_node_5', 'active_power_node_6', 'active_power_node_7', 'active_power_node_8', 'active_power_node_9', 'active_power_node_10', 'active_power_node_11', 'active_power_node_12', 'active_power_node_13', 'active_power_node_14', 'active_power_node_15', 'active_power_node_16', 'active_power_node_17', 'active_power_node_18', 'active_power_node_19', 'active_power_node_20', 'active_power_node_21', 'active_power_node_22', 'active_power_node_23', 'active_power_node_24', 'active_power_node_25', 'active_power_node_26', 'active_power_node_27', 'active_power_node_28', 'active_power_node_29', 'active_power_node_30', 'active_power_node_31

In [5]:
import torch
from torch.nn.utils import clip_grad_norm_
from power_network_rl.DRL_algorithms.Agent import AgentDDPG
from power_network_rl.DRL_algorithms.utility import Config, ReplayBuffer, SumTree, build_mlp, get_episode_return, get_optim_param
import time

### Set parameters for the experiment
`args` involved in the training process of a reinforcement learning agent, specifically using the DDPG (Deep Deterministic Policy Gradient) algorithm. This explanation assumes familiarity with basic reinforcement learning concepts and terminology.

**Environment Arguments (`env_args`)**
- `env_name`: 'PowerNetEnv' the name of the environment
- `state_dim`: the state dimension in the environment
- `action_dim`: the action dimension in the environment
- `if_discrete`: Indicates if the action space is discrete (False for continuous action space).

**General Training Arguments**
- `agent_class`: Specifies the class of the agent, set to AgentDDPG.
- `env_class`: The class of the environment, set to None.
- `run_name`: Identifier for the training run, set as 'DDPG_test'.

**Buffer Configuration**
- `gamma`: Discount factor for future rewards, set to 0.99.
- `target_step`: Target steps for the agent to achieve in the environment.
- `warm_up`: Number of actions before training starts, set to 2000.
- `buffer_size`: Size of the replay buffer, set to 400,00.
- `repeat_times`: Number of training repeats, set to 1 (suitable for PER).
- `batch_size`: Size of the training batch, set to 512.

**Device Configuration**
- `GPU_ID`: Identifier for the GPU, set to 0.
- `gpu_id`: GPU ID for training, matching GPU_ID.
- `num_workers`: Number of parallel workers, set to 4.
- `random_seed`: Seed for random number generation, set to 521.

**Agent Configuration**
- `net_dims`: Dimensions of the neural network, set to (256, 256, 256).
- `learning_rate`: Learning rate for training, set to 6e-5.
- `num_episode`: Number of training episodes, set to 1000 as an example.

**Initialization and Execution**
- `init_before_training()`: Initializes components before training.
- `print()`: Prints the current configuration.


In [None]:
env_args = {
    'env_name': 'PowerNetEnv',
    'state_dim': env.state_space.shape[0],
    'action_dim': env.action_space.shape[0],
    'if_discrete': False
}
args = Config(agent_class=AgentDDPG, env_class=None, env_args=env_args)  # see `Config` for explanation
args.run_name='DDPG_test'
'''init buffer configuration'''
args.gamma = 0.99  # discount factor of future rewards
args.target_step=1000
args.warm_up=2000#
args.buffer_size = int(4e5)  #
args.repeat_times = 1
args.batch_size=512
'''init device'''
GPU_ID=0
args.gpu_id = GPU_ID
args.num_workers = 4
args.random_seed=521
'''init agent configration'''
args.net_dims=(256,256,256)
args.learning_rate=6e-5
args.num_episode=5# using 10 episodes as an example
'''init before training'''
args.init_before_training()
'''print configuration'''
args.print()

'''init agent'''
agent = args.agent_class(args.net_dims, args.state_dim, args.action_dim, gpu_id=args.gpu_id, args=args)
'''init buffer '''
if args.if_off_policy:
    buffer = ReplayBuffer(
        gpu_id=args.gpu_id,
        num_seqs=args.num_envs,
        max_size=args.buffer_size,
        state_dim=args.state_dim,
        action_dim=1 if args.if_discrete else args.action_dim,
        if_use_per=args.if_use_per,
        args=args,
    )


| Arguments Remove cwd: ./DDPG/DDPG_test
{'action_dim': 5,
 'agent_class': <class 'power_network_rl.DRL_algorithms.Agent.AgentDDPG'>,
 'batch_size': 512,
 'buffer_size': 400000,
 'clip_grad_norm': 3.0,
 'cwd': './DDPG/DDPG_test',
 'env_args': {'action_dim': 5,
              'env_name': 'PowerNetEnv',
              'if_discrete': False,
              'max_step': 96,
              'num_envs': 1,
              'state_dim': 46},
 'env_class': None,
 'env_name': 'PowerNetEnv',
 'gamma': 0.99,
 'gpu_id': 0,
 'if_discrete': False,
 'if_off_policy': True,
 'if_remove': True,
 'if_use_per': False,
 'learner_gpus': 0,
 'learning_rate': 6e-05,
 'max_step': 96,
 'net_dims': (256, 256, 256),
 'num_envs': 1,
 'num_episode': 5,
 'num_threads': 8,
 'num_workers': 4,
 'random_seed': 521,
 'repeat_times': 1,
 'reward_scale': 1,
 'run_name': 'DDPG_test',
 'soft_update_tau': 0.005,
 'state_dim': 46,
 'state_value_tau': 0,
 'target_step': 1000,
 'train': True,
 'warm_up': 2000}




### Training loop
- First warm up buffer
- Then after warm up, to the set steps, start to update net
- Make the test on training set and record the performance metric based on your requirements  
- Update the buffer to collect new experiences
- Save trained agent



In [None]:
# training loop '''train loop'''
buffer_items = agent.explore_env(env, args.target_step, if_random=True)
buffer.update(buffer_items)  # warm up for ReplayBuffer
if args.train:
    collect_data = True
    while collect_data:
        print(f'buffer:{buffer.cur_size}')
        with torch.no_grad():
            buffer_items = agent.explore_env(env, args.target_step, if_random=True)
            buffer.update(buffer_items)
        if buffer.cur_size >= args.warm_up:
            collect_data = False
    torch.set_grad_enabled(False)
    for i_episode in range(args.num_episode):

        torch.set_grad_enabled(True)
        critic_loss, actor_loss, = agent.update_net(buffer)
        torch.set_grad_enabled(False)
        episode_reward, violation_time, violation_value, reward_for_power, reward_for_good_action, reward_for_penalty, state_list = get_episode_return(
            env, agent.act,
            agent.device)
        print(
            f'curren epsiode is {i_episode}, reward:{episode_reward},violation time of one day for all nodes:{violation_time},violation value is {violation_value},buffer_length: {buffer.cur_size}')
        if i_episode % 1 == 0:
            # target_step, continuly update replay buffer
            buffer_items = agent.explore_env(env, args.target_step, if_random=False)
            buffer.update(buffer_items)
agent.save_or_load_agent(args.cwd, if_save=True)


buffer:1000


  s_tensor = torch.as_tensor((state,), device=device, dtype=torch.float)


the year:2020,month:12,day:21 is used for testing this episode




curren epsiode is 0, reward:-15.064297996410644,violation time of one day for all nodes:32,violation value is -0.03972565685330336,buffer_length: 2000




the year:2020,month:12,day:17 is used for testing this episode




curren epsiode is 1, reward:-12.769444952407284,violation time of one day for all nodes:29,violation value is -0.03215923733220635,buffer_length: 3000




the year:2020,month:10,day:13 is used for testing this episode




curren epsiode is 2, reward:-2.5809982828595284,violation time of one day for all nodes:0,violation value is 0.0,buffer_length: 4000




the year:2020,month:10,day:12 is used for testing this episode




curren epsiode is 3, reward:-4.3901180818163335,violation time of one day for all nodes:4,violation value is 0.0,buffer_length: 5000




the year:2020,month:10,day:11 is used for testing this episode




curren epsiode is 4, reward:-3.248826404102601,violation time of one day for all nodes:0,violation value is 0.0,buffer_length: 6000




./DDPG/DDPG_test/act_target.pth
./DDPG/DDPG_test/cri_target.pth
./DDPG/DDPG_test/cri.pth
./DDPG/DDPG_test/cri_optimizer.pth
./DDPG/DDPG_test/act.pth
./DDPG/DDPG_test/act_optimizer.pth
