Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nikita Tomin #9

Closed
frostyduck opened this issue May 23, 2020 · 12 comments
Closed

Nikita Tomin #9

frostyduck opened this issue May 23, 2020 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@frostyduck
Copy link

frostyduck commented May 23, 2020

Dear colleagues! Thank you! You have developed a very interesting tool. We would like to compare a performance of your DRL-based dynamic brake with our dynamic brake model based on the sub-Grammians method.

However, when I started RLGC tool, I met with a strange problem. The first time I installed your tool on my home laptop (Windows 10, Python 3.7), fully following your instructions. And everything worked perfectly. I started training the model of a dynamic brake, training steps started and there were no warnings and errors. Then I decided to install and run your tool on my working computer (Ubuntu, Python 3.7), which has GPUs and is a powerful workstation. The installation was successful, but when I started the Kundur scheme model training, the following warning occurred:

`Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)

Caused by: py4j.Py4JNetworkException
at py4j.GatewayServer.startSocket(GatewayServer.java:788)
at py4j.GatewayServer.start(GatewayServer.java:763)
at py4j.GatewayServer.start(GatewayServer.java:746)
at org.pnnl.gov.pss_gateway.IpssPyGateway.main(IpssPyGateway.java:1143)
... 5 more

Caused by: java.net.BindException: Address already in use: JVM_Bind
at java.net.DualStackPlainSocketImpl.bind0(Native Method)
at java.net.DualStackPlainSocketImpl.socketBind(Unknown Source)
at java.net.AbstractPlainSocketImpl.bind(Unknown Source)
at java.net.PlainSocketImpl.bind(Unknown Source)
at java.net.ServerSocket.bind(Unknown Source)
at py4j.GatewayServer.startSocket(GatewayServer.java:786)
... 8 more
`

In this case, the training process either abruptly ends (there may be one iteration), or does not start at all and an error appears:

py4j.protocol.Py4JJavaError: An error occurred while calling t.initStudyCase.

When I returned home, I met the same java warnings began to appear on my laptop when I ran code python trainKundur2areaGenBrakingAgent.py. This is very strange considering that earlier everything worked well on a home laptop. However, other models trainIEEE39LoadSheddingAgent _ *.py work fine.

I understand that is something related to your py4j.protocol. However, I'm a loser in java and I can't understand why such happened.

@qhuang-pnl
Copy link
Collaborator

qhuang-pnl commented May 23, 2020

Hi, thanks for your interest and sorry for issues that you are facing. This is a known issue that happens with the py4j Java Server initiated in your last run was not properly stopped ( the subprocess for running the java sever should be killed to release the server port). We have added the function to close the py4j connection after each training. Please refer to this update
97b728f

In addition, there are two easy ways to fix it: 1) change the java sever port number in line#27 of the Kundur example; 2) manually kill the corresponding java process (refer to the PID number).

We are highly interested to learn you final comparison results. Shot me a msg/an email if you like to share.

@qhuang-pnl qhuang-pnl added the help wanted Extra attention is needed label May 23, 2020
@frostyduck
Copy link
Author

frostyduck commented May 24, 2020

Thanks for the quick response! I added env.close_connection (), however this did not help. Then I tried changing the java server port number, but this also did not help. However, I found that after running the command python trainKundur2areaGenBrakingAgent.py, it starts the Java server with PID twice

(RL_Challenge) (test2) C:\Users\Никита\RLGC\src\py>python trainKundur2areaGenBrakingAgent.py
IPSS-RL Java server lib path: C:\Users\Никита\RLGC/lib/RLGCJavaServer0.87.jar
Java server started with PID: 144
InterPSS Engine for Reinforcement Learning (IPSS-RL) developed by Qiuhua Huang @ PNNL. Version 0.87, built on 2/16/2020
Starting Py4J org.pnnl.gov.pss_gateway.IpssPyGateway at port =25003
case files:[C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area_ver30.raw, C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area.dyr]

action type =  None
observation_history_length,observation_space_dim, action_location_num, action_level_num =
4 8 1 2
IPSS-RL Java server lib path: C:\Users\Никита\RLGC/lib/RLGCJavaServer0.87.jar
Java server started with PID: 1828
InterPSS Engine for Reinforcement Learning (IPSS-RL) developed by Qiuhua Huang @ PNNL. Version 0.87, built on 2/16/2020
Starting Py4J org.pnnl.gov.pss_gateway.IpssPyGateway at port =25003
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)
Caused by: py4j.Py4JNetworkException
        at py4j.GatewayServer.startSocket(GatewayServer.java:788)
        at py4j.GatewayServer.start(GatewayServer.java:763)
        at py4j.GatewayServer.start(GatewayServer.java:746)
        at org.pnnl.gov.pss_gateway.IpssPyGateway.main(IpssPyGateway.java:1143)
        ... 5 more
Caused by: java.net.BindException: Address already in use: JVM_Bind
        at java.net.DualStackPlainSocketImpl.bind0(Native Method)
        at java.net.DualStackPlainSocketImpl.socketBind(Unknown Source)
        at java.net.AbstractPlainSocketImpl.bind(Unknown Source)
        at java.net.PlainSocketImpl.bind(Unknown Source)
        at java.net.ServerSocket.bind(Unknown Source)
        at py4j.GatewayServer.startSocket(GatewayServer.java:786)
        ... 8 more
case files:[C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area_ver30.raw, C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area.dyr]

action type =  None
observation_history_length,observation_space_dim, action_location_num, action_level_num =
4 8 1 2
2020-05-24 10:33:20.341322: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
WARNING:tensorflow:From c:\users\никита\rlgc\src\py\baselines\baselines\common\input.py:57: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From C:\anaconda3\envs\RL_Challenge\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From c:\users\никита\rlgc\src\py\baselines\baselines\common\models.py:94: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From C:\anaconda3\envs\RL_Challenge\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
case files:[C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area_ver30.raw, C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area.dyr]

Case id: 0, Fault bus id: Bus9, fault start time: 1,000000, fault duration: 0,585000
C:\anaconda3\envs\RL_Challenge\lib\site-packages\numpy\core\fromnumeric.py:2920: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
C:\anaconda3\envs\RL_Challenge\lib\site-packages\numpy\core\_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Saving final model to: ./previous_model/kundur2area_multistep_581to585_bus2_90w_lr_0.0001_90w.pkl
total running time is 13.722784757614136
Java server terminated with PID: 144
Finished!!`

For example, when starting another model python trainIEEE39LoadSheddingAgent_discrete_action.py, the java server starts only once, and then everything works fine. The training process begins and goes with many steps. But for the Kundur case, only one step is started and training ends immediately. Although this case used to run normally too, there were many modeling / training steps

@qhuang-pnl
Copy link
Collaborator

qhuang-pnl commented May 24, 2020

Hi, sorry, this is one more issue with the original Kundur test code (we haven't updated it while we update other parts), env was initiated twice. Please check out this update. This should address your issue.

925ab5b

You are also recommended to use v7 'from PowerDynSimEnvDef_v7 import PowerDynSimEnv' instead of v5 for env definition. If you do this, the jar lib need to be updated too, by setting jar_file = '/lib/RLGCJavaServer0.93.jar'

@frostyduck
Copy link
Author

Thank you! The problem with the java server has been fixed. However, the training process still takes one step and ends after this one iteration. Previously, many steps were started and the modeling lasted quite a while. Moreover, in essence, the reward array is empty, that is, training does not occur.

IPSS-RL Java server lib path: C:\Users\Никита\RLGC/lib/RLGCJavaServer0.93.jar
Java server started with PID: 1912
Working Directory = C:\Users\Никита\RLGC\src\py
InterPSS Engine for Reinforcement Learning (IPSS-RL) developed by Qiuhua Huang @ PNNL. Version 0.93, built on 5/1/2020
Starting Py4J org.pnnl.gov.pss_gateway.IpssPyGateway at port =25003

Imported power flow base case files:
[C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area_ver30.raw]

case files:[C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area_ver30.raw, C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area.dyr]

action type =  None
observation_history_length,observation_space_dim, action_location_num, action_level_num =
4 8 1 2
C:\anaconda3\envs\RL_Challenge\lib\site-packages\gym\logger.py:30: UserWarning: ?[33mWARN: Box bound precision lowered by casting to float32?[0m
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
2020-05-24 12:28:36.960446: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
WARNING:tensorflow:From c:\users\никита\rlgc\src\py\baselines\baselines\common\input.py:57: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From C:\anaconda3\envs\RL_Challenge\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From c:\users\никита\rlgc\src\py\baselines\baselines\common\models.py:94: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From C:\anaconda3\envs\RL_Challenge\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.

Imported power flow base case files:
[C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area_ver30.raw]

case files:[C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area_ver30.raw, C:\Users\Никита\RLGC/testData/Kundur-2area/kunder_2area.dyr]

Case id: 0, Fault bus id: Bus9, fault start time: 1,000000, fault duration: 0,585000
C:\anaconda3\envs\RL_Challenge\lib\site-packages\numpy\core\fromnumeric.py:2920: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
C:\anaconda3\envs\RL_Challenge\lib\site-packages\numpy\core\_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Saving final model to: ./previous_model/kundur2area_multistep_581to585_bus2_90w_lr_0.0001_90w.pkl
total running time is 5.127830505371094
Java server terminated with PID: 1912
Finished!!

@qhuang-pnl
Copy link
Collaborator

You need to change the following callback() or set callback=None in the training main() function
def callback(lcl, glb):
# TODO define the early stop criteria
is_solved = False
return is_solved

@frostyduck
Copy link
Author

Thank you so much. Now I understand why the training process stopped after the first step. I made changes to the code and ran it. The training lasted about one hour (total_timesteps = 900), periodically I got a message that appeared that the reward had increased. It's looked fine.

However, when I decided to re-run the code, the training lasted much faster, between the stages they became fast. At the same time, training could be completed maybe in 5-6 steps of fault modeling. Even if I already set total_timesteps = 9000, then the training is completed in 10 minutes. Could there be some kind of model memory here? At the same time, at the first start after changing callback, the files were saved in the PowerGridModels folder. Now the training ends without saving these files.

I apologize for asking you so much.

@qhuang-pnl
Copy link
Collaborator

Great to learn you can run it now. I think the time is reasonable. I run it on my end, and below is the printout. The summary showed it ran about 9000 steps, and ended in ~7 mins. For RL training, it easily require 1 million steps to get a well-trained solution. So you will need to increase the time_steps significantly.


| % time spent exploring | 2 |
| episodes | 60 |
| mean 100 episode reward | -1.06e+03 |
| steps | 8.77e+03 |

Saving final model to: ./previous_model/kundur2area_multistep_581to585_bus2_90w_lr_0.0001_90w.pkl
total running time is 344.2550754547119
Java server terminated with PID: 33804
Finished!!

@frostyduck
Copy link
Author

Yes, you are right! I remember that RL algorithms enjoy their very own Groundhog Day. I ran also it on my work computer, and summary showed it ran about 90000 steps, and ended in ~45 mins.

Restored model with mean reward: -908.7
Saving final model to: ./previous_model/kundur2area_multistep_581to585_bus2_90w_lr_0.0001_90w.pkl
total running time is 2705.7513461112976

Thank you for your support and help!

I have other little question. I wanted to see (and to visualize) obtainded training results and have tried to make it.

dataname = "_lr_0.0001_multistep_581to585_bus2_90w.npy"
step_rewards = np.load(os.path.join(datafolder, "step_rewards" + dataname))
step_actions = np.load(os.path.join(datafolder, "step_actions" + dataname))
step_observations = np.load(os.path.join(datafolder, "step_observations" + dataname))
step_status = np.load(os.path.join(datafolder, "step_status" + dataname))
step_starttime = np.load(os.path.join(datafolder, "step_starttime" + dataname))
step_durationtime = np.load(os.path.join(datafolder, "step_durationtime" + dataname))

def episode_rewards(step_rewards, step_status):
result = list()
start = 0
for i, done in enumerate(step_status):
if done:
result.append(sum(step_rewards[start:i+1]))
start = i+1
result.append(sum(step_rewards[start:]))
return np.array(result)

rewards = episode_rewards(step_rewards, step_status)
print("total episodes: %d" % (sum(step_rewards)))

total episodes: 0

I think, my saved files in storedData folder are empty.

@qhuang-pnl
Copy link
Collaborator

qhuang-pnl commented May 25, 2020

We keep this openAI baseline implementation here only because our previous paper used this version.

You can switch to Stable-Baselines instead of openAI baseline to get the full Tensorboard support. We already use it in the IEEE39 bus system training, as you can see in the codes.
https://twitter.com/araffin2/status/1042026628753313792?lang=en

@frostyduck
Copy link
Author

According your advice, I switched to Stable-Baselines instead of openAI baseline in the Kundur system training.

def main(learning_rate, env):
    tf.reset_default_graph()  
    graph = tf.get_default_graph()

    model = DQN(CustomDQNPolicy, env, learning_rate=learning_rate, verbose=0)
    callback = SaveOnBestTrainingRewardCallback(check_freq=1000, storedData=storedData)
    time_steps = 900000
    model.learn(total_timesteps=int(time_steps), callback=callback)

    print("Saving final model to: " + savedModel + "/" + model_name + "_lr_%s_90w.pkl" % (str(learning_rate)))
    model.save(savedModel + "/" + model_name + "_lr_%s_90w.pkl" % (str(learning_rate)))

I used the following environment settings:

However after 900000 steps of training DQN agent cannot find a good policy. Please see average reward progress plot

https://www.dropbox.com/preview/DQN_adaptivenose.png?role=personal

I used the following env settings

case_files_array.append(folder_dir +'/testData/Kundur-2area/kunder_2area_ver30.raw')
case_files_array.append(folder_dir+'/testData/Kundur-2area/kunder_2area.dyr')
dyn_config_file = folder_dir+'/testData/Kundur-2area/json/kundur2area_dyn_config.json'
rl_config_file = folder_dir+'/testData/Kundur-2area/json/kundur2area_RL_config_multiStepObsv.json'

Mu suggestion is that in the baseline scenario kunder_2area_ver30.raw (without system loading), short circuit might not lead to loss of stability during the simulation. Therefore, (perhaps) DQN agent finds a "no action" policy, that so as not to receive the actionPenalty = 2.0. Because according the reward progress plot, during training agent cannot find a policy better than mean reward 603.05. When testing, mean_reward = 603.05 means "no action" policy (please see figure bellow)

https://www.dropbox.com/preview/no%20actions%20case.png?role=personal

However it's only my suggestion, I can wrong. I thought to try scenarios with increasing load in order to get for sure loss of stability during simulation.

@amahoro12
Copy link

Traceback (most recent call last):
File "C:/Users/HP/Desktop/Leando/RLGC-master (3)/RLGC-master/src/scripts/testScripts/test_PowerDynSimEnv_v7_continuous_action.py", line 35, in
from PowerDynSimEnvDef_v7 import PowerDynSimEnv
ModuleNotFoundError: No module named 'PowerDynSimEnvDef_v7'

@amahoro12
Copy link

anyone who can help me how to resolve above issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants