Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PPO agent #126

Merged
merged 93 commits into from Nov 13, 2017
Merged

Add PPO agent #126

merged 93 commits into from Nov 13, 2017

Conversation

toslunar
Copy link
Member

@toslunar toslunar commented Aug 1, 2017

Resolves #120.

Currently, this only supports environments with discrete actions

@coveralls
Copy link

coveralls commented Aug 1, 2017

Coverage Status

Changes Unknown when pulling aa30724 on toslunar:ppo-agent into ** on chainer:master**.

@coveralls
Copy link

coveralls commented Aug 2, 2017

Coverage Status

Changes Unknown when pulling 7bf31c1 on toslunar:ppo-agent into ** on chainer:master**.

@coveralls
Copy link

coveralls commented Aug 3, 2017

Coverage Status

Changes Unknown when pulling 7197d63 on toslunar:ppo-agent into ** on chainer:master**.

@coveralls
Copy link

coveralls commented Aug 9, 2017

Coverage Status

Changes Unknown when pulling 7da765b on toslunar:ppo-agent into ** on chainer:master**.

@muupan
Copy link
Member

muupan commented Aug 11, 2017

How is the performance?

@toslunar
Copy link
Member Author

passed the GPU tests

  • nosetests -v -a 'gpu' tests
  • ./test_example.sh 0

on the commit f31626c, in Python 3.

shape (int or tuple of int): Shape of input values except batch axis.
batch_axis (int): Batch axis.
eps (float): Small value for stability.
dtype (dtype): Dtype of input values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add until?

# clear cache
self._cached_std_inverse = None

def __call__(self, x, update=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring, since update is important?


return self._cached_std_inverse

def experience(self, x):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring?

layers.append(L.Linear(n_input_channels, n_hidden_channels))
for _ in range(n_hidden_layers - 1):
layers.append(self.nonlinearity)
layers.append(L.Linear(n_hidden_channels, n_hidden_channels))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonlinearity is missing before the last layer

@muupan
Copy link
Member

muupan commented Nov 13, 2017

LGTM

@muupan muupan merged commit 04e938e into chainer:master Nov 13, 2017
@toslunar toslunar deleted the ppo-agent branch November 13, 2017 10:34
@muupan muupan added this to the v0.3 milestone Nov 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants