Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to 3DNNWalkers example #849

Merged
merged 14 commits into from
Jun 5, 2017

Conversation

benelot
Copy link
Contributor

@benelot benelot commented Nov 1, 2016

I realized that the example is not deterministic. So I wanted to fix it quickly, but it took me a longer while.

Additionally I improved the TimeSeriesCanvas to support yMin and yMax instead of only yScale. It is backwards compatible with the other demos.

I have 2 issues that I could not resolve properly:

  • How do I reset a btRigidBody+btTypedConstraints setup to a certain position? I used the following method to reset all hinge joints, clear forces, angular velocity and linear velocity and reset all positions using the resetPosition as a baseposition and then apply the respective relative transform.

      void resetAt(const btVector3& position) {
      	removeFromWorld();
      	btTransform resetPosition(btQuaternion::getIdentity(), position);
    
      	for (int i = 0; i < JOINT_COUNT; i++)
      	{
      		btHingeConstraint* hingeC = static_cast<btHingeConstraint*>(getJoints()[i]);
      		hingeC->enableMotor(false);
      	}
    
      	for (int i = 0; i < BODYPART_COUNT; ++i)
      	{
      		m_bodies[i]->clearForces();
      		m_bodies[i]->setAngularVelocity(btVector3(0,0,0));
      		m_bodies[i]->setLinearVelocity(btVector3(0,0,0));
    
      		m_bodies[i]->setWorldTransform(resetPosition*m_bodyRelativeTransforms[i]);
      		if(m_bodies[i]->getMotionState()){
      			m_bodies[i]->getMotionState()->setWorldTransform(resetPosition*m_bodyRelativeTransforms[i]);
      		}
      	}
      }
    

However, this does not reset the walker properly, instead its performance varies lightly. I will make a simple example to show the issue if necessary. The problem is currently solved by recreating every creature from scratch every time it is reset. A side-effect is that you can change its morphology using the sliders and see if the same neural network can control it.

  • If I create a lot of graphics objects, it crashes. The current demo avoids this by not creating all graphics objects in headless mode, however, when running the demo with graphics it crashes after about 15 generations of 50 walkers.

Except for those problems, everything works well and the evolution converges to a certain speed.

@erwincoumans
Copy link
Member

erwincoumans commented Nov 3, 2016

However, this does not reset the walker properly, instead its performance varies lightly.

Such reset will be non-deterministic, so outcomes will be slightly different. I'm planning to fix this for the shared memory API/pybullet, most robotics and machine learning / reinforcement learning projects need it.

If I create a lot of graphics objects, it crashes.

You only have 50 walkers at one time, so you create 50 graphics shapes/instances in total? Why do you keep on creating them? And to 'delete' instances, you can move them far away and scale them to zero. Have you tried that?

@benelot
Copy link
Contributor Author

benelot commented Nov 3, 2016

I keep on instantiating the walkers because when I reset a walker, I get these non-deterministic variations of performance of a single walker. Since the performance varies strongly (the walking patterns are not robust to slight variations), the formerly best walker is no longer the best in a second generation. Therefore I create a new walker every time I want to reset it and copy over the neural network configuration. By that I get deterministic performances, but it seems that generating new graphical objects every time (that is what m_guiHelper->autogenerateGraphicsObjects(m_dynamicsWorld); does right?) causes some issues on the OpenGL side. At least it can be prevented by not generating the graphics objects as long as they are not shown (as it is the case in headless simulation mode). Interestingly, the graphics disappear as soon as I delete the old walkers, so there seems to happen some clean up. Maybe I will have a look at your OpenGL code at some point. For now, my simulation has a toggle called REBUILD_WALKER, which deletes and reinstantiates when set to true, and uses the reset method when set to false (but with the non-deterministic effect). So by default it is set to true.

Copy link
Member

@erwincoumans erwincoumans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of #include time.h please use the Bullet/examples/Utils/b3Clock instead.

@benelot
Copy link
Contributor Author

benelot commented Nov 8, 2016

I extended the b3Clock to expose the system current time in milliseconds. This functionality was missing. Now the example differs every time because the random numbers differ properly, but a single performance is deterministic.

@benelot
Copy link
Contributor Author

benelot commented Nov 8, 2016

Such reset will be non-deterministic, so outcomes will be slightly different. I'm planning to fix this for the shared memory API/pybullet, most robotics and machine learning / reinforcement learning projects need it.

In case you need a simple test case for the Bullet Example to test the non-determinism of resetting a position, tell me and I can commit the one I wrote based on the BasicExample in a separate Pull Request.

@erwincoumans
Copy link
Member

erwincoumans commented Nov 18, 2016

to test the non-determinism of resetting a position

There is no determinism when only resetting a few values and not the others, so adding a test is non-sense.

It would be good to deprecate this example and move to pybullet+TF/gym etc. See https://github.com/matpalm/cartpoleplusplus

@erwincoumans
Copy link
Member

erwincoumans commented Nov 18, 2016

I keep on instantiating the walkers because when I reset a walker,
I get these non-deterministic variations of performance of a single walker.

[edit] That is not a good idea. You could simply re-create the entire dynamics world and remove all graphics instances: instancingRenderer->removeAllInstances();

I'm working on allowing pybullet /shared memory C-API allows deterministic save/restore features for machine learning.

@benelot
Copy link
Contributor Author

benelot commented Nov 18, 2016

Thanks for the comments.

There is no determinism when only resetting a few values and not the others

What other parameters do I need to reset to complete the reset? Can this only be done by recreating the dynamics world?
I will implement the reset strategy you proposed. Yes, I wanted to go on to the pybullet example long ago, but this non-determinism problem kept me back. I just would like to give people a working non-pybullet version to play with. I already finished an urdf model of my walker. Next I am going to look into how to get it all together. For that I will probably port cartpoleplusplus into the browser just to get the hang on it. Stay tuned and give me updates on upcoming things!

@erwincoumans
Copy link
Member

Can this only be done by recreating the dynamics world?

Yes, you have to re-create everything at the moment. I'm working on fixing this in pybullet / shared memory API / future versions of Bullet.

@benelot
Copy link
Contributor Author

benelot commented Dec 29, 2016

I am working again on this in parallel with the porting of the cartpole++ example. I committed some changes which are not yet enough to fix the example. However, from what I have seen before I merged in the new commits from bullet is that rebuilding the world does not work and also rebuilding the walkers together with rebuilding the world does not work. I will look into it further because there are still some issues to be resolved.

@benelot
Copy link
Contributor Author

benelot commented May 29, 2017

I updated the application to also remove the graphics instances. Additionally I implemented the full reset of the simulation after every generation. It deletes all creatures and the whole dynamics world, however, the simulation is still non-deterministic. I have no idea what is wrong.

/// Returns the time in us since the last call to reset or since
/// the Clock was created.
/// Gets the system time in milliseconds
unsigned long int b3Clock::getSystemTimeMilliseconds() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a duplicate of the already existing 'getTimeMilliseconds', why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is not a copy. I is not a difference to the last time (currentTime.QuadPart - m_data->mStartTime.QuadPart) but it is the absolute system time. I use this to get a seed for randomness in the evolutionary algorithm. Do you propose another way to get a random seed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove any changes to b3Clock? I will add an option to 'reset' using absolute 0 time reference. See #1165



return msecTicks;
LARGE_INTEGER currentTime, elapsedTime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a formatting/layout fix? Please undo (it makes reviewing very hard to mix layout changes with actual changes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made this code consistent with all other code segments using a very similar code segment here. But you are right, I should not have mixed it. I will undo it and maybe PR for it again so that this class is a bit cleaner. Ok?

@benelot
Copy link
Contributor Author

benelot commented May 31, 2017

Thanks for reviewing!

@erwincoumans
Copy link
Member

If you remove the changes to b3Clock I can try it out and merge it. We can see why your sim is non-deterministic.

@benelot
Copy link
Contributor Author

benelot commented Jun 4, 2017

Reverted the changes from b3Clock, thanks for adding the reset method. Let us see if the CI passes.

@@ -160,7 +160,8 @@ class NN3DWalkersExample : public NN3DWalkersTimeWarpBase
m_filterCallback(NULL)
{
b3Clock clock;
srand(clock.getSystemTimeMilliseconds()); // Set random milliseconds based on system time
srand(clock.getTimeMilliseconds()); // Set random milliseconds based on system time
clock.reset(true); // Reset clock to zero to get new random seed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to call this reset before calling 'getTimeMilliseconds', otherwise you still get the relative time since the clock was created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I could basically reset it directly after using it, so that it is reset for the next call. But that is ok too.

@erwincoumans
Copy link
Member

Thanks, I'll merge it soon after trying it out.

Could you provide some links/paper/thesis/description about the details of your approach?

@benelot
Copy link
Contributor Author

benelot commented Jun 4, 2017

Do you need it for debugging or is it for some documentation or similar? My thesis for my master´s degree did something similar and this was basically just a first version for the example browser. Next up would be some kind of body dimensions evolution together with the neural network weights.

From the top of my head, the following is the approach of this example:
The simulation implements a classic version of evolutionary algorithm[1], which evolves a controller for a fixed morphology using a population of 8 legged individuals. The individuals start with an randomly initialized controller and compete with each other to move away from the starting point as fast as possible. The controller of each individual, which is evolved for walking speed, is a simple one layer neural network that maps from leg ground contact to a certain body joint position. Its weights are evolved either by random weight reinitialization or by crossover of two successful specimen. The population is ranked by walking speed in order to keep the best performing individuals, mutating the less performant ones and cull some of the worst performing ones. The culled individuals are either replaced by new individuals with randomly initialized or crossover-produced controllers. Using this approach, the population as a whole learns to walk and approaches a good local optimum of a walking controller.

[1] https://en.wikipedia.org/wiki/Evolutionary_algorithm

Let me know if you need anything more specific or any more details.

@erwincoumans
Copy link
Member

Thanks, it help if I'm aware what the examples (try to) do :-) Do you have a link to your thesis?

@erwincoumans erwincoumans merged commit 444f206 into bulletphysics:master Jun 5, 2017
@erwincoumans
Copy link
Member

I merged it, but get crashes on Windows when increasing the simulation speed (first slider). Reverting it: #1171

@benelot
Copy link
Contributor Author

benelot commented Jun 6, 2017

@benelot
Copy link
Contributor Author

benelot commented Jun 6, 2017

Sounds strange, on Linux it worked. I will look into it on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants