# Poppy Mimic Movements with SAC 

Let's start to work on our poppy robot project!\
The goal is to mimic the movement of a person with a robot.\
More specifically we have a video of movements of the upper body of a person and a simulation of a robot torso.

Instead of using inverse kinematics, we are going to use reinforcement learning, more specifically using SAC for Soft Actor Critic:

https://stable-baselines3.readthedocs.io/en/master/modules/sac.html

From the introductory notebook of this project, we have : 
* The action is the joint positions given to each of the motors.
* The observation are the cartesian positions that can be accessed by commands like poppy.l_arm_chain.position.


## Imports

In [1]:
import numpy as np
import os
from pypot.creatures import PoppyTorso
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
from pypot.creatures.ik import IKChain
from pypot.primitive.move import Move
from pypot.primitive.move import MovePlayer
# To install Stable baselines I needed to downgrade wheel to 0.38.4

In [3]:
import sys
sys.path.append("./gym-examples/") # to be ale to import from the gym_examples folder, which I simply put inside this project
import gym_examples
import gym

In [4]:
from stable_baselines3 import SAC

## Instanciate the robot

In [5]:
from pypot import vrep
vrep.close_all_connections()
poppy = PoppyTorso(simulator='vrep')

In [6]:
t = None
targets = None
smoothed_targets = None

Instead of pursuing with the robot, we build the environment necessary to implement RL with the CoppeliamSim simulation.\
I found this repository to draw inspiration from : https://github.com/chauby/CoppeliaSimRL \
Unfortunately it works with Gym instead of Gymnasium the newer version. 

I cloned the gym-example in another folder, one folder up compare to the poppy-torso we are currently in : https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/#subclassing-gymnasium-env

There are kind of two parts in our environment, one for the agent and one for the target, which is moving too.\
Once we are done with setting the agent, let's try and see how it is going.

In [7]:
env = gym.make('gym_examples/PoppyTorso-v0')

pygame 2.1.0 (SDL 2.0.16, Python 3.9.5)
Hello from the pygame community. https://www.pygame.org/contribute.html
Initialiazing PoppyTorsoEnv


Okay, now let's try to take a step in the environment. Hopefully our robot will move.

In [8]:
# First a reset is needed !
env.reset()

{'agent': array([[ 0.11508345, -0.16457885,  0.05731346],
        [-0.10222124, -0.17925963,  0.07029861]]),
 'target': tensor([[ 0.1619, -0.1774,  0.0378],
         [-0.1155, -0.2014,  0.0246]])}

In [9]:
'''
fps = 10
move_motors = [m.name for m in poppy.motors]
for t in np.linspace(0.02,3,int(3*fps)):
    new_positions = {}
    for motor in move_motors:
        # decide for each timestep and each motor a joint angle and a velocity
        new_positions[motor] = [20*np.sin(t), 0.0]
action = new_positions
env.step(action)
'''

'\nfps = 10\nmove_motors = [m.name for m in poppy.motors]\nfor t in np.linspace(0.02,3,int(3*fps)):\n    new_positions = {}\n    for motor in move_motors:\n        # decide for each timestep and each motor a joint angle and a velocity\n        new_positions[motor] = [20*np.sin(t), 0.0]\naction = new_positions\nenv.step(action)\n'

Our robot moves but it does randomly which is normal for now. We want to know the targets that our robot has to follow. Let's focus  on some end effectors like the hands, and set those as target, we won't pay attention to the rest for now. 

Hence we need to implement the necessary code in the environment to get the targets.

**time passing**

It is done. I'd like to know at each step the observations. Also I'd like to know it for the initial step. 

In [10]:
'''
for _ in range(5):
    env.reset()
    done = False
    while not done:
        action = env.action_space.sample()
        obs, reward, done, info = env.step(action)
'''

'\nfor _ in range(5):\n    env.reset()\n    done = False\n    while not done:\n        action = env.action_space.sample()\n        obs, reward, done, info = env.step(action)\n'

I also set up a reward related to the inverse distance till the target.
Now I still don't know if what I'm doing is more or less correct, but I'll try to apply a SAC algorithm nonetheless.

In [11]:
env.reset()
model = SAC('MultiInputPolicy', env, verbose=1)
model.learn(total_timesteps=10000)
obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    if done:
      obs = env.reset()


Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


KeyboardInterrupt: 

In [None]:
# Save the agent
model.save("sac_arm")