In [1]:
from IPython.display import clear_output

In [2]:
%pip install gymnasium stable-baselines3 panda_gym

clear_output()

In [4]:
# %pip install numpy matplotlib

# Content

In this notebook, you will use the Soft Actor Critic algorithm to solve the **Panda Pick and Place** Environment. We will use Stable baselines3's implementation of SAC. The Panda pick and place environment is found in panda-gym, which provides some robotics environments in pybullet simulation as gym environments.

- Write code to define and train the agent
- Also include a visualization of the agent's performance in the form of a video

In the Panda Pick and Place environment, a robotic arm needs to learn to pick an object on the table and place it on the goal position (near another object in this environment). The actions control the robotic arm. Here is a biref detail of the actions and rewards of the environment:

| **Aspect**               | **Description**                                                                                                                                                           |
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Actions**              |                                                                                                                                                                           |
| Cartesian End-Effector Movements | Controls the position and orientation of the robot's end-effector (e.g., x, y, z coordinates and roll, pitch, yaw).                                                        |
| Joint Movements          | Directly controls the individual joint angles of the robot arm by specifying the target positions for each joint.                                                          |
| Gripper Control          | Controls the opening and closing of the gripper, which is crucial for grasping and releasing objects.                                                                      |
| Discrete or Continuous Actions | The action space can be either discrete (specific predefined movements) or continuous (exact positions or velocities).                                                         |
| **Rewards**              |                                                                                                                                                                           |
| Distance to Target       | Rewards (or penalties) based on the distance between the object and the target location. As the object gets closer to the target, the reward increases.                    |
| Grasp Success            | Rewards for successfully grasping the object. Typically a binary reward (e.g., +1 for a successful grasp, 0 otherwise).                                                    |
| Placement Success        | Rewards for successfully placing the object at the target location, often a significant reward indicating task completion.                                                 |
| Intermediate Rewards     | Rewards for intermediate steps such as moving the end-effector close to the object, aligning the gripper, or lifting the object off the table.                             |
| Penalties                | Penalties for undesirable actions such as dropping the object, moving away from the target, or colliding with the environment.                                             |


![PandaPickAndPlace Image](https://github.com/qgallouedec/panda-gym/raw/master/docs/_static/img/pickandplace.png)

In [None]:
import numpy as np

import gymnasium as gym
import panda_gym

# Stable baseline 3 importd
from stable_baselines3.common.vec_env import DummyVecEnv
# import remaining imports here

from IPython.display import clear_output

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

## Creating the environment

In [6]:
def make_env(render_mode="rgb_array"):

    env = gym.make('PandaPickAndPlace-v3', render_mode=render_mode)
    return env

In [7]:
# Create the environment
env = DummyVecEnv([make_env for _ in range(4)])  # adjust accoring to available ram

### Solve here

write the code to define and train the agent:

### Visualization

You are provided with some functions which will help you visualize the results as a video.
Feel free to wrie your own code for visualization if you prefer

In [None]:
def frames_to_video(frames, fps=24):
    # Do not modify

    fig = plt.figure(figsize=(frames[0].shape[1] / 100, frames[0].shape[0] / 100), dpi=100)
    ax = plt.axes()
    ax.set_axis_off()

    if len(frames[0].shape) == 2:  # Grayscale image
        im = ax.imshow(frames[0], cmap='gray')
    else:  # Color image
        im = ax.imshow(frames[0])

    def init():
        if len(frames[0].shape) == 2:
            im.set_data(frames[0], cmap='gray')
        else:
            im.set_data(frames[0])
        return im,

    def update(frame):
        if len(frames[frame].shape) == 2:
            im.set_data(frames[frame], cmap='gray')
        else:
            im.set_data(frames[frame])
        return im,

    interval = 1000 / fps
    anim = FuncAnimation(fig, update, frames=len(frames), init_func=init, blit=True, interval=interval)
    plt.close()
    return HTML(anim.to_html5_video())

In [15]:
t_env = DummyVecEnv([lambda: make_env(render_mode="rgb_array")])
state = t_env.reset()
frames = []

while True:
    
    # Write your code to choose an action here.
    action = 0


    
    state_next, r, done, info = t_env.step(action)
    frames.append(t_env.render())
    state = state_next
    if done:
        break

t_env.close()

In [None]:
frames_to_video(frames, fps=5)