# BAIR Camp Project Part 2: Learning to Imitate

In this project, you will make the two robots learn to run by imitating an expert! You will start by using a linear function to represent the mapping from states to actions. Then, you will try a neural network.

### Before starting
Before starting, first take a look at the code in project_part2.py. In particular, see what objects and functions are available for you to call, and read about what they do. As you start below, keep project_part2.py open as a reference.

### Deliverables
You should write down what you found and learned, and download videos of your agent, so that you can include both in your presentation on Friday.

In [1]:
from project_part2 import ImitationRobot
import tensorflow as tf
square = tf.square
mean = tf.reduce_mean

### Part 2A: Linear functions
The goal of this part of the project is to teach your simulated robot to run.

First, make one of your two robots, passing in the parameters of the robot that you designed on Tuesday.

In [11]:
my_robot = ImitationRobot('walker') ### solution

Your robot initially will take random actions. See what your robot does by running it.

In [12]:
video = my_robot.run() ### solution
# Be patient with this command. The video will take time to appear
video.ipython_display(width=280)

 35%|███▌      | 88/251 [00:00<00:00, 871.65it/s]

Done running. Your robot went 2.88 meters forward.


100%|██████████| 251/251 [00:00<00:00, 790.34it/s]


Ok, we need to define the loss function for imitation. The rest of the learning has been implemented in project_part2.py. To compute the square of a variable, use the function square() and to compute the absolute value, use the function abs().

_Hint_: Make sure that your loss is a single scalar value and not a list.

In [13]:
def loss_function(robot_action, expert_action):
    loss = mean(square(robot_action - expert_action)) ### solution
    return loss

In [14]:
my_robot.set_loss(loss_function)

Now, it is time to train the robot. First, we need data collected from an expert. Next, load demonstrations and visualize what the demonstrations look like.

In [17]:
video_clip = my_robot.load_demonstrations(40) ### solution

In [16]:
# Be patient with this command. The video will take time to appear
video_clip.ipython_display(width=280)

100%|██████████| 401/401 [00:00<00:00, 1509.48it/s]


Now, decide how many iterations you want to train your robot for. Once you are done training, run the robot to see what the robot has learned!

How many iterations does the robot need to learn how to run?

*Hint*: Monitor the value of the error during training. You might need more training iterations than you think. Learning to walk takes time!

In [20]:
### solution:
for i in range(20000):
    error = my_robot.train_step()
    if i % 500 == 0:
        print('Iteration ' + str(i) + ': ' + str(error))

Iteration 0: 0.15101431
Iteration 500: 0.15339857
Iteration 1000: 0.16139637
Iteration 1500: 0.16849054
Iteration 2000: 0.17196786
Iteration 2500: 0.20163947
Iteration 3000: 0.20126562
Iteration 3500: 0.18065639
Iteration 4000: 0.13343014
Iteration 4500: 0.18209647
Iteration 5000: 0.11226487
Iteration 5500: 0.15270384
Iteration 6000: 0.14649613
Iteration 6500: 0.15475166
Iteration 7000: 0.13706587
Iteration 7500: 0.1261129
Iteration 8000: 0.145474
Iteration 8500: 0.16347206
Iteration 9000: 0.16883473
Iteration 9500: 0.121183075
Iteration 10000: 0.17195521
Iteration 10500: 0.14391957
Iteration 11000: 0.1518581
Iteration 11500: 0.16276945
Iteration 12000: 0.12617151
Iteration 12500: 0.14474712
Iteration 13000: 0.14887975
Iteration 13500: 0.15655738
Iteration 14000: 0.15866968
Iteration 14500: 0.1575609
Iteration 15000: 0.10214567
Iteration 15500: 0.19147734
Iteration 16000: 0.22276227
Iteration 16500: 0.15722702
Iteration 17000: 0.14871031
Iteration 17500: 0.16412377
Iteration 18000: 0.1

In [21]:
video_learned = my_robot.run() ### solution
video_learned.ipython_display(width=280)

 34%|███▍      | 86/251 [00:00<00:00, 854.19it/s]

Done running. Your robot went 0.45 meters forward.


100%|██████████| 251/251 [00:00<00:00, 792.68it/s]


Try running the robot multiple times. Does the robot always run the same distance?

Also, try experimenting with different numbers of demonstrations, by inputting different arguments to the collect_demonstrations function. How does the robot's behavior change as it imitates varying numbers of demonstrations? 

Experiment with both of your robots. Does one robot learn faster or more efficiently than the other?

### Part 2B: Neural networks
Now it is time to teach your robots to run by training a neural network! Follow the above code to create your robot, except with a neural network instead of a linear function. We will use the same loss function that you designed earlier.

In [44]:
my_robot =  ImitationRobot('ant', linear=False)
my_robot.set_loss(loss_function)

First, load the demonstrations as before.

In [45]:
video = my_robot.load_demonstrations(40) ### solution

In [46]:
### solution:
for i in range(60000):
    error = my_robot.train_step()
    if i % 500 == 0:
        print('Iteration ' + str(i) + ': ' + str(error))

Iteration 0: 0.09296724
Iteration 500: 0.0072722537
Iteration 1000: 0.005428784
Iteration 1500: 0.006716316
Iteration 2000: 0.003965938
Iteration 2500: 0.002778334
Iteration 3000: 0.0025332673
Iteration 3500: 0.002580049
Iteration 4000: 0.0033384012
Iteration 4500: 0.003056853
Iteration 5000: 0.0020909864
Iteration 5500: 0.0024258257
Iteration 6000: 0.0016496527
Iteration 6500: 0.0014831494
Iteration 7000: 0.001603898
Iteration 7500: 0.0022101784
Iteration 8000: 0.0015961712
Iteration 8500: 0.0019964539
Iteration 9000: 0.0014170264
Iteration 9500: 0.0010924047
Iteration 10000: 0.0019455317
Iteration 10500: 0.0017932549
Iteration 11000: 0.000845082
Iteration 11500: 0.001510395
Iteration 12000: 0.00092690566
Iteration 12500: 0.001556989
Iteration 13000: 0.0018777897
Iteration 13500: 0.0008516057
Iteration 14000: 0.0012234144
Iteration 14500: 0.001625689
Iteration 15000: 0.0016979766
Iteration 15500: 0.002334737
Iteration 16000: 0.00092374755
Iteration 16500: 0.0009294763
Iteration 17000:

In [49]:
video_learned = my_robot.run() ### solution
video_learned.ipython_display(width=280)

 33%|███▎      | 83/251 [00:00<00:00, 822.47it/s]

Done running. Your robot went 72.03 meters forward.


100%|██████████| 251/251 [00:00<00:00, 791.35it/s]
