<a href="https://colab.research.google.com/github/Ketian-Wang/RobotLearning/blob/main/%E2%80%9Crobot_learning_proj_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intruduction

This project aims to demonstrate how neural networks can be used in a robotics setting. This project will continue using the 2D maze environment introduced in previous project and learn to navigate an agent to a goal. However, since neural networks can be more powerful models than the ones we had access to previously, we can afford to make some changes to the 2D maze environment and make the problem more difficult.  This project consists of three parts. Part I will be training a simple DNN, which will take as input the agent position and goal position. Parts II and III will be training CNNs which take as input an image of the environment, with the agent the goal depicted on it.

<div>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/P1_side.png?raw=true" width="300"/>
</div>

The image above shows the simulation world. The "robot" (also called "agent") is shown by the green dot. The goal location is shown by the red square. The agent is required to navigate to the goal. **Unlike the previous project, the robot and the goal are spawned at random positions in the maze.** Also, the action space now contains all four directions: 'up', 'down', 'left' and 'right'. Another change is that, in addition to the obstacle map shown above, we introduce two new obstacle maps as shown below. However, these new maps will not be used until Part III.

<div>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map1.png?raw=true" width="300"/>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map2.png?raw=true" width="300"/>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map3.png?raw=true" width="300"/>
</div>


# Part 0. Project Setup

In [None]:
# Download from my own Google Drive!
!pip install --upgrade gdown
!gdown https://drive.google.com/drive/folders/1w-IBeDph_-CKacJBbXOsGwstJfa4-9Sh?usp=sharing --folder
!cp -av /content/project2/* /content/
!pip install pybullet==2.6.6 numpngw

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gdown
  Downloading gdown-4.6.4-py3-none-any.whl (14 kB)
Installing collected packages: gdown
  Attempting uninstall: gdown
    Found existing installation: gdown 4.4.0
    Uninstalling gdown-4.4.0:
      Successfully uninstalled gdown-4.4.0
Successfully installed gdown-4.6.4
Retrieving folder list
Retrieving folder 1s8rl59KNppjobL-sLAwKTp2ZiVGWqbsi data
Processing file 1bwCL05Z3VKUG0wpnDvmjU2yG_huXzZ6e all_maps.pkl
Processing file 17k4vNAs8d5eD5V1F1m0BZZkjvNjYyUCU map1.pkl
Processing file 124WdMqP9R4cWxeTlyne6TBY_GP-kAUU0 data_utils.py
Processing file 1wEMfNG6Yl7S5_01bATf-SujhIO3RgdGu dnn.py
Retrieving folder 1q3SDrW_Lw6p3QEYyN6gNVCXMTXDVL0_F imgs
Processing file 16P36tz3pDBQo_CzxHiLS_x9QEgALUOJa map1.png
Processing file 1rakolV0hvNw6Cecnik3I_WlSILn8XqZI map2.png
Processing file 1hhEhbqmh16XtO9QfTUpqunhXv1aJS_Q7 map3.png
Processing file 1abKu-ad3pUk3lmhekyk2Nx6s_qrgQcIl P1_sid

# Part I. Behavioral cloning with low dimensional data

This part is a natural extension of Part II in Project 1.

Learning the agent's policy here is the familiar classification problem, given that labeled examples from an expert will be provided. Each labeled example $i$ will contain a tuple of the form $(o, a)^i$, where $o$ represents an observation and $a$ represents the action taken by the expert given that observation. The part simply learn to imitate the expert, a process also known as behavioral cloning. While the action space is the same in all parts of the project, the observation space will be different. 

We will be training a DNN policy to predict an action to be taken ('up', 'down', 'left', and 'right') based on the observation. **In Part I, the observation will contain the agent position and the current goal position.** (Because the goal is sampled randomly, the policy has to know the current goal to be reached.) The environment thus returns an observation array of size (4, ) where the agent position is contained in the first two axes and the current goal position is contained in the next two. **In Part I, the map that the robot is navigating is always the same.**


In [None]:
# base class

import abc


class RobotPolicy(abc.ABC):

    @abc.abstractmethod
    def train(self, data):
        """
            Abstract method for training a policy.

            Args:
                data: a dict that contains X (key = 'obs') and y (key = 'actions').

                X is either rgb image (N, 64, 64, 3) OR  agent & goal pos (N, 4)  

            Returns:
                This method does not return anything. It will just need to update the
                property of a RobotPolicy instance.
        """

    @abc.abstractmethod
    def get_action(self, obs):
        """
            Abstract method for getting action. You can do data preprocessing and feed
            forward of your trained model here.
            Args:
                obs: an observation (64 x 64 x 3) rgb image OR (4, ) positions
            
            Returns:
                action: an integer between 0 to 3
        """

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(0)


from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader

import numpy as np
import math

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt


class MyDNN(nn.Module):
    def __init__(self, input_dim):
        super(MyDNN, self).__init__()
        self.fc1 = nn.Linear(input_dim, 256)
        self.fc2 = nn.Linear(256, 64)
        self.fc3 = nn.Linear(64, 16)
        self.fc4 = nn.Linear(16, 4)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x

    def predict(self, features):
        """
        Function receives a numpy array, converts to torch, returns numpy again
        """
        self.eval()  # Sets network in eval mode (vs training mode)
        features = torch.from_numpy(features).float()
        return self.forward(features).detach().numpy()
  

class MyDataset(Dataset):
    def __init__(self, labels, features):
        super(MyDataset, self).__init__()
        self.labels = labels
        self.features = features

    def __len__(self):
        return self.features.shape[0]

    def __getitem__(self,
                    idx):  # This tells torch how to extract a single datapoint from a dataset, Torch randomized and needs a way to get the nth-datapoint
        feature = self.features[idx]
        label = self.labels[idx]
        return {'feature': feature, 'label': label}


class MyDNNTrain(object):
    def __init__(self, network): 
        self.network = network
        self.learning_rate = 0.005
        self.optimizer = torch.optim.Adam(self.network.parameters(), lr=self.learning_rate)
        self.criterion = nn.CrossEntropyLoss()
        self.num_epochs = 200
        self.batchsize = 100
        self.shuffle = True

    def train(self, labels, features):
        self.network.train()
        dataset = MyDataset(labels, features)
        loader = DataLoader(dataset, shuffle=self.shuffle, batch_size=self.batchsize)
        for epoch in range(self.num_epochs):
            self.train_epoch(loader)

    def train_epoch(self, loader):
        total_loss = 0.0
        for i, data in enumerate(loader):
            features = data['feature'].float()
            labels = data['label'].long()
            self.optimizer.zero_grad()
            predictions = self.network.forward(features)
            # print(predictions)
            loss = self.criterion(predictions, labels)
            loss.backward()
            total_loss += loss.item()
            self.optimizer.step()
        print('loss', total_loss / i)


class POSBCRobot(RobotPolicy):

    def train(self, data):
        for key, val in data.items():
            print(key, val.shape)
        print("Using dummy solution for POSBCRobot")

        #data input
        AgentAction = data['actions']
        labels = np.asarray(AgentAction)
        labels = torch.from_numpy(labels)
        # labels = F.one_hot(labels, num_classes=4)
        # print(labels)


        obs = data['obs']
        # features = np.asarray(obs)
        obs = torch.from_numpy(obs).float()
        # print(labels)
        # print(type(features))


        self.network = MyDNN(4)
        self.trainer = MyDNNTrain(self.network)
        self.trainer.train(labels, obs)

        pass 

    def get_action(self, obs):
        self.network.eval()
        action = self.network.predict(obs)
        return np.argmax(action)

## Evaluation and Grading

The model will be evaluated by simply having the agent follow the commands that it provides. 100 different randomly sampled starting positions and goals will be tested. Each goal rolls out the trained policy for 50 steps. After the 50 steps, we will evaluate the closest distance to the goal the agent has ended up. If the agent reaches < 0.1 distance from the goal, the episode is ended before 50 steps and the minimum distance will be recorded as 0. The score is the fraction of the initial distance to goal covered by the agent averaged over 100 trials. Your final grade will be computed based on this score.

The score will be calculated using the formula :

```score = avg[(init_dist -  min_dist) / init_dist]```

The total points of this assignment are 15. According to the difficulty level of each part, parts 1, 2, and 3 have 4, 5, 6 points respectively. 

- Part 1: if your score >= 0.99, you will receive 4 / 4. Otherwise, your final grade will be score / 0.99 * 4.
- Part 2: if your score >= 0.95, you will receive 5 / 5. Otherwise, your final grade will be score / 0.95 * 5.
- Part 3: if your score >= 0.95, you will receive 6 / 6. Otherwise, your final grade will be score / 0.95 * 6.


In [None]:
import score_policy
import importlib
importlib.reload(score_policy)
from IPython.display import Image


part1_bound = 0.99
part2_bound = 0.95
part3_bound = 0.95

In [None]:
score1 = score_policy.score_pos_bc(policy=POSBCRobot(), gui=False, model=None)
grade1 = score1 / part1_bound * 4 if score1 < part1_bound else 4

print('\n---')
print(f'Part 1 Score: {score1}')
print(f'Part 1 Grade: {score1:.2f} / {part1_bound:.2f} * 4 = {grade1:.2f}')

actions (4000,)
obs (4000, 4)
Using dummy solution for POSBCRobot
loss 0.9775175635631268
loss 0.6955564786226321
loss 0.5940810869901608
loss 0.505073220301897
loss 0.4466643998256096
loss 0.4194487096407475
loss 0.4042717348306607
loss 0.3944137421173927
loss 0.39177782642535675
loss 0.38042136339040905
loss 0.36642322402734023
loss 0.387549720513515
loss 0.3570115061906668
loss 0.3440079601147236
loss 0.3507527773961043
loss 0.3380996535221736
loss 0.3305549598657168
loss 0.33064373563497496
loss 0.32209540750735843
loss 0.316408247137681
loss 0.31713431920760715
loss 0.33212955334247685
loss 0.3042853860518871
loss 0.30718862322660595
loss 0.2901652094263297
loss 0.29669795968593693
loss 0.28822548878498566
loss 0.2967775483161975
loss 0.282771103657209
loss 0.28695253072640836
loss 0.28090505378368574
loss 0.2693265752914624
loss 0.26178984114756954
loss 0.2662335550173735
loss 0.25954637160668004
loss 0.2658908038567274
loss 0.25278833966988784
loss 0.24796549746623406
loss 0.239

# Part II. Behavioral cloning with visual observations

In this part, the task is similar to that in Part I, **but the observations will be RGB image observations of the world**, similar to the ones which was used to do localization in Part III of Project 1. To process the RGB image a CNN is implemneted using PyTorch.  [The official PyTorch tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) is a good starting point. As in Part I, the map that the robot is navigating is always the same. **This means that the model really only has to learn how to figure out where the robot and the goal are located, and how to navigate around a fixed set of obstacles.**


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(0)


from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader

import numpy as np
import math

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt


class MyCNN(nn.Module):

    def __init__(self):
        super(MyCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(2704, 512)
        self.fc2 = nn.Linear(512, 64)
        self.fc3 = nn.Linear(64, 4)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def predict(self, features):
        """
        Function receives a numpy array, converts to torch, returns numpy again
        """
        self.eval()  # Sets network in eval mode (vs training mode)
        features = torch.from_numpy(features).float()
        return self.forward(features).detach().numpy()
  

class MyDataset(Dataset):
    def __init__(self, labels, features):
        super(MyDataset, self).__init__()
        self.labels = labels
        self.features = features

    def __len__(self):
        return self.features.shape[0]

    def __getitem__(self,
                    idx):  # This tells torch how to extract a single datapoint from a dataset, Torch randomized and needs a way to get the nth-datapoint
        feature = self.features[idx]
        label = self.labels[idx]
        return {'feature': feature, 'label': label}


class MyCNNTrain(object):
    def __init__(self, network):  
        self.network = network
        self.learning_rate = 0.01
        self.optimizer = torch.optim.SGD(self.network.parameters(), lr=self.learning_rate, momentum = 0.9)
        self.criterion = nn.CrossEntropyLoss()
        self.num_epochs = 100
        self.batchsize = 100
        self.shuffle = True

    def train(self, labels, features):
        self.network.train()
        dataset = MyDataset(labels, features)
        loader = DataLoader(dataset, shuffle=self.shuffle, batch_size=self.batchsize)
        for epoch in range(self.num_epochs):
            self.train_epoch(loader)

    def train_epoch(self, loader):
        total_loss = 0.0
        for i, data in enumerate(loader):
            features = data['feature'].float()
            labels = data['label'].long()
            self.optimizer.zero_grad()
            predictions = self.network.forward(features)
            # print(predictions)
            loss = self.criterion(predictions, labels)
            loss.backward()
            total_loss += loss.item()
            self.optimizer.step()
        print('loss', total_loss / i)



class RGBBCRobot1(RobotPolicy):

    def train(self, data):
        for key, val in data.items():
            print(key, val.shape)
        print("Using dummy solution for RGBBCRobot1")
        AgentAction = data['actions']
        labels = np.asarray(AgentAction)
        labels = torch.from_numpy(labels)
        # labels = F.one_hot(labels, num_classes=4)
        print(labels)


        obs = data['obs']
        obs = obs.transpose((0,3,1,2))
        obs = torch.from_numpy(obs).float()
        print(obs.shape)

        self.network = MyCNN()
        self.trainer = MyCNNTrain(self.network)
        self.trainer.train(labels, obs)
        
        pass

    def get_action(self, obs):
        self.network.eval()
        obs = obs.transpose((2,0,1))
        # print(obs.shape)
        obs = torch.from_numpy(obs).float()
        obs = obs.unsqueeze(0)
        # print(obs.shape)
        obs = obs.numpy()
        action = self.network.predict(obs)
        return np.argmax(action)

## Evaluation and Grading

In [None]:
score2 = score_policy.score_rgb_bc1(policy=RGBBCRobot1(), gui=False, model=None)
grade2 = score2 / part2_bound * 5 if score2 < part2_bound else 5

print('\n---')
print(f'Part 2 Score: {score2}')
print(f'Part 2 Grade: {score2:.2f} / {part2_bound:.2f} * 5 = {grade2:.2f}')

actions (4000,)
obs (4000, 64, 64, 3)
Using dummy solution for RGBBCRobot1
tensor([0, 0, 0,  ..., 2, 2, 3])
torch.Size([4000, 3, 64, 64])
loss 1.4167950397882707
loss 1.4151402497902894
loss 1.412885302152389
loss 1.4051742217479608
loss 1.3384582293339264
loss 1.0823844701815875
loss 1.0128055062049475
loss 0.990216137507023
loss 0.9802308953725375
loss 0.9723199881040133
loss 0.9707029171479054
loss 0.9575655857721964
loss 0.950916038109706
loss 0.9492045381130316
loss 0.9452001773394071
loss 0.9375186944619204
loss 0.932464232811561
loss 0.9236517319312463
loss 0.9294846485822629
loss 0.9205345205771618
loss 0.9160658350357642
loss 0.9064350601954337
loss 0.9015761904227428
loss 0.8973910136100574
loss 0.8913263663267478
loss 0.8950373652653817
loss 0.8866803630804404
loss 0.87954803613516
loss 0.8780925717109289
loss 0.8786089145220243
loss 0.8693983249175243
loss 0.8657529155413309
loss 0.8611573179562887
loss 0.8581631091924814
loss 0.8503025892453316
loss 0.8479902056547312
loss

# Part III. Behavioral cloning with visual observations - multiple maps

This part is the same as  Part II except that it is trained and tested differently. **The training set involves expert demonstrations for the two new obstacle maps. And while testing, for each trial, a different obstacle map is randomly selected.** This means that the model has to learn how to reason about what an obstacle is, and how to go around it, based on nothing more than an image. The main objective of this part is to show that, when using a CNN, it is possible for a model to achieve this. The evaluation method for this part is the same as Part I and II.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(0)


from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader

import numpy as np
import math

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt


class MyCNN(nn.Module):

    def __init__(self):
        super(MyCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(2704, 512)
        self.fc2 = nn.Linear(512, 64)
        self.fc3 = nn.Linear(64, 4)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def predict(self, features):
        """
        Function receives a numpy array, converts to torch, returns numpy again
        """
        self.eval()  # Sets network in eval mode (vs training mode)
        features = torch.from_numpy(features).float()
        return self.forward(features).detach().numpy()
  

class MyDataset(Dataset):
    def __init__(self, labels, features):
        super(MyDataset, self).__init__()
        self.labels = labels
        self.features = features

    def __len__(self):
        return self.features.shape[0]

    def __getitem__(self,
                    idx):  # This tells torch how to extract a single datapoint from a dataset, Torch randomized and needs a way to get the nth-datapoint
        feature = self.features[idx]
        label = self.labels[idx]
        return {'feature': feature, 'label': label}


class MyCNNTrain(object):
    def __init__(self, network):  # Networks is of datatype MyDNN
        self.network = network
        self.learning_rate = 0.01
        self.optimizer = torch.optim.SGD(self.network.parameters(), lr=self.learning_rate, momentum = 0.9)
        self.criterion = nn.CrossEntropyLoss()
        self.num_epochs = 40
        self.batchsize = 100
        self.shuffle = True

    def train(self, labels, features):
        self.network.train()
        dataset = MyDataset(labels, features)
        loader = DataLoader(dataset, shuffle=self.shuffle, batch_size=self.batchsize)
        for epoch in range(self.num_epochs):
            self.train_epoch(loader)

    def train_epoch(self, loader):
        total_loss = 0.0
        for i, data in enumerate(loader):
            features = data['feature'].float()
            labels = data['label'].long()
            self.optimizer.zero_grad()
            predictions = self.network.forward(features)
            # print(predictions)
            loss = self.criterion(predictions, labels)
            loss.backward()
            total_loss += loss.item()
            self.optimizer.step()
        print('loss', total_loss / i)



class RGBBCRobot2(RobotPolicy):

    def train(self, data):
        for key, val in data.items():
            print(key, val.shape)
        print("Using dummy solution for RGBBCRobot1")
        AgentAction = data['actions']
        labels = np.asarray(AgentAction)
        labels = torch.from_numpy(labels)
        # labels = F.one_hot(labels, num_classes=4)
        print(labels)


        obs = data['obs']
        obs = obs.transpose((0,3,1,2))
        obs = torch.from_numpy(obs).float()
        print(obs.shape)

        self.network = MyCNN()
        self.trainer = MyCNNTrain(self.network)
        self.trainer.train(labels, obs)
        
        pass

    def get_action(self, obs):
        self.network.eval()
        obs = obs.transpose((2,0,1))
        # print(obs.shape)
        obs = torch.from_numpy(obs).float()
        obs = obs.unsqueeze(0)
        # print(obs.shape)
        obs = obs.numpy()
        action = self.network.predict(obs)
        return np.argmax(action)


## Evaluation and Grading


In [None]:
core3 = score_policy.score_rgb_bc2(policy=RGBBCRobot2(), gui=False, model=None)
grade3 = score3 / part3_bound * 6 if score3 < part3_bound else 6

print('\n---')
print(f'Part 3 Score: {score3}')
print(f'Part 3 Grade: {score3:.2f} / {part3_bound:.2f} * 6 = {grade3:.2f}')

actions (12000,)
obs (12000, 64, 64, 3)
Using dummy solution for RGBBCRobot1
tensor([0, 0, 0,  ..., 1, 0, 0])
torch.Size([12000, 3, 64, 64])
loss 1.3802767070401616
loss 1.3548624124847541
loss 1.2919049202894963
loss 1.11726879572668
loss 1.085056000396985
loss 1.0753861275039802
loss 1.0629730655365632
loss 1.055064192339152
loss 1.0435278701181172
loss 1.0304056255757308
loss 0.9303386046105072
loss 0.5912367744105202
loss 0.3969051444730839
loss 0.2788687447289459
loss 0.24782643561102763
loss 0.1995091384448925
loss 0.1694784928895846
loss 0.14901856120143617
loss 0.13251018586779842
loss 0.11747746507660682
loss 0.10194553744767894
loss 0.09707912672780641
loss 0.08538869603247452
loss 0.08032164884684216
loss 0.07215447633081123
loss 0.06717237683811358
loss 0.059370339777664975
loss 0.06173123870858876
loss 0.05109305284573
loss 0.04842735465788165
loss 0.04214916330128282
loss 0.040242882928724924
loss 0.03994752047387805
loss 0.036052964921226775
loss 0.033973311604305854
los