# **MECS6616 Spring 2025 - Project 2**

# **Introduction**

***IMPORTANT:***
- **Before starting, make sure to read the [Assignment Instructions](https://courseworks2.columbia.edu/courses/197115/pages/assignment-instructions) page on Courseworks to understand the workflow and submission requirements for this project.**

This project aims to demonstrate how neural networks can be used in a robotics setting. We will continue using the 2D maze environment introduced in Project 1 and learn to navigate an agent to a goal. Since neural networks can be more powerful models than the ones we had access to in Project 1, we can afford to make some changes to the 2D maze environment and make the problem more difficult. The project is divided into three parts: In Part I, you will train a simple Deep Neural Network (DNN) to predict the optimal action towards the goal given the agent position and the goal position. In Parts II and III, you will train Convolutional Neural Networks (CNNs) to predict the optimal action given images of the maze environment.

<div>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/P1_side.png?raw=true" width="300"/>
</div>

The figure above illustrates the simulation world, where the "robot" (also referred to as "agent") is represented by a green dot, and the goal location is marked by a red square. The agent's objective is to navigate to this goal location, avoiding any obstacles (depicted as black boxes) along the way.

**Unlike the previous project, the robot and the goal are spawned at random positions in the maze.** Also, the action space now contains all four directions: 'up', 'down', 'left' and 'right'. Another change is that, in addition to the obstacle map shown above, we introduce two new obstacle maps as shown below. However, these new maps will not be used until Part III.

<div>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map1.png?raw=true" width="300"/>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map2.png?raw=true" width="300"/>
<img src="https://github.com/roamlab/robot-learning-S2023/blob/main/project2/imgs/map3.png?raw=true" width="300"/>
</div>

We want to learn to navigate the agent by imitating demonstrations from an expert user. In all three parts, you will be using data collected by a human controlling the agent via a keyboard for training.

# **Project Setup (do NOT change)**

***IMPORTANT:***
- Do NOT change this "*Project Setup*" section
- Do NOT install any other dependencies or a different version of an already provided package. You may, however, import other packages
- Your code should go under the subsequent sections with headings "*Part 1*", "*Part 2*", and "*Part 3*"
- You may find it useful to minimize sections using the arrows located to the left of each section heading
- You may not use pre-trained models or any form of transfer learning for Part 2 and Part 3

You will be accessing data files located in a Google Drive folder. The following cell downloads the data from the cloud

In [None]:
# DO NOT CHANGE
# Download data
!wget https://www.dropbox.com/scl/fi/gy1d0ifkwuusmdjv796dl/project2.zip?rlkey=h6wresrsqxiryhlvrssjla5hn&st=cfvqccqm&dl=0
!mv project2.zip?rlkey=h6wresrsqxiryhlvrssjla5hn project2.zip

--2025-03-11 19:10:22--  https://www.dropbox.com/scl/fi/gy1d0ifkwuusmdjv796dl/project2.zip?rlkey=h6wresrsqxiryhlvrssjla5hn
Resolving www.dropbox.com (www.dropbox.com)... 162.125.5.18, 2620:100:601d:18::a27d:512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.5.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uca4329450164fe44cc525ac05f3.dl.dropboxusercontent.com/cd/0/inline/ClqTBPAjSb-q1TC1Dd7AcrA7UVLpC9k-gZQmHDdKqoYec0Gwm3Ax82J6eLVWD4nZAYTRoBEDCDIXN-Q7yN5hTfVSAJx0CAc3np3g07P9dk6KXrxfkZ8Xa_FU13Gn2ZLUX1xpNhyxnxcj1qRyGGKbs4NT/file# [following]
--2025-03-11 19:10:22--  https://uca4329450164fe44cc525ac05f3.dl.dropboxusercontent.com/cd/0/inline/ClqTBPAjSb-q1TC1Dd7AcrA7UVLpC9k-gZQmHDdKqoYec0Gwm3Ax82J6eLVWD4nZAYTRoBEDCDIXN-Q7yN5hTfVSAJx0CAc3np3g07P9dk6KXrxfkZ8Xa_FU13Gn2ZLUX1xpNhyxnxcj1qRyGGKbs4NT/file
Resolving uca4329450164fe44cc525ac05f3.dl.dropboxusercontent.com (uca4329450164fe44cc525ac05f3.dl.dropboxusercontent.com)... 162.125.5.15, 2620:1

In [None]:
# Make sure you have successfully uploaded the zip file to Colab before running the line below.
# If wget fails to pull the zip file, you can download the zipfile from dropbox and manually upload it to collab instead
# If you do decide to manually upload the file, use the dropbox link in the previous cell (after wget) to access the file
# Make sure the zip file is named "project2.zip", rename it before uploading (if necessary)
# Upload the entire zip file to google colab. Do not unzip before uploading

# Unzip the uploaded zip file
!unzip -o project2.zip -d /content/

Archive:  project2.zip
   creating: /content/mjcf/
  inflating: /content/mjcf/point_mass.xml  
   creating: /content/mjcf/common/
  inflating: /content/mjcf/common/skybox.xml  
  inflating: /content/mjcf/common/visual.xml  
  inflating: /content/mjcf/common/materials.xml  
  inflating: /content/mjcf/test_mjcf.xml  
  inflating: /content/dnn.py         
   creating: /content/imgs/
  inflating: /content/imgs/P1_side.png  
  inflating: /content/imgs/map1.png  
  inflating: /content/imgs/map3.png  
  inflating: /content/imgs/map2.png  
  inflating: /content/score_policy.py  
  inflating: /content/simple_maze.py  
  inflating: /content/data_utils.py  
   creating: /content/data/
  inflating: /content/data/map1.pkl  
  inflating: /content/data/all_maps.pkl  


In [None]:
# DO NOT CHANGE

# Install required packages
!pip install pybullet numpngw

Collecting pybullet
  Downloading pybullet-3.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting numpngw
  Downloading numpngw-0.1.4-py3-none-any.whl.metadata (14 kB)
Downloading pybullet-3.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (103.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.2/103.2 MB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpngw-0.1.4-py3-none-any.whl (21 kB)
Installing collected packages: pybullet, numpngw
Successfully installed numpngw-0.1.4 pybullet-3.2.7


# Part I. Behavioral cloning with low dimensional data

This part is a natural extension of Part II in Project 1, where your agent needs to learn a policy using labeled examples from an expert.

Each labeled example $i$ will contain a tuple of the form $(o, a)^i$, where $o$ represents an observation and $a$ represents the action taken by the expert given that observation. You must simply learn to imitate the expert, a process also known as behavioral cloning. Note that while the observation space will be different in each part, the action space is the same for the rest of the project.

We will be training a DNN policy to predict an action to be taken ('up', 'down', 'left', and 'right') based on the observation. **In Part I, the observation will contain the agent position and the current goal position.** (Since the goal is sampled randomly, the policy has to know the current goal to be reached). The environment thus returns an observation array of size (4, ) where the agent position is contained in the first two axes and the current goal position is contained in the next two. **In Part I, the map that the robot is navigating is always the same.**

PyTorch and Tensorflow are two popular frameworks for building and training neural networks but for this class, we will be exclusively using PyTorch and you are allowed to use any of its features. A good starting point can be found [here](https://github.com/roamlab/robot-learning-S2024/blob/main/dnn_example.py).

You will implement a class that inherits from `RobotPolicy` by providing implementations for the abstract methods from the class. These abstract methods will be re-used by future parts of the project, so do not edit them.



**NOTES:**
- The problem is about behavioral cloning which means we want to train a deep neural network - so at least one hidden layer- to imitate an expert's actions based on observations.

- Agents position: (x,y)
- Goal's position (x,y)

- Observations: 4D vector: (4000, 4)
  - First few X values:
  - The first two are the agents position
  - the last two are the goals position

[[-0.38606754 -0.37158364  0.0612401   0.38303697]

 [-0.38606754 -0.28158364  0.0612401   0.38303697]

 [-0.38606754 -0.18158363  0.0612401   0.38303697]

 [-0.38606754 -0.08158363  0.0612401   0 38303697]

 [-0.38606754  0.01841637  0.0612401   0.38303697]]

- Actions: (4000,)
  - (0) up, (1) down, (2) left, (3) right
  - classification so we can use cross entropy loss as the loss function

**Neural Network Architecture:**

- Input:
  - 4 neurons, one for each of the current states: agent and goal position
- Hidden Layers:
  - simplest 1:
  - two hidden layers:
  - ReLU activation function


- output:
  - fully connected so 4 output neurons
  - Softmax for output

- loss function:
  - Cross entropy loss : classification

In [None]:
# DO NOT CHANGE
# base class

import abc


class RobotPolicy(abc.ABC):

    @abc.abstractmethod
    def train(self, data):
        """
            Abstract method for training a policy.

            Args:
                data: a dict that contains X (key = 'obs') and y (key = 'actions').

                X is either rgb image (N, 64, 64, 3) OR  agent & goal pos (N, 4)

            Returns:
                This method does not return anything. It will just need to update the
                property of a RobotPolicy instance.
        """

    @abc.abstractmethod
    def get_action(self, obs):
        """
            Abstract method for getting action. You can do data preprocessing and feed
            forward of your trained model here.
            Args:
                obs: an observation (64 x 64 x 3) rgb image OR (4, ) positions

            Returns:
                action: an integer between 0 to 3
        """

In [None]:
# Implement your solution for Part 1 below
import torch
import torch.nn as nn
import torch.optim as optim
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import random
from torch.optim.lr_scheduler import StepLR


def set_seed(seed=42):
   torch.manual_seed(seed)
   torch.cuda.manual_seed_all(seed)  # Even though CPU, this doesn't hurt.
   np.random.seed(seed
)
   random.seed(seed)
   torch.backends.cudnn.deterministic = True
   torch.backends.cudnn.benchmark = False


set_seed(42)


class POSBCRobot(RobotPolicy):

    def __init__(self):

      super().__init__()

      self.model = nn.Sequential(
          #4 features to n^5 neurons in hidden layer
          nn.Linear(4,32),
          #normalizes activations
          nn.BatchNorm1d(32),
          #Activation function
          nn.ReLU(),
          #second hidden layer
          nn.Linear(32, 64),
          nn.BatchNorm1d(64),
          nn.ReLU(),
          #third hidden layer
          nn.Linear(64, 32),
          nn.BatchNorm1d(32),
          #activation function 2
          nn.ReLU(),
          #32 to 4 output
          nn.Linear(32,4)
      )

      self.criterion = nn.CrossEntropyLoss()
      self.optimizer = optim.AdamW(self.model.parameters(), lr=0.01, weight_decay=1e-4)


    def train(self, data):

        X_data = data['obs']
        y_data = data['actions']

        #formatting into tensors for pytorch

        X = torch.tensor(data['obs'], dtype=torch.float32)
        y = torch.tensor(data['actions'], dtype = torch.long)

        #trainig loop

        num_epochs = 800

        #this is where we update the models weights using back propagation and gradient decent

        for epoch in range(num_epochs):

          #this clears the gradients before computing new ones
            self.optimizer.zero_grad()
          #input data --> the model will predict logits for each action
            outputs = self.model(X)

          #How confident the model is about taking each action --> it'll pick the highest prob
          #outputs = tensor([[-1.23, 2.56, 0.87, -0.41]])
          #compare's the models. predictions to true labels

            loss = self.criterion(outputs,y)
          #backpropagation computes the gradients
            loss.backward()
          #use the computed gradients to adjust model params
            self.optimizer.step()

            if epoch % 2 == 0:
                print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")


    def get_action(self, obs):
        obs_tensor = torch.tensor(obs, dtype=torch.float32).unsqueeze(0)  # Convert to tensor & add batch dimension



        self.model.eval()
        with torch.no_grad():  # Disable gradient computation
            output = self.model(obs_tensor)  # Get action scores
            action = torch.argmax(output).item()  # Pick the action with the highest score



        self.model.train()
        return action



## Evaluation and Grading

We will evaluate your model by simply having the agent follow the commands that it provides.  We will evaluate for 100 different randomly sampled starting positions and goals. For each goal, we roll out the trained policy for 50 steps. After the 50 steps, we will evaluate the closest distance to the goal the agent has ended up. If the agent reaches < 0.1 distance from the goal, the episode is ended before 50 steps and the minimum distance will be recorded as 0. The score is the fraction of the initial distance to goal covered by the agent averaged over 100 trials. Your final grade will be computed based on this score.

We will calculate the score using the formula :

```score = avg[(init_dist -  min_dist) / init_dist]```

We will auto-generate your grades using the code below. The grading of each part is separate from each other so you can get the grade right after each part is finished.

The total points of this assignment are 15. According to the difficulty level of each part, parts 1, 2, and 3 have 4, 5, 6 points respectively.

- Part 1: if your score >= 0.99, you will receive 4 / 4. Otherwise, your final grade will be score / 0.99 * 4.
- Part 2: if your score >= 0.95, you will receive 5 / 5. Otherwise, your final grade will be score / 0.95 * 5.
- Part 3: if your score >= 0.95, you will receive 6 / 6. Otherwise, your final grade will be score / 0.95 * 6.

The score function for each part provides two extra arguments to assist your debugging.

- gui: If this is set to True, you will save the behavior of the agents during evaluation as an animation file. This animation file can be visualized using the provided code below to help you understand the behavior of the agent. **Please set it to False before your submission as it will slow down evaluation.**
- model: If you provide a path to a saved model, the score function will not train from scratch but will instead load the save model. **Please set it to None before submission.** Any models you generate during runtime will be automatically deleted when disconnected. The grader will train the model from scratch.

In [None]:
# DO NOT CHANGE
# Set up grading

import score_policy
import importlib
importlib.reload(score_policy)
from IPython.display import Image


part1_bound = 0.99
part2_bound = 0.95
part3_bound = 0.95

In [None]:
# DO NOT CHANGE
# Getting the score and grade for Part 1

score1 = score_policy.score_pos_bc(policy=POSBCRobot(), gui=False, model=None)
grade1 = score1 / part1_bound * 4 if score1 < part1_bound else 4

print('\n---')
print(f'Part 1 Score: {score1}')
print(f'Part 1 Grade: {score1:.2f} / {part1_bound:.2f} * 4 = {grade1:.2f}')

Epoch [1/800], Loss: 1.4681
Epoch [3/800], Loss: 0.9381
Epoch [5/800], Loss: 0.8178
Epoch [7/800], Loss: 0.7304
Epoch [9/800], Loss: 0.6542
Epoch [11/800], Loss: 0.5911
Epoch [13/800], Loss: 0.5326
Epoch [15/800], Loss: 0.4871
Epoch [17/800], Loss: 0.4504
Epoch [19/800], Loss: 0.4203
Epoch [21/800], Loss: 0.3960
Epoch [23/800], Loss: 0.3762
Epoch [25/800], Loss: 0.3610
Epoch [27/800], Loss: 0.3484
Epoch [29/800], Loss: 0.3375
Epoch [31/800], Loss: 0.3270
Epoch [33/800], Loss: 0.3170
Epoch [35/800], Loss: 0.3060
Epoch [37/800], Loss: 0.2965
Epoch [39/800], Loss: 0.2870
Epoch [41/800], Loss: 0.2775
Epoch [43/800], Loss: 0.2675
Epoch [45/800], Loss: 0.2582
Epoch [47/800], Loss: 0.2493
Epoch [49/800], Loss: 0.2401
Epoch [51/800], Loss: 0.2324
Epoch [53/800], Loss: 0.2348
Epoch [55/800], Loss: 0.2288
Epoch [57/800], Loss: 0.2228
Epoch [59/800], Loss: 0.2102
Epoch [61/800], Loss: 0.2016
Epoch [63/800], Loss: 0.1943
Epoch [65/800], Loss: 0.1911
Epoch [67/800], Loss: 0.1879
Epoch [69/800], Los

In [None]:
# Optionally, uncomment and run the code below if you have saved an animation (gui = True) that you want to visualize.

# Image(filename='part_1_anim.png', width=200, height=200)

# Part II. Behavioral cloning with visual observations

In this part, you are asked to do a similar task as Part I, **but the observations will be RGB image observations of the world**, similar to the ones you used to do localization in Part III of Project 1. To process the RGB images, you will be implementing a CNN using PyTorch. [The official PyTorch tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) is a good starting point. As in Part I, the map that the robot is navigating is always the same. **This means that your model really only has to learn how to figure out where the robot and the goal are located, and how to navigate around a fixed set of obstacles.**

All requirements from your code, as well as the evaluation method, are unchanged compared to Part I. The only difference is the nature of the observation that is provided to you.

**NOTES:**

SHAPES AND DATA:
- actions shape: (4000,)
- obs shape: (4000, 64, 64, 3)
- Shape of X (observations): (4000, 64, 64, 3)
- Shape of y (actions): (4000,)
- Unique actions in dataset: {0, 1, 2, 3}
- First image shape: (64, 64, 3)
- First action label: 0

AFTER CONVERSION:
- Converted X tensor shape: torch.Size([4000, 3, 64, 64])
- Converted y tensor shape: torch.Size([4000])

Observations (obs) → (4000, 64, 64, 3) → 4000 RGB images of size 64×64

Actions (actions) → (4000,) → 4000 labels (one per image)

For a Convolutional Neural Network (CNN), PyTorch expects the images in (N, C, H, W) format:

- N = Number of samples (4000)
- C = Channels (RGB → 3)
- H = Height (64)
- W = Width (64)

**A typical CNN consists of:**

- Convolutional Layers:
    - Detect spatial patterns (edges, shapes).
- Activation Functions (ReLU):
    - Introduce non-linearity.
- Pooling Layers (MaxPooling):
    - Reduce spatial dimensions.
- Fully Connected (FC) Layers:
    - Classify the image into one of 4 actions.
- Softmax or Logits Output:
    - Convert model outputs into class probabilities.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import random
from torch.optim.lr_scheduler import StepLR


def set_seed(seed=42):
   torch.manual_seed(seed)
   torch.cuda.manual_seed_all(seed)  # Even though CPU, this doesn't hurt.
   np.random.seed(seed
)
   random.seed(seed)
   torch.backends.cudnn.deterministic = True
   torch.backends.cudnn.benchmark = False


set_seed(42)


class RGBBCRobot1(nn.Module):
   def __init__(self, num_classes=4, lr=0.0001, betas=(0.9, 0.999), weight_decay=0):
       super(RGBBCRobot1, self).__init__()


       # Define CNN model using nn.Sequential
       self.model = nn.Sequential(
           # First Conv Block
           nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
           nn.BatchNorm2d(32),
           nn.LeakyReLU(negative_slope=0.05),
           nn.MaxPool2d(2, 2),


           # Second Conv Block
           nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
           nn.BatchNorm2d(64),
           nn.LeakyReLU(negative_slope=0.05),
           nn.MaxPool2d(2, 2),


           # Third Conv Block
           nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
           nn.BatchNorm2d(128),
           nn.LeakyReLU(negative_slope=0.05),
           nn.MaxPool2d(2, 2),


           # Flatten & Fully Connected Layers
           nn.Flatten(),
           nn.Linear(8192, 512),  # Adjusted for output size
           nn.LeakyReLU(negative_slope=0.05),
           #nn.Dropout(0.3),

           nn.Linear(512, 256),
           nn.LeakyReLU(negative_slope=0.05),
           nn.Linear(256, num_classes)
       )


       # Loss Function
       self.criterion = nn.CrossEntropyLoss()


       # Optimizer try .000198
       #self.optimizer = optim.Adam(self.model.parameters(), lr=.000099, betas=(0.9, 0.999), weight_decay=0)
       #batchs ize 30
       self.optimizer = optim.Adam(self.model.parameters(), lr=.000045, betas=(0.9, 0.999), weight_decay=0)
       #self.optimizer = optim.Adam(self.model.parameters(), lr=.00022, betas=(0.9, 0.999), weight_decay=0)
       #BEST! num epochs 11 and batch size 30
       #self.scheduler = StepLR(self.optimizer, step_size=5, gamma=0.45)
       #self.scheduler = StepLR(self.optimizer, step_size=6, gamma=0.45)


   def forward(self, x):
       return self.model(x)


   def train(self, data, num_epochs=12, batch_size=30):
       """ Train model with Adam optimizer and optional scheduler """
       if isinstance(data, bool):
           nn.Module.train(self, data)
           return


       # Extract images & labels
       X, y = data['obs'], data['actions']
       # Convert to tensors
       X_tensor = torch.tensor(X, dtype=torch.float32).permute(0, 3, 1, 2)  # (N, 3, 64, 64)
       y_tensor = torch.tensor(y, dtype=torch.long)


       # Create DataLoader
       dataset = TensorDataset(X_tensor, y_tensor)
       print(f"[DEBUG] Converted X_tensor shape: {X_tensor.shape}")
       dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)


       # Training Loop
       for epoch in range(num_epochs):
           total_loss = 0.0
           for inputs, targets in dataloader:
               self.optimizer.zero_grad()
               outputs = self.forward(inputs)
               loss = self.criterion(outputs, targets)
               loss.backward()
               self.optimizer.step()
               total_loss += loss.item()
           #self.scheduler.step()


           avg_loss = total_loss / len(dataloader)
           print(f"Epoch [{epoch+1}/{num_epochs}] - Avg Loss: {avg_loss:.4f}")


   def get_action(self, obs):
    #

      #.eval() removes the effects of any dropout layers
       """ Get predicted action from observation """
       obs = obs * 255.0  # Scale input if needed

       obs_tensor = torch.tensor(obs, dtype=torch.float32).permute(2, 0, 1).unsqueeze(0)

       with torch.no_grad():
           predictions = self.forward(obs_tensor)
           #instead of torch.max apply soft max = should give four probabilities one for each action
           #sample from that distribution, scaling action based on how
           #convert from the logits to probabilities
           #_, action_position = torch.max(predictions, dim=1)
           #action = action_position.item()


           action_probabilities = F.softmax(predictions, dim=1)
           #print("Action probabilities:", action_probabilities[0].cpu().numpy())

           action = torch.multinomial(action_probabilities[0], num_samples=1).item()

       return action

#rather than taking the maximum you can build in stoachstiity by softmaxing the output
# Define hyperparameter grid including scheduler options




## Evaluation and Grading

In [None]:
# DO NOT CHANGE
# Getting the score and grade for Part 2
#Next should be the

score2 = score_policy.score_rgb_bc1(policy=RGBBCRobot1(), gui=False, model=None)
grade2 = score2 / part2_bound * 5 if score2 < part2_bound else 5

print('\n---')

print(f'Part 2 Score: {score2}')
print(f'Part 2 Grade: {score2:.2f} / {part2_bound:.2f} * 5 = {grade2:.2f}')

[DEBUG] Converted X_tensor shape: torch.Size([4000, 3, 64, 64])
Epoch [1/12] - Avg Loss: 1.1412
Epoch [2/12] - Avg Loss: 0.7588
Epoch [3/12] - Avg Loss: 0.4710
Epoch [4/12] - Avg Loss: 0.3678
Epoch [5/12] - Avg Loss: 0.3146
Epoch [6/12] - Avg Loss: 0.2770
Epoch [7/12] - Avg Loss: 0.2413
Epoch [8/12] - Avg Loss: 0.2130
Epoch [9/12] - Avg Loss: 0.1925
Epoch [10/12] - Avg Loss: 0.1719
Epoch [11/12] - Avg Loss: 0.1525
Epoch [12/12] - Avg Loss: 0.1401

---
Part 2 Score: 0.93581729749514
Part 2 Grade: 0.94 / 0.95 * 5 = 4.93


In [None]:
# Optionally, uncomment and run the code below if you have saved an animation (gui = True) that you want to visualize.

# Image(filename='part_2_anim.png', width=200, height=200)

# Part III. Behavioral cloning with visual observations - multiple maps

This part is the same as  Part II except that it is trained and tested differently. **The training set involves expert demonstrations for the two new obstacle maps. And while testing, for each trial, a different obstacle map is randomly selected.** This means that your model has to learn how to reason about what an obstacle is, and how to go around it, based on nothing more than an image. The main objective of this part is to show that, when using a CNN, it is possible for a model to achieve this. The evaluation method for this part is the same as Part I and II.

- Leaky ReLU .01, Batch Size 16, Epochs 3; takes 8min
- Drop out (.3) top fc layer
  - lr: .000001 = 0.48
  - lr: .00001 = 0.60
  - lr: .0001 = 0.78
  - lr: .0003 = 0.72
  - lr: .00048 = 0.67
  - lr: .00049 =
  - lr: .0005 = 0.83
  - lr: .00051 =
  - lr: .00052 =
  - lr: .00055 = 0.64
  - lr: .0007 = 0.70
  - lr: .001 = 0.63
  - lr: .01 = 0.22

- Drop out (.35)
- lr: .0005 = 0.75

- Drop out (.25)
- lr: .0005 = .61

- Drop out (.31)
- lr: .0005 = .64

nn.LeakyReLU(negative_slope=0.001), lr: .0005 ,
Drop out(.30)
- .68

nn.LeakyReLU(negative_slope=0.05), lr: .0005
- .56

nn.LeakyReLU(negative_slope=0.005), lr: .0005
- .70

----------------------------------
TWO LAYERS
 Dropout (.3) and (.3)
- .71
Dropout (.3) and (.1) lr = .0005
- .63
Dropout (.3) and (.1) lr = .001
-

In [None]:
# Implement your solution for Part 3 below


class RGBBCRobot2(RobotPolicy):

    def train(self, data):
        for key, val in data.items():
            print(key, val.shape)
        print("Using dummy solution for RGBBCRobot2")
        pass

    def get_action(self, obs):
    	return 0

## Evaluation and Grading


In [None]:
# Optionally, uncomment and run the code below if you have saved an animation (gui = True) that you want to visualize.

# Image(filename='part_3_anim.png', width=200, height=200)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import torch.nn.functional as F
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import random

class RGBBCRobot2(nn.Module):
    def __init__(self, num_classes=4):

        super(RGBBCRobot2, self).__init__()  # Ensure parent constructor is called!
        self.model = nn.Sequential(
           # First Conv Block
           nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
           nn.BatchNorm2d(32),
           nn.LeakyReLU(negative_slope=0.01),
           nn.MaxPool2d(2, 2),


           # Second Conv Block
           nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
           nn.BatchNorm2d(64),
           nn.LeakyReLU(negative_slope=0.01),
           nn.MaxPool2d(2, 2),


           # Third Conv Block
           nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
           nn.BatchNorm2d(128),
           nn.LeakyReLU(negative_slope=0.01),
           nn.MaxPool2d(2, 2),


           # Flatten & Fully Connected Layers
           nn.Flatten(),
           nn.Linear(8192, 512),  # Adjusted for output size
           nn.LeakyReLU(negative_slope=0.01),
           #nn.Dropout(0.6),

           nn.Linear(512, 256),
           nn.LeakyReLU(negative_slope=0.01),
           nn.Dropout(0.35),
           nn.Linear(256, num_classes)
        )

       # Loss Function
        self.criterion = nn.CrossEntropyLoss()

        #self.optimizer = optim.Adam(self.model.parameters(), lr=.0005, betas=(0.9, 0.999), weight_decay=0)

        self.optimizer = optim.Adam(self.model.parameters(), lr=.0006, betas=(0.9, 0.99), weight_decay=0)

    def forward(self, x):
        return self.model(x)

# 128 = .23
# 64 = .239
# 30 =  .26

#DROP OUT IN ALL LAYERS
#.03 = .309

#In only bottom layer
#0.6 = .268

#DROP OUT IN TOP 3 CONV LAYERS
#dropping the drop out to .2 in all three layers =  .304
#dropping the drop out to .25 in all three layers =  .315
#upping the drop out to .3 in all three layers = .3275
#upping the drop out to .4 in all three layers = .20


    def train(self, data, num_epochs=3, batch_size=15):

        if isinstance(data, bool):
            nn.Module.train(self, data)
            return

        # Extract images & labels
        X, y = data['obs'], data['actions']

        # Convert to tensors
        X_tensor = torch.tensor(X, dtype=torch.float32).permute(0, 3, 1, 2)  # Shape: (N, 3, 64, 64)
        y_tensor = torch.tensor(y, dtype=torch.long)

        # Create DataLoader
        dataset = TensorDataset(X_tensor, y_tensor)
        print(f"[DEBUG] Converted X_tensor shape: {X_tensor.shape}")
        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

        # Training Loop
        for epoch in range(num_epochs):
            total_loss = 0.0
            for inputs, targets in dataloader:
                self.optimizer.zero_grad()
                outputs = self.forward(inputs)
                loss = self.criterion(outputs, targets)
                loss.backward()
                self.optimizer.step()

                total_loss += loss.item()

            avg_loss = total_loss / len(dataloader)
            print(f"Epoch [{epoch+1}/{num_epochs}] - Avg Loss: {avg_loss:.4f}")

    def get_action(self, obs):
        """ Get predicted action from observation """
        #print(f"[DEBUG] obs shape: {obs.shape}, type: {type(obs)}")
        #print(f"[DEBUG] obs min: {obs.min()}, max: {obs.max()}")

        #self.model.eval()

        obs = obs * 255.0  # If needed, scale input (ensure training and inference scaling match)

        obs_tensor = torch.tensor(obs, dtype=torch.float32).permute(2, 0, 1).unsqueeze(0)
        #print(f"[DEBUG] Converted obs_tensor shape: {obs_tensor.shape}")

        with torch.no_grad():
            predictions = self.forward(obs_tensor)
            action_probabilities = F.softmax(predictions, dim=1)

            action = torch.multinomial(action_probabilities[0], num_samples=1).item()

        return action

In [None]:
# DO NOT CHANGE
# Getting the score and grade for Part 3

score3 = score_policy.score_rgb_bc2(policy=RGBBCRobot2(), gui=False, model=None)
grade3 = score3 / part3_bound * 6 if score3 < part3_bound else 6

print('\n---')
print(f'Part 3 Score: {score3}')
print(f'Part 3 Grade: {score3:.2f} / {part3_bound:.2f} * 6 = {grade3:.2f}')

[DEBUG] Converted X_tensor shape: torch.Size([12000, 3, 64, 64])
Epoch [1/3] - Avg Loss: 0.6861
Epoch [2/3] - Avg Loss: 0.3333
Epoch [3/3] - Avg Loss: 0.2903

---
Part 3 Score: 0.7690467490816303
Part 3 Grade: 0.77 / 0.95 * 6 = 4.86


# Other Requirements and Hints

- **Training time**: To keep auto-grading feasible, your total training time must be strictly under 3 mins, 15mins, and 10 mins for parts 1, 2, and 3. These time budgets are more than enough to achieve full credits on this project. Note that longer training time does not necessarily mean higher performance because of overfitting. The faster your network trains the better!
- **Memory usage**: Make sure your code does not require too much memory. The required amount of RAM for this assignment should not be more than 8GB.
- **NO GPU**: No GPU is required or allowed for this assignment.
- **Reproducibility**: We have ensured that the randomness of the environment is deterministic. To get reproducible scores you must ensure your model training and prediction are also reproducible. The randomly initialized weights of the neural network should be made repeatable using seeding. You can add PyTorch seeding method below and see [PyTorch Reproducibility](https://pytorch.org/docs/stable/notes/randomness.html) to learn more.
  ```
  import torch
  torch.manual_seed(0)
  ```
- **Classifier**: In all the parts we are training a neural network to solve a classification problem and it is important to use a reasonable loss function. For example, the MSE (mean squared classification) error has drawbacks related to sensitivity. Cross entropy loss usually has good performance for classification tasks and you can find the documentation for it [here](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) and is further explained below. However, note that you are free to use any loss function you like.
  - Cross entropy is a concept from information theory which is defined for two probability distributions. Cross entropy is minimum when the two distributions involved are the same and this is the property that makes it useful as a loss function in the context of machine learning. The idea is to minimize the cross entropy between the prediction distribution and the label distribution. For our case where we are training a neural network for classification, we can have the network output a score for each action. Cross entropy can be computed from these scores by converting to probability values (using softmax) and comparing it with the label distribution. The label distribution is obtained simply by assigning a probability of 1 to ground truth action and 0 to all other actions. Once trained, the best action can found by just choosing the action with the highest probability (i.e., the highest score) as predicted by the network.
- **Optimizer**: While it is possible to use a simple optimizer to achieve the desired accuracy, the training time can be quite high. There exist a number of optimizers implemented in PyTorch that have much faster convergence.
- **Parameter tuning**: Keep your architectures simple and slowly add complexity (more layers/kernels) to improve accuracy. Remember "To Err is Human" and the expert data (collected by a human) that you are training on is not perfect. Having a 100% training accuracy (very small training loss) might not be the best for achieving the highest score. So make sure your model does not overfit during training.
- **PyTorch input shape**: Notice that the expected input shape to CONV2D in PyTorch is (N, C, H, W), where N is the batch size, C is the number of channels, H is the image height and W is the image width. You will need to switch axes for the incoming images in order for them to be correctly passed to the first convolution layer.