<a href="https://colab.research.google.com/github/Seph-iroth/RoboLearning/blob/main/mecs6616_Spring2024_Project1_JL6080.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **MECS6616 Spring 2024 - Project 1**

# **Introduction**

***IMPORTANT:***
- **Before starting, make sure to read the [Assignment Instructions](https://courseworks2.columbia.edu/courses/197115/pages/assignment-instructions) page on Courseworks to understand the workflow and submission requirements for this project.**


This project applies classical machine learning techniques within a robotics context. Specifically, you will develop a navigation agent designed to maneuver through a simple 2D maze environment.

<div>
<img src="https://drive.google.com/uc?id=1mSpegY1psdek3Lgh6cxzcCGUCF-lddnV" width="300"/>
</div>

The figure above illustrates the simulation world, where the "robot" (also referred to as "agent") is represented by a green dot, and the goal location is marked by a red square. The agent's objective is to navigate to this goal location, avoiding any obstacles (depicted as black boxes) along the way.

To navigate to the goal location, the agent will learn appropriate behaviors by imitating demonstrations from an expert user. These demonstrations have been collected in advance by a human controlling the agent via a keyboard. These demonstrations will be provided to you as training data.

For this project, we explicitly prohibit the use of Deep Learning and Reinforcement Learning techniques. Instead, we will focus on "traditional" supervised learning methods. In future projects, where we will employ DL and RL, we will have the opportunity to understand and appreciate the significant advantages they offer over traditional methods.

You should use the scikit-learn library to implement learning algorithms in this project. Comprehensive documentation on its general usage and individual functions can be found on the [scikit-learn page](https://scikit-learn.org/stable/).

This project has 3 parts. The instructions for each part are detailed below.

# **Project Setup (do NOT change)**


***IMPORTANT:***
- Do NOT change this "*Project Setup*" section
- Do NOT install any other dependencies or a different version of an already provided package. You may, however, import other packages. Note that scikit-learn is already installed in Colab
- Your code should go under the subsequent sections with headings "*Part 1*", "*Part 2*", and "*Part 3*"
- The "*Testing*" section allows you to test your code's performance using an autograder
- You may find it useful to minimize sections using the arrows located to the left of each section heading

You will be accessing data files located in a class github repo. The following cell clones the repo into the working directory

In [None]:
# do NOT change

# This cell should take less than a minute to run.
# After running this cell, the folder 'robot-learning-S2024' will show up in the file explorer on the left
# Click on the folder icon if it's not open. Refresh the 'File' page if you still don't see any new files
!git clone https://github.com/roamlab/robot-learning-S2024.git

Cloning into 'robot-learning-S2024'...
remote: Enumerating objects: 48, done.[K
remote: Counting objects: 100% (48/48), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 48 (delta 6), reused 41 (delta 4), pack-reused 0[K
Receiving objects: 100% (48/48), 604.10 KiB | 16.78 MiB/s, done.
Resolving deltas: 100% (6/6), done.


In [None]:
# do NOT change

# Copy the files needed for project 1 into the current working directory. This is simply to make accessing files easier
!cp -av /content/robot-learning-S2024/project1/* /content/

'/content/robot-learning-S2024/project1/data' -> '/content/data'
'/content/robot-learning-S2024/project1/data/bc_with_gtpos_data.pkl' -> '/content/data/bc_with_gtpos_data.pkl'
'/content/robot-learning-S2024/project1/data/reg_test_data.pkl' -> '/content/data/reg_test_data.pkl'
'/content/robot-learning-S2024/project1/data/bc_data.pkl' -> '/content/data/bc_data.pkl'
'/content/robot-learning-S2024/project1/data/regression_data.pkl' -> '/content/data/regression_data.pkl'
'/content/robot-learning-S2024/project1/data_utils.py' -> '/content/data_utils.py'
'/content/robot-learning-S2024/project1/mjcf' -> '/content/mjcf'
'/content/robot-learning-S2024/project1/mjcf/common' -> '/content/mjcf/common'
'/content/robot-learning-S2024/project1/mjcf/common/materials.xml' -> '/content/mjcf/common/materials.xml'
'/content/robot-learning-S2024/project1/mjcf/common/skybox.xml' -> '/content/mjcf/common/skybox.xml'
'/content/robot-learning-S2024/project1/mjcf/common/visual.xml' -> '/content/mjcf/common/visua

In [None]:
# do NOT change

# Installing pybullet, the physics engine that we will use for simulation
!pip install pybullet

Collecting pybullet
  Downloading pybullet-3.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (103.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.2/103.2 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pybullet
Successfully installed pybullet-3.2.6


In [None]:
# do NOT change

# Installing numpngw for visualization
!pip3 install numpngw

Collecting numpngw
  Downloading numpngw-0.1.3-py3-none-any.whl (21 kB)
Installing collected packages: numpngw
Successfully installed numpngw-0.1.3


# Part 1. Inferring the Position of an Agent with RGB Images

<div>
<img src="https://drive.google.com/uc?id=1Cn2sAcz0sOXX5x1dvRCEtKCL5yJDYkKS" width="300"/>
</div>


Your first task is learning to predict the agent's location within the maze using RGB image observations, such as the one shown above. Each such observation is an RGB image with dimensions $[64, 64]$ for each color channel, resulting in an overall shape of $[64, 64, 3]$ per observation.

The maze has its own coordinate system, which you will use to express the agent's location. You will be provided with RGB image observations from this environment, along with the corresponding ground truth location of the agent, expressed in the maze's coordinate system.

The task is to develop a model capable of predicting the agent's location based on these RGB observations. Note that this can be seen as a regression problem (if the location of the agent is a continuous variable) or a classification problem (if we discretize the output space to a finite number of possible locations).

In this part, you will need to implement the class *PositionRegressor*. Your class will contain two methods:
- *train()*: trains a position regressor using the given data
- *predict()*: predicts the agent's locations given a batch of observations

We will test the performance of your model in this part using the Mean Square Error (MSE) between the predicted positions and the actual (ground truth) positions. We will evaluate your implementation on both the training data (which your model will be trained on) and additional testing data that is held out. Your score will be $$\text{score} = 1 - MSE$$ and then clipped between 0 and 1.

Please implement your solution below by completing the two methods for the *PositionRegressor* class. Note that the actual training and prediction occur in the *Testing* section, where our scoring code loads the data from a file and calls your functions, passing them the appropriate arguments. In *Part 1* (and the subsequent *Part 2* & *Part 3*), you are only required to complete the methods. You do not need to load data and perform training & prediction.

We have provided dummy solutions for all three parts of this assignment. This ensures that the scoring function in the *Testing* section can be executed successfully, even if you have completed only a portion of this assignment. If you would like to test your *train()* and *predict()* methods while you're working on it, simply run the code for all three parts, and run the *score_all_parts()* function in the *Testing* section. This will run your *train()* and *predict()* methods

In [None]:
# Implement Part 1 Below
from sklearn.linear_model import LinearRegression
import numpy as np
import pickle
import pprint
# with open('/content/data/regression_data.pkl', 'rb') as f:
#     data = pickle.load(f)
# for i,j in data.items():
#   print(j)

# obs
# actions
# info
# dones
# pprint.pprint(data.items())

# print(len(data['obs'][0]))
# for i in data['obs'][0]:
#   print(i)
# print(data['info'])
# for i in data['info']:
#   print(i)
# pictures = np.array((i['obs'] for i in data))
# pictures = np.array(  [i for i in data['obs']]  )
class PositionRegressor():
    #  [64,64,3]  per observation.
    def __init__(self):
        # Initialize the model variable
        self.model = None

    def train(self, data):
        """
        A method that trains a regressor using the given data

        Args:
            data: a dictionary that contains images and the corresponding ground truth location of an agent.
        Returns:
            Nothing
        """
        # TODO

        # Starter code below for visualizing the dataset
        # You should delete them after completing this part
        # for key, val in data.items():
        #     print(key, val)

        # Dummy solution below (delete after completion)



        #for picture
        pictures = np.array(  [i for i in data['obs']]  )
        # print(pictures.shape)
        #for ground truth location of an agent
        ground_truth = np.array( [i['agent_pos'] for i in data['info']] )
        # print(ground_truth.shape)
        #reshape the picture data
        # print(len(data['obs']))

        #flatten the data, turn 64 * 64 *3
        pictures_reshaped = pictures.reshape(len(data['obs']),-1)
        self.model = LinearRegression()
        self.model.fit(pictures_reshaped,ground_truth)
        # print("Using dummy solution for PositionRegressor")
        print("Running Part 1")

        pass

    def predict(self, Xs):
        """
        A method that predicts y's given a batch of X's

        Args:
            Xs: a batch of data (in this project, it is in the shape [batch_size, 64, 64, 3])

        Returns:
            The predicted locations (y's) of the agent from your trained model. Note that
            this method expects batched inputs and returns batched outputs
        """
        # TODO
        #flatten the input
        Xs_reshaped = Xs.reshape(Xs.shape[0], -1)
        predictions = self.model.predict(Xs_reshaped)
        # print(predictions)
        # print(predictions.shape)
        return predictions
        # Dummy solution below (delete after completion)
        # return np.zeros((Xs.shape[0], 2))

# Part 2. Behavioral Cloning with Low Dimensional Data

In *Part 2*, your task is to develop a model that decides the agent's next action based on environmental observations. The agent has three possible actions: moving up, left, or right. The objective is to navigate the agent towards the goal square, which is marked in red in the figures provided above.

Note that, in general terms, what you are providing here is a "policy" - a model that selects an action based on observations from the world. There are various methods for training such policies, and we will explore many of these techniques in the Reinforcement Learning section of the course.

It is important to note that learning a policy can also be approached as a Supervised Learning problem. In this scenario, you will receive labeled examples from an "expert". Each example will include a tuple in the form of $(o, a)_i$, where $o$ denotes an observation and $a$ indicates the action taken by the expert in response to that observation. You must simply learn to imitate the expert, a process also known as behavioral cloning. If the action space is discrete, behavioral cloning becomes a classification problem; if it's continuous, it turns into a regression problem. We will be working on an environment that has a discrete action space. Consequently, we can treat behavioral cloning as a classification problem with three output classes: go up, go left, and go right.

In *Part 2*, the observation will be the agent's ground truth position within the maze's coordinate system. The training data will consist of tuples $(o, a)_i$  where $o$ represents the agent's location in the maze, and $a$ is the action taken by the expert at that location. You may use any classification method from Scikit-learn to learn the mapping from these observations to the corresponding actions.

You will need to implement the class *POSBCRobot()*. The methods to implement are documented below. We will evaluate your model by having the robot execute the commands generated by your policy, or in other words, by "rolling out your policy" in the environment. After 20 steps, we will compute how close the robot gets to the goal. Formally, the score for a single run will be calculated based on the minimum distance between your agent and the target location achieved over a trajectory of 100 steps. We will run your agent 20 times in the environment and use the following formula to calculate your score:

$$\text{score} = \frac{(∑^{20}_n(\text{init_dist - min_dist}_n))/20}{\text{init_dist}}$$

Essentially, you will be scored on the average performance across all 20 runs, meaning consistency is an important part of performance.

Similar to *Part 1*, you are only required to complete the methods. Our scoring function will load the data and test your implementations.

In [None]:
# Implement Part 2 Below

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import RidgeClassifier
from sklearn.ensemble import BaggingClassifier
with open('/content/data/regression_data.pkl', 'rb') as f:
    data = pickle.load(f)

# pictures_reshaped = pictures.reshape(len(data['obs']),-1)
# print(pictures_reshaped.shape)
# for k,v in data.items():
#   print(k,v)
# n = len(data['obs'])
# position = np.array(  [i['agent_pos'] for i in data['info']]  )
# pictures_reshaped = pictures.reshape(n,-1)
# actions = np.array( [i for i in data['actions']] )
# X_train, X_test, y_train, y_test = train_test_split(position, actions, test_size=0.2, random_state=42)
# print(position.shape)
# print(actions)

class POSBCRobot():
    def __init__(self):
        # Initialize the model variable
        self.model = None
    def train(self, data):


        """
        A method for training a policy.

        Args:
            data: a dictionary that contains X (observations) and y (actions).

        Returns:
            This method does not return anything. It only need to update the
            property of a RobotPolicy instance.
        """
        # for k,v in data.items():
        #   print(k,v)
        # TODO
        n = len(data['obs'])
        #for ground truth Features (500, 2)
        position = np.array(  [i for i in data['obs']])

        #for actions Labels (500, 1)
        actions = np.array( [i for i in data['actions']] ).ravel() #(500, ) after ravel()
        X_train, X_test, y_train, y_test = train_test_split(position, actions, test_size=0.5, random_state=42)
        # self.model = KNeighborsClassifier(n_neighbors=5, algorithm='auto',n_jobs = -1)
        # self.model = DecisionTreeClassifier(max_depth=10, random_state=42)
        # self.model = LogisticRegression(random_state=42, max_iter=1000)  # 3/5

        # self.model = RidgeClassifier(random_state=1)  # it worked, but it stucked.


        # X_train, X_test, y_train, y_test = train_test_split(position, actions, test_size=0.5, random_state=42) 5/5
        # self.model = SVC(random_state=42)
        # self.model.fit(X_train, y_train)

        self.model = SVC(random_state=42,C=1.1)
        self.model.fit(X_train, y_train)

        # self.model.fit(position, actions)
        print("Using dummy solution for POSBCRobot")
        # Dummy solution below (delete after completion)
        pass

    def get_actions(self, observations):
        """
        A method for getting actions. You can do data preprocessing and feed
        forward of your trained model here.

        Args:
            observations: a batch of observations (images or vectors)

        Returns:
            A batch of actions with the same batch size as observations.
        """
# [[ 0.04999    -0.40000001]]
# [[ 0.04999    -0.40000001]]
# [[ 0.04999    -0.40000001]]
# [[ 0.04999    -0.40000001]]
        predictions = self.model.predict(observations)

        # print(predictions.shape)

        # TODO
        return predictions
        # Dummy solution below (delete after completion)
        # return np.zeros(observations.shape[0])


# Part 3. Behavioral cloning with visual observations

In *Part 3*, you are asked to do a similar task as *Part 2*, but the observations will be a lot more challenging to use. Rather than using the actual (ground truth) location of the agent (robot), your model will receive as input RGB image observations of the world, similar to the ones you used to perform localization in *Part 1*.

You will need to implement the class RGBBCRobot(). All requirements for your code, as well as the evaluation method, remain unchanged from *Part 2*. The only difference is the nature of the observation that is provided to you. Once again, you are only required to complete the methods. Our scoring function will load the data and test your implementations.

In [None]:
# Implement Part 3 Below

import numpy as np
import pickle
import pprint

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import RidgeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.decomposition import PCA, KernelPCA

with open('/content/data/bc_data.pkl', 'rb') as f:
    data = pickle.load(f)
# print(data)
from datetime import datetime
current_time = datetime.now()

# print(data)

class RGBBCRobot():
    def __init__(self):
        # Initialize the model variable
        self.model = None
        self.pca = PCA(n_components=140)
    def train(self, data):
        """
        A method for training a policy.
        # for i in data['obs'][0]:
        Args:
            data: a dictionary that contains X (observations RGB) and y (actions).

        Returns:
            This method does not return anything. It will just need to update the
            property of a RobotPolicy instance.
        """

        # TODO
        RGB_reshaped = np.array(  [i for i in data['obs']]  ).reshape(len(data['obs']), -1)
        actions_flatten = np.array( [i for i in data['actions']] ).ravel()
        features_pca = self.pca.fit_transform(RGB_reshaped)

        X_train, X_test, y_train, y_test = train_test_split(features_pca, actions_flatten,  test_size=0.9, random_state=42)#only 42 works

        self.model = RandomForestClassifier(max_depth=13,n_estimators=100, random_state=5,bootstrap = True,n_jobs=-1,min_impurity_decrease=0.1113)#0.1113

        self.model.fit(X_test, y_test)
        # Dummy solution below (delete after completion)
        print("Using dummy solution for RGBBCRobot")


        # current_time = datetime.now()
        # print("Current date and time:", current_time)
        pass

    def get_actions(self, observations):
        """
        A method for getting actions. You can do data preprocessing and feed
        forward of your trained model here.

        Args:
            observations: a batch of observations (images or vectors)

        Returns:
            A batch of actions with the same batch size as observations.
        """
        # TODO

        obs_reshaped = observations.reshape(observations.shape[0], -1)
        obs_pca = self.pca.transform(obs_reshaped)

        predicted_action = self.model.predict(obs_pca)
        # print(predicted_action)
        # return [0]
        # Dummy solution below (delete after completion)
        return predicted_action
        # return np.zeros(observations.shape[0])


# print(score_img_bc(RGBBCRobot(),gui_enable=gui))
# Image(filename='rgb_bc_anim.png', width=400, height=400) #need to remove






# **Testing**

We will use the cells provided below to automatically generate your score for this project. To assess your progress, simply execute these cells.

If you wish to visualize your policy, set gui_enable to True. Doing so will create an animated .png file, which you can view using the cell at the end of the notebook. Please note that enabling this visualization may result in longer runtime.


**Grading Rubrics**

You are graded based on the scores you achieved for each part. Each part is 5 points and the final grade you get for this project is the sum of all points from three parts (thus, 15 maximum in total)

**Part 1**

- score >= 0.99, you get 5/5
- score >= 0.95, you get 4/5
- score >= 0.80, you get 2/5

**Part 2**

- score >= 0.99, you get 5/5
- score >= 0.80, you get 3/5

**Part 3**

- score >= 0.99, you get 5/5
- score >= 0.90, you get 4/5
- score >= 0.80, you get 3/5
- score >= 0.60. you get 2/5

### Turn GUI on/off (you may change) -- **please set to False before submission**

In [None]:
# Enabling the gui saves animated pngs to the working directory
# You can view the pngs using the cell at the bottom of the notebook
# Code runs slightly slower when gui is enabled, as pngs need to be generated
# Use the gui to debug if you're not sure where it's getting stuck
# Or just to see a succesful visualization once you have it working!

# gui = False
gui = False

### Score Policy (do NOT change)

In [None]:
# do NOT change

# Our code that evaluates your implementations on all three parts
from score_policy import *
score_all_parts(POSBCRobot(), RGBBCRobot(), PositionRegressor(), gui_enable=gui)

Using dummy solution for POSBCRobot
Using dummy solution for RGBBCRobot
Running Part 1



--------SCORES--------
Position regression: 5/5
BC with positions: 5/5
BC with rgb images: 5/5

Final score: 15/15
----------------------


### Show GUI (optional, you may change)

In [None]:
from IPython.display import Image
#Image(filename='pos_bc_anim.png', width=200, height=200)
#Image(filename='rgb_bc_anim.png', width=400, height=400)

  and should_run_async(code)
