<a href="https://colab.research.google.com/github/Ketian-Wang/RobotLearning/blob/main/RobotLearning_proj_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This project aims to demonstrate how classical machine learning methods can be used in robotics setting. This project will be working on a navigation agent that navigates inside a simple 2D maze.

<div>
<img src="https://drive.google.com/uc?id=1mSpegY1psdek3Lgh6cxzcCGUCF-lddnV" width="300"/>
</div>


The image above shows the simulation world. The "robot" (also called "agent") is shown by the green dot. The goal location is shown by the red square. The aim of the agent is to navigate to the goal.

The ultimate goal in this project is to learn an appropriate behavior for the agent by imitating demonstrations from an expert user. These demonstrations have been collected by a human controlling the agent via a keyboard, and will be provided to you as training data.

This project has 3 parts. The instructions for each part are below.

# Part 0. Project Setup


In [None]:
!git clone https://github.com/roamlab/robot-learning-S2023.git
!cp -av /content/robot-learning-S2023/project1/* /content/
!pip install pybullet

Cloning into 'robot-learning-S2023'...
remote: Enumerating objects: 44, done.[K
remote: Counting objects: 100% (44/44), done.[K
remote: Compressing objects: 100% (32/32), done.[K
remote: Total 44 (delta 12), reused 39 (delta 10), pack-reused 0[K
Unpacking objects: 100% (44/44), 129.23 KiB | 283.00 KiB/s, done.


## Part 1. Inferring the position of an agent with RGB images

<div>
<img src="https://drive.google.com/uc?id=1Cn2sAcz0sOXX5x1dvRCEtKCL5yJDYkKS" width="300"/>
</div>

The first task is to learn to infer where the agent is inside the maze based on RGB image observations like the one shown above. Each such observation will consist of an RGB image of size [64, 64] for each color channel, so the total size of each observation is [64, 64, 3].

The maze has its own coordinate system, in which the agent's location must be expressed. RGB image observations is provided in this environment, as well as the groundtruth location of the agent in each image, expressed in the maze coordinate system. The task is to learn a model that can predict the location of an agent given this RGB observation.

In [None]:
import numpy as np
# additional packages
import pickle
from sklearn.linear_model import LinearRegression as LR
import numpy as np
import random
import math
from sklearn.model_selection import KFold
import pandas as pd



class PositionRegressor():

    print('start')

    def train(self, data):
        """A method that trains a regressor with given data

           Args:
               data: a dictionary that contains images and the groundtruth location of
                     an agent.
           Returns:
               Nothing
        """

        # figure data
        imgs = data['obs']
        RBGarray = []
        for image in imgs:
            RBGarray.append(image.reshape(1, -1)[0])
        RBGarray = np.array(RBGarray)

        # ground truth
        info = data['info']
        agentPos = []
        for pos in info:
            agentPos.append(pos['agent_pos'])
        agentPos = np.array(agentPos)

        self.model = LR()
        self.model.fit(RBGarray,agentPos)

        print("Using dummy solution for PositionRegressor")
        pass

    def predict(self, Xs):
        """A method that predicts y's given a batch of X's

           Args:
               Xs: a batch of data (in this project, it is in shape [batch_size, 64, 64, 3])
           Returns:
               The fed-forward results (predicted y's) with a trained model.
        """
        xx = Xs.reshape(Xs.shape[0], -1)
        result = self.model.predict(xx)

        return result


start


## Part 2. Behavioral cloning with low dimensional data

In this part, your model is asked to determine what action the agent should take, based on an observation from its environment. The action can be one of three choices: go up, go left, or go right. The goal of the agent is to reach the goal squre, shown in red in the images above.

This project will be working on an environment with a discrete action space, so behavioral cloning can be seen as a classification problem with three output classes (go up, go left, go right). While the action space is the same in Parts II and III, the nature of the observation used in each case will be different.

In Part II, the observation will consist of the ground truth position of the agent in the maze coordinate system. Training data will thus contain tuples $(o, a)_i$  where $o$ is the agent's location in the maze, and $a$ is the action taken by the expert at that location.

In [None]:

import numpy as np

from sklearn.svm import SVC

import matplotlib.pyplot as plt


class POSBCRobot():
    

    def train(self, data):
        """A method for training a policy.

            Args:
                data: a dictionary that contains X (observations) and y (actions).
            
            Returns:
                This method does not return anything. It will just need to update the
                property of a RobotPolicy instance.
        """

        # for key, val in data.items():
            # print(key, val.shape)

        # data read
        agentPos = data['obs']
        agentAction = data['actions']
        agentAction = agentAction.reshape(-1)


        self.svc=SVC(kernel='poly',degree=1,coef0=0)

        self.svc.fit(agentPos, agentAction)


        print("Using dummy solution for POSBCRobot")
        pass

    def get_actions(self, observations):
        """A method for getting actions. You can do data preprocessing and feed
            forward of your trained model here.
            
            Args:
                observations: a batch of observations (images or vectors)
            
            Returns:
                A batch of actions with the same batch size as observations.
        """
        result = self.svc.predict(observations)

        return result


## Part 3. Behavioral cloning with visual observations

In this part the observations will be a lot more challenging to use. Instead of being provided with the actual robot location, the model will receive as input RGB image observations of the world.

In [None]:
import numpy as np

import matplotlib.pyplot as plt

from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import LinearSVC,SVC
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import KernelPCA


class RGBBCRobot():


    def train(self, data):
        """A method for training a policy.

            Args:
                data: a dictionary that contains X (observations) and y (actions).
            
            Returns:
                This method does not return anything. It will just need to update the
                property of a RobotPolicy instance.
        """

        # for key, val in data.items():
            # print(key, val.shape)

        imgs = data['obs']
        RGBarray = []
        for image in imgs:
            RGBarray.append(image.reshape(1, -1)[0])
        RGBarray = np.array(RGBarray)

        self.SScaler= StandardScaler()
        self.SScaler.fit(RGBarray)
        RGBarrayScaled = self.SScaler.transform(RGBarray)
        
        agentAction = data['actions']
        agentAction = agentAction.reshape(-1)

        self.KernelPCAmodel = KernelPCA(n_components=30, kernel='rbf')
        self.KernelPCAmodel.fit(RGBarrayScaled)
        RGBTransed = self.KernelPCAmodel.transform(RGBarrayScaled)
        # print(RGBTransed)
        # print(RGBTransed.shape)

        self.svc = SVC(kernel='poly',degree=2, C=10, coef0=0.5)
        self.svc.fit(RGBTransed, agentAction)


        print("Using dummy solution for RGBBCRobot")
        pass

    def get_actions(self, observations):
        """A method for getting actions. You can do data preprocessing and feed
            forward of your trained model here.
            
            Args:
                observations: a batch of observations (images or vectors)
            
            Returns:
                A batch of actions with the same batch size as observations.
        """
        obsScaler = self.SScaler.transform(observations)
        obsTransed = self.KernelPCAmodel.transform(obsScaler)
        resultSVC = self.svc.predict(obsTransed)

        return resultSVC

# Testing

A grader will be applied to generate the score for this project. 

**Grading Rubrics**

**Part 1**

- score >= 0.99, you get 5/5
- score >= 0.95, you get 4/5
- score >= 0.80, you get 2/5

**Part 2**

- score >= 0.99, you get 5/5
- score >= 0.80, you get 3/5

**Part 3**

- score >= 0.99, you get 5/5 
- score >= 0.90, you get 4/5 
- score >= 0.80, you get 3/5
- score >= 0.60. you get 2/5 

### Turn GUI on/off

In [None]:
gui = False

### Score Policy (do NOT change)

In [None]:
!pip3 install numpngw

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting numpngw
  Downloading numpngw-0.1.2-py3-none-any.whl (21 kB)
Installing collected packages: numpngw
Successfully installed numpngw-0.1.2


In [None]:
from score_policy import *
score_all_parts(POSBCRobot(), RGBBCRobot(), PositionRegressor(), gui_enable=gui)

Using dummy solution for POSBCRobot
Using dummy solution for RGBBCRobot
Using dummy solution for PositionRegressor



--------SCORES--------
Position regression: 5/5
BC with positions: 5/5
BC with rgb images: 5/5

Final score: 15/15
----------------------


### Show GUI

In [None]:
from IPython.display import Image
# Image(filename='pos_bc_anim.png', width=200, height=200)
# Image(filename='rgb_bc_anim.png', width=200, height=200)