# <center>A tutorial on 6D object pose estimation for AR application </center>


<div style="text-align: right"> Ikbeom Jeon (TA) (ikbeomjeon@kaist.ac.k) </div> 
<div style="text-align: right"> Vincent Lepetit (vincent.lepetit@enpc.fr) </div>  
<br>
<div style="text-align: right"> Last update: 16/07/2021</div>
    

**This document is read-only.**  **Please duplicate it to the current folder for you to edit,**  
(File-> Save as -> Type your unique name and save.)  
And please don't access documents created by others to prevent from losing the edits. 



### Description

In this tutorial, you can try to run the implemented code for state-of-the-art 6D object pose estimation method.

It consists of three steps.

First, predicting 2D bounding boxs and the initial 6D poses using "PoseCNN".  
Second, refinining initial object poses using "DeepIM" based network that proposed by "CosyPose".  
Third, apply the results to AR.  

There are severel exercises at each step. To solve these, you should look at the reference materials as well as example code in this document.

If you have any problems, please contact the TA. 


### Citation
We have referenced the original code needed for writing this tutorial here.
https://github.com/ylabbe/cosypose

If you use the code in your research, please cite the paper:

```
@inproceedings{labbe2020,
title= {CosyPose: Consistent multi-view multi-object 6D pose estimation}
author={Y. {Labbe} and J. {Carpentier} and M. {Aubry} and J. {Sivic}},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}}
```


## How to use this document.

This document is written using "Jupyter Notbook". In this document, you can run written code or create new one.

In [None]:
print("If you run this code, click this block(called 'cell') and then press the \"Shift + Enter\" ")

In [None]:
print("If '*' is displayed on left side like 'In[*]:', you should wait for it to complete.") 
print("Or, click the stop button on menu.")
import time
time.sleep(10) # waiting for 10 sec.
print("Done.")    

In [None]:
print("If you want to save changes, press the \"Ctrl+ S\" ")

In [None]:
print("You should run all cells in order.")

## 0. Load required packages
It is required packages for exercises. 

In [None]:
## It is required packages for exercises.
## you sho
%load_ext autoreload
%autoreload 2

import os
import pickle as pkl 
import numpy as np
import json
import pandas as pd
import torch
import torch.multiprocessing

from bokeh.io import output_notebook, show; output_notebook()
from bokeh.plotting import gridplot

from cosypose.lib3d import Transform
from cosypose.config import EXP_DIR, MEMORY, RESULTS_DIR, LOCAL_DATA_DIR
from cosypose.datasets.datasets_cfg import make_scene_dataset, make_object_dataset
from cosypose.rendering.bullet_scene_renderer import BulletSceneRenderer
from cosypose.visualization.plotter import Plotter
from cosypose.visualization.singleview import make_singleview_prediction_plots, filter_predictions
from cosypose.visualization.singleview import filter_predictions
from cosypose.lib3d.rigid_mesh_database import MeshDataBase
from cosypose.rendering.bullet_batch_renderer import BulletBatchRenderer
from cosypose.training.pose_models_cfg import create_model_refiner, create_model_coarse, check_update_config
from cosypose.models.efficientnet import EfficientNet
from cosypose.models.wide_resnet import WideResNet18, WideResNet34
from cosypose.models.flownet import flownet_pretrained

from cosypose.scripts.run_cosypose_eval import load_posecnn_results
from cosypose.visualization.multiview import render_predictions_wrt_camera
from cosypose.scripts.run_cosypose_eval import load_posecnn_results

# Pose models 
from cosypose.models.pose import PosePredictor
import yaml
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)


import cosypose.utils.tensor_collection as tc
import random
os.environ['CUDA_VISIBLE_DEVICES'] = str(random.randrange(0,4))
print(f"CUDA_VISIBLE_DEVICES : {os.environ['CUDA_VISIBLE_DEVICES']}")

## 1. Load and visualize dataset 

In [None]:

dataset_name, urdf_dataset_name = 'ycbv.test.keyframes', 'ycbv'

## Load dataset
scene_dataset = make_scene_dataset(dataset_name) # Load all frames.

## Get sample from dataset
scene_id = 51
idx = 50

mask = scene_dataset.frame_index['scene_id'] == scene_id
scene_dataset.frame_index = scene_dataset.frame_index[mask].reset_index(drop=True)

input_rgb, _, state = scene_dataset[idx] # Get first sample.

view_id = state['frame_info']['view_id']
objects = state['objects']  # groundtruths of objects.
cameras = [state['camera']] # groundtruths of cameras.

print('The first object name and pose :')
print(objects[0]['name'])
print(objects[0]['TWO'])
#print(objects[0][0]['TWO'])

print('The extrinsic parameter (i.e. 6D pose) of the camera. :')
print(cameras[0]['TWC'])



renderer = BulletSceneRenderer(urdf_dataset_name) # Create renderer.
objects_rgb = renderer.render_scene(objects, cameras)[0]['rgb'] # Render the scene using object and camera poses.
renderer.disconnect() #disconnect renderer.


## Plotting images.
plotter = Plotter()
fig_input_rgb = plotter.plot_image(input_rgb)
fig_objects_rgb = plotter.plot_image(objects_rgb)
fig_overlay = plotter.plot_overlay(input_rgb, objects_rgb)

show(gridplot([[fig_input_rgb, fig_objects_rgb, fig_overlay]], sizing_mode='scale_width'))

### Exercise 1.1
Plot 10th sample of 'scene_id : 51'. 

You can see other available dataset here.  
http://143.248.249.6:9000/tree/local_data/bop_datasets/ycbv/test


### Exercise 1.2
Print the names of symetric obejcts in the scene.

You can see the properties of object here.  
http://143.248.249.6:9000/tree/cosypose/datasets/bop_object_datasets.py

### Exercise 1.3
Render the scene after translating the camera by (10, -20, 40) and rotating (30, 10, 40) as euler angles.

You can see how to tranform the camera pose here.  
4-8 sildes in https://vincentlepetit.github.io/files/pdfs/kaist3_lepetit.pdf

## 2. Load results of 2D detection and initial 6D pose estimation from PoseCNN

In [None]:
detections = load_posecnn_results()

# We can consider object cadidates using its score.
mask = (detections.infos['score'] >= 0.0)
detections = detections[np.where(mask)[0]]

# Find the result corresponding to the input image.
det_index = detections.infos['scene_id'] == scene_id
detections = detections[np.where(det_index)]
det_index = detections.infos['view_id'] == view_id # Use 'view_id' instead of 'idx' to find the sample. 
detections = detections[np.where(det_index)]

print(detections.poses[0])
print(detections.bboxes[0])

colors = ['yellow' for _ in range(len(detections.poses))]
detections.infos['color'] = colors

# rendering the 3D scene using result of posecnn.
renderer = BulletSceneRenderer(urdf_dataset_name) 
rendered_pose_coarse = render_predictions_wrt_camera(renderer, detections, state['camera'])
renderer.disconnect()

fig_detections = fig_input_rgb
fig_detections_with_input_rgb = plotter.plot_maskrcnn_bboxes(fig_detections, detections) #Draw 2D bbox in input image
fig_rendered_pose_coarse = plotter.plot_overlay(input_rgb, rendered_pose_coarse)

show(gridplot([[fig_detections_with_input_rgb, fig_rendered_pose_coarse]], sizing_mode='scale_width'))


### Exercise 2.1
What are the input and output of PoseCNN?

You can see the parameters in here  
line: 70- in http://143.248.249.6:9000/edit/cosypose/scripts/run_cosypose_eval.py

If you want to know it more detail, this post will help you.  
(Korean) https://juseong-tech.tistory.com/7  
(paper) https://arxiv.org/pdf/1711.00199.pdf  

### Exercise 2.2
Evaluate the result using ADD and ADD-S metric, and describe why the ADD-S metric was devised.

<img src = "./imgs/add.png" width="60%">
<img src = "./imgs/add_s.png" width="60%">


You can see the implementation here.  
http://143.248.249.6:9000/edit/cosypose/evaluation/meters/pose_meters.py


You can see how to use it here.  
line: 203-208 in http://143.248.249.6:9000/edit/cosypose/scripts/run_cosypose_eval.py



### Exercise 2.3
Find failure case by changing samples and state your opinion in which cases it was faild.


## 3. Get better result using refinement network

In [None]:
result_id = 'ycbv-n_views=1--5154971130'
pred_key = 'posecnn_init/refiner/iteration=2'

results = LOCAL_DATA_DIR / 'results' / result_id / 'results.pth.tar'
results = torch.load(results)['predictions']

results[pred_key].infos.loc[:, ['scene_id', 'view_id']].groupby('scene_id').first()

this_preds = filter_predictions(results[pred_key], scene_id, view_id)
renderer = BulletSceneRenderer(urdf_dataset_name)
figures = make_singleview_prediction_plots(scene_dataset, renderer, this_preds)
renderer.disconnect()

show(gridplot([[fig_input_rgb, fig_rendered_pose_coarse, figures['pred_overlay'] ]], sizing_mode='scale_width'))

### Exercise 3.1
What are the input and output of renfinement network?

You can see the implementation here.  
line : 76- in http://143.248.249.6:9000/edit/cosypose/integrated/pose_predictor.py


You can see how to use it here.  
line: 131-135 in http://143.248.249.6:9000/edit/cosypose/evaluation/pred_runner/multiview_predictions.py


### Exercise 3.2
Why did this network use renderer? How is the loss function defined?

You can see the basic theory of refinement network here.  
slide : 25 in https://vincentlepetit.github.io/files/pdfs/kaist3_lepetit.pdf

If you can see more details, please refer to paper.  
https://arxiv.org/pdf/1804.00175.pdf

### Exercise 3.3
What is the 'backbone' of this network?  what is the role of that?

You can see information of the models here.  
http://143.248.249.6:9000/tree/local_data/experiments

And you can see how to initialize the network with it.  
http://143.248.249.6:9000/tree/cosypose/training/pose_models_cfg.py

## 4. Application for AR

In this step, we assumed that images are input in real time, and the processing time of estimation and rendering are fast enough. 

In order to play video in this document, it is required to record all result images in advance. So it takes quite a bit of time.

In [None]:
%%capture
%matplotlib inline

try:
    from PIL import Image
except ImportError:
    import Image

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["animation.html"] = "jshtml"
import matplotlib.animation as animation

fig, ax = plt.subplots()

imgs = []

renderer = BulletSceneRenderer('ycbv')  # Create renderer.

for sid, sample in enumerate(scene_dataset):
    
    ## skip some frames due to precessing time :)
    if sid % 5 != 0: 
        continue; 
        
        
    input_img, _, state = sample

    camera = state['camera']
    cameras = [camera]
    objects = state['objects']

    target_object = dict(name=objects[0]['name'],
                         color='white',
                         TWO=objects[0]['TWO'])

    list_objects = [target_object]

    # target_objects_pose = target_objects['TWO']
    # fig_overlay = plotter.plot_overlay(input_img, rendered_img)

    rendered_img = renderer.render_scene(list_objects, cameras)[0]['rgb']

    pil_input_img = Image.fromarray(input_img.numpy())
    pil_rendered_img = Image.fromarray(rendered_img)

    pil_blend_img = Image.blend(pil_input_img, pil_rendered_img, 0.5)
    result_img = np.array(pil_blend_img)

    ax_img = ax.imshow(result_img)

    imgs.append([ax_img])
    

renderer.disconnect()
ani = animation.ArtistAnimation(fig, imgs, interval=33, blit=True,repeat_delay=1000)
print('done')

In [None]:
## You should call this for playing recored images.
ani

###  Execise 4.1 
Fill the area of 'cup' with 'yellow' for highlighting that.

###  Execise 4.2 
If you want augment your own 3D model or text, please refer these.  

http://143.248.249.6:9000/edit/cosypose/recording/bop_recording_scene.py

(package they used )https://pybullet.org/wordpress/


### Exercise 4.3
What are the problems of the 6D pose estimation method for augmented reality?

Keyword : processing time, occlusion, depth aware, accuracy, etc.