### Universal Manipulation Interface
- [Website](https://umi-gripper.github.io/)
- [Paper](https://arxiv.org/pdf/2402.10329.pdf)

### How it works

#### Overall Problem with Data Collection
- Teleportation systems are expensive to setup and require expoert and precise operation. 
- Humand video is too messy and requires too large of an inference gap from video to robot embodiement (this can be solved)

#### Problem with Gripper Collection
- Motion blur with reduces action precision
- Too small of a context window around env
- Latency discrepancy: none during collection but some during inference

#### Failure Cases
- Object not in FOV
- Object exceeded joint limit
- Absolute action
- Not latency matching
- Incorrect classification 

#### Setup
- GoPro
- IMU sensor (using that in GoPro)
- Gripper
- Side Mirrors (adds a sense of depth)

#### New Terms
- Monocular structure-from-motion (SfM) -> recovers robot actions
- Visuomotor Policy
- Prehensile (Grip/Grab) and non-prehensile (Push/Move)
- Proprioception (your body's ability to sense movement, action, and location). Tells your body where you are in space and time. Example: close your eyes and touch your index finger to your nose

#### New Insights
- Take observations (RGB images, dof, dimensions of controller (gripper and product a sequence of actions that acheive the set out goal)


### DexCap
- [Website](https://dex-cap.github.io/)
- [Paper](https://arxiv.org/pdf/2403.07788.pdf)
- [Code](https://github.com/j96w/DexCap)

#### Problem
- How to scale up training data for imitation learning? Extremely timely and costly
- Videos of humans preforming tasks in in 2D, which fails to caputre 3D environments, thus requiring additional data to bridge the gap

#### Solution
- Capture detailed finger motion
- Accurate 6-DoF wrist post estimation
- Aligned 3D observations from recording with coordinate frame of hands
- Protability and ease of use for data collection

#### Setup
- Intel Realsense T265 camera (fisheye of course) and L515 RGB-D LiDAR camera
- IMU sensor
- Electromagnetic conductive material
- SLAM algo

#### New Terms
- In-hand  object re-orientation
- Mocap (motion capture)
- Electromagnetic Gloves (conductive material that measures different electricial potentials for example the change in potentials on the skin)
- RGB-D (red, green, blue, depth; lidar works by shining a laser extremely fast around an environment, it then measures the time the laser takes to travel to and from an object in the environment)

### Human Pose Estimatation: A Survey
- [Paper](https://arxiv.org/pdf/2308.13872.pdf)


#### Problems
- There is a vast amount of different poses which depend on the shape and body of the person, what they are wearing, lighting, and environment, etc.
- Occlusion which occurs when an object is blocked by something else. For example if I have a camera and want to predict the landmarks of my finger, yet there there is a cup in front of a few of my fingers, then it will be extremely hard to predict the landmarks for the occluded fingers

#### Reading Insights
- Method for HPE using deep learning include using the following: CNNs, RNN, GCNs.
- CNNs are made up of two parts; 1. using some pretrained model such as Alexnet or Resnet on a large dataset such as Imagenet, and then ontop of this classification layer adding HPE pretrained models such a Hourglass, Cascaded Pyramid Network (CPN), or HRNet.
- Prediction heads estimate human poses, two main methods are used: direcly predict joint coordinates using some regression method, or apply a heatmap over the classified image and then apply some regression method
- Graph Convolutional Network (GCN) takes the graph of a pose as the input instead of an image as the pose of a humand can be represetned as a graph, or the magnitudes between different landmarks. It would then be intersting to understand how you can estimate a pose based on its temporal information when preforminga task such as lifting a weight. This is quite interesting because you are understanding a pose not based on how it looks visually, but based of the coordinate spacing of all the predefined landmarks, thus the definition of a pose is something that is much different than the way we understand a pose.



#### New Terms
- Bibliometrics (the statistical analysis of of books, articles, and other resources)
- Occlusion (when something is blocked or hidden from the prominent view)
- Temporal Information (describes how something behaves or evolves over time, eg. signal processing)
- ∀ ("for all" or "for every", is used to indicate that a particular statement holds true for given set or domain)
- HPE Datasets (COCO, MPII, CrowdPose, and PoseTrack)