perception

History

Name		Name	Last commit message	Last commit date
parent directory ..
GeoWizard @ 5b25910		GeoWizard @ 5b25910
Grounded-Segment-Anything @ 753dd66		Grounded-Segment-Anything @ 753dd66
Inpaint-Anything @ 5bfa9f3		Inpaint-Anything @ 5bfa9f3
gpt		gpt
README.md		README.md
gpt_physic.py		gpt_physic.py
gpt_ram.py		gpt_ram.py
run_albedo_shading.py		run_albedo_shading.py
run_depth_normal.py		run_depth_normal.py
run_fg_bg.py		run_fg_bg.py
run_gsam.py		run_gsam.py
run_inpaint.py		run_inpaint.py

README.md

Perception

Segmentation

GPT-based Recognize Anything

We use GPT-4V to recognize objects in the image as input prompt for Grounded-Segment-Anything. Alternatively, you could use any other object recognition model (e.g. RAM) to get the objects in the given image.
We put pre-computed GPT-4V result under each data/${name}/obj_movable.json. You could skip below and run segmentation if you don't want to re-run GPT-4V.
Copy the OpenAI API key into gpt/gpt_configs/my_apikey.
Install requirements
```
pip install inflect openai==0.28
```

Run GPT-4V based RAM

python gpt_ram.py --img_path ../data/${name}

The default save_path is saved under same folder as input ../data/${name}/intermediate/obj_movable.json. The output is a list in json format.
```
[
   {"obj_1": True # True if the object is movable or False if not},
   {"obj_2": True},
     ...
]
```

Grounded-Segment-Anything

We use Grounded-Segment-Anything to segment the input image given the prompts input. We use the earlier checkout version for the paper. You could adapt the code to the latest version of Grounded-SAM or Grounded-SAM-2.

Follow the Grounded-SAM setup

cd Grounded-Segment-Anything/
git checkout 753dd6675ea7935af401a983e88d159629ad4d5b

# Follow Grounded-SAM readme to install requirements

# Download pretrained weights to current folder
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Segmentation requires the input image input and a prompt prompts_path for each object in the image. The default prompt path is ../data/${name}/intermediate/obj_movable.json.
```
python run_gsam.py --input ../data/${name}
```

The default output is saved under the same folder as input ../data/${name} and visualizations under ../data/${name}/intermediate as follows:

image folder/  
    ├── intermediate/
        ├── mask.png # FG and BG segmentation mask
        ├── mask.json # segmentation id and and object name, movability
        ├── vis_mask.jpg # segmentation visualization

Pool	Domino	Pig Ball	Balls

Depth and Normal Estimation

We use GeoWizard to estimate depth and normal of input image. Follow GeoWizard setup to install requirements. Recommend to create a new conda environment.

Run GeoWizard on input image

python run_depth_normal.py --input ../data/${name} --output ../outputs/${name} --vis

Depth and normal are saved in outputs/${name}. Visualizations are saved in outputs/${name}/intermediate.

image folder/ 
    ├──depth.npy
    ├── normal.npy
    ├── intermediate/
        ├── depth_vis.png
        ├── normal_vis.png

Input	Normal	Depth

Foreground / Background & Edge Detection

We separate the foreground and background using the segmentation mask. Foreground objects with complete masks are used for physics reasoning and simulation, while truncated objects are treated as static. We use edges from static objects and the background as physical boundaries for simulation.

Module requires the following input for each image:

image folder/ 
    ├── depth.npy
    ├──normal.npy
    ├── original.png # optional:for visualization only
    ├── intermediate/
        ├── mask.png # complete image segmentation mask
        ├── mask.json # segmentation id and and object name, movability

Run foreground/background separation and edge detection

python run_fg_bg.py --input ../data/${name} --vis_edge

The default output is saved under the same folder as input ../data/${name}, contains the final foreground objects mask mask.png and edge list edges.json saved in outputs/${name}.

image folder/ 
    ├── mask.png # final mask
    ├── edges.json
    ├── intermediate/
        ├── edge_vis.png # red line for edges
        ├── fg_mask_vis.png # text is the segmentation id
        ├── bg_mask_vis.png
        ├── bg_mask.png

Input	Foreground	Background	Edges

We could simulate all foreground objects by specifying their velocity and acceleration use the segmentation id in simulation.

Inpainting

We use Inpaint-Anything to inpaint the background of input image. You could adapt the code to any other latest inpainting model.

Follow Inpaint-Anything setup to install requirements and download the pretrained model. Recommend to create a new conda environment.

    python -m pip install torch torchvision torchaudio
    python -m pip install -e segment_anything
    python -m pip install -r lama/requirements.txt 
    # Download pretrained model under Inpaint-Anything/pretrained_models/

Inpainting requires the input image ../data/${name}/original.png and a foreground mask of ../data/${name}/mask.png under the same folder.
```
python run_inpaint.py --input ../data/${name} --output ../outputs/${name} --dilate_kernel_size 20
```
dilate_kernel_size could be adjusted in the above script. For heavy shadow image, increase dilate_kernel_size to get better inpainting results.
The output inpaint.png is saved in outputs/${name}.

Input Inpainting

Physics Reasoning

Install requirements
```
pip install openai==0.28 ruamel.yaml
```
Copy the OpenAI API key into gpt/gpt_configs/my_apikey.

Physics reasoning requires the following input for each image:

image folder/ 
   ├── original.png
   ├── mask.png  # movable segmentation mask

Run GPT-4V physical property reasoning by the following command:

python gpt_physic.py --input ../data/${name} --output ../outputs/${name}

The output physics.yaml contains the physical properties and primitive shape of each object segment in the image. Note GPT-4V outputs may vary for different runs and differ from the original setting in data/${name}/sim.yaml. Users could adjust accordingly to each run output.

Albedo and Shading Estimation

We use Intrinsic to infer albedo and shading of input image. Follow Intrinsic setup to install requirements. Recommend to create a new conda environment.
```
git clone https://github.com/compphoto/Intrinsic
cd Intrinsic/
git checkout d9741e99b2997e679c4055e7e1f773498b791288
pip install .
```

Run Intrinsic decomposition on input image

python run_albedo_shading.py --input ../data/${name} --output ../outputs/${name} --vis

shading.npy are saved in outputs/${name}. Visualization of albedo and shading are saved in outputs/${name}/intermediate.

Input Albedo Shading
Intrinsic has released updated trained model with better results. Feel free to use the updated model or any other model for better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

perception

perception

README.md

Perception

Segmentation

GPT-based Recognize Anything

Grounded-Segment-Anything

Depth and Normal Estimation

Foreground / Background & Edge Detection

Inpainting

Physics Reasoning

Albedo and Shading Estimation

Files

perception

Directory actions

More options

Directory actions

More options

Latest commit

History

perception

Folders and files

parent directory

README.md

Perception

Segmentation

GPT-based Recognize Anything

Grounded-Segment-Anything

Depth and Normal Estimation

Foreground / Background & Edge Detection

Inpainting

Physics Reasoning

Albedo and Shading Estimation