-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* docs: setup documentation site using mkdocs * docs: add readme for docs package * docs: add vs code config for mkdocs-material syntax support * docs: update readmes to reflect docs site * ci: add action to deploy docs site * refactor: update docs site for custom url
- Loading branch information
Showing
16 changed files
with
1,901 additions
and
136 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Deploy mkdocs site to GitHub Pages | ||
name: pages | ||
|
||
on: | ||
# Runs on pushes targeting the default branch | ||
push: | ||
branches: | ||
- main | ||
# Only run if docs or self updated | ||
paths: | ||
- packages/docs/** | ||
- .github/workflows/pages.yml | ||
|
||
# Allows you to run this workflow manually from the Actions tab | ||
workflow_dispatch: | ||
|
||
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages | ||
permissions: | ||
contents: write | ||
pages: write | ||
id-token: write | ||
|
||
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued. | ||
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete. | ||
concurrency: | ||
group: "pages" | ||
cancel-in-progress: false | ||
|
||
jobs: | ||
build-and-deploy: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Configure Git Credentials | ||
run: | | ||
git config user.name github-actions[bot] | ||
git config user.email 41898282+github-actions[bot]@users.noreply.github.com | ||
- name: Setup Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: 3.11 | ||
architecture: 'x64' | ||
- name: Install Dependencies | ||
run: | | ||
cd packages/docs | ||
pip install pdm | ||
pdm install --frozen-lockfile --production | ||
- name: Build Docs and Deploy | ||
run: | | ||
cd packages/docs | ||
pdm run deploy-pages |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,5 +3,6 @@ | |
"charliermarsh.ruff", | ||
"ms-python.python", | ||
"ms-python.vscode-pylance", | ||
"redhat.vscode-yaml", | ||
], | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,43 @@ | ||
# ezsam | ||
Extract foreground from images or video via text prompt | ||
# ezsam (easy segment anything model) | ||
|
||
See the command line tool readme at https://github.com/ae9is/ezsam/tree/main/packages/cli | ||
A tool to segment images and video via text prompts. | ||
|
||
Input images and videos, describe the subjects or objects you want to keep, and output new images and videos with the background removed. | ||
|
||
**Check out the docs! [ezsam.org](https://www.ezsam.org)** | ||
|
||
## Why? | ||
|
||
Meta's [Segment Anything](https://github.com/facebookresearch/segment-anything) is a powerful tool for separating parts of images, | ||
but requires coordinate prompts—either bounding boxes or points. | ||
And manual prompt generation is tedious for large collections of still images or video. | ||
|
||
In constrast, text-based prompts describing the object(s) in the foreground to segment can be constant. | ||
Inspired by [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything), | ||
this project tries to package a simpler to use tool. | ||
|
||
If you're not interested in text-based prompts with Segment Anything, | ||
check out [rembg](https://github.com/danielgatis/rembg). | ||
|
||
## How does it work? | ||
|
||
The foreground is selected using text prompts to [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) to detect objects. | ||
Image segments are generated using [Segment Anything](https://github.com/facebookresearch/segment-anything) | ||
or [Segment Anything HQ (SAM-HQ)](https://github.com/SysCV/SAM-HQ). | ||
|
||
## Quick start | ||
|
||
```bash | ||
# Ubuntu 22.04, Python 3.9 - 3.11 | ||
pip install ezsam | ||
sudo apt install ffmpeg imagemagick | ||
ezsam --help | ||
``` | ||
|
||
For more detailed info, see the documentation site here: [ezsam.org](https://www.ezsam.org) | ||
|
||
## Monorepo structure | ||
|
||
This repository collocates the following packages: | ||
- [cli](packages/cli): the ezsam command line tool | ||
- [docs](packages/docs): a static documentation site |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,135 +1,5 @@ | ||
# ezsam (easy segment anything model) | ||
# ezsam/cli | ||
|
||
A pipeline to extract foreground from images or video via text prompts. | ||
A command line tool to extract foreground from images or video via text prompts. | ||
|
||
## Why? | ||
|
||
Meta's Segment Anything is a powerful tool for separating parts of images, | ||
but requires coordinate prompts—either bounding boxes or points. | ||
Manual prompt generation is tedious for large collections of still images or video. | ||
In constrast, text-based prompts describing the object(s) in the foreground to segment can be constant. | ||
Inspired by [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything), | ||
this project tries to package a simpler to use tool. | ||
|
||
If you're not interested in text-based prompts with Segment Anything, | ||
check out [rembg](https://github.com/danielgatis/rembg). | ||
|
||
## How does it work? | ||
|
||
The foreground is selected using text prompts to [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) to detect objects. | ||
Image segments are generated using [Segment Anythinug](https://github.com/facebookresearch/segment-anything) | ||
or [Segment Anything HQ (SAM-HQ)](https://github.com/SysCV/SAM-HQ). | ||
|
||
## Installation | ||
|
||
```bash | ||
pip install ezsam | ||
``` | ||
|
||
For video output, you need to install FFmpeg and have it available on your $PATH as `ffmpeg` for | ||
all the encoding options except GIF. GIF output requires Imagemagick; `convert` must be available on your $PATH. | ||
|
||
```bash | ||
# Examples will be given for apt-based Linuxes like Ubuntu, Debian... | ||
apt install ffmpeg imagemagick | ||
``` | ||
|
||
For a development install, see [Development](#development). | ||
|
||
## Usage | ||
|
||
```bash | ||
ezsam --help | ||
``` | ||
|
||
## Examples | ||
|
||
Example images are sourced from [rembg](https://github.com/danielgatis/rembg/tree/main/examples) for easy comparison. | ||
|
||
Process images extracting foreground specified by prompt to `examples/animal*.out.png`. | ||
(For extractions, which require adding an alpha channel, the output image format is always `png`.) | ||
|
||
```bash | ||
ezsam examples/animal*.jpg -p animal -o examples | ||
``` | ||
|
||
Multiple objects can be selected as the foreground. The output image `./car-1.out.png` contains the car and the person. | ||
|
||
```bash | ||
ezsam examples/car-1.jpg -p car, person | ||
``` | ||
|
||
Use debug mode to fine tune or troubleshoot prompts. This writes output with foreground mask and object detections | ||
annotated over the original image file. Here we write out to `test/car-3.debug.jpg`. | ||
(Note the original image format `jpg` is preserved in debug mode!) | ||
|
||
```bash | ||
ezsam examples/car-3.jpg -p white car -o test -s .debug --debug | ||
``` | ||
|
||
The object detection box threshold parameter can be used to fine tune objects for selection. | ||
|
||
```bash | ||
ezsam examples/car-3.jpg -p white car -o test --bmin 0.45 | ||
``` | ||
|
||
Writing prompts with specificity can also help. | ||
|
||
```bash | ||
ezsam examples/anime-girl-2.jpg -o examples -s .debug -p girl, phone, bag, railway crossing sign post --debug | ||
``` | ||
|
||
## Models | ||
|
||
The tool uses [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) for object detection. | ||
|
||
To perform image segmentation, you can pick SAM or SAM-HQ: | ||
* [Segment Anything](https://github.com/facebookresearch/segment-anything) | ||
* [Segment Anything HQ (SAM-HQ)](https://github.com/SysCV/SAM-HQ) | ||
|
||
For the best results use the biggest model your GPU has memory for. ViT = Vision Transformer, the model type. From best/slowest to worst/fastest: ViT-H > ViT-L > ViT-B > ViT-tiny. | ||
|
||
Note: ViT-tiny is for SAM-HQ only, you must use the `--hq` flag. | ||
|
||
## Development | ||
|
||
This project uses [pdm](https://github.com/pdm-project/pdm) for package management. Example installation: | ||
|
||
```bash | ||
pip install pipx | ||
pipx install pdm | ||
git clone https://github.com/ae9is/ezsam.git | ||
cd ezsam/packages/cli | ||
pdm install | ||
pdm start | ||
``` | ||
|
||
Pre-commit is used for some commit hooks: | ||
```bash | ||
pip install pre-commit | ||
pre-commit install | ||
``` | ||
|
||
## GPU memory troubleshooting | ||
|
||
If you *always* get an error stating "CUDA out of memory", try using a smaller Segment Anything model (vit_tiny, vit_b) or lower resolution (or less) input. | ||
|
||
If you only get a CUDA OOM error occasionally, or after a while, try to free up some memory by closing processes using the GPU: | ||
```bash | ||
# List commands using nvidia gpu | ||
fuser -v /dev/nvidia* | ||
``` | ||
|
||
You can also try manually getting the GPU to clear some processes: | ||
```bash | ||
# Clears all processes accounted so far | ||
sudo nvidia-smi -caa | ||
``` | ||
|
||
If you are using multiple GPUs, and so the GPU you're running CUDA on isn't driving your displays, you can also reset the GPU using: | ||
```bash | ||
# Trigger reset of one or more GPUs | ||
sudo nvidia-smi -r | ||
``` | ||
|
||
Note: nvidia-smi is in the nvidia-utils package of [NVIDIA's CUDA repo for Ubuntu](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network). | ||
Check out the docs at: [ezsam.org](https://www.ezsam.org) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# ezsam/docs | ||
|
||
Static docs site using [Material for MkDocs](https://github.com/squidfunk/mkdocs-material) | ||
|
||
## Install | ||
|
||
```bash | ||
pdm install | ||
``` | ||
|
||
## Run | ||
|
||
```bash | ||
pdm start | ||
``` | ||
|
||
## Deployment | ||
|
||
Checkout the github action at [.github/workflows/pages.yml](/.github/workflows/pages.yml) | ||
|
||
To manually deploy to your local Git repository's gh-pages branch: | ||
|
||
```bash | ||
pdm deploy-pages | ||
``` | ||
|
||
## Development | ||
|
||
Checkout the repos for [mkdocs-material](https://github.com/squidfunk/mkdocs-material) and [pdm](https://github.com/pdm-project/pdm) for examples (for generating versioned docs, api docs from doc strings, etc...) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
site_name: ezsam | ||
site_url: https://www.ezsam.org | ||
site_author: ae9is | ||
site_description: >- | ||
Use ezsam to extract foreground from images or video via text prompts | ||
repo_name: ae9is/ezsam | ||
repo_url: https://github.com/ae9is/ezsam | ||
edit_uri: edit/main/packages/docs/src/ | ||
docs_dir: src | ||
plugins: | ||
- search | ||
markdown_extensions: | ||
- admonition | ||
- attr_list | ||
- toc: | ||
permalink: true | ||
- pymdownx.emoji: | ||
emoji_index: !!python/name:material.extensions.emoji.twemoji | ||
emoji_generator: !!python/name:material.extensions.emoji.to_svg | ||
extra: | ||
social: | ||
- icon: fontawesome/brands/github | ||
link: https://github.com/ae9is/ezsam | ||
# analytics: | ||
# provider: google | ||
# property: !ENV GOOGLE_ANALYTICS_KEY | ||
# consent: | ||
# title: Would you like a free cookie? 🍪 | ||
# description: It's just to see how this docs site is used and potentially improve it. | ||
# actions: | ||
# - manage | ||
# - reject | ||
# - accept | ||
#copyright: <a href="#__consent">Change cookie settings</a> | ||
extra_css: | ||
- assets/extra.css | ||
theme: | ||
name: material | ||
palette: | ||
- media: '(prefers-color-scheme: dark)' | ||
scheme: slate | ||
toggle: | ||
icon: material/weather-night | ||
name: Switch to light mode | ||
- media: '(prefers-color-scheme: light)' | ||
scheme: default | ||
toggle: | ||
icon: material/weather-sunny | ||
name: Switch to dark mode | ||
features: | ||
- content.action.edit | ||
- content.action.view | ||
- content.code.annotate | ||
- content.code.copy | ||
- content.tooltips | ||
- navigation.footer | ||
- navigation.indexes | ||
- navigation.tracking | ||
- navigation.path | ||
- navigation.top | ||
# - navigation.sections | ||
# - navigation.tabs | ||
- search.highlight | ||
- search.share | ||
- toc.follow | ||
icon: | ||
edit: material/pencil | ||
view: material/eye | ||
nav: | ||
- About: index.md | ||
- install.md | ||
- usage.md | ||
- changelog.md |
Oops, something went wrong.