Skip to content

Commit

Permalink
12 docs site (#13)
Browse files Browse the repository at this point in the history
* docs: setup documentation site using mkdocs

* docs: add readme for docs package

* docs: add vs code config for mkdocs-material syntax support

* docs: update readmes to reflect docs site

* ci: add action to deploy docs site

* refactor: update docs site for custom url
  • Loading branch information
ae9is committed Jan 26, 2024
1 parent b07cb30 commit a910461
Show file tree
Hide file tree
Showing 16 changed files with 1,901 additions and 136 deletions.
52 changes: 52 additions & 0 deletions .github/workflows/pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Deploy mkdocs site to GitHub Pages
name: pages

on:
# Runs on pushes targeting the default branch
push:
branches:
- main
# Only run if docs or self updated
paths:
- packages/docs/**
- .github/workflows/pages.yml

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: write
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.11
architecture: 'x64'
- name: Install Dependencies
run: |
cd packages/docs
pip install pdm
pdm install --frozen-lockfile --production
- name: Build Docs and Deploy
run: |
cd packages/docs
pdm run deploy-pages
1 change: 1 addition & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
"charliermarsh.ruff",
"ms-python.python",
"ms-python.vscode-pylance",
"redhat.vscode-yaml",
],
}
12 changes: 12 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,16 @@
"python.analysis.fixAll": [
"source.unusedImports"
],
// ref: https://squidfunk.github.io/mkdocs-material/creating-your-site/#minimal-configuration
"yaml.schemas": {
"https://squidfunk.github.io/mkdocs-material/schema.json": "mkdocs.yml"
},
"yaml.customTags": [
"!ENV scalar",
"!ENV sequence",
"!relative scalar",
"tag:yaml.org,2002:python/name:material.extensions.emoji.to_svg",
"tag:yaml.org,2002:python/name:material.extensions.emoji.twemoji",
"tag:yaml.org,2002:python/name:pymdownx.superfences.fence_code_format"
]
}
45 changes: 42 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,43 @@
# ezsam
Extract foreground from images or video via text prompt
# ezsam (easy segment anything model)

See the command line tool readme at https://github.com/ae9is/ezsam/tree/main/packages/cli
A tool to segment images and video via text prompts.

Input images and videos, describe the subjects or objects you want to keep, and output new images and videos with the background removed.

**Check out the docs! [ezsam.org](https://www.ezsam.org)**

## Why?

Meta's [Segment Anything](https://github.com/facebookresearch/segment-anything) is a powerful tool for separating parts of images,
but requires coordinate prompts—either bounding boxes or points.
And manual prompt generation is tedious for large collections of still images or video.

In constrast, text-based prompts describing the object(s) in the foreground to segment can be constant.
Inspired by [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything),
this project tries to package a simpler to use tool.

If you're not interested in text-based prompts with Segment Anything,
check out [rembg](https://github.com/danielgatis/rembg).

## How does it work?

The foreground is selected using text prompts to [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) to detect objects.
Image segments are generated using [Segment Anything](https://github.com/facebookresearch/segment-anything)
or [Segment Anything HQ (SAM-HQ)](https://github.com/SysCV/SAM-HQ).

## Quick start

```bash
# Ubuntu 22.04, Python 3.9 - 3.11
pip install ezsam
sudo apt install ffmpeg imagemagick
ezsam --help
```

For more detailed info, see the documentation site here: [ezsam.org](https://www.ezsam.org)

## Monorepo structure

This repository collocates the following packages:
- [cli](packages/cli): the ezsam command line tool
- [docs](packages/docs): a static documentation site
136 changes: 3 additions & 133 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
@@ -1,135 +1,5 @@
# ezsam (easy segment anything model)
# ezsam/cli

A pipeline to extract foreground from images or video via text prompts.
A command line tool to extract foreground from images or video via text prompts.

## Why?

Meta's Segment Anything is a powerful tool for separating parts of images,
but requires coordinate prompts—either bounding boxes or points.
Manual prompt generation is tedious for large collections of still images or video.
In constrast, text-based prompts describing the object(s) in the foreground to segment can be constant.
Inspired by [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything),
this project tries to package a simpler to use tool.

If you're not interested in text-based prompts with Segment Anything,
check out [rembg](https://github.com/danielgatis/rembg).

## How does it work?

The foreground is selected using text prompts to [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) to detect objects.
Image segments are generated using [Segment Anythinug](https://github.com/facebookresearch/segment-anything)
or [Segment Anything HQ (SAM-HQ)](https://github.com/SysCV/SAM-HQ).

## Installation

```bash
pip install ezsam
```

For video output, you need to install FFmpeg and have it available on your $PATH as `ffmpeg` for
all the encoding options except GIF. GIF output requires Imagemagick; `convert` must be available on your $PATH.

```bash
# Examples will be given for apt-based Linuxes like Ubuntu, Debian...
apt install ffmpeg imagemagick
```

For a development install, see [Development](#development).

## Usage

```bash
ezsam --help
```

## Examples

Example images are sourced from [rembg](https://github.com/danielgatis/rembg/tree/main/examples) for easy comparison.

Process images extracting foreground specified by prompt to `examples/animal*.out.png`.
(For extractions, which require adding an alpha channel, the output image format is always `png`.)

```bash
ezsam examples/animal*.jpg -p animal -o examples
```

Multiple objects can be selected as the foreground. The output image `./car-1.out.png` contains the car and the person.

```bash
ezsam examples/car-1.jpg -p car, person
```

Use debug mode to fine tune or troubleshoot prompts. This writes output with foreground mask and object detections
annotated over the original image file. Here we write out to `test/car-3.debug.jpg`.
(Note the original image format `jpg` is preserved in debug mode!)

```bash
ezsam examples/car-3.jpg -p white car -o test -s .debug --debug
```

The object detection box threshold parameter can be used to fine tune objects for selection.

```bash
ezsam examples/car-3.jpg -p white car -o test --bmin 0.45
```

Writing prompts with specificity can also help.

```bash
ezsam examples/anime-girl-2.jpg -o examples -s .debug -p girl, phone, bag, railway crossing sign post --debug
```

## Models

The tool uses [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) for object detection.

To perform image segmentation, you can pick SAM or SAM-HQ:
* [Segment Anything](https://github.com/facebookresearch/segment-anything)
* [Segment Anything HQ (SAM-HQ)](https://github.com/SysCV/SAM-HQ)

For the best results use the biggest model your GPU has memory for. ViT = Vision Transformer, the model type. From best/slowest to worst/fastest: ViT-H > ViT-L > ViT-B > ViT-tiny.

Note: ViT-tiny is for SAM-HQ only, you must use the `--hq` flag.

## Development

This project uses [pdm](https://github.com/pdm-project/pdm) for package management. Example installation:

```bash
pip install pipx
pipx install pdm
git clone https://github.com/ae9is/ezsam.git
cd ezsam/packages/cli
pdm install
pdm start
```

Pre-commit is used for some commit hooks:
```bash
pip install pre-commit
pre-commit install
```

## GPU memory troubleshooting

If you *always* get an error stating "CUDA out of memory", try using a smaller Segment Anything model (vit_tiny, vit_b) or lower resolution (or less) input.

If you only get a CUDA OOM error occasionally, or after a while, try to free up some memory by closing processes using the GPU:
```bash
# List commands using nvidia gpu
fuser -v /dev/nvidia*
```

You can also try manually getting the GPU to clear some processes:
```bash
# Clears all processes accounted so far
sudo nvidia-smi -caa
```

If you are using multiple GPUs, and so the GPU you're running CUDA on isn't driving your displays, you can also reset the GPU using:
```bash
# Trigger reset of one or more GPUs
sudo nvidia-smi -r
```

Note: nvidia-smi is in the nvidia-utils package of [NVIDIA's CUDA repo for Ubuntu](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network).
Check out the docs at: [ezsam.org](https://www.ezsam.org)
29 changes: 29 additions & 0 deletions packages/docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# ezsam/docs

Static docs site using [Material for MkDocs](https://github.com/squidfunk/mkdocs-material)

## Install

```bash
pdm install
```

## Run

```bash
pdm start
```

## Deployment

Checkout the github action at [.github/workflows/pages.yml](/.github/workflows/pages.yml)

To manually deploy to your local Git repository's gh-pages branch:

```bash
pdm deploy-pages
```

## Development

Checkout the repos for [mkdocs-material](https://github.com/squidfunk/mkdocs-material) and [pdm](https://github.com/pdm-project/pdm) for examples (for generating versioned docs, api docs from doc strings, etc...)
73 changes: 73 additions & 0 deletions packages/docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
site_name: ezsam
site_url: https://www.ezsam.org
site_author: ae9is
site_description: >-
Use ezsam to extract foreground from images or video via text prompts
repo_name: ae9is/ezsam
repo_url: https://github.com/ae9is/ezsam
edit_uri: edit/main/packages/docs/src/
docs_dir: src
plugins:
- search
markdown_extensions:
- admonition
- attr_list
- toc:
permalink: true
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
extra:
social:
- icon: fontawesome/brands/github
link: https://github.com/ae9is/ezsam
# analytics:
# provider: google
# property: !ENV GOOGLE_ANALYTICS_KEY
# consent:
# title: Would you like a free cookie? 🍪
# description: It's just to see how this docs site is used and potentially improve it.
# actions:
# - manage
# - reject
# - accept
#copyright: <a href="#__consent">Change cookie settings</a>
extra_css:
- assets/extra.css
theme:
name: material
palette:
- media: '(prefers-color-scheme: dark)'
scheme: slate
toggle:
icon: material/weather-night
name: Switch to light mode
- media: '(prefers-color-scheme: light)'
scheme: default
toggle:
icon: material/weather-sunny
name: Switch to dark mode
features:
- content.action.edit
- content.action.view
- content.code.annotate
- content.code.copy
- content.tooltips
- navigation.footer
- navigation.indexes
- navigation.tracking
- navigation.path
- navigation.top
# - navigation.sections
# - navigation.tabs
- search.highlight
- search.share
- toc.follow
icon:
edit: material/pencil
view: material/eye
nav:
- About: index.md
- install.md
- usage.md
- changelog.md
Loading

0 comments on commit a910461

Please sign in to comment.