Skip to content

Commit

Permalink
Overhaul to one package (#56)
Browse files Browse the repository at this point in the history
* change pyproject name

* overhaul to one package

* remove stillwater package

* remove tox files

* remove unused quiver readme

* update ReadMe

* clean up Readme
  • Loading branch information
EthanMarx committed Jan 21, 2024
1 parent 6d7b7d4 commit 59ff33d
Show file tree
Hide file tree
Showing 87 changed files with 3,293 additions and 9,193 deletions.
35 changes: 2 additions & 33 deletions .github/workflows/unit-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,44 +7,13 @@ on:
- main

jobs:
# JOB to run change detection
changes:
runs-on: ubuntu-latest
outputs:
# Expose matched filters as job 'packages' output variable
libraries: ${{ steps.filter.outputs.changes }}
steps:
# For pull requests it's not necessary to checkout the code
-
if: ${{ github.event_name == 'push'}}
uses: actions/checkout@v2
-
uses: dorny/paths-filter@v2
id: filter
with:
filters: |
aeriel:
- '.github/workflows/unit-tests.yaml'
- 'hermes/hermes.aeriel/**'
quiver:
- '.github/workflows/unit-tests.yaml'
- 'hermes/hermes.quiver/**'
stillwater:
- '.github/workflows/unit-tests.yaml'
- 'hermes/hermes.stillwater/**'
# cloudbreak:
# - '.github/workflows/unit-tests.yaml'
# - 'hermes/hermes.cloudbreak/**'

# For pull requests it's not necessary to checkout the code
test:
runs-on: ubuntu-latest
needs: changes
strategy:
fail-fast: false
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11']
library: ${{ fromJSON(needs.changes.outputs.libraries) }}
steps:
- uses: actions/checkout@v2

Expand All @@ -63,4 +32,4 @@ jobs:
# run the library's tests
-
name: run tests
run: cd hermes/hermes.${{ matrix.library }} && tox
run: tox
67 changes: 53 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,74 @@
# Hermes
## Deep Learning Inference-as-a-Service Deployment Utilties
`hermes` is a set of libraries for simplifying the deployment of deep learning applications via [Triton Inference Server](https://github.com/triton-inference-server/server). Each library is installed and managed independently to keep deployments lightweight and minimize dependencies for use.
However, components are designed to play well together across libraries in order to minimize the overhead required to create new, exciting applications.
`hermes` is a set of libraries for simplifying the deployment of deep learning applications via [Triton Inference Server](https://github.com/triton-inference-server/server).

`hermes` is particularly aimed at streaming timeseries use cases, like those found in [gravitational](https://github.com/ML4GW/DeepClean) [wave](https://github.com/ML4GW/BBHNet) [physics](https://github.com/ml4gw/pe). In particular, it includes helpful APIs for exposing input and output states on the server to minimize data I/O, as outlined in [arXiv:2108.12430](https://arxiv.org/abs/2108.12430) and [doi.org/10.1145/3526058.3535454](https://dl.acm.org/doi/10.1145/3526058.3535454).

## The `hermes` libraries
### [`hermes.aeriel`](./hermes/hermes.aeriel)
## The `hermes` modules
### [`hermes.aeriel`](./hermes/aeriel)
#### Triton serving and client utilities
`aeriel` wraps Triton's `InferenceServerClient` class with neat functionality for inferring the names, shapes, and datatypes of the inputs required by complex ensembles of models with combinations of stateful and stateless inputs,
The `aeriel.client` submodule wraps Triton's `InferenceServerClient` class with neat functionality for inferring the names, shapes, and datatypes of the inputs required by complex ensembles of models with combinations of stateful and stateless inputs,
and exposing these inputs for asynchronous inference via numpy arrays.

The `aeriel.serve` submodule also includes a Python context manager for spinning up a local Triton inference service via [Singularity](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html), the preferred container runtime on the HPC clusters on which GW physics work typically takes place.

### [`hermes.cloudbreak`](./hermes/hermes.cloudbreak)
#### Cloud orchestration and deployment
`cloudbreak` contains utilities for orchestrating and deploying workloads on cloud-based resources via simple APIs in a cloud-agnostic manner (though only Google Cloud is supported as a backend [at the moment](/../../issues/2)). This includes both Kubernetes clusters and swarms of VMs to perform parallel inference.
The `aeriel.monitor` submodule contains a `ServerMonitor` context manager for monitoring Triton server-side metrics such as model latency and throughput. This can be extremely useful for diagnosing and addressing bottlenecks in deployment configurations

### [`hermes.quiver`](./hermes/hermes.quiver)
### [`hermes.quiver`](./hermes/quiver)
#### Model export and acceleration
`quiver` assists in exporting trained neural networks from both Torch and TensorFlow to either cloud or local [model repositories](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md), simplifying the creation of complex [model ensembles](https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models) and server-side streaming innput and output states.
`quiver` assists in exporting trained neural networks from both Torch and TensorFlow to either cloud or local [model repositories](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md), simplifying the creation of complex [model ensembles](https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models) and server-side streaming input and output states.

`quiver` also contains utilities for converting models from your framework of choice to NVIDIA's [TensorRT](https://developer.nvidia.com/tensorrt) inference library, which can sometimes help accelerate inference.

### [`hermes.stillwater`](./hermes/hermes/stillwater)
#### Asynchronous pipeline development
The `stillwater` submodule assists in building asychronous inference pipelines by leveraging Python multiprocessing and passing data from one process to the next, with wrappers around the client in `hermes.aeriel` to support truly asynchronous inference and response handling.
## Examples
### Local Triton Server with `hermes.aeriel.serve.serve`

```python
from hermes.aeriel.serve import serve
from tritonclient import grpc as triton

with serve("/path/to/model/repository", "/path/to/container/image", wait=True):
# wait ensures that the server comes online before we enter the context
client = triton.InferenceServerClient("localhost:8001")
assert client.is_server_live()

# exiting the context will spin down the server
try:
client.is_server_live()
except triton.InferenceServerException:
print("All done!")
```

You can even specify arbitrary GPUs to expose to Triton via the `CUDA_VISIBLE_DEVICES` environment variable:

```python
with serve(..., gpus=[0, 3, 5]):
# do inference on 3 GPUs here
```

Note that since the mechanism for exposing these GPUs to Triton is by setting the `CUDA_VISIBLE_DEVICES` environment variable, the desired GPUs should be indexed by their _global_ indices, _not_ any indices mapped to by the current value of `CUDA_VISIBLE_DEVICES`.
For example, if `CUDA_VISIBLE_DEVICES=2,4,6,7` in my inference script's environment, setting `gpus=[0,2]` will expose the GPUs with global indices 0 and 2 to Triton, _not_ 2 and 5.

You can also choose to wait for the server at any time by using the `SingularityInstance` object returned by the `serve` context:

```python

with serve(..., wait=False) as instance:
do_some_setup_while_we_wait_for_server()

# now wait for the server before we begin the actual inference
instance.wait()

client = triton.InferenceServerClient("localhost:8001")
assert client.is_server_live()
```

Consult the function's documentation for information about other configuration and logging options.
This function is not suitable for at-scale deployment, but is useful for running self-contained inference scripts for e.g. local model validation.

## Installation
Hermes is not [currently](/../../issues/10) hosted on [PyPI](/../../issues/11), so to install you'll need to clone this repo and add the submodule(s) you require via [Poetry](https://python-poetry.org).
Hermes is pip installable via `pip install ml4gw-hermes`. Hermes is also fully compatible with [Poetry](https://python-poetry.org) for ease of use as a git submodule.


## Stability and Development
Hermes is still very much a work in progress, but the fastest path towards making it more robust is broader adoption! To that end, we warn users that they may experience bugs as they deploy Hermes to new and novel problems, and encourage them to file [issues](/../../issues) on this page and if they can, consider contributing a [PR](https://github.com/ML4GW/hermes/pulls) to fix whatever bug they stumbled upon!
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion hermes/hermes.quiver/docs/about.rst → docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ Installation
------------
.. code-block:: bash
pip install gravswell.quiver
pip install ml4gw-hermes
2 changes: 1 addition & 1 deletion hermes/hermes.quiver/docs/conf.py → docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

# -- Project information -----------------------------------------------------

project = "hermes.quiver"
project = "hermes"
copyright = "2021, Alec Gunny"
author = "Alec Gunny"

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
import tritonclient.grpc as triton
import urllib3

from hermes.stillwater.logging import listener
from hermes.stillwater.process import PipelineProcess
from hermes.aeriel.monitor.logging import listener
from hermes.aeriel.monitor.process import PipelineProcess

if TYPE_CHECKING:
from io import TextIOWrapper
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
from queue import Empty
from typing import TYPE_CHECKING, Optional

from hermes.stillwater.logging import listener, logger
from hermes.stillwater.utils import ExceptionWrapper, Throttle
from hermes.aeriel.monitor.logging import listener, logger
from hermes.aeriel.monitor.utils import ExceptionWrapper, Throttle

if TYPE_CHECKING:
from queue import Queue

from hermes.stillwater.utils import Package
from hermes.aeriel.monitor.utils import Package


class PipelineProcess(mp.Process):
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
49 changes: 0 additions & 49 deletions hermes/hermes.aeriel/README.md

This file was deleted.

0 comments on commit 59ff33d

Please sign in to comment.