Collection of containerized AI/ML workloads optimized for deployment in Polaris Containers. Includes CPU and GPU variants for different use cases.
-
- Location:
/polaris-ai/cpu/torchserve - Overview: Sets up TorchServe for CPU workloads. The start script downloads the model and starts the service.
- Location:
-
- Location:
/polaris-ai/gpu/torchserve - Overview: Sets up TorchServe for GPU workloads using a GPU-enabled TorchServe image.
- Location:
-
- Location:
/polaris-llm/cpu/ollama - Overview: Packages the Ollama service in a Docker container for CPU usage. Specify the model via the
POLARIS_LLM_OLLAMA_MODELenvironment variable.
- Location:
-
- Location:
/polaris-llm/gpu/ollama - Overview: GPU-accelerated version of Ollama service. Uses NVIDIA CUDA for model inference.
- Prerequisites:
POLARIS_LLM_OLLAMA_MODEL: The model identifier.
- Location:
-
- Location:
/polaris-llm/gpu/vllm - Overview: Sets up vllm in a Docker container for GPU usage. The start script validates required environment variables, downloads the specified model (if needed), and launches vllm.
- Prerequisites:
HF_TOKEN: Your Hugging Face token.POLARIS_VLLM_MODEL: The model identifier.POLARIS_VLLM_DIR: (Optional) Directory for model storage (default:/models).
- Location:
- On first startup, if the specified model is not already present, the related workload will automatically download it (this may take some time depending on the model size and connection speed).
- Refer to each workload's README for more detailed instructions and configuration options.