This repository contains hands-on notebooks for the NVIDIA NIM workshop, demonstrating how to build, fine-tune, and deploy Large Language Models using NVIDIA's AI stack.
Learn how to:
- Use NVIDIA NIM APIs for cloud-based inference
- Deploy NIM containers locally with GPU acceleration
- Fine-tune LLMs using LoRA (Low-Rank Adaptation) with NeMo
- Deploy custom LoRA adapters with NIM for production use
- 00_Workshop_Setup.ipynb - Initial setup and environment configuration
- 01_NIM_API_Tutorial.ipynb - Introduction to NVIDIA NIM cloud APIs
- 02_Local_NIM_Deployment.ipynb - Deploy NIM containers locally
- 03_LoRA_Training_NeMo.ipynb - Fine-tune models with LoRA
- 04_Deploy_LoRA_with_NIM.ipynb - Deploy LoRA adapters
- NVIDIA GPU (A100, V100, or similar)
- Docker with NVIDIA Container Runtime
- Python 3.8+
- NGC Account (free at ngc.nvidia.com)
- NVIDIA API Key (get one at build.nvidia.com)
This workshop requires three API keys (stored in a .env file):
- NVIDIA_API_KEY: For accessing NVIDIA's cloud API services
- NGC_API_KEY: For downloading NIM containers from NVIDIA GPU Cloud
- NGC_CLI_API_KEY: For NGC CLI operations (optional, uses NGC_API_KEY as fallback)
The setup notebook (00_Workshop_Setup.ipynb) will guide you through obtaining and configuring these keys.
- Clone this repository:
git clone https://github.com/snaic-org/NIM-build-tune-deploy-participant.git
cd NIM-build-tune-deploy-participant- Run the setup notebook:
jupyter notebook 00_Workshop_Setup.ipynb- Follow the notebooks in order (00 → 01 → 02 → 03 → 04)
NIM-build-tune-deploy-participant/
├── 00_Workshop_Setup.ipynb # Environment setup & API key configuration
├── 01_NIM_API_Tutorial.ipynb # Cloud API tutorial
├── 02_Local_NIM_Deployment.ipynb # Local deployment
├── 03_LoRA_Training_NeMo.ipynb # LoRA training
├── 04_Deploy_LoRA_with_NIM.ipynb # LoRA deployment
├── openai_example/ # OpenAI API compatibility examples
│ └── openai_api_example.ipynb # Example using OpenAI client with NIM
├── lora_tutorial/ # Training data and configs
│ └── data/ # Sample datasets
├── ngc-cli/ # NGC CLI scripts
├── img/ # Workshop images
└── .env # API keys (create this file)
- NVIDIA NIM: Inference microservices for optimized model deployment
- NeMo Framework: For training and fine-tuning LLMs
- LoRA: Efficient fine-tuning technique
- Docker: Container-based deployment
- NGC (NVIDIA GPU Cloud): Container registry and model repository
- The workshop uses Llama 3.1 8B Instruct as the base model
- NIM containers require significant disk space (~50GB per model)
- First-time model downloads may take 5-10 minutes
- Subsequent runs use cached models for faster startup
If you encounter issues:
- Ensure all API keys are properly set in the
.envfile - Verify Docker and NVIDIA Container Runtime are installed
- Check that your GPU has sufficient memory (16GB+ recommended)
- Confirm you have enough disk space for model caching
This workshop material is provided for educational purposes. Model usage is subject to respective model licenses.