Skip to content

This repository was created to add code examples to be used alongside the workshop "Accelerating Large Models In Production" presented at Toronto Machine Learning Summit 2023 by James Cameron.

License

Notifications You must be signed in to change notification settings

FamousDirector/AcceleratingLargeModelNotebooksForTMLS2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Notebooks on how to Accelerate Large AI model inference

This repository was created to add code examples to be used alongside the workshop "Accelerating Large Models In Production: A Practical Guide" presented at Toronto Machine Learning Summit 2023 by James Cameron. Slides and notes can be found here

Disclaimer

This repository is meant to serve as a starting point for new projects. Always to be used only as a reference.

System Requirements

  • 4 core CPU (min 3.0 GHz)
  • 8 GB System Memory
  • Nvidia GPU w/ CUDA Compute Capability >=7.0
  • 6GB Memory minimum

Installation

Setting up the environment

We need to install docker and docker-compose. Please follow the instructions from here (Docker) and here (Docker Compose).

Next we need to install the Nvidia Container Toolkit. Please follow the instructions from here.

Set the nvidia Docker runtime as default:

sudo tee /etc/docker/daemon.json <<EOF
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
EOF

Restart Docker:

sudo service docker restart

Then test the installation:

docker run nvcr.io/nvidia/cuda nvidia-smi

You should see a similar output:

Tue Mar 29 12:38:23 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.81       Driver Version: 472.39       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A3000    Off  | 00000000:01:00.0 Off |                  N/A |
| 25%   46C    P8    12W / 100W |    319MiB /  6144MiB |    18%       Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Running the project

To start the services, run the following command:

bash run.sh

Accessing the Notebooks

Navigate to http://localhost:8888/lab

About

This repository was created to add code examples to be used alongside the workshop "Accelerating Large Models In Production" presented at Toronto Machine Learning Summit 2023 by James Cameron.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published