In [2]:
!pwd
!ls
!tree

/home/lithira/Project1/19_06_24_conda_copy_for_15_07_24/triton_repo


26_07_24.txt  prometheus_config  show.mp4	   triton_run_2.ipynb
build_env.sh  requirements.txt	 triton_run.ipynb  violation
[01;34m.[0m
├── 26_07_24.txt
├── [01;32mbuild_env.sh[0m
├── [01;34mprometheus_config[0m
│   └── prometheus.yml
├── requirements.txt
├── [01;35mshow.mp4[0m
├── triton_run.ipynb
├── triton_run_2.ipynb
└── [01;34mviolation[0m
    ├── [01;34m1[0m
    │   ├── [01;34m__pycache__[0m
    │   │   └── model.cpython-310.pyc
    │   ├── model.py
    │   ├── xgb_model.pkl
    │   ├── yolov8x-seg.engine
    │   └── yolov8x-seg.pt
    ├── config.pbtxt
    └── [01;31mpyenv.tar.gz[0m

4 directories, 14 files


In [3]:
!docker run -d --shm-size=5G --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $PWD:/mnt/data/model_repository \
custom_triton \
tritonserver \
--model-repository=/mnt/data/model_repository \
--log-verbose=1

cae4ed38b87d3a6ac6df68f1a37450a49a943f0e953495bb7485cb12b6d2f188


In [4]:
!docker ps

CONTAINER ID   IMAGE           COMMAND                  CREATED              STATUS              PORTS                              NAMES
5db2367acefc   custom_triton   "/opt/nvidia/nvidia_…"   About a minute ago   Up About a minute   0.0.0.0:8000-8002->8000-8002/tcp   goofy_bohr


## Docker Command Explanation

This `docker run` command starts a container with the NVIDIA Triton Inference Server, configured with various options:

- `-d`: Runs the container in detached mode, meaning it runs in the background and does not block your terminal.

- `--shm-size=5G`: Sets the size of `/dev/shm` (shared memory) to 5 gigabytes. Shared memory is used for inter-process communication and can be necessary for certain applications, like machine learning inference.

- `--gpus all`: Grants the container access to all GPUs on the host machine. This is important for running machine learning models that utilize GPU acceleration.

- `-p8000:8000 -p8001:8001 -p8002:8002`: Maps ports from the host to the container. This allows you to access services running on these container ports from the corresponding ports on your host machine.
   - `8000`: Typically used for HTTP requests to the Triton server.
   - `8001`: Typically used for GRPC requests.
   - `8002`: Typically used for the server's metrics endpoint.

- `-v $PWD/triton_repo:/mnt/data/model_repository`: Mounts the `triton_repo` directory from your current working directory (`$PWD`) on the host to `/mnt/data/model_repository` inside the container. This is where Triton expects to find the model repository.

- `nvcr.io/nvidia/tritonserver:24.05-py3`: Specifies the Docker image to use, which is version `24.05-py3` of NVIDIA's Triton Inference Server from NVIDIA's container registry (`nvcr.io`).

- `tritonserver`: The command run inside the container to start the Triton Inference Server.

- `--model-repository=/mnt/data/model_repository`: Tells Triton where to find the model repository inside the container.

- `--log-verbose=1`: Sets the verbosity level of Triton's logging. Level 1 provides a moderate amount of logging detail.

This command sets up and runs a Triton Inference Server in a Docker container with access to shared memory, GPUs, and specific ports, using your local model repository.


In [4]:
!docker run -d --shm-size=5G --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 \
  -v $PWD:/mnt/data/model_repository \
  --network bridge \
  -e TZ=Asia/Kolkata \
  custom_triton \
  tritonserver \
  --model-repository=/mnt/data/model_repository \
  --log-verbose=1

345bb785a059010ac6eb3b3de174b71337d67dce47886d458b26a157d61d91f2


In [9]:
!docker ps

CONTAINER ID   IMAGE           COMMAND                  CREATED         STATUS                  PORTS                              NAMES
e38c8ef488ed   custom_triton   "/opt/nvidia/nvidia_…"   4 seconds ago   Up Less than a second   0.0.0.0:8000-8002->8000-8002/tcp   intelligent_satoshi


In [2]:
import requests
import json
import base64

url = "http://localhost:8000/v2/models/violation/versions/1/infer"

# Read the video file
with open("show.mp4", "rb") as video_file:
    video_bytes = video_file.read()

# Encode the video file to base64
video_base64 = base64.b64encode(video_bytes).decode('utf-8')

# Set the client number (1 or 2)
client_number = 1

payload = json.dumps({
    "inputs": [
        {
            "name": "INPUT0",
            "shape": [1],
            "datatype": "BYTES",
            "data": [video_base64]
        },
        {
            "name": "CLIENT_NUMBER",
            "shape": [1],
            "datatype": "INT32",
            "data": [client_number]
        }
    ]
})

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)

print(json.loads(response.text))


{'model_name': 'violation', 'model_version': '1', 'outputs': [{'name': 'OUTPUT0', 'datatype': 'INT32', 'shape': [1], 'data': [1]}]}


# monitoring

In [2]:
!docker run -d --name prometheus \
  -p 9090:9090 \
  -v $(pwd)/../trition-monitoring/prometheus:/etc/prometheus \
  prom/prometheus


ab6c272f07a2b34c617bccc1062045e6f14a9b8fb82d1ec3311f81c1d32effea


In [3]:
!docker run -d --name grafana \
  -p 3000:3000 \
  grafana/grafana

f29f0a4e9378388167fb19022024a9f432e81f0a5b77fd02997209b1772e72ed


I understand. If you're starting each container separately rather than using Docker Compose, I'll provide a step-by-step guide for that approach. Here's how to set up Grafana monitoring for your Triton Inference Server using individual Docker containers:

1. Create a project directory:

```bash
mkdir triton-monitoring
cd triton-monitoring
```

2. Start Triton Inference Server:

Assuming you already have your Triton container running, make sure it's exposing the metrics port (usually 8002). If not, start it with:

```bash
docker run --name triton -p 8000:8000 -p 8001:8001 -p 8002:8002 \
  -v /path/to/your/models:/models \
  nvcr.io/nvidia/tritonserver:22.12-py3 \
  tritonserver --model-repository=/models
```

3. Set up Prometheus:

Create a Prometheus configuration file:

```bash
mkdir prometheus
cat << EOF > prometheus/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'triton'
    static_configs:
      - targets: ['host.docker.internal:8002']
EOF
```

Start Prometheus container:

```bash
docker run -d --name prometheus \
  -p 9090:9090 \
  -v $(pwd)/prometheus:/etc/prometheus \
  prom/prometheus:v2.42.0
```

4. Start Grafana:

```bash
docker run -d --name grafana \
  -p 3000:3000 \
  grafana/grafana:9.3.6
```

5. Configure Grafana:

   a. Access Grafana at http://localhost:3000
   b. Log in with default credentials (admin/admin)
   c. Change the password when prompted
   d. Add Prometheus as a data source:
      - Go to Configuration > Data Sources
      - Click "Add data source"
      - Select Prometheus
      - Set the URL to http://host.docker.internal:9090
      - Click "Save & Test"

6. Create a dashboard for Triton metrics:

   a. Click the "+" icon in the sidebar and select "Dashboard"
   b. Click "Add a new panel"
   c. In the query editor, select Prometheus as the data source
   d. Start adding queries to visualize Triton metrics, such as:
      - `triton_memory_used`
      - `triton_queue_count`
      - `triton_inflight_request_count`
   e. Customize your panels and save the dashboard

7. (Optional) Import pre-made Triton dashboards:

   a. Go to "Create" > "Import" in the Grafana sidebar
   b. Enter the dashboard ID: 12391 for the Triton Inference Server dashboard
   c. Select your Prometheus data source
   d. Click "Import"

8. Monitor and analyze your Triton Inference Server using the Grafana dashboards.

Additional notes:

- The use of `host.docker.internal` in the Prometheus config allows it to reach your host machine. This may need to be adjusted based on your Docker network setup.
- If containers can't communicate, you may need to create a Docker network and connect all containers to it:

```bash
docker network create triton-monitoring
docker network connect triton-monitoring triton
docker network connect triton-monitoring prometheus
docker network connect triton-monitoring grafana
```

Then update the Prometheus config to use the container name instead of `host.docker.internal`.

- Remember to adjust volume mounts and ports as necessary for your specific setup.

Would you like me to explain or elaborate on any part of this setup?

# GUI

In [3]:
import requests
import json
import base64
import threading

def send_to_triton(video_base64, client_number):
    url = "http://localhost:8000/v2/models/violation/versions/1/infer"
    
    payload = json.dumps({
        "inputs": [
            {
                "name": "INPUT0",
                "shape": [1],
                "datatype": "BYTES",
                "data": [video_base64]
            },
            {
                "name": "CLIENT_NUMBER",
                "shape": [1],
                "datatype": "INT32",
                "data": [client_number]
            }
        ]
    })
    
    headers = {
        'Content-Type': 'application/json'
    }
    
    response = requests.post(url, headers=headers, data=payload)
    print("Triton response:", json.loads(response.text))

def send_to_flask(video_base64):
    flask_url = "http://localhost:5000/receive_video"
    
    payload = {
        "video": video_base64
    }
    
    headers = {
        'Content-Type': 'application/json'
    }
    
    response = requests.post(flask_url, headers=headers, json=payload)
    print("Flask response:", response.text)

# Read the video file
with open("show.mp4", "rb") as video_file:
    video_bytes = video_file.read()

# Encode the video file to base64
video_base64 = base64.b64encode(video_bytes).decode('utf-8')

# Set the client number (1 or 2)
client_number = 1

# Create threads for sending to Triton and Flask
triton_thread = threading.Thread(target=send_to_triton, args=(video_base64, client_number))
flask_thread = threading.Thread(target=send_to_flask, args=(video_base64,))

# Start both threads
triton_thread.start()
flask_thread.start()

# Wait for both threads to complete
triton_thread.join()
flask_thread.join()

print("Video sent to both Triton and Flask")


Flask response: {
  "message": "Video successfully received and saved."
}

Triton response: {'model_name': 'violation', 'model_version': '1', 'outputs': [{'name': 'OUTPUT0', 'datatype': 'INT32', 'shape': [1], 'data': [1]}]}
Video sent to both Triton and Flask


# frame output docker volume

In [11]:
!docker run -d --shm-size=5G --gpus all \
  -p 8000:8000 -p 8001:8001 -p 8002:8002 \
  -v $PWD:/mnt/data/model_repository \
  -v ~/saved_frames:/opt/tritonserver/xgb_capt_3 \
  --network bridge \
  -e TZ=Asia/Kolkata \
  custom_triton \
  tritonserver \
  --model-repository=/mnt/data/model_repository \
  --log-verbose=1

e14e6741f0a79dc88bb11f1869e46197bb3e467df71274488ba6a8b1407cb512
