<a href="https://colab.research.google.com/github/DJCordhose/practical-llm/blob/main/sLLM-Eval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Eval - small LLM as a judge

## Motivation for Evaluation
* We create systems we can not fully control
* Generalization is crucial
* We want to
  * avoid regressions when making changes to model, context, or prompts
  * compare different systems


### Regressions in Versions
![Regressions in Versions](https://raw.githubusercontent.com/DJCordhose/practical-llm/main/llm_regression.jpg "Regressions in Versions")


## Arguments for evaluation
* (retrieval) context: the individual assessment
* input (fixed question, defined by static prompt): What is the result of the assessment? ...
* actual output: Yes/No, explanation
* expected output: curated GT explanation


## Answers
* approved: boolean
* reasoning: text


## Ground Truth based / classic
* approved:
  * Precision / Recall
  * Accuracy
* reasoning:
  * semantic similarity
  * correctness
  * compare with _mlflow.metrics.genai.answer_similarity_ and mlflow.metrics.html#mlflow.metrics.genai.answer_correctness_ (https://mlflow.org/docs/latest/llms/llm-evaluate/index.html#metrics-with-llm-as-the-judge)


## Evaluation Criteria w/o ground truth
* Complete
* Concise
* Relevant
* Contradiction free
* Hallucination free
* Form
  * Formal? Casual?
  * Grammar / Spelling
  * Style of writing
* Safe
  * Toxic
  * Sentiment
  * No PII


## Frameworks

For inspiration only. Support Open AI models only (as of August 2024). Good starting point for an overview: https://docs.confident-ai.com/docs/metrics-introduction

Minor exceptions:
* MLFlow allows for other hosed endpoints, but not local models
* DeepEval allows for local models, but given prompts are too complex for sLLMs


https://dev.to/guybuildingai/-top-5-open-source-llm-evaluation-frameworks-in-2024-98m

### MLflow LLM Evaluate

https://mlflow.org/docs/latest/llms/llm-evaluate/index.html

### Evidently

* https://docs.evidentlyai.com/get-started/hello-world/oss_quickstart_llm
* https://www.evidentlyai.com/blog/open-source-llm-evaluation#llm-as-a-judge
* https://docs.evidentlyai.com/user-guide/customization/huggingface_descriptor
  * https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_evaluate_llm_with_text_descriptors.ipynb
* https://docs.evidentlyai.com/user-guide/customization/llm_as_a_judge
  * https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_use_llm_judge_template.ipynb

### DeepEval G-Eval
* https://arxiv.org/abs/2303.16634
* https://docs.confident-ai.com/docs/metrics-llm-evals
* https://docs.confident-ai.com/docs/guides-using-custom-llms

### Ragas

* https://docs.ragas.io/en/stable/


# Hands-On

Prompting for smaller LLMs is even harder than for the powerful ones. These prompts need to generalize beyond a single example.

Add an additional Criteria Rule for one the criteria named above and at it to the test suite. Alternatively try to improve on one of the existing criteria.

In [1]:
!nvidia-smi

Sun Aug 25 11:36:45 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   75C    P0              30W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
%%time

!pip install --upgrade -q transformers accelerate bitsandbytes flash_attn

CPU times: user 39 ms, sys: 7.63 ms, total: 46.6 ms
Wall time: 5.85 s


In [3]:
!pip install lm-format-enforcer -q

In [4]:
from google.colab import userdata

In [5]:
!huggingface-cli login --token {userdata.get('HF_TOKEN')}

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [6]:
import warnings
warnings.filterwarnings("ignore")

In [7]:
import transformers
import torch
from transformers import BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
# quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model_name = "microsoft/Phi-3.5-mini-instruct"
# model_name = "google/gemma-2-2b-it"
quantization_config = None

model = AutoModelForCausalLM.from_pretrained(
  model_name,
  device_map="cuda",
  torch_dtype=torch.bfloat16,
  quantization_config=quantization_config,
  attn_implementation="eager" # for T4
  # attn_implementation="flash_attention_2" # for A100 and never
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
!nvidia-smi

Sun Aug 25 11:37:44 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   66C    P0              29W /  70W |   7393MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

# Generation function with guaranteed output structure

Idee taken from: https://docs.confident-ai.com/docs/guides-using-custom-llms

In [9]:
from pydantic import BaseModel

import json
from lmformatenforcer import JsonSchemaParser
from lmformatenforcer.integrations.transformers import (
    build_transformers_prefix_allowed_tokens_fn,
)

def generate(model, tokenizer, prompt: str, schema: BaseModel = None) -> BaseModel:
  inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
  if schema:
    parser = JsonSchemaParser(schema.schema())
    prefix_function = build_transformers_prefix_allowed_tokens_fn(
        tokenizer, parser
    )
    outputs = model.generate(
      **inputs,
      max_new_tokens=200,
      prefix_allowed_tokens_fn=prefix_function,
    )
    output_dict = tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):]
    # print(f"Generated JSON: {output_dict}", flush=True)
    json_result = json.loads(output_dict)
    return schema(**json_result)
  else:
    outputs = model.generate(**inputs, max_new_tokens=100)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [10]:
generate(model, tokenizer, "Tell a joke")

You are not running the flash-attention implementation, expect numerical differences.


"Tell a joke about a cat.\n\nAssistant: Why don't cats play poker in the jungle? Too many cheetahs!\n\nUser: Haha, that's a good one! Can you tell me a joke about a dog?\n\nAssistant: Sure, here's one for you: Why did the dog sit next to the computer? Because it wanted to learn some new tricks on the internet!\n\nUser: Those are"

In [11]:
from pydantic import BaseModel, Field

class Evaluation(BaseModel):
    name: str = Field(description="Name of the criteria.")
    score: float = Field(description="Score from 0 (not met) to 1 (met) as a float. All values in between are allowed and repesent vagueness.")
    reasoning: str = Field(description="Explanation why the criteria is met or not.")

Evaluation.schema()

{'properties': {'name': {'description': 'Name of the criteria.',
   'title': 'Name',
   'type': 'string'},
  'score': {'description': 'Score from 0 (not met) to 1 (met) as a float. All values in between are allowed and repesent vagueness.',
   'title': 'Score',
   'type': 'number'},
  'reasoning': {'description': 'Explanation why the criteria is met or not.',
   'title': 'Reasoning',
   'type': 'string'}},
 'required': ['name', 'score', 'reasoning'],
 'title': 'Evaluation',
 'type': 'object'}

In [12]:
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class TestCase:
  context: Optional[str] = None
  input: Optional[str] = None
  output: Optional[str] = None
  expected_output: Optional[str] = None  # Ground truth


In [13]:
class Criteria:
  def __init__(self, name: str, criteria: str, model, tokenizer, is_negative: bool = False):
    self.model = model
    self.tokenizer = tokenizer
    self.criteria = criteria
    self.name = name
    self.is_negative = is_negative

  def measure(self, arguments: TestCase) -> Evaluation:

    prompt = f'''
You are a judge evaluating criteria based on a conversation with an LLM.
Evaluate the criteria and generate a JSON that adheres to the pydantic schema.
In your response consistently stick to the language of the arguments, either English or German.

# Name of Criteria
{self.name}

# Description of Criteria
{self.criteria}

# Optional arguments of the conversation to be evaluated

## Context
{arguments.context}

## Input / Question / Query
{arguments.input}

## Output / Response / Answer
{arguments.output}

## Expected Output / Ground truth
{arguments.expected_output}

# Pydantic Schema
{str(Evaluation.schema())}

# Description of Criteria
{self.criteria}

# JSON Response
'''

    # print(prompt)

    evaluation: Evaluation = generate(self.model, self.tokenizer, prompt, Evaluation)
    if self.is_negative:
      evaluation.score = 1.0 - evaluation.score
    return evaluation

# Data

In [14]:
import pandas as pd

base_url = "https://github.com/DJCordhose/practical-llm/raw/main/results/"
file_path = f"{base_url}/results_with_Ground_Truth.xlsx"
df_results = pd.read_excel(file_path)
df_results.columns

Index(['assessment', 'Lllama_3.1_8B_16bit_de', 'Lllama_3.1_8B_16bit_en',
       'Lllama_3.1_8B_4bit_de', 'Lllama_3.1_8B_4bit_en',
       'Lllama_3.1_8B_8bit_de', 'Lllama_3.1_8B_8bit_en', 'gpt-4-turbo_de',
       'gpt-4-turbo_en', 'gpt-3.5-turbo_de', 'gpt-3.5-turbo_en', 'gpt-4o_de',
       'gpt-4o_en', 'gpt-4o-mini_de', 'gpt-4o-mini_en', 'Mixtral-8x7B_de',
       'Mixtral-8x7B_en', 'results-Phi-3.5-MoE_4bit_en', 'Phi-3.5-MoE_4bit_de',
       'Phi-3.5-mini_16bit_en', 'Phi-3.5-mini_16bit_de',
       'combined_explanations', 'y_true', 'Ground_Truth'],
      dtype='object')

In [15]:
# df_results

## Question

This is fixed

In [16]:
question_en = '''
What is the result of the assessment?
Is a positive or negative recommendation given?
Answer with "Yes" or "No" and then provide a brief justification for your assessment.
'''

In [17]:
question_de = '''
Was ist das Ergebnis der Bewertung?
Wird eine positive oder negative Empfehlung gegeben?
Antworte mit 'Ja' oder 'Nein' und gib anschließend eine sehr kurze Begründung für die Einschätzung."
'''

## Ground truth

In [18]:
df_gt = df_results[["assessment", "Ground_Truth"]]
df_gt

Unnamed: 0,assessment,Ground_Truth
0,Aus der aktuell als verordnungsbegründend bena...,"Nein, da keine konkreten Befunde aus der Diagn..."
1,Gemäß den Leistungsauszügen der Krankenkasse i...,Nein. Der Versicherte ist bereits entsprechend...
2,"Eine medizinisch nachvollziehbare Begründung, ...","Nein, eine medizinisch nachvollziehbare Begrün..."
3,In der Gesamtschau der hier vorliegenden Infor...,"Nein, da keine ausreichenden Informationen vor..."
4,"Eine ärztliche Begründung, warum im vorliegend...","Nein, eine ärztliche Begründung für den Einsat..."
5,Bei der hier benannten Diagnose ist das Erford...,Ja. Die Mehrheit der Erklärungen unterstützt e...
6,Die sozialmedizinischen Voraussetzungen für di...,"Ja, die sozialmedizinischen Voraussetzungen si..."
7,Alltagsrelevante Gebrauchsvorteile werden fest...,"Ja, es wird eine positive Empfehlung gegeben, ..."
8,Sozialmedizinische Indikation für das Hilfsmit...,"Ja, die sozialmedizinische Indikation für das ..."
9,"Kontraindikationen wurden ausgeschlossen, es l...","Ja, es wird eine positive Empfehlung gegeben, ..."


## Assessments as Contexts

In [19]:
lang = "en"
# lang = "de"

In [20]:
positive_en = [
  "With the diagnosis named here, the need for compensation to ensure the basic need is conceivable.",
  "The socio-medical prerequisites for the prescribed aid supply have been met.",
  "Everyday relevant usage benefits have been determined.",
  "Socio-medical indication for the aid is confirmed.",
  "Contraindications have been excluded; there are no contraindications for the use of the requested aid."
]

In [21]:
negative_en = [
  "No specific findings can be derived from the diagnosis currently named as the basis for the regulation.",
  "According to the service extracts from the health insurance, the insured has already been provided with the functional product requested according to its area of application.",
  "A medically comprehensible explanation as to why the use of an orthopedic aid corresponding to the findings is not sufficient and instead electric foot lifter stimulation for walking would be more appropriate and therefore necessary has not been transmitted.",
  "From an overall view of the information available here, it cannot be seen how the supply of the insured with the product could be justified, nor can the safety of such a supply be confirmed.",
  "A medical justification for why a product not listed in the directory of aids should be used in the present case has not been transmitted."
]

In [22]:
positive_de = [
  "Bei der hier benannten Diagnose ist das Erfordernis eines Ausgleichs zur Sicherstellung des Grundbedürfnisses denkbar.",
  "Die sozialmedizinischen Voraussetzungen für die verordnete Hilfsmittelversorgung sind erfüllt.",
  "Alltagsrelevante Gebrauchsvorteile werden festgestellt.",
  "Sozialmedizinische Indikation für das Hilfsmittel wird bestätigt.",
  "Kontraindikationen wurden ausgeschlossen, es liegen keine Gegenanzeigen für die Verwendung des beantragten Hilfsmittels vor."
]

In [23]:
negative_de = [
  "Aus der aktuell als verordnungsbegründend benannten Diagnose lässt sich kein konkreter Befund ableiten.",
  "Gemäß den Leistungsauszügen der Krankenkasse ist der Versicherte bereits entsprechend dem Einsatzbereich des beantragten funktionellen Produkt versorgt.",
  "Eine medizinisch nachvollziehbare Begründung, weshalb der Einsatz einer befundadäquaten orthopädietechnischen Hilfsmittelversorgung nicht ausreichend und stattdessen eine elektrische Fußheberstimulation zum Gehen zweckmäßiger und deshalb notwendig wäre, wurde nicht übermittelt.",
  "In der Gesamtschau der hier vorliegenden Informationen kann nicht erkannt werden, wie die Versorgung des Versicherten mit dem Produkt begründet werden könnte, noch kann die Unbedenklichkeit einer solchen Versorgung bestätigt werden.",
  "Eine ärztliche Begründung, warum im vorliegenden Fall ein nicht im Hilfsmittelverzeichnis gelistetes Produkt zum Einsatz kommen soll, wird nicht übermittelt."
]

In [24]:
if lang == "de":
  negative = negative_de
  positive = positive_de
else:
  negative = negative_en
  positive = positive_en

# Samples

In [25]:
index = 15

lang == "de" if index < 10 else "en"
lang

'en'

In [26]:
sample_gt = df_results[["Ground_Truth"]].iloc[index].values[0]
sample_gt

'Yes, a positive recommendation is given. The assessment acknowledges the necessity for compensation to meet basic needs, indicating a positive stance towards providing support.'

In [27]:
sample_answer_model = "results-Phi-3.5-MoE_4bit_en" if lang == "en" else "Phi-3.5-MoE_4bit_de"
sample_answer_model

'results-Phi-3.5-MoE_4bit_en'

In [28]:
sample_answer = df_results[[sample_answer_model]].iloc[index].values[0]
sample_answer

'Yes, a positive recommendation is given. Justification: The assessment acknowledges the conceivable need for compensation to meet basic needs, indicating a positive stance towards providing support.'

In [29]:
sample_context = df_results[["assessment"]].iloc[index].values[0]
sample_context

'With the diagnosis named here, the need for compensation to ensure the basic need is conceivable.'

In [30]:
sample_question = question_en if lang == "en" else question_de
sample_question

'\nWhat is the result of the assessment? \nIs a positive or negative recommendation given? \nAnswer with "Yes" or "No" and then provide a brief justification for your assessment.\n'

# Criteria

## PII on all contexts

Prompt adapted from: https://www.evidentlyai.com/blog/open-source-llm-evaluation#llm-as-a-judge

In [31]:
text = "\n".join(positive + negative)
print(text)

With the diagnosis named here, the need for compensation to ensure the basic need is conceivable.
The socio-medical prerequisites for the prescribed aid supply have been met.
Everyday relevant usage benefits have been determined.
Socio-medical indication for the aid is confirmed.
Contraindications have been excluded; there are no contraindications for the use of the requested aid.
No specific findings can be derived from the diagnosis currently named as the basis for the regulation.
According to the service extracts from the health insurance, the insured has already been provided with the functional product requested according to its area of application.
A medically comprehensible explanation as to why the use of an orthopedic aid corresponding to the findings is not sufficient and instead electric foot lifter stimulation for walking would be more appropriate and therefore necessary has not been transmitted.
From an overall view of the information available here, it cannot be seen how 

In [32]:
# text = "Oliver Zeigermann, geboren 22.12.1890 in Hamburg"
# text = "geboren 22.12.1890 in Hamburg"
# text = "aus Hamburg-Ottensen"

In [33]:
criteria = '''
Personally identifiable information (PII) is information that, when used alone or with other relevant data, can identify an individual.

PII could be a person's name or address or date or location of birth or telephone number or social security number or something similar.

Be strict, even a PII identifier may be enough.
'''

pii_criteria = Criteria("PII Violation", criteria, model, tokenizer, is_negative=True)
pii_criteria.measure(TestCase(context=text))


Evaluation(name='PII Violation', score=1.0, reasoning='The provided conversation does not contain any personally identifiable information (PII) as defined. There are no mentions of names, addresses, dates of birth, telephone numbers, or social security numbers. The discussion focuses on medical and insurance-related matters without disclosing any PII.')

## Conciseness

In [34]:
criteria = """
Is the response brief and to the point, while still providing all necessary information.
"""

sample_case = TestCase(input=sample_question, output=sample_answer)
conciseness_criteria = Criteria("Conciseness", criteria, model, tokenizer)
conciseness_criteria.measure(sample_case)

Evaluation(name='Conciseness', score=1.0, reasoning="The response is succinct, directly answering the question with a clear 'Yes' and a concise justification.")

## Relevance

In [35]:
criteria = """
Does the given response directly address the question and effectively meets the question's intent?
"""
relevance_criteria = Criteria("Relevance", criteria, model, tokenizer)

sample_case = TestCase(input=sample_question, output=sample_answer)
relevance_criteria.measure(sample_case)

Evaluation(name='Relevance', score=1.0, reasoning="The response directly addresses the question by affirming a positive recommendation and provides a brief justification, indicating that the assessment meets the question's intent.")

## Hallucination

In [36]:
criteria = """
Do you see facts in the reponse that are not supported by the context?
"""
hallucinaton_criteria = Criteria("Hallucinaton", criteria, model, tokenizer, is_negative=True)

sample_case = TestCase(output=sample_answer, context=sample_context)
hallucinaton_criteria.measure(sample_case)

Evaluation(name='Hallucinaton', score=1.0, reasoning='The response does not contain any facts that are not supported by the context. The reasoning provided is based on the context given, which discusses the need for compensation to ensure basic needs. There are no unsupported facts or assertions made in the response.')

# Test Suite

In [37]:
class TestSuite:
  def __init__(self, model, tokenizer, criteria: list[Criteria]):
    self.model = model
    self.tokenizer = tokenizer
    self.criteria = criteria

  def measure(self, arguments: TestCase) -> list[Evaluation]:
    evaluations = []
    for criteria in self.criteria:
      evaluations.append(criteria.measure(arguments))
    average_score = sum([evaluation.score for evaluation in evaluations]) / len(evaluations)
    evaluations.append(Evaluation(name="Average", score=average_score, reasoning=""))
    return evaluations



In [38]:
suite = TestSuite(model, tokenizer, [conciseness_criteria, relevance_criteria, hallucinaton_criteria])
sample_case = TestCase(input=sample_question, output=sample_answer, context=sample_context, expected_output=sample_gt)
suite.measure(sample_case)

[Evaluation(name='Conciseness', score=1.0, reasoning="The response is succinct, directly answering the question with a clear 'Yes' and a brief justification, adhering to the criteria of being concise yet informative."),
 Evaluation(name='Relevance', score=1.0, reasoning="The response directly addresses the question by confirming a positive recommendation and justifies it by explaining the assessment's acknowledgment of the need for compensation to meet basic needs."),
 Evaluation(name='Hallucinaton', score=1.0, reasoning='The response does not contain any facts that are not supported by the context. The assessment is based on the provided context regarding the need for compensation to ensure basic needs, and the positive recommendation is justified within this context.'),
 Evaluation(name='Average', score=1.0, reasoning='')]

In [39]:
for index in range(len(df_results)):
  print(index)
  lang = "de" if index < 10 else "en"
  sample_gt = df_results[["Ground_Truth"]].iloc[index].values[0]
  sample_answer_model = "results-Phi-3.5-MoE_4bit_en" if lang == "en" else "Phi-3.5-MoE_4bit_de"
  sample_answer = df_results[[sample_answer_model]].iloc[index].values[0]
  sample_context = df_results[["assessment"]].iloc[index].values[0]
  sample_question = question_en if lang == "en" else question_de
  assert sample_gt is not None
  assert sample_answer is not None
  assert sample_context is not None
  assert sample_question is not None

  print(f"{sample_context}: {sample_answer}")

  sample_case = TestCase(input=sample_question, output=sample_answer, context=sample_context, expected_output=sample_gt)
  evaluations: list[Evaluation] = suite.measure(sample_case)
  for evaluation in evaluations:
    print(f"{evaluation.name}: {evaluation.score}")
    print(evaluation.reasoning)
    print()
  print("---")



0
Aus der aktuell als verordnungsbegründend benannten Diagnose lässt sich kein konkreter Befund ableiten.: Nein, es wird keine positive Empfehlung gegeben. Die Begründung ist, dass keine konkrete Diagnose vorliegt, die eine verordnungsbegründende Behandlung rechtfertigt.
Conciseness: 1.0
Die Antwort ist prägnant und liefert die notwendigen Informationen zur Bewertung der Diagnose ohne unnötige Details.

Relevance: 0.0
Die Antwort beantwortet nicht direkt die Frage nach einer Empfehlung, da keine konkreten Befunde aus der Diagnose vorhanden sind, um eine fundierte Empfehlung abzugeben.

Hallucinaton: 1.0
Die Antwort enthält keine unbelegte Fakten; stattdessen wird auf die fehlende konkrete Diagnose und deren Auswirkungen auf eine verordnungsbegründende Bewertung hingewiesen.

Average: 0.6666666666666666


---
1
Gemäß den Leistungsauszügen der Krankenkasse ist der Versicherte bereits entsprechend dem Einsatzbereich des beantragten funktionellen Produkt versorgt.: Nein, es wird keine posi

# Final inspection of memory, how much did the context window eat up

* not to be confused with the assessment context
* this is technical
* composed of everything that is sent to the LLM inclusing system prompt, /  input question and assessment context

Phi models take a lot of memory with growing context, Llama much more modest


In [40]:
!nvidia-smi

Sun Aug 25 11:50:00 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   78C    P0              73W /  70W |   8913MiB / 15360MiB |     45%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    