<a href="https://colab.research.google.com/github/DJCordhose/practical-llm/blob/main/Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi Lingual Medical Assessment Classification

## LLama 3.1 8B
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

### Measurements

#### 4 Bit T4
* Negatives: 35 s
* Positives: 38 s

#### 8 Bit T4
* Negatives: 23 s
* Positives: 23 s

#### Full Res L4
* Negatives: 12.5 s
* Positives: 12.5 s

## Phi 3.5 MoE
* https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/discover-the-new-multi-lingual-high-quality-phi-3-5-slms/ba-p/4225280
* https://huggingface.co/microsoft/Phi-3.5-mini-instruct
* https://huggingface.co/microsoft/Phi-3.5-MoE-instruct

### Measurements

#### 4 Bit A100
* Negatives: 50 s
* Positives: 50 s


# Hands-on: Inference on commodity hardware

**Run the assessment classification on the largest Llama 3.1 model possible**

1. Get a very subjective impression: Do you like the results in terms of e.g.
   1. Quality of language
   1. Halluzination
   1. Accuracy
   1. Speed
   1. Memory consumption
1. If more than one Llama 3.1 model actually runs, subjectively compare the results

## Optional
1. Compare to Phi Mini
1. Try a different langauge (German is prepared)

## Links
- How much memory does Llama 3.1 need for weights and cache (depending on the actually used context length) https://huggingface.co/blog/llama31#inference-memory-requirements
- Detailed, but still comprehensive explanation of how inference in LLMs works: https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/

In [1]:
%%time

!pip install --upgrade -q transformers accelerate flash_attn torch bitsandbytes

CPU times: user 13.5 ms, sys: 8.41 ms, total: 21.9 ms
Wall time: 2.91 s


In [2]:
from google.colab import userdata

In [3]:
# Configure HuggingFace token as a Colab Secret, use key symbol on the left panel
!huggingface-cli login --token {userdata.get('HF_TOKEN')}

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [4]:
import warnings
warnings.filterwarnings("ignore")

In [5]:
from IPython.display import Markdown

In [6]:
!nvidia-smi

Sun Aug 25 15:15:05 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [7]:
# kind = 'Lllama_3.1_8B_4bit'
# kind = 'Lllama_3.1_8B_8bit'
kind = 'Lllama_3.1_8B_16bit'
# kind = 'Phi-3.5-MoE_4bit'
# kind = "Phi-3.5-mini_16bit"

# lang = "de"
lang = "en"

In [8]:
if "Lllama_3.1_8B" in kind:
  model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
elif "Phi-3.5-MoE" in kind:
  model_id = "microsoft/Phi-3.5-MoE-instruct"
else:
  model_id = "microsoft/Phi-3.5-mini-instruct"

print(model_id)

meta-llama/Meta-Llama-3.1-8B-Instruct


In [9]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

***note:*** execute in a termial 'watch -n 0.5 nvidia-smi' to see the GPU usage and when the model is loaded onto it

In [10]:
%%time

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

torch_dtype = None
quantization_config = None


if "8bit" in kind:
  print("Using 8Bit quantization")
  quantization_config = BitsAndBytesConfig(load_in_8bit=True)
elif "4bit" in kind:
  print("Using 4Bit quantization")
  quantization_config = BitsAndBytesConfig(load_in_4bit=True)
else:
  print("Using Full Resolution")
  torch_dtype = torch.bfloat16

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    torch_dtype=torch_dtype,
    device_map="cuda",
    trust_remote_code=True
)

Using 8Bit quantization


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

CPU times: user 24 s, sys: 10.8 s, total: 34.8 s
Wall time: 13.2 s


In [11]:
!nvidia-smi

Sun Aug 25 15:15:22 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0              25W /  70W |   8825MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [12]:
positive_en = [
  "With the diagnosis named here, the need for compensation to ensure the basic need is conceivable.",
  "The socio-medical prerequisites for the prescribed aid supply have been met.",
  "Everyday relevant usage benefits have been determined.",
  "Socio-medical indication for the aid is confirmed.",
  "Contraindications have been excluded; there are no contraindications for the use of the requested aid."
]

In [13]:
negative_en = [
  "No specific findings can be derived from the diagnosis currently named as the basis for the regulation.",
  "According to the service extracts from the health insurance, the insured has already been provided with the functional product requested according to its area of application.",
  "A medically comprehensible explanation as to why the use of an orthopedic aid corresponding to the findings is not sufficient and instead electric foot lifter stimulation for walking would be more appropriate and therefore necessary has not been transmitted.",
  "From an overall view of the information available here, it cannot be seen how the supply of the insured with the product could be justified, nor can the safety of such a supply be confirmed.",
  "A medical justification for why a product not listed in the directory of aids should be used in the present case has not been transmitted."
]

In [14]:
positive_de = [
  "Bei der hier benannten Diagnose ist das Erfordernis eines Ausgleichs zur Sicherstellung des Grundbedürfnisses denkbar.",
  "Die sozialmedizinischen Voraussetzungen für die verordnete Hilfsmittelversorgung sind erfüllt.",
  "Alltagsrelevante Gebrauchsvorteile werden festgestellt.",
  "Sozialmedizinische Indikation für das Hilfsmittel wird bestätigt.",
  "Kontraindikationen wurden ausgeschlossen, es liegen keine Gegenanzeigen für die Verwendung des beantragten Hilfsmittels vor."
]

In [15]:
negative_de = [
  "Aus der aktuell als verordnungsbegründend benannten Diagnose lässt sich kein konkreter Befund ableiten.",
  "Gemäß den Leistungsauszügen der Krankenkasse ist der Versicherte bereits entsprechend dem Einsatzbereich des beantragten funktionellen Produkt versorgt.",
  "Eine medizinisch nachvollziehbare Begründung, weshalb der Einsatz einer befundadäquaten orthopädietechnischen Hilfsmittelversorgung nicht ausreichend und stattdessen eine elektrische Fußheberstimulation zum Gehen zweckmäßiger und deshalb notwendig wäre, wurde nicht übermittelt.",
  "In der Gesamtschau der hier vorliegenden Informationen kann nicht erkannt werden, wie die Versorgung des Versicherten mit dem Produkt begründet werden könnte, noch kann die Unbedenklichkeit einer solchen Versorgung bestätigt werden.",
  "Eine ärztliche Begründung, warum im vorliegenden Fall ein nicht im Hilfsmittelverzeichnis gelistetes Produkt zum Einsatz kommen soll, wird nicht übermittelt."
]

In [16]:

if lang == "de":
  negative = negative_de
  positive = positive_de
else:
  negative = negative_en
  positive = positive_en



In [17]:
assessment = negative[0]
# assessment = positive[0]

In [18]:
%%time

if lang == "de":
  messages = [
    {"role": "system", "content": "Du bist ein kompetenter Experte auf dem Gebiet der gesetzlichen Krankenversicherung und sprichst Deutsch. Antworte präzise, ernst und formell."},
    {"role": "user", "content": f'''
    Was ist das Ergebnis der Bewertung? Wird eine positive oder negative Empfehlung gegeben? Antworte mit 'Ja' oder 'Nein' und gib anschließend eine sehr kurze Begründung für die Einschätzung."

    # Assessment
    {assessment}

    '''}
  ]
else:
  messages = [
    {"role": "system", "content": "You are an English-speaking, competent expert in the field of statutory health insurance. Answer consice, serious and formal."},
    {"role": "user", "content": f'''
What is the result of the assessment? Is a positive or negative recommendation given? Answer with "Yes" or "No" and then provide a brief justification for your assessment.

# Assessment
{assessment}

'''}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=terminators,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=False
)
response = outputs[0][input_ids.shape[-1]:]
Markdown(tokenizer.decode(response, skip_special_tokens=True))

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


CPU times: user 10.9 s, sys: 327 ms, total: 11.2 s
Wall time: 10.3 s


No

The assessment indicates that there are no specific findings that can be derived from the diagnosis, implying that the diagnosis does not provide sufficient information to make a recommendation.

In [19]:
def eval_assessment(assessment):
  if lang == "de":
    yes = "Ja"
    no = "Nein"
    messages = [
  {"role": "system", "content": "Du bist ein kompetenter Experte auf dem Gebiet der gesetzlichen Krankenversicherung und sprichst Deutsch. Antworte präzise, ernst und formell."},
  {"role": "user", "content": f'''
  Was ist das Ergebnis der Bewertung? Wird eine positive oder negative Empfehlung gegeben? Antworte mit 'Ja' oder 'Nein' und gib anschließend eine sehr kurze Begründung für die Einschätzung."

  # Assessment
  {assessment}

  '''}
  ]
  else:
    yes = "Yes"
    no = "No"
    messages = [
      {"role": "system", "content": "You are an English-speaking, competent expert in the field of statutory health insurance. Answer consice, serious and formal."},
      {"role": "user", "content": f'''
  What is the result of the assessment? Is a positive or negative recommendation given? Answer with "Yes" or "No" and then provide a brief justification for your assessment.

  # Assessment
  {assessment}

  '''}
  ]

  input_ids = tokenizer.apply_chat_template(
      messages,
      add_generation_prompt=True,
      return_tensors="pt"
  ).to(model.device)

  terminators = [
      tokenizer.eos_token_id,
      tokenizer.convert_tokens_to_ids("<|eot_id|>")
  ]

  outputs = model.generate(
      input_ids,
      max_new_tokens=512,
      eos_token_id=terminators,
      pad_token_id=tokenizer.eos_token_id,
      do_sample=False
  )
  response = outputs[0][input_ids.shape[-1]:]
  result = tokenizer.decode(response, skip_special_tokens=True)
  if result.startswith(yes):
    return "Positive", result
  elif result.startswith(no):
    return "Negative", result
  else:
    return "Neutral", result

## Negative

In [20]:
%%time

negative_results = []
negative_explanations = []

for assessment in negative:
  print(f"Assessment: {assessment}")
  result, explanation = eval_assessment(assessment)
  negative_results.append(result)
  negative_explanations.append(explanation)
  print(f"{result}: {explanation}")
  print("-----")

Assessment: No specific findings can be derived from the diagnosis currently named as the basis for the regulation.
Negative: No

The assessment indicates that there are no specific findings that can be derived from the diagnosis, implying that the diagnosis does not provide sufficient information to make a recommendation.
-----
Assessment: According to the service extracts from the health insurance, the insured has already been provided with the functional product requested according to its area of application.
Negative: No

Justification: The assessment indicates that the insured has already received the requested functional product, implying that the need for it has been met. Therefore, a negative recommendation is likely, as there is no further requirement for the product.
-----
Assessment: A medically comprehensible explanation as to why the use of an orthopedic aid corresponding to the findings is not sufficient and instead electric foot lifter stimulation for walking would be mo

## Positive

In [21]:
%%time

positive_results = []
positive_explanations = []

for assessment in positive:
  print(f"Assessment: {assessment}")
  result, explanation = eval_assessment(assessment)
  positive_results.append(result)
  positive_explanations.append(explanation)
  print(f"{result}: {explanation}")
  print("-----")

Assessment: With the diagnosis named here, the need for compensation to ensure the basic need is conceivable.
Negative: No

The assessment suggests that the need for compensation to ensure the basic need is conceivable, implying a potential requirement for financial support. However, this does not necessarily imply a positive or negative recommendation, but rather a consideration of the necessity for compensation.
-----
Assessment: The socio-medical prerequisites for the prescribed aid supply have been met.
Positive: Yes

The socio-medical prerequisites for the prescribed aid supply have been met, indicating that the individual is eligible for the aid supply based on their medical needs. This suggests a positive recommendation for the aid supply.
-----
Assessment: Everyday relevant usage benefits have been determined.
Negative: No

The assessment only mentions the determination of everyday relevant usage benefits, which is a step in the assessment process, but does not provide a conclu

In [22]:
!nvidia-smi

Sun Aug 25 15:16:50 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P0              35W /  70W |   9155MiB / 15360MiB |     38%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [23]:
import pandas as pd

df = pd.DataFrame({
    'assesment': negative + positive,
    'y_true': ['Negative'] * len(negative) + ['Positive'] * len(positive),
    'y_hat': negative_results + positive_results,
    'explanation': negative_explanations + positive_explanations
})
df

Unnamed: 0,assesment,y_true,y_hat,explanation
0,No specific findings can be derived from the d...,Negative,Negative,No\n\nThe assessment indicates that there are ...
1,According to the service extracts from the hea...,Negative,Negative,No\n\nJustification: The assessment indicates ...
2,A medically comprehensible explanation as to w...,Negative,Negative,No\n\nThe assessment indicates that a medicall...
3,From an overall view of the information availa...,Negative,Negative,No\n\nThe assessment indicates that the supply...
4,A medical justification for why a product not ...,Negative,Negative,No\n\nA medical justification is required to j...
5,"With the diagnosis named here, the need for co...",Positive,Negative,No\n\nThe assessment suggests that the need fo...
6,The socio-medical prerequisites for the prescr...,Positive,Positive,Yes\n\nThe socio-medical prerequisites for the...
7,Everyday relevant usage benefits have been det...,Positive,Negative,No\n\nThe assessment only mentions the determi...
8,Socio-medical indication for the aid is confir...,Positive,Positive,Yes\n\nThe assessment confirms a socio-medical...
9,Contraindications have been excluded; there ar...,Positive,Positive,Yes\n\nThe assessment indicates that there are...


In [24]:
df.to_excel(f'results_{kind}_{lang}.xlsx', index=False)

In [25]:
!ls -l

total 12
-rw-r--r-- 1 root root 6366 Aug 25 15:16 results_Lllama_3.1_8B_8bit_en.xlsx
drwxr-xr-x 1 root root 4096 Aug 22 13:24 sample_data
