# Note-to-FHIR
In Healthcare, an enormous amount of information is captured in clinical notes. Structuring this information for scientific research, administration and business intelligence in these notes is a labour-intensive task.

With the introduction of powerfull LLM's, we can automate this work.
HealthSage AI has fine-tuned an open source LLM adapter that understands how to convert natural language to FHIR R4. We now support the 10 most-used resource types and we are expanding our scope.

This notebook gives you a walkthrough on how to use HealthSage's Open Beta Note-to-FHIR LLM.

# Performance and limitations

### Scope of the model
This open sourced Beta model is trained within the following scope:
- FHIR R4
- 10 Resource types:
  1. Bundle
  2. Patient
  3. Encounter
  4. Practitioner
  5. Organization
  6. Immunization
  7. Observation
  8. Condition
  9. AllergyIntolerance
  10. Procedure.
- English language


### The following features are out of scope of the current release:
- Support for Coding systems such as SNOMED CT and Loinc.
- FHIR extensions and profiles
- Any language, resource type or FHIR version not mentioned under "in scope".

### Furthermore, please note:
- **No Relative dates:** HealthSage AI Note-to-FHIR will not provide accurate FHIR datetime fields based on text that contains relative time information like "today" or "yesterday". Furthermore, relative dates like "Patient John Doe is 50 years old." will not result in an accurate birthdate estimation, since the precise birthday and -month is unknown, and since the LLM is not aware of the current date.
- **Designed as Patient-centric:** HealthSage AI Note-to-FHIR is trained on notes describing one patient each.
- **<4k Context window:** The training data for this application contained at most 3686 tokens, which is 90% of the context window for Llama-2 (4096)
- **Explicit Null:** If a certain FHIR element is not present in the provided text, it is explicitely predicted as NULL. Explicitely modeling the absence of information reduces the chance of hallucinations.
- **Uses Bundles:** For consistency and simplicity, all predicted FHIR resources are Bundled.
- **Conservative estimates:** Our model is designed to stick to the information explicitely provided in the text.
- **ID's are local:** ID fields and references are local enumarations (1,2,3, etc.). They are not yet tested on referential correctness.
- **Generation design:** The model is designed to generate a seperate resource if there is information about that resource in the text beyond what can be described in reference fields of related resources.
- **This Beta application is still in early development:** Our preliminary results suggest that HealthSage AI Note-to-FHIR is superior to the GPT-4 foundation model within the scope of our application in terms of Fhir Syntax and ability to replicate the original FHIR resources in our test dataset. However, our model is still being analyzed on its performance for out-of-distribution data and out-of-scope data.

First, install the required libraries

In [1]:
!pip install fhir.resources torch==1.13.1 pytorch-gpu dev-gpt einops accelerate xformers bitsandbytes pandas numpy matplotlib datasets torchvision peft trl transformers



# How to convert clinical notes to FHIR R4

In [2]:
from transformers import pipeline
from datasets import load_dataset
import torch
from torch import cuda
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer
from peft import LoraConfig, get_peft_model
import os
import sys
from datasets import load_dataset
import json
import pandas as pd
from google.colab import userdata
import json

# Login to Huggingface Hub and load the dataset

In [3]:
login_str = f"huggingface-cli login --token={userdata.get('huggingface_token')}"
os.system(login_str)

0

In [4]:
dataset_name = 'healthsageai/fhir-to-note'  # FHIR dataset
dataset = load_dataset(dataset_name)

# Inspecting the first test sample
This is the first sample in our testset. It is written in a formal sentence style in lower case. Our synthetic clinical notes contain various writing styles to maximize the generalizability of the healthsage model.

In [5]:
print("Clinical Note: \n\n",dataset['test']['note'][0])

Clinical Note: 

 the information pertains to a collection of medical records associated with mr. chase olson. 

on the 17th of january, 2023, mr. olson visited dr. jude reynolds. the consultation commenced at 9:42 pm and concluded at 9:57 pm, as per the central european time. the nature of mr. olson's visit classified as an ambulatory patient encounter.

additional background information about mr. olson: he was born on the 29th of november, 1966, which makes him a male of 57 years of age. he can be contacted through his home phone number 555-770-7639. he resides at 792 schumm fork unit 62, palmer, massachusetts, 01069, us. his communication preference is english, specifically the version spoken in the united states.

regarding mr. olson’s health, it's important to mention an active and confirmed health condition he endures. he has an allergy towards a certain, unspecified substance, categorized as an environmental allergy.

all of this information represents critical aspects of mr. ol

Below you find the associated FHIR-R4 resource. FHIR is not easy to read but if you take a close look you can see the json string describes the clinical note in FHIR R4 language. Globally, the FHIR Bundle consists of a **Patient** (mr. Chase Olson), an **encounter** and an **allergy**.  

In [6]:
print("FHIR R4 representation:")
json.loads(dataset['test']['fhir'][0])

FHIR R4 representation:


{'resourceType': 'Bundle',
 'id': '1',
 'type': 'collection',
 'entry': [{'resource': {'resourceType': 'Encounter',
    'id': '1',
    'status': 'finished',
    'class': {'system': 'http://terminology.hl7.org/CodeSystem/v3-ActCode',
     'code': 'AMB'},
    'type': None,
    'subject': {'reference': 'Patient/1', 'display': 'Mr. Chase Olson'},
    'participant': [{'type': [{'coding': [{'system': 'http://terminology.hl7.org/CodeSystem/v3-ParticipationType',
          'code': 'PPRF',
          'display': 'primary performer'}],
        'text': 'primary performer'}],
      'period': {'start': '2023-01-17T21:42:36+01:00',
       'end': '2023-01-17T21:57:36+01:00'},
      'individual': {'reference': 'Practitioner/1',
       'display': 'Dr. Jude Reynolds'}}],
    'period': {'start': '2023-01-17T21:42:36+01:00',
     'end': '2023-01-17T21:57:36+01:00'},
    'reasonCode': None,
    'serviceProvider': None}},
  {'resource': {'resourceType': 'Patient',
    'id': '1',
    'name': [{'use': 'official

# Load Llama-2 and HealthSage AI note-to-FHIR-adapter

Our Note-to-FHIR is capable of doing this conversion automatically on a relatively T4 GPU. We do so by loading our LoRA adapter model on a quantized version of LLama-2-13b.

In [7]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

In [8]:
base_model_name = "meta-llama/Llama-2-13b-chat-hf"
model_name = "healthsageai/note-to-fhir-13b-adapter"

model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map='auto'
)

model.config.use_cache = False
model.load_adapter(model_name)

# LOAD TOKENIZER
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True, return_tensor="pt", padding=True)
tokenizer.pad_token = tokenizer.bos_token
tokenizer.padding_side = 'left'

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

For numerical stability:

In [9]:
for name, module in model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

# Combine the clinical note with instructions

This is the prompt template on which we add our clinical note

In [10]:
prompt = """[INST] <<SYS>>
INSTRUCTION
Translate the following clinical note into HL7 FHIR R4 Format.
- Do not insert any values that are not in the note.
- Do not infer or impute any values
- Only include information that is essential:
    - information that is in the clinical note
    - information that is mandatory for a valid FHIR resource.

OUTPUT FORMAT
Return the HL7 FHIR structured information as a json string. denote the start and end of the json with a markdown codeblock:
```json
[RESOURCE HERE]
```
<</SYS>>

CLINICAL NOTE
{note}

[/INST]
""".format(note=dataset['test']['note'][0])

In [11]:
print(prompt)

[INST] <<SYS>>
INSTRUCTION
Translate the following clinical note into HL7 FHIR R4 Format.
- Do not insert any values that are not in the note.
- Do not infer or impute any values
- Only include information that is essential:
    - information that is in the clinical note
    - information that is mandatory for a valid FHIR resource.

OUTPUT FORMAT
Return the HL7 FHIR structured information as a json string. denote the start and end of the json with a markdown codeblock:
```json
[RESOURCE HERE]
```
<</SYS>>

CLINICAL NOTE
the information pertains to a collection of medical records associated with mr. chase olson. 

on the 17th of january, 2023, mr. olson visited dr. jude reynolds. the consultation commenced at 9:42 pm and concluded at 9:57 pm, as per the central european time. the nature of mr. olson's visit classified as an ambulatory patient encounter.

additional background information about mr. olson: he was born on the 29th of november, 1966, which makes him a male of 57 years of a

# Inference
Now we ask our model to give us the original FHIR model back

In [12]:
generator = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    do_sample=False,
    eos_token_id=model.config.eos_token_id,
    max_length=4096)

In [13]:
with torch.autocast("cuda"):
  fhir_pred = generator(prompt)



In [14]:
fhir_pred = fhir_pred[0]['generated_text']

# Result
First we load the json string into a dictionary and then we remove any None values.
As you can see, the resulting FHIR closely resembles the original FHIR Bundle.
An Encounter, a Patient and an AllergyIntolerance Resource are created.

In [15]:
result = json.loads(fhir_pred.split("```")[3][4:].strip(" \t\n\r"))

In [16]:
for resource in result['entry']:
  key_values = [(k, v) for k, v in resource['resource'].items()]
  for k, v in key_values:
    if v is None:
      del resource['resource'][k]

In [17]:
result

{'resourceType': 'Bundle',
 'id': '1',
 'type': 'collection',
 'entry': [{'resource': {'resourceType': 'Encounter',
    'id': '1',
    'status': 'unknown',
    'class': {'system': 'http://terminology.hl7.org/CodeSystem/v3-ActCode',
     'code': 'AMB'},
    'subject': {'reference': 'Patient/1', 'display': 'Mr. Chase Olson'},
    'participant': [{'type': None,
      'period': {'start': '2023-01-17T21:42:00+01:00',
       'end': '2023-01-17T21:57:00+01:00'},
      'individual': {'reference': 'Practitioner/1',
       'display': 'Dr. Jude Reynolds'}}]}},
  {'resource': {'resourceType': 'Patient',
    'id': '1',
    'name': [{'use': 'official',
      'family': 'Olson',
      'given': ['Chase'],
      'prefix': ['Mr.']}],
    'telecom': [{'system': 'phone', 'value': '555-770-7639', 'use': 'home'}],
    'gender': 'male',
    'birthDate': '1966-11-29',
    'address': [{'line': ['792 Schumm Fork Unit 62'],
      'city': 'Palmer',
      'state': 'Massachusetts',
      'postalCode': '01069',
     

# Evaluation
Does the model produce valid FHIR?

In [18]:
from fhir.resources.R4B.bundle import Bundle

In [19]:
Bundle.parse_raw(json.dumps(result))

Bundle(resource_type='Bundle', fhir_comments=None, id='1', implicitRules=None, implicitRules__ext=None, language=None, language__ext=None, meta=None, entry=[BundleEntry(resource_type='BundleEntry', fhir_comments=None, extension=None, id=None, modifierExtension=None, fullUrl=None, fullUrl__ext=None, link=None, request=None, resource=Encounter(resource_type='Encounter', fhir_comments=None, id='1', implicitRules=None, implicitRules__ext=None, language=None, language__ext=None, meta=None, contained=None, extension=None, modifierExtension=None, text=None, account=None, appointment=None, basedOn=None, classHistory=None, class_fhir=Coding(resource_type='Coding', fhir_comments=None, extension=None, id=None, code='AMB', code__ext=None, display=None, display__ext=None, system='http://terminology.hl7.org/CodeSystem/v3-ActCode', system__ext=None, userSelected=None, userSelected__ext=None, version=None, version__ext=None), diagnosis=None, episodeOfCare=None, hospitalization=None, identifier=None, l