# Note-to-FHIR
In Healthcare, an enormous amount of information is captured in clinical notes. Structuring this information for scientific research, administration and business intelligence in these notes is a labour-intensive task.

With the introduction of powerfull LLM's, we can automate this work.
HealthSage AI has fine-tuned an open source LLM adapter that understands how to convert natural language to FHIR R4. We now support the 10 most-used resource types and we are expanding our scope.

This notebook gives you a walkthrough on how to use HealthSage's Open Beta Note-to-FHIR LLM.

# Performance and limitations

### Scope of the model
This open sourced Beta model is trained within the following scope:
- FHIR R4
- 10 Resource types:
  1. Bundle
  2. Patient
  3. Encounter
  4. Practitioner
  5. Organization
  6. Immunization
  7. Observation
  8. Condition
  9. AllergyIntolerance
  10. Procedure.
- English language


### The following features are out of scope of the current release:
- Support for Coding systems such as SNOMED CT and Loinc.
- FHIR extensions and profiles
- Any language, resource type or FHIR version not mentioned under "in scope".

### Furthermore, please note:
- **No Relative dates:** HealthSage AI Note-to-FHIR will not provide accurate FHIR datetime fields based on text that contains relative time information like "today" or "yesterday". Furthermore, relative dates like "Patient John Doe is 50 years old." will not result in an accurate birthdate estimation, since the precise birthday and -month is unknown, and since the LLM is not aware of the current date.
- **Designed as Patient-centric:** HealthSage AI Note-to-FHIR is trained on notes describing one patient each.
- **<4k Context window:** The training data for this application contained at most 3686 tokens, which is 90% of the context window for Llama-2 (4096)
- **Explicit Null:** If a certain FHIR element is not present in the provided text, it is explicitely predicted as NULL. Explicitely modeling the absence of information reduces the chance of hallucinations.
- **Uses Bundles:** For consistency and simplicity, all predicted FHIR resources are Bundled.
- **Conservative estimates:** Our model is designed to stick to the information explicitely provided in the text.
- **ID's are local:** ID fields and references are local enumarations (1,2,3, etc.). They are not yet tested on referential correctness.
- **Generation design:** The model is designed to generate a seperate resource if there is information about that resource in the text beyond what can be described in reference fields of related resources.
- **This Beta application is still in early development:** Our preliminary results suggest that HealthSage AI Note-to-FHIR is superior to the GPT-4 foundation model within the scope of our application in terms of Fhir Syntax and ability to replicate the original FHIR resources in our test dataset. However, our model is still being analyzed on its performance for out-of-distribution data and out-of-scope data.

First, install the required libraries

In [1]:
!pip install healthsageai

Collecting git+https://github.com/HealthSage-AI/healthsage-ai-llm.git@installation-config
  Cloning https://github.com/HealthSage-AI/healthsage-ai-llm.git (to revision installation-config) to /tmp/pip-req-build-xqgmycvw
  Running command git clone --filter=blob:none --quiet https://github.com/HealthSage-AI/healthsage-ai-llm.git /tmp/pip-req-build-xqgmycvw
  Running command git checkout -b installation-config --track origin/installation-config
  Switched to a new branch 'installation-config'
  Branch 'installation-config' set up to track remote branch 'installation-config' from 'origin'.
  Resolved https://github.com/HealthSage-AI/healthsage-ai-llm.git to commit 3ff88ccb18dbc25e9c6e01299af9bfa2dbbd7e59
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting fhir.resources (from healthsageai==0.1.0)
  Downloading fhir.resources-7.1.0-py2.py3-none-any.whl (3.1 MB)


In [2]:
%load_ext autoreload
%autoreload 2

# How to convert clinical notes to FHIR R4

In [3]:
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
import os
import sys
from datasets import load_dataset
from google.colab import userdata
import sys
sys.path.append("healthsage-ai-llm")
import json

# Login to Huggingface Hub and load the dataset

In [4]:
login_str = f"huggingface-cli login --token={userdata.get('huggingface_token')}"
os.system(login_str)

0

In [5]:
dataset_name = 'healthsageai/fhir-to-note'  # FHIR dataset
dataset = load_dataset(dataset_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/311 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.85M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/145k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/207k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

# Inspecting the first test sample
This is the first sample in our testset. It is written in a formal sentence style in lower case. Our synthetic clinical notes contain various writing styles to maximize the generalizability of the healthsage model.

In [6]:
print("Clinical Note: \n\n",dataset['test']['note'][0])

Clinical Note: 

 the information pertains to a collection of medical records associated with mr. chase olson. 

on the 17th of january, 2023, mr. olson visited dr. jude reynolds. the consultation commenced at 9:42 pm and concluded at 9:57 pm, as per the central european time. the nature of mr. olson's visit classified as an ambulatory patient encounter.

additional background information about mr. olson: he was born on the 29th of november, 1966, which makes him a male of 57 years of age. he can be contacted through his home phone number 555-770-7639. he resides at 792 schumm fork unit 62, palmer, massachusetts, 01069, us. his communication preference is english, specifically the version spoken in the united states.

regarding mr. olson’s health, it's important to mention an active and confirmed health condition he endures. he has an allergy towards a certain, unspecified substance, categorized as an environmental allergy.

all of this information represents critical aspects of mr. ol

Below you find the associated FHIR-R4 resource. FHIR is not easy to read but if you take a close look you can see the json string describes the clinical note in FHIR R4 language. Globally, the FHIR Bundle consists of a **Patient** (mr. Chase Olson), an **encounter** and an **allergy**.  

In [7]:
print("FHIR R4 representation:")
json.loads(dataset['test']['fhir'][0])

FHIR R4 representation:


{'resourceType': 'Bundle',
 'id': '1',
 'type': 'collection',
 'entry': [{'resource': {'resourceType': 'Encounter',
    'id': '1',
    'status': 'finished',
    'class': {'system': 'http://terminology.hl7.org/CodeSystem/v3-ActCode',
     'code': 'AMB'},
    'type': None,
    'subject': {'reference': 'Patient/1', 'display': 'Mr. Chase Olson'},
    'participant': [{'type': [{'coding': [{'system': 'http://terminology.hl7.org/CodeSystem/v3-ParticipationType',
          'code': 'PPRF',
          'display': 'primary performer'}],
        'text': 'primary performer'}],
      'period': {'start': '2023-01-17T21:42:36+01:00',
       'end': '2023-01-17T21:57:36+01:00'},
      'individual': {'reference': 'Practitioner/1',
       'display': 'Dr. Jude Reynolds'}}],
    'period': {'start': '2023-01-17T21:42:36+01:00',
     'end': '2023-01-17T21:57:36+01:00'},
    'reasonCode': None,
    'serviceProvider': None}},
  {'resource': {'resourceType': 'Patient',
    'id': '1',
    'name': [{'use': 'official

# Inference
Now we ask our model to give us the original FHIR model back

In [8]:
from healthsageai.note_to_fhir.inference import NoteToFhir13b

In [9]:
note_to_fhir = NoteToFhir13b()

config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/451 [00:00<?, ?B/s]

adapter_model.bin:   0%|          | 0.00/13.2M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [10]:
dataset

DatasetDict({
    train: Dataset({
        features: ['fhir', 'note'],
        num_rows: 2726
    })
    validation: Dataset({
        features: ['fhir', 'note'],
        num_rows: 48
    })
    test: Dataset({
        features: ['fhir', 'note'],
        num_rows: 79
    })
})

In [11]:
fhir_true = json.loads(dataset['test']['fhir'][0])
fhir_pred = note_to_fhir.translate(dataset['test']['note'][0])

# Result
First we load the json string into a dictionary and then we remove any None values.
As you can see, the resulting FHIR closely resembles the original FHIR Bundle.
An Encounter, a Patient and an AllergyIntolerance Resource are created.

In [12]:
fhir_pred

{'resourceType': 'Bundle',
 'id': '1',
 'type': 'collection',
 'entry': [{'resource': {'resourceType': 'Encounter',
    'id': '1',
    'status': 'finished',
    'class': {'system': 'http://terminology.hl7.org/CodeSystem/v3-ActCode',
     'code': 'AMB'},
    'subject': {'reference': 'Patient/1', 'display': 'Mr. Chase Olson'},
    'period': {'start': '2023-01-17T21:42:54+01:00',
     'end': '2023-01-17T21:57:54+01:00'}}},
  {'resource': {'resourceType': 'Patient',
    'id': '1',
    'name': [{'use': 'official',
      'family': 'Olson',
      'given': ['Chase'],
      'prefix': ['Mr.']}],
    'telecom': [{'system': 'phone', 'value': '555-770-7639', 'use': 'home'}],
    'gender': 'male',
    'birthDate': '1966-11-29',
    'address': [{'line': ['792 Schumm Fork Unit 62'],
      'city': 'Palmer',
      'state': 'Massachusetts',
      'postalCode': '01069',
      'country': 'US'}],
    'communication': [{'language': {'coding': [{'system': 'urn:ietf:bcp:47',
         'code': 'en-US',
         

# Evaluation
Does the model produce valid FHIR?

In [14]:
from healthsageai.note_to_fhir.evaluation.utils import get_diff
from healthsageai.note_to_fhir.evaluation.visuals import show_diff

In [15]:
diff = get_diff(fhir_true, fhir_pred, resource_type="Bundle")
show_diff(diff)