# LLM Powered Medical Case Sheet Ingestion
## Outline
1. Data Cleansing
2. Prompt Definition
3. Entity & Relationship Extraction
4. Neo4j Cypher Generation
5. Data Ingestion

In [None]:
%%capture
%pip install graphdatascience
%pip install openai
%pip install python-dotenv
%pip install retry

In [None]:
import os
import openai
from retry import retry
import re
from string import Template
import json
import ast
import time
import pandas as pd
from graphdatascience import GraphDataScience
import glob
from timeit import default_timer as timer
from dotenv import load_dotenv

## Data Cleansing

First, let's define a function that can help clean the input data. For the sake of simplicity, lets keep it simple. In the corpus, the data refers to some Figures like scan images. We dont have them and so will remove any such references.

In [None]:
def clean_text(text):
  clean = "\n".join([row for row in text.split("\n")])
  clean = re.sub(r'\(fig[^)]*\)', '', clean, flags=re.IGNORECASE)
  return clean

Let's take this case sheet and extract entities and relations using LLM

In [None]:
article_txt = """The patient was a 34-yr-old man who presented with complaints of fever and a chronic cough.
He was a smoker and had a history of pulmonary tuberculosis that had been treated and cured.
A computed tomographic (CT) scan revealed multiple tiny nodules in both lungs.
A thoracoscopic lung biopsy was taken from the right upper lobe.
The microscopic examination revealed a typical LCH.
The tumor cells had vesicular and grooved nuclei, and they formed small aggregations around the bronchioles (Fig.1).
The tumor cells were strongly positive for S-100 protein, vimentin, CD68 and CD1a.
There were infiltrations of lymphocytes and eosinophils around the tumor cells.
With performing additional radiologic examinations, no other organs were thought to be involved.
He quit smoking, but he received no other specific treatment.
He was well for the following one year.
After this, a follow-up CT scan was performed and it showed a 4 cm-sized mass in the left lower lobe, in addition to the multiple tiny nodules in both lungs (Fig.2).
A needle biopsy specimen revealed the possibility of a sarcoma; therefore, a lobectomy was performed.
Grossly, a 4 cm-sized poorly-circumscribed lobulated gray-white mass was found (Fig.3), and there were a few small satellite nodules around the main mass.
Microscopically, the tumor cells were aggregated in large sheets and they showed an infiltrative growth.
The cytologic features of some of the tumor cells were similar to those seen in a typical LCH.
However, many tumor cells showed overtly malignant cytologic features such as pleomorphic/hyperchromatic nuclei and prominent nucleoli (Fig.4), and multinucleated tumor giant cells were also found.
There were numerous mitotic figures ranging from 30 to 60 per 10 high power fields, and some of them were abnormal.
A few foci of typical LCH remained around the main tumor mass.
Immunohistochemically, the tumor cells were strongly positive for S-100 protein (Fig.5) and vimentin; they were also positive for CD68 (Dako N1577, Clone KPI), and focally positive for CD1a (Fig.6), and they were negative for cytokeratin, epithelial membrane antigen, CD3, CD20 and HMB45.
The ultrastructural analysis failed to demonstrate any Birbeck granules in the cytoplasm of the tumor cells.
Now, at five months after lobectomy, the patient is doing well with no significant change in the radiologic findings.
"""

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Prompt Definition

**⚠️** You need to duplicate `config.env.example` file in the left and rename as `config.env`. Edit the values in this file and provide the values for API keys and Neo4j credentials

In [None]:
import os
#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
api_key = "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"

# Set the environment variable
os.environ["OPENAI_API_KEY"] = api_key


This is a helper function to talk to the LLM with our prompt and text input

This is a simple prompt to start with. If the processing is very complex, you can also chain the prompts as and when required. I am going to use a single prompt here that helps me to extract the text strictly as per the Entities and Relationships defined. This is a simplification. In the real scenario, especially with medical records, you have to leverage on Domain experts to define the Ontology systematically and capture the important information. You might also be fine-tuning the model as and when required.

Also, instead of one single large model, you can also consider chaining a number of smaller ones as per your needs.

We are going with this Graph Schema for our Case Sheet:
![schema.png](attachment:d16d5097-785c-4343-8a48-289517fca944.png)

In [None]:
prompt1="""From the Case sheet for a patient below, extract the following Entities & relationships described in the mentioned format
0. ALWAYS FINISH THE OUTPUT. Never send partial responses
1. First, look for these Entity types in the text and generate as comma-separated format similar to entity type.
   `id` property of each entity must be alphanumeric and must be unique among the entities. You will be referring this property to define the relationship between entities. Do not create new entity types that aren't mentioned below. Document must be summarized and stored inside Case entity under `summary` property. You will have to generate as many entities as needed as per the types below:
    Entity Types:
    label:'Case',id:string,summary:string //Case
    label:'Person',id:string,age:string,location:string,gender:string //Patient mentioned in the case
    label:'Symptom',id:string,description:string //Symptom Entity; `id` property is the name of the symptom, in lowercase & camel-case & should always start with an alphabet
    label:'Disease',id:string,name:string //Disease diagnosed now or previously as per the Case sheet; `id` property is the name of the disease, in lowercase & camel-case & should always start with an alphabet
    label:'BodySystem',id:string,name:string //Body Part affected. Eg: Chest, Lungs; id property is the name of the part, in lowercase & camel-case & should always start with an alphabet
    label:'Diagnosis',id:string,name:string,description:string,when:string //Diagnostic procedure conducted; `id` property is the summary of the Diagnosis, in lowercase & camel-case & should always start with an alphabet
    label:'Biological',id:string,name:string,description:string //Results identified from Diagnosis; `id` property is the summary of the Biological, in lowercase & camel-case & should always start with an alphabet

3. Next generate each relationships as triples of head, relationship and tail. To refer the head and tail entity, use their respective `id` property. Relationship property should be mentioned within brackets as comma-separated. They should follow these relationship types below. You will have to generate as many relationships as needed as defined below:
    Relationship types:
    case|FOR|person
    person|HAS_SYMPTOM{when:string,frequency:string,span:string}|symptom //the properties inside HAS_SYMPTOM gets populated from the Case sheet
    person|HAS_DISEASE{when:string}|disease //the properties inside HAS_DISEASE gets populated from the Case sheet
    symptom|SEEN_ON|chest
    disease|AFFECTS|heart
    person|HAS_DIAGNOSIS|diagnosis
    diagnosis|SHOWED|biological

The output should look like :
{
    "entities": [{"label":"Case","id":string,"summary":string}],
    "relationships": ["disease|AFFECTS|heart"]
}

Case Sheet:
$ctext
"""

Let's run our completion task with our LLM

In [None]:
# This code is for v1 of the openai package: pypi.org/project/openai
from openai import OpenAI
client = OpenAI()

def process_gpt(system,prompt):
  completion = client.chat.completions.create(
      model="gpt-3.5-turbo-16k",
      messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": prompt},
      ],
      temperature=1,
      max_tokens=3000,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0
      )
  nlp_results = completion
  print(nlp_results)
  return nlp_results


In [None]:
def run_completion(prompt, results, ctext):
    try:
        system = "You are a helpful Medical Case Sheet expert who extracts relevant information and store them on a Neo4j Knowledge Graph"
        pr = Template(prompt1).substitute(ctext=ctext)
        res = process_gpt(system, pr)
        parsed_result = json.loads(res.choices[0].message.content)

        # Debugging lines
        print("Parsed Result Type:", type(parsed_result))
        print("Parsed Result:", parsed_result)

        results.append(parsed_result)
        return results
    except Exception as e:
        print(e)

prompts = [prompt1]
results = []

for p in prompts:
    results = run_completion(p, results, clean_text(article_txt))

print(results)


ChatCompletion(id='chatcmpl-8NepeFqhXxPbIBcyw8J6LbKjWjuAU', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='{\n    "entities": [\n        {"label":"Case","id":"case1","summary":"The patient was a 34-yr-old man who presented with complaints of fever and a chronic cough."},\n        {"label":"Person","id":"person1","age":"34-yr-old","gender":"man"},\n        {"label":"Symptom","id":"fever","description":"fever"},\n        {"label":"Symptom","id":"chronicCough","description":"a chronic cough"},\n        {"label":"Disease","id":"smoker","name":"smoker"},\n        {"label":"Disease","id":"pulmonaryTuberculosis","name":"pulmonary tuberculosis"},\n        {"label":"BodySystem","id":"lungs","name":"lungs"},\n        {"label":"Diagnosis","id":"computedTomographicScan","name":"computed tomographic scan"},\n        {"label":"Diagnosis","id":"thoracoscopicLungBiopsy","name":"thoracoscopic lung biopsy"},\n        {"label":"Biological","id":"microscopicExaminati

In [None]:
results

[{'entities': [{'label': 'Case',
    'id': 'case1',
    'summary': 'The patient was a 34-yr-old man who presented with complaints of fever and a chronic cough.'},
   {'label': 'Person', 'id': 'person1', 'age': '34-yr-old', 'gender': 'man'},
   {'label': 'Symptom', 'id': 'fever', 'description': 'fever'},
   {'label': 'Symptom',
    'id': 'chronicCough',
    'description': 'a chronic cough'},
   {'label': 'Disease', 'id': 'smoker', 'name': 'smoker'},
   {'label': 'Disease',
    'id': 'pulmonaryTuberculosis',
    'name': 'pulmonary tuberculosis'},
   {'label': 'BodySystem', 'id': 'lungs', 'name': 'lungs'},
   {'label': 'Diagnosis',
    'id': 'computedTomographicScan',
    'name': 'computed tomographic scan'},
   {'label': 'Diagnosis',
    'id': 'thoracoscopicLungBiopsy',
    'name': 'thoracoscopic lung biopsy'},
   {'label': 'Biological',
    'id': 'microscopicExamination',
    'name': 'microscopic examination'},
   {'label': 'Biological', 'id': 'S-100', 'name': 'S-100 protein'},
   {'lab

In [None]:
# case1
results1= [{
    "entities": [
        {"label": "Case", "id": "case1", "summary": "The patient was a 34-yr-old man who presented with complaints of fever and a chronic cough."},
        {"label": "Person", "id": "person1", "age": "34", "gender": "male"},
        {"label": "Symptom", "id": "fever", "description": "fever"},
        {"label": "Symptom", "id": "chronicCough", "description": "chronic cough"},
        {"label": "Disease", "id": "pulmonaryTuberculosis", "name": "pulmonary tuberculosis"},
        {"label": "BodySystem", "id": "lungs", "name": "lungs"},
        {"label": "Diagnosis", "id": "ctScan", "name": "computed tomographic scan", "description": "revealed multiple tiny nodules in both lungs", "when": "current"},
        {"label": "Biological", "id": "nodulesInLungs", "name": "multiple tiny nodules in both lungs", "description": "revealed by CT scan"},
        {"label": "Diagnosis", "id": "lungBiopsy", "name": "thoracoscopic lung biopsy", "description": "taken from the right upper lobe", "when": "current"},
        {"label": "Biological", "id": "typicalLCH", "name": "typical LCH", "description": "revealed by microscopic examination"},
        {"label": "BodySystem", "id": "bronchioles", "name": "bronchioles"},
        {"label": "Diagnosis", "id": "lobectomy", "name": "lobectomy", "description": "performed due to the possibility of a sarcoma", "when": "current"},
        {"label": "Biological", "id": "sarcomaPossibility", "name": "possibility of a sarcoma", "description": "revealed by needle biopsy specimen"},
        {"label": "BodySystem", "id": "lowerLobe", "name": "left lower lobe"},
        {"label": "Biological", "id": "massInLowerLobe", "name": "4 cm-sized mass in the left lower lobe", "description": "revealed by follow-up CT scan"},
        {"label": "Biological", "id": "mainMass", "name": "4 cm-sized poorly-circumscribed lobulated gray-white mass", "description": "found during lobectomy"},
        {"label": "Biological", "id": "tumorGiantCells", "name": "multinucleated tumor giant cells", "description": "found microscopically"},
        {"label": "Biological", "id": "malignantFeatures", "name": "overtly malignant cytologic features", "description": "found microscopically"},
        {"label": "Biological", "id": "mitoticFigures", "name": "numerous mitotic figures", "description": "ranging from 30 to 60 per 10 high power fields"},
        {"label": "Biological", "id": "typicalLCHRemnants", "name": "foci of typical LCH", "description": "remained around the main tumor mass"},
        {"label": "Biological", "id": "positiveMarkers", "name": "positive markers", "description": "S-100 protein, vimentin, CD68, CD1a"},
        {"label": "Diagnosis", "id": "ultrastructuralAnalysis", "name": "ultrastructural analysis", "description": "failed to demonstrate any Birbeck granules", "when": "current"},

    ],
    "relationships": [
        "case1|FOR|person1",
        "person1|HAS_SYMPTOM{}|fever",
        "person1|HAS_SYMPTOM{}|chronicCough",
        "person1|HAS_DISEASE{}|pulmonaryTuberculosis",
        "person1|HAS_DIAGNOSIS|ctScan",
        "ctScan|SHOWED|nodulesInLungs",
        "person1|HAS_DIAGNOSIS|lungBiopsy",
        "lungBiopsy|SHOWED|typicalLCH",
        "typicalLCH|SEEN_ON|bronchioles",
        "person1|HAS_DIAGNOSIS|lobectomy",
        "lobectomy|SHOWED|sarcomaPossibility",
        "sarcomaPossibility|AFFECTS|lowerLobe",
        "lobectomy|SHOWED|massInLowerLobe",
        "massInLowerLobe|AFFECTS|lungs",
        "massInLowerLobe|AFFECTS|mainMass",
        "mainMass|AFFECTS|tumorGiantCells",
        "mainMass|AFFECTS|malignantFeatures",
        "mainMass|AFFECTS|mitoticFigures",
        "mainMass|AFFECTS|typicalLCHRemnants",
        "mainMass|AFFECTS|positiveMarkers",
        "lobectomy|SHOWED|ultrastructuralAnalysis"
    ]
}
]

## Neo4j Cypher Generation

The entities & relationships we got from the LLM have to be transformed to Cypher so we can ingest into Neo4j

In [None]:
#pre-processing results for uploading into Neo4j - helper function:
def get_prop_str(prop_dict, _id):
    s = []
    for key, val in prop_dict.items():
      if key != 'label' and key != 'id':
         s.append(_id+"."+key+' = "'+str(val).replace('\"', '"').replace('"', '\"')+'"')
    return ' ON CREATE SET ' + ','.join(s)

def get_cypher_compliant_var(_id):
    return "_"+ re.sub(r'[\W_]', '', _id)

def generate_cypher(in_json):
    e_map = {}
    e_stmt = []
    r_stmt = []
    e_stmt_tpl = Template("($id:$label{id:'$key'})")
    r_stmt_tpl = Template("""
      MATCH $src
      MATCH $tgt
      MERGE ($src_id)-[:$rel]->($tgt_id)
    """)
    for obj in in_json:
      for j in obj['entities']:
          props = ''
          label = j['label']
          id = j['id']
          if label == 'Case':
                id = 'c'+str(time.time_ns())
          elif label == 'Person':
                id = 'p'+str(time.time_ns())
          varname = get_cypher_compliant_var(j['id'])
          stmt = e_stmt_tpl.substitute(id=varname, label=label, key=id)
          e_map[varname] = stmt
          e_stmt.append('MERGE '+ stmt + get_prop_str(j, varname))

      for st in obj['relationships']:
          rels = st.split("|")
          src_id = get_cypher_compliant_var(rels[0].strip())
          rel = rels[1].strip()
          tgt_id = get_cypher_compliant_var(rels[2].strip())
          stmt = r_stmt_tpl.substitute(
              src_id=src_id, tgt_id=tgt_id, src=e_map[src_id], tgt=e_map[tgt_id], rel=rel)

          r_stmt.append(stmt)

    return e_stmt, r_stmt

In [None]:
ent_cyp, rel_cyp = generate_cypher(results1)

## Data Ingestion

In [None]:
url="neo4j+s://4bedc6a7.databases.neo4j.io"
username = "neo4j"
password = "8AKeYxtCKyKuI2avOaVRiM-OAdc-A2micaTNPJg05XQ"

In [None]:
pip install langchain

Collecting langchain
  Downloading langchain-0.0.339-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.2-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.1.0,>=0.0.63 (from langchain)
  Downloading langsmith-0.0.66-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.8/46.8 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langch

In [None]:
from langchain.graphs import Neo4jGraph
neo4j_graph = Neo4jGraph(url=url, username=username, password=password)

Before loading the data, create constraints as below

In [None]:
neo4j_graph.query('CREATE CONSTRAINT unique_case_id IF NOT EXISTS FOR (n:Case) REQUIRE n.id IS UNIQUE')
neo4j_graph.query('CREATE CONSTRAINT unique_person_id IF NOT EXISTS FOR (n:Person) REQUIRE (n.id) IS UNIQUE')
neo4j_graph.query('CREATE CONSTRAINT unique_symptom_id IF NOT EXISTS FOR (n:Symptom) REQUIRE (n.id) IS UNIQUE')
neo4j_graph.query('CREATE CONSTRAINT unique_disease_id IF NOT EXISTS FOR (n:Disease) REQUIRE n.id IS UNIQUE')
neo4j_graph.query('CREATE CONSTRAINT unique_bodysys_id IF NOT EXISTS FOR (n:BodySystem) REQUIRE n.id IS UNIQUE')
neo4j_graph.query('CREATE CONSTRAINT unique_diag_id IF NOT EXISTS FOR (n:Diagnosis) REQUIRE n.id IS UNIQUE')
neo4j_graph.query('CREATE CONSTRAINT unique_biological_id IF NOT EXISTS FOR (n:Biological) REQUIRE n.id IS UNIQUE')

[]

Ingest the entities

In [None]:
%%time
for e in ent_cyp:
    neo4j_graph.query(e)


CPU times: user 36.7 ms, sys: 4.61 ms, total: 41.4 ms
Wall time: 2.24 s


Ingest relationships now

In [None]:
%%time
for r in rel_cyp:
    neo4j_graph.query(r)

CPU times: user 29.8 ms, sys: 7.22 ms, total: 37 ms
Wall time: 2.45 s


This is a helper function to ingest all case sheets inside the `data/` directory

In [None]:
def run_pipeline(count=191):
    txt_files = glob.glob("/content/drive/MyDrive/neo4j_data/20cases/*.txt")[0:count]
    print(f"Running pipeline for {len(txt_files)} files")
    failed_files = process_pipeline(txt_files)
    print(failed_files)
    return failed_files

def process_pipeline(files):
    failed_files = []
    for f in files:
        try:
            with open(f, 'r') as file:
                print(f"  {f}: Reading File...")
                data = file.read().rstrip()
                text = clean_text(data)
                print(f"    {f}: Extracting E & R")
                results = extract_entities_relationships(f, text)
                print(f"    {f}: Generating Cypher")
                ent_cyp, rel_cyp = generate_cypher(results)
                print(f"    {f}: Ingesting Entities")
                for e in ent_cyp:
                    neo4j_graph.query(e)
                print(f"    {f}: Ingesting Relationships")
                for r in rel_cyp:
                    neo4j_graph.query(r)
                print(f"    {f}: Processing DONE")
        except Exception as e:
            print(f"    {f}: Processing Failed with exception {e}")
            failed_files.append(f)
    return failed_files

def extract_entities_relationships(f, text):
    start = timer()
    system = "You are a helpful Medical Case Sheet expert who extracts relevant information and store them on a Neo4j Knowledge Graph"
    prompts = [prompt1]
    all_cypher = ""
    results = []
    for p in prompts:
      p = Template(p).substitute(ctext=text)
      res = process_gpt(system, p)
      results.append(json.loads(res.choices[0].message.content))
    end = timer()
    elapsed = (end-start)
    print(f"    {f}: E & R took {elapsed}secs")
    return results




In [None]:
%%time
failed_files = run_pipeline(200)

Running pipeline for 20 files
  /content/drive/MyDrive/neo4j_data/20cases/19214295.txt: Reading File...
    /content/drive/MyDrive/neo4j_data/20cases/19214295.txt: Extracting E & R
ChatCompletion(id='chatcmpl-8NfBHjuQQG6cuzd4rCReg820Nq2zJ', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='{\n"entities": [\n{"label":"Case","id":"case1","summary":"A 71-year-old woman with known diastolic congestive heart failure (CHF) presented to the ER with exacerbation of CHF and a decreased level of consciousness."},\n{"label":"Person","id":"person1","age":"71-year-old","gender":"female","location":"ER"},\n{"label":"Disease","id":"chf","name":"congestive heart failure"},\n{"label":"Symptom","id":"exacerbation","description":"exacerbation of CHF"},\n{"label":"Diagnosis","id":"hypertrophy","name":"concentric left ventricular hypertrophy with diastolic dysfunction","when":"recent admission"},\n{"label":"Diagnosis","id":"mitral","name":"mild to moderate mitral regurgi

If processing failed for some files due to API Rate limit or some other error, you can retry as below

In [None]:
len(failed_files)

14

In [None]:
failed_files

['/content/drive/MyDrive/neo4j_data/20cases/19214295.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/19860925.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/20146086.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/20977862.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/19610147.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/19860006.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/18787726.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/16778410.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/17803823.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/15939911.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/21129213.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/21067996.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/19009665.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/18815636.txt']

In [None]:
%%time
failed_files = process_pipeline(failed_files)
failed_files

  /content/drive/MyDrive/neo4j_data/20cases/19214295.txt: Reading File...
    /content/drive/MyDrive/neo4j_data/20cases/19214295.txt: Extracting E & R
ChatCompletion(id='chatcmpl-8NfKXzbXF2q8V4Z6Vn0fq373Z0kzm', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='{\n    "entities": [\n        {"label":"Case","id":"case_1", "summary":"A 71-year-old woman with known diastolic congestive heart failure (CHF) presented to the ER with exacerbation of CHF and a decreased level of consciousness."},\n        {"label":"Person","id":"person_1","age":"71-year-old","gender":"female"},\n        {"label":"Symptom","id":"exacerbationOfChf", "description":"exacerbation of CHF"},\n        {"label":"Disease","id":"diastolicCongestiveHeartFailure", "name":"diastolic congestive heart failure"},\n        {"label":"BodySystem","id":"heart", "name":"heart"},\n        {"label":"Diagnosis","id":"cardiacCatheterization", "name":"cardiac catheterization", "description":"50% stenos

['/content/drive/MyDrive/neo4j_data/20cases/19214295.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/19860925.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/20146086.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/20977862.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/18787726.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/16778410.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/17803823.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/15939911.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/21129213.txt',
 '/content/drive/MyDrive/neo4j_data/20cases/21067996.txt']