<a href="https://colab.research.google.com/github/24p11/recode-scenario/blob/main/scenario_oncology_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Create fictive clinical notes from Code set (DRG + ICD)

Code set are the raw classification data, we can extract from National database (Base nationale PMSI en France). They are made of 
* classification profile made of grouping variables from DRG records which are prepared with their frequency in the national database
    - age (class)
    - sexe
    - DRG (racine GHM)
    - Main diagnosis (ICD10) : cf
    - Hospitalization management type : cf
* diagnosis associated to each classification profile, extracted with their frequencies
* procedures associated to each classification profile, specialy for surgery and technical gestures, extracted with their frequencies

From thoses raw information we produce a coded clinical scenario which will be uses a seed.

This scenario is transformed into a detail prompt that will be given to a LLM for generation.
From the combinaision of primary and related diagnosis in French discharge abstract, we derived two notions :
* Primary diagnosis : host the notion of principal pathology, it is rather the primary diagnosis of the discharge abstract or the related diagnosis when it exists and that the primary diagnosis of the discharge abstract is from the chapter "Facteurs influant sur l’état de santé" of ICD10
* The Hospitalization management type is rather the term "Primary diagnosis" or the ICD-10 code of the related diagnosis when it exists


In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import pandas as pd
import numpy as np
import datetime as dt

In [4]:
from utils import *

In [5]:
gs = generate_scenario()
# Load official dictionaries
# col_names option allow you to algin your column names the project dictionary.
gs.load_offical_icd("cim_2024.xlsx",col_names={"code" : "icd_code","libelle":"icd_code_description"} )
gs.load_offical_procedures("ccam_actes_2024.xlsx",col_names={"code":"procedure","libelle_long":"procedure_description"} )
col_names={"Code CIM":"icd_parent_code","Localisation":"primary_site","Type Histologique":"histological_type",
	"Stade":"stage","Marqueurs Tumoraux":"biomarkers","Traitement":"treatment_recommandation","Protocole de Chimiothérapie":"chemotherapy_regimen"}
gs.load_cancer_treatement_recommandations("Tableau récapitulatif traitement cancer.xlsx",col_names ) 

In [6]:
# Load data from BN  PMSI
col_names={"racine":"drg_parent_code","das": "icd_secondary_code","diag":"icd_primary_code","categ_cim":"icd_primary_parent_code",
            "mdp":"case_management_type","nb_situations":"nb","acte":"procedure",
            "mode_entree":"admission_mode",
            "mode_sortie":"discharge_disposition",
            "mode_hospit":"admission_type"}
gs.load_classification_profile("bn_pmsi_cases_20250819.csv", col_names)
gs.load_secondary_icd("bn_pmsi_related_diag_20250818.csv",col_names)
gs.load_procedures("bn_pmsi_procedures_20250818.csv",col_names)

In [7]:
query = "icd_primary_code=='C50' and case_management_type=='Z511'"
current_profile = gs.df_classification_profile.query(query).drop(columns="nb").iloc[1]
current_profile

icd_primary_code                                             C50
case_management_type                                        Z511
drg_parent_code                                            28Z07
age                                                        ge_18
cage                                                     [30-40[
cage2                                                    [18-50[
sexe                                                           2
admission_type                                         Inpatient
admission_mode                                          DOMICILE
discharge_disposition                                   DOMICILE
dms                                         1,32352941176471e+00
los_mean                                                     2.0
los_sd                                                       0.0
drg_parent_description    Chimiothérapie pour tumeur, en séances
da                                                           D27
libelle_da               

In [8]:
scenario = gs.generate_scenario_from_profile(current_profile)

In [11]:
case = gs.make_prompts_marks_from_scenario(scenario)

In [12]:
def prepare_prompt(prompt_path, case):
  with open(prompt_path, "r", encoding="utf-8") as f:
      content = f.read()
  return (content
          .replace("[SCENARIO here]", case["SCENARIO"])
          .replace("[INSTRUCTIONS_CANCER here]", case["INSTRUCTIONS_CANCER"])
          .replace("[ICD_ALTERNATIVES here]", case["ICD_ALTERNATIVES"])

          )

print(prepare_prompt("templates/scenario_onco_v1.txt",case =case))

Vous êtes un oncologue clinicien expert. Votre tâche est de générer un compte rendu d'hospitalisation en style clinique synthétique à partir d'un scénario comprenant le résumé PMSI (codes de la classification internationale des maladies), ainsi que d'autres informations décrivant l'hospitalisation.


**SCÉNARIO DE DÉPART :**
- Âge du patient : 40ans
- Sexe du patient : 2
- Date d'entrée : 08/01/2024
- Date de sortie : 10/01/2024
- Date de naissance : 29/08/1983
- Prénom du patient : Leonisa
- Mode de prise en charge : Hospitalisation pour prise en charge du cancer
- codage CIM10 :
   * Diagnostic principal : Tumeur maligne du sein(C50)
   * Diagnostic relié : Séance de chimiothérapie pour tumeur(Z511)
   * Diagnostic associés : 
- Dépendance envers un respirateur : ventilation par un autre moyen(Z991+8)
- Hypertension essentielle (primitive)(I10)
- Tumeur maligne secondaire et non précisée des ganglions lymphatiques de l'aisselle et du membre supérieur(C773)
- Tumeur maligne secondaire