<a href="https://colab.research.google.com/github/24p11/recode-scenario/blob/main/scenario_oncology_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Create fictive clinical notes from Code set (DRG + ICD)

Code set are the raw classification data, we can extract from National database (Base nationale PMSI en France). They are made of 
* classification profile made of grouping variables from DRG records which are prepared with their frequency in the national database
    - age (class)
    - sexe
    - DRG (racine GHM)
    - Main diagnosis (ICD10) : cf
    - Hospitalization management type : cf
* diagnosis associated to each classification profile, extracted with their frequencies
* procedures associated to each classification profile, specialy for surgery and technical gestures, extracted with their frequencies

From thoses raw information we produce a coded clinical scenario which will be uses a seed.

This scenario is transformed into a detail prompt that will be given to a LLM for generation.
From the combinaision of primary and related diagnosis in French discharge abstract, we derived two notions :
* Primary diagnosis : host the notion of principal pathology, it is rather the primary diagnosis of the discharge abstract or the related diagnosis when it exists and that the primary diagnosis of the discharge abstract is from the chapter "Facteurs influant sur l’état de santé" of ICD10
* The Hospitalization management type is rather the term "Primary diagnosis" or the ICD-10 code of the related diagnosis when it exists


In [201]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [202]:
import pandas as pd
import numpy as np
import datetime as dt

In [203]:
from utils import *

In [204]:
gs = generate_scenario()
# Load official dictionaries
# col_names option allow you to algin your column names the project dictionary.
gs.load_offical_icd("cim_2024.xlsx",col_names={"code" : "icd_code","libelle":"icd_code_description"} )
gs.load_offical_procedures("ccam_actes_2024.xlsx",col_names={"code":"procedure","libelle_long":"procedure_description"} )
col_names={"Code CIM":"icd_parent_code","Localisation":"primary_site","Type Histologique":"histological_type",
	"Stade":"stage","Marqueurs Tumoraux":"biomarkers","Traitement":"treatment_recommandation","Protocole de Chimiothérapie":"chemotherapy_regimen"}
gs.load_cancer_treatement_recommandations("Tableau récapitulatif traitement cancer.xlsx",col_names ) 

In [205]:
# Load data from BN  PMSI
col_names={"racine":"drg_parent_code","das": "icd_secondary_code","diag":"icd_primary_code","categ_cim":"icd_primary_parent_code",
            "mdp":"case_management_type","nb_situations":"nb","acte":"procedure",
            "mode_entree":"admission_mode",
            "mode_sortie":"discharge_disposition",
            "mode_hospit":"admission_type"}
gs.load_classification_profile("bn_pmsi_cases_20250819.csv", col_names)
gs.load_secondary_icd("bn_pmsi_related_diag_20250818.csv",col_names)
gs.load_procedures("bn_pmsi_procedures_20250818.csv",col_names)

In [206]:
cols_scenario = ["first_name","last_name","cage2","cage","sexe",
                "last_name_med","icd_primary_code",
                "admission_type","admission_mode","discharge_disposition",
                'drg_parent_code','icd_primary_code','icd_secondaray_code','cd_md_pec']
                
cols_cancer = ["cancer_stage","TNM_score","histological_type","treatment_recommandation","chemotherapy_regimen"]



In [207]:
#Prepare cases
df_profile = gs.df_classification_profile.drop(columns="nb")
df_profile = df_profile[df_profile.icd_primary_code.isin(gs.icd_codes_cancer)]

In [208]:
test_profile = df_profile[(df_profile["icd_primary_code"]=="C50") & (df_profile["case_management_type"]=="DP")].iloc[0].copy()
test_profile

icd_primary_code                                                     C50
case_management_type                                                  DP
drg_parent_code                                                    09M10
age                                                                ge_18
cage                                                               [80-[
cage2                                                              [50-[
sexe                                                                   2
admission_type                                                 Inpatient
admission_mode                                                  URGENCES
discharge_disposition                                                SMR
dms                                                 2,08000000000000e+01
los_mean                                                       11.610687
los_sd                                                         16.134052
drg_parent_description                        Tumeu

In [209]:
test_scenario = gs.generate_scenario_from_profile(test_profile)
test_scenario

{'age': 87,
 'sexe': np.int64(2),
 'date_entry': datetime.date(2024, 6, 8),
 'date_discharge': datetime.date(2024, 6, 10),
 'date_of_birth': datetime.date(1936, 8, 12),
 'first_name': 'Renee',
 'last_name': 'Rougier',
 'icd_primary_code': 'C50',
 'case_management_type': 'DP',
 'icd_secondaray_code': ['I10',
  'J961+0',
  'R4700',
  'C773',
  'I269',
  'R33',
  'D226',
  'E755'],
 'admission_mode': 'URGENCES',
 'discharge_disposition': 'SMR',
 'cancer_stage': 'Stade III',
 'score_TNM': 'T4N3M0',
 'histological_type': 'Carcinome invasif',
 'treatment_recommandation': 'Traitement néoadjuvant (chimiothérapie, thérapie ciblée HER2), chirurgie, radiothérapie, thérapie systémique adjuvante',
 'chemotherapy_regimen': 'AC   (Doxorubicine/Cyclophosphamide) suivi de Taxane et Pertuzumab/Trastuzumab',
 'drg_parent_code': '09M10',
 'cage': '[80-[',
 'cage2': '[50-[',
 'admission_type': 'Inpatient',
 'dms': '2,08000000000000e+01',
 'los_mean': np.float64(11.6106870229008),
 'los_sd': np.float64(16.1

In [210]:
def create_prompt(scenario):
    if scenario['admission_type'] == "Inpatient" and scenario['drg_parent_code'][2:3]=="C" :
        template_name = "surgery_complete.txt"
    elif scenario['admission_type'] == "Outpatient" and scenario['drg_parent_code'][2:3]=="C" :
        template_name = "surgery_outpatient.txt"
    else:
        template_name = "scenario_onco_v1.txt"
    
    case  = gs.make_prompts_marks_from_scenario(scenario)
    prompt = prepare_prompt("templates/" + template_name, case =case)
    return prompt

In [211]:
test_prompt = create_prompt(test_scenario)
print (test_prompt)

Vous êtes un oncologue clinicien expert. Votre tâche est de générer un compte rendu d'hospitalisation en style clinique synthétique.


**SCÉNARIO DE DÉPART :**
- Âge du patient : 87 ans
- Sexe du patient : Féminin
- Date d'entrée : 08/06/2024
- Date de sortie : 10/06/2024
- Date de naissance : 12/08/1936
- Prénom du patient : Renee
- Nom du patient : Rougier
- Mode de prise en charge : Première hospitalisation pour découverte de cancer
- Codage CIM10 :
   * Diagnostic principal : Tumeur maligne du sein (C50)
   * Diagnostic relié : Aucun
   * Diagnostic associés : 
- Hypertension essentielle (primitive) (I10)
- Insuffisance respiratoire chronique obstructive (J961+0)
- Aphasie récente, persistant au-delà de 24 heures (R4700)
- Tumeur maligne secondaire et non précisée des ganglions lymphatiques de l'aisselle et du membre supérieur (C773)
- Embolie pulmonaire, (sans mention de coeur pulmonaire aigu) (I269)
- Rétention d'urine (R33)
- Naevus à mélanocytes du membre supérieur, y compris l

In [233]:
# df_scenario =[]
# for i in range(0,5):

#     current_profile = df_profile.iloc[i,:]

#     scenario = gs.generate_scenario_from_profile(current_profile)
#     row = {k:scenario[k] for k in scenario if k in cols_scenario }
#     cancer = [scenario[k] for k in scenario if k in cols_cancer ]

#     row.update({"cancer":cancer})
    
#     case  = gs.make_prompts_marks_from_scenario(scenario)
    
#     row.update({'case': case})


#     if row['admission_type'] == "Inpatient" and row['drg_parent_code'][2:3]=="C" :
#         template_name = "surgery_complete.txt"
#     elif row['admission_type'] == "Outpatient" and row['drg_parent_code'][2:3]=="C" :
#         template_name = "surgery_outpatient.txt"
#     elif row['drg_parent_code'][2:3]=="K" :
#         template_name = "interventionnel.txt"
#     elif row['cd_md_pec']==17 :
#         template_name = "bilan.txt"
#     else:
#         template_name = "scenario_onco_v1.txt"
        
#     prompt =  prepare_prompt("templates/" + template_name ,case =case)
#     row.update({'prompt': prompt})

#     df_scenario.append(row)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cacher_needs_updating = self._check_is_chained_assignment_possible()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer_missing(indexer, value)


In [226]:
# df_scenario= pd.DataFrame(df_scenario)

In [227]:
# df_scenario.to_csv(gs.path_data + "test_scenario_v1.csv")

In [212]:
# df_scenario[]

{'sexe': 2,
 'first_name': 'Zeinab',
 'icd_secondaray_code': ['R2630', 'I10', 'C773', 'C795', 'C792', 'Z290'],
 'admission_mode': 'DOMICILE',
 'discharge_disposition': 'DOMICILE',
 'icd_primary_code': 'C50',
 'drg_parent_code': '28Z07',
 'cage': '[30-40[',
 'cage2': '[18-50[',
 'admission_type': 'Inpatient',
 'last_name': 'Jouhanet',
 'last_name_med': 'Lissillour',
 'cancer': ['Stade II',
  None,
  'Carcinome métaplasique',
  'Chirurgie (mastectomie ou BCS) + radiothérapie, thérapie systémique adjuvante incluant chimiothérapie, thérapie ciblée HER2 si HER2+',
  'TAC (Docetaxel, Doxorubicine, Cyclophosphamide)'],
 'case': {'SCENARIO': "**SCÉNARIO DE DÉPART :**\n- Âge du patient : 34ans\n- Sexe du patient : 2\n- Date d'entrée : 18/07/2024\n- Date de sortie : 20/07/2024\n- Date de naissance : 29/06/1990\n- Prénom du patient : Zeinab\n- Mode de prise en charge : Hospitalisation pour prise en charge du cancer\n- codage CIM10 :\n   * Diagnostic principal : Tumeur maligne du sein(C50)\n   * D