# Automatizarea intocmirii fisei pacientului

### Scop

Dezvoltarea unui sistem inteligent care sƒÉ ajute medicii pentru a completa in mod automat fisa pacientului.

### Ideea de baza

Munca medicilor este plina de provocari. Mai ales cand trebuie sa faca multe task-uri, uneori simultan, precum realizarea si citirea unei ecografii si inregistrarea observatiilor facute. De aceea este nevoie de un sistem inteligent care sa transforme informatia audio inregistrata de catre un medic in format text si sa completeze in mod automat rubricile dedicate din fisa pacientului. Se va pleca de la inregistrari audio, se vor converti in format text si se va completa automat partea evidentiata cu galben din fisa pacientului (informatiile respective se vor salva intr-un tabel/jason si apoi se vor exporta intr-un document word)

---

### Descriere FormalƒÉ (MatematicƒÉ / TehnicƒÉ)

Datele de intrare:
ùëã = { fi»ôiere audio_i }
- Explica»õie: X reprezintƒÉ mul»õimea fi»ôierelor audio √Ænregistrate de medic, fiecare corespunz√¢nd unei observa»õii sau consulta»õii.

Scopul aplica»õiei:
ùêπ: ùëã ‚Üí ùëå
- Explica»õie: F este func»õia care transformƒÉ fi»ôierele audio √Æn fi»ôe pacient completate.

Unde:
ùëå = { fi»ôe pacient completate }
- Explica»õie: Y este mul»õimea fi»ôelor completate cu informa»õii structurate extrase din audio.

---

### Descompunerea Problemei

Problema poate fi descompusƒÉ √Æn douƒÉ sub-probleme:
##### 1. Speech-to-Text
ùëì‚ÇÅ : audio ‚Üí text
- Explica»õie: f1 folose»ôte modele ASR (Romanian Wav2Vec2) pentru a transforma fi»ôierele audio √Æn text.
##### 2. Information Extraction
ùëì‚ÇÇ : text ‚Üí date structurate
- Explica»õie: f2 extrage rubricile relevante din text pentru completarea automatƒÉ a fi»ôei pacientului.

### Func»õia FinalƒÉ

Func»õia finalƒÉ este:
ùêπ = ùëì‚ÇÇ ‚àò ùëì‚ÇÅ
- Explica»õie: mai √Ænt√¢i transformƒÉm audio-ul √Æn text, apoi extragem datele structurate pentru completarea fi»ôei pacientului.



In [19]:
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse
import shutil
import os

In [20]:
from transformers import WhisperProcessor, WhisperForConditionalGeneration, pipeline
import soundfile as sf
import torchaudio
import torch

model_name = "TransferRapid/whisper-large-v3-turbo_ro"

# Load processor and model
processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def preprocess_audio(audio_path, processor):
    """Preprocess audio: load via soundfile, resample if needed, and convert to model input format."""
    # √éncarcƒÉ audio cu dtype explicit float32
    waveform_np, sample_rate = sf.read(audio_path, dtype='float32')
    waveform = torch.from_numpy(waveform_np)

    # Converte»ôte stereo ‚Üí mono dacƒÉ e necesar
    if len(waveform.shape) > 1:
        waveform = waveform.mean(dim=0)  # Medie pe canale

    # Resample to 16kHz if needed
    if sample_rate != 16000:
        resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
        waveform = resampler(waveform)

    # Process audio into model input format (asigurƒÉ numpy array float32)
    waveform_np = waveform.numpy() if isinstance(waveform, torch.Tensor) else waveform
    inputs = processor(waveform_np, sampling_rate=16000, return_tensors="pt")

    # Move inputs to device
    inputs = {key: val.to(device) for key, val in inputs.items()}

    return inputs

def transcribe(audio_path):
    """Generate transcription for an audio file."""
    inputs = preprocess_audio(audio_path, processor)

    forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(language="romanian", task="transcribe")

    with torch.no_grad():
        generated_ids = model.generate(inputs["input_features"], forced_decoder_ids=forced_decoder_ids)

    transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

    return transcription[0]

In [21]:
app = FastAPI()
UPLOAD_FOLDER = "uploads"
os.makedirs(UPLOAD_FOLDER, exist_ok=True)

In [22]:
@app.post("/upload-audio/")
async def upload_audio(file: UploadFile = File(...)):
    file_path = os.path.join(UPLOAD_FOLDER, file.filename)
    content = await file.read()
    with open(file_path, "wb") as buffer:
        buffer.write(content)
    file_size = len(content)
    transcription = transcribe(file_path)
    return JSONResponse({
        "filename": file.filename,
        "size_bytes": file_size,
        "transcription": transcription
    })

In [23]:
import nest_asyncio
import uvicorn
import threading

nest_asyncio.apply()

def run_server():
    uvicorn.run(app, host="127.0.0.1", port=8000)

thread = threading.Thread(target=run_server, daemon=True)
thread.start()

print("FastAPI server is running in the background on http://127.0.0.1:8000")


FastAPI server is running in the background on http://127.0.0.1:8000


INFO:     Started server process [407]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
ERROR:    [Errno 48] error while attempting to bind on address ('127.0.0.1', 8000): address already in use
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.


In [24]:
# Manual/local test: pick first file from UPLOAD_FOLDER and transcribe
import os
files = os.listdir(UPLOAD_FOLDER)
if not files:
    print("No files found in uploads/ to transcribe.")
else:
    test_path = os.path.join(UPLOAD_FOLDER, files[0])
    print(f"Testing file: {test_path}")
    try:
        result = transcribe(test_path)
        print("Local transcription:", result)
    except Exception as e:
        print(f"Error transcribing {test_path}: {e}")


Task exception was never retrieved
future: <Task finished name='Task-67' coro=<Server.serve() done, defined at /Users/eduard/Development/MIRPR-Voice-To-Text/.venv1/lib/python3.10/site-packages/uvicorn/server.py:69> exception=SystemExit(1)>
Traceback (most recent call last):
  File "/Users/eduard/Development/MIRPR-Voice-To-Text/.venv1/lib/python3.10/site-packages/uvicorn/server.py", line 164, in startup
    server = await loop.create_server(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 1519, in create_server
    raise OSError(err.errno, 'error while attempting '
OSError: [Errno 48] error while attempting to bind on address ('127.0.0.1', 8000): address already in use

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Users/eduard/Devel

Testing file: uploads/test3.ogg
Local transcription: Aorta la inel, opt, aorta la sinusuri, doisprezece, aortƒÉ ascendentƒÉ, zece, a se treisprezece, vede, »ôase, vede, »ôase, vede, »ôase, vede, »ôase, vede, valva aorticƒÉ, valva aorticƒÉ, veice, treisprezece, barƒÉ »ôase, barƒÉ »ôase.


In [25]:
# Local-only transcription test (no HTTP endpoints)
local_audio_path = "/Users/eduard/Downloads/test3.ogg"  # adjust to your file
print(f"Loading and transcribing: {local_audio_path}")
try:
    transcript = transcribe(local_audio_path)
    print("Transcript:", transcript)
except Exception as e:
    print(f"Error during transcription: {e}")


Loading and transcribing: /Users/eduard/Downloads/test3.ogg
Transcript: Aorta la inel, opt, aorta la sinusuri, doisprezece, aortƒÉ ascendentƒÉ, zece, a se treisprezece, vede, »ôase, vede, »ôase, vede, »ôase, vede, »ôase, vede, valva aorticƒÉ, valva aorticƒÉ, veice, treisprezece, barƒÉ »ôase, barƒÉ »ôase.


In [26]:
# === EXTRAC»öIE ENTITƒÇ»öI MEDICALE CU MODEL NER GENERIC (PENTRU COMPARA»öIE) ===
import json
CALEA_MODEL_NER = "dumitrescustefan/bert-base-romanian-ner"

try:
    print("=" * 80)
    print("METODA 1: Model NER Generic (pentru referin»õƒÉ)")
    print("=" * 80)
    print(f"\n‚ö†Ô∏è  AVERTISMENT: Modelul {CALEA_MODEL_NER} NU este antrenat pe date medicale!")
    print("    Rezultatele vor fi sub-optime pentru termeni medicali.\n")

    # √éncƒÉrcƒÉm pipeline-ul NER generic
    print(f"Se √ÆncarcƒÉ modelul NER: {CALEA_MODEL_NER}...")
    ner_pipeline = pipeline(
        "ner",
        model=CALEA_MODEL_NER,
        tokenizer=CALEA_MODEL_NER,
        aggregation_strategy="simple"
    )
    print("‚úÖ Model √ÆncƒÉrcat cu succes.")

    # RulƒÉm textul prin pipeline pentru a extrage entitƒÉ»õile
    entitati_extrase = ner_pipeline(transcript)

    print("\n--- EntitƒÉ»õi Extrase de NER Generic ---")
    for entitate in entitati_extrase:
        print(f"- {entitate['entity_group']}: {entitate['word']} (scor: {entitate['score']:.2f})")

    # Conversia √Æn JSON Structurat
    fisa_pacient_json_generic = {}
    for entitate in entitati_extrase:
        tip_entitate = entitate['entity_group']
        valoare = entitate['word']
        if tip_entitate not in fisa_pacient_json_generic:
            fisa_pacient_json_generic[tip_entitate] = []
        fisa_pacient_json_generic[tip_entitate].append(valoare)

    print("\n--- JSON Structurat Generat (Model Generic) ---")
    json_output_string = json.dumps(fisa_pacient_json_generic, indent=4, ensure_ascii=False)
    print(json_output_string)

    # Salvare √Æn fi»ôier
    with open("fisa_pacient_output_generalist.json", "w", encoding="utf-8") as f:
        f.write(json_output_string)
    print(f"\n‚úÖ Fi»ôierul 'fisa_pacient_output_generalist.json' a fost salvat.")

except Exception as e:
    print(f"‚ùå A apƒÉrut o eroare la √ÆncƒÉrcarea modelului sau la procesare: {e}")


METODA 1: Model NER Generic (pentru referin»õƒÉ)

‚ö†Ô∏è  AVERTISMENT: Modelul dumitrescustefan/bert-base-romanian-ner NU este antrenat pe date medicale!
    Rezultatele vor fi sub-optime pentru termeni medicali.

Se √ÆncarcƒÉ modelul NER: dumitrescustefan/bert-base-romanian-ner...


Device set to use mps:0
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


‚úÖ Model √ÆncƒÉrcat cu succes.

--- EntitƒÉ»õi Extrase de NER Generic ---
- LABEL_0: Aorta la inel, (scor: 0.99)
- LABEL_25: opt (scor: 1.00)
- LABEL_0: , aorta la sinusuri, (scor: 1.00)
- LABEL_25: doisprezece (scor: 1.00)
- LABEL_0: , aortƒÉ ascendentƒÉ, (scor: 1.00)
- LABEL_25: zece (scor: 1.00)
- LABEL_0: , a se (scor: 1.00)
- LABEL_25: treisprezece (scor: 1.00)
- LABEL_0: , vede, (scor: 1.00)
- LABEL_25: »ôase (scor: 1.00)
- LABEL_0: , vede, (scor: 1.00)
- LABEL_25: »ôase (scor: 1.00)
- LABEL_0: , vede, (scor: 1.00)
- LABEL_25: »ôase (scor: 1.00)
- LABEL_0: , vede, (scor: 1.00)
- LABEL_25: »ôase (scor: 1.00)
- LABEL_0: , vede, valva aorticƒÉ, valva aorticƒÉ, veice, (scor: 1.00)
- LABEL_25: treisprezece (scor: 1.00)
- LABEL_0: , barƒÉ (scor: 0.99)
- LABEL_25: »ôase (scor: 1.00)
- LABEL_0: , barƒÉ (scor: 0.99)
- LABEL_25: »ôase (scor: 0.99)
- LABEL_0: . (scor: 1.00)

--- JSON Structurat Generat (Model Generic) ---
{
    "LABEL_0": [
        "Aorta la inel,",
        ", aorta la sin

In [27]:
# === EXTRAC»öIE ENTITƒÇ»öI MEDICALE CU EXTRACTOR SPECIALIZAT (RECOMANDAT) ===
from medical_entity_extractor import MedicalEntityExtractor, process_medical_transcription

print("\n" + "=" * 80)
print("METODA 2: Extractor Medical Specializat (RECOMANDAT)")
print("=" * 80)
print("‚úÖ Folose»ôte pattern matching »ôi reguli specifice domeniului medical\n")

# Ini»õializeazƒÉ extractorul
extractor = MedicalEntityExtractor()

# Extrage toate entitƒÉ»õile medicale
fisa_pacient = extractor.extract_all_entities(transcript)

# Afi»ôeazƒÉ rezultatele
print("\nüìã MƒÇSURƒÇTORI ECOGRAFICE EXTRASE:")
print("-" * 80)
if fisa_pacient.masuratori_ecografice:
    for i, masurare in enumerate(fisa_pacient.masuratori_ecografice, 1):
        print(f"{i}. {masurare['structura_anatomica']}: {masurare['valoare_numerica']} {masurare['unitate_masura']}")
else:
    print("   Nicio mƒÉsurƒÉtoare detectatƒÉ")

print("\nüíä MEDICAMENTE EXTRASE:")
print("-" * 80)
if fisa_pacient.medicamente:
    for med in fisa_pacient.medicamente:
        print(f"   ‚Ä¢ {med['nume']} - {med['dozaj']} ({med['frecventa']})")
else:
    print("   Niciun medicament detectat")

print("\nü©∫ SIMPTOME EXTRASE:")
print("-" * 80)
if fisa_pacient.simptome:
    for simptom in fisa_pacient.simptome:
        print(f"   ‚Ä¢ {simptom}")
else:
    print("   Niciun simptom detectat")

print("\nüîç DIAGNOSTICE EXTRASE:")
print("-" * 80)
if fisa_pacient.diagnostice:
    for diagnostic in fisa_pacient.diagnostice:
        print(f"   ‚Ä¢ {diagnostic}")
else:
    print("   Niciun diagnostic detectat")

# SalveazƒÉ √Æn format JSON standard
print("\n" + "=" * 80)
print("SALVARE DATE STRUCTURATE")
print("=" * 80)

extractor.save_to_json(fisa_pacient, "fisa_pacient_medical_structured.json")

# GenereazƒÉ »ôi salveazƒÉ format FHIR
fhir_observations = extractor.to_fhir_observation(fisa_pacient.masuratori_ecografice)
with open("fhir_observations.json", "w", encoding="utf-8") as f:
    json.dump(fhir_observations, f, indent=4, ensure_ascii=False)
print(f"‚úÖ Observa»õii FHIR salvate √Æn: fhir_observations.json")

# Afi»ôeazƒÉ JSON-ul complet
print("\n" + "=" * 80)
print("JSON COMPLET (cu format FHIR)")
print("=" * 80)
print(extractor.to_json(fisa_pacient))

print("\n" + "=" * 80)
print("‚úÖ PROCESARE COMPLETƒÇ!")
print("=" * 80)
print("\nüìä FI»òIERE GENERATE:")
print("   1. fisa_pacient_medical_structured.json  - Date structurate complete")
print("   2. fhir_observations.json                - Format FHIR pentru integrare")
print("   3. fisa_pacient_output_generalist.json   - Rezultat model NER generic (pentru compara»õie)")
print("\nüí° TIP: Folose»ôte 'fisa_pacient_medical_structured.json' pentru generarea raportului Word!")



METODA 2: Extractor Medical Specializat (RECOMANDAT)
‚úÖ Folose»ôte pattern matching »ôi reguli specifice domeniului medical


üìã MƒÇSURƒÇTORI ECOGRAFICE EXTRASE:
--------------------------------------------------------------------------------
1. aorta la inel: 8.0 mm
2. aorta la sinusuri: 12.0 mm
3. aortƒÉ ascendentƒÉ: 10.0 mm

üíä MEDICAMENTE EXTRASE:
--------------------------------------------------------------------------------
   Niciun medicament detectat

ü©∫ SIMPTOME EXTRASE:
--------------------------------------------------------------------------------
   Niciun simptom detectat

üîç DIAGNOSTICE EXTRASE:
--------------------------------------------------------------------------------
   Niciun diagnostic detectat

SALVARE DATE STRUCTURATE
‚úÖ Fi»ôa pacientului salvatƒÉ √Æn: fisa_pacient_medical_structured.json
‚úÖ Observa»õii FHIR salvate √Æn: fhir_observations.json

JSON COMPLET (cu format FHIR)
{
    "masuratori_ecografice": [
        {
            "structura_anatom

In [28]:
# === GENERARE RAPORT WORD (CONFORM SEC»öIUNII 5.2 DIN GHID) ===
from word_report_generator import MedicalReportGenerator, generate_word_report
from datetime import datetime

print("\n" + "=" * 80)
print("GENERARE RAPORT WORD AUTOMAT")
print("=" * 80)

# Metoda 1: Raport simplu (fƒÉrƒÉ »ôablon)
print("\nüìÑ Generare raport simplu...")
try:
    raport_path = generate_word_report(
        json_path="fisa_pacient_medical_structured.json",
        output_path=f"raport_medical_{datetime.now().strftime('%Y%m%d_%H%M%S')}.docx",
        use_template=False
    )
    print(f"‚úÖ Raport generat: {raport_path}")
except Exception as e:
    print(f"‚ùå Eroare la generarea raportului: {e}")

# Metoda 2: Raport cu »ôablon personalizat
print("\nüìù Generare »ôablon Word personalizabil...")
generator = MedicalReportGenerator()
try:
    generator.create_template("template_fisa_pacient.docx")

    # GenereazƒÉ raport folosind »ôablonul
    print("\nüìÑ Generare raport din »ôablon...")
    raport_template_path = generate_word_report(
        json_path="fisa_pacient_medical_structured.json",
        output_path=f"raport_medical_template_{datetime.now().strftime('%Y%m%d_%H%M%S')}.docx",
        use_template=True,
        template_path="template_fisa_pacient.docx"
    )
    print(f"‚úÖ Raport din »ôablon generat: {raport_template_path}")
except Exception as e:
    print(f"‚ùå Eroare la generarea raportului cu »ôablon: {e}")

print("\n" + "=" * 80)
print("üéâ PIPELINE COMPLET FINALIZAT!")
print("=" * 80)
print("\nüìã REZULTATE FINALE:")
print("   ‚úÖ Audio ‚Üí Text (ASR cu Whisper)")
print("   ‚úÖ Text ‚Üí EntitƒÉ»õi Structurate (Pattern Matching Medical)")
print("   ‚úÖ EntitƒÉ»õi ‚Üí JSON (cu format FHIR)")
print("   ‚úÖ JSON ‚Üí Raport Word (Formatat »ôi Lizibil)")
print("\nüí° URMƒÇTORII PA»òI:")
print("   1. EditeazƒÉ template_fisa_pacient.docx √Æn Word pentru personalizare")
print("   2. Fine-tuning model NER pentru domeniul medical (op»õional, pentru acurate»õe maximƒÉ)")
print("   3. IntegreazƒÉ cu sistemul medical existent (FHIR, HL7)")



GENERARE RAPORT WORD AUTOMAT

üìÑ Generare raport simplu...
‚úÖ Raport Word generat cu succes: raport_medical_20251024_195754.docx
‚úÖ Raport generat: raport_medical_20251024_195754.docx

üìù Generare »ôablon Word personalizabil...
‚úÖ »òablon creat: template_fisa_pacient.docx
üí° Po»õi edita acest »ôablon √Æn Word »ôi apoi √Æl po»õi folosi cu create_report_with_template()

üìÑ Generare raport din »ôablon...
‚úÖ Raport Word generat din »ôablon: raport_medical_template_20251024_195754.docx
‚úÖ Raport din »ôablon generat: raport_medical_template_20251024_195754.docx

üéâ PIPELINE COMPLET FINALIZAT!

üìã REZULTATE FINALE:
   ‚úÖ Audio ‚Üí Text (ASR cu Whisper)
   ‚úÖ Text ‚Üí EntitƒÉ»õi Structurate (Pattern Matching Medical)
   ‚úÖ EntitƒÉ»õi ‚Üí JSON (cu format FHIR)
   ‚úÖ JSON ‚Üí Raport Word (Formatat »ôi Lizibil)

üí° URMƒÇTORII PA»òI:
   1. EditeazƒÉ template_fisa_pacient.docx √Æn Word pentru personalizare
   2. Fine-tuning model NER pentru domeniul medical (op»õional, pentr