# Demo for MedAgent - Guideline workflow
This is the manual testing playground to test some basic workflows later properly implemented in the MedAgent repository.

First up, we have the setup of guidelines, meaning:
- Find the demanded guidelines
- Download them with some metadat extraction from the website
- Analyze their nature

In [2]:
## SETUP
import os
import sys

sys.path.append(os.path.abspath("../src"))
from general.helper.mongodb_interactor import MongoDBInterface, CollectionName
from scripts.Guideline.write_guideline_list import run_guideline_finding
from scripts.Guideline.guideline_download import run_guideline_downloader
from scripts.Guideline.guideline_analysis import *


mongo_url = os.getenv("MONGO_URL", "mongodb://mongo:mongo@localhost:27017")

search_url="https://register.awmf.org/de/suche#doctype=longVersion&association=007&sorting=relevance"
awmf_guideline_list_file = "output/guideline/guideline_list.txt"
pdf_output_dir, text_output_dir = "output/guideline/pdf", "output/guideline/plain_text"

statistics_doc = "output/guideline/evaluation/statistics_document.txt"
page_count_doc = "output/guideline/evaluation/distr_page_count.png"
dates_scatter_doc, dates_valid_doc = "output/guideline/evaluation/distr_pub_date.png", "output/guideline/evaluation/dist_validity_pub_date.png"

for file_or_dir in [awmf_guideline_list_file, pdf_output_dir, text_output_dir, statistics_doc, page_count_doc, dates_scatter_doc, dates_valid_doc]:
    os.makedirs(os.path.dirname(file_or_dir), exist_ok=True)

# Scale for screen display and saving options for all images
screen_width, screen_height = 650, 400
width, height = 750, 500

dbi = MongoDBInterface(mongo_url)
dbi.register_collections(
    CollectionName.GUIDELINES
)

ModuleNotFoundError: No module named 'Code'

## Extract from AWMF website

In [2]:
run_guideline_finding(search_url=search_url, output_file=awmf_guideline_list_file)

[37m2025-04-07 18:31:09[0m [37m[[0m[1m[32mSUCCESS[0m[37m][0m [32mChromedriver initialized[0m
[37m2025-04-07 18:31:09[0m [37m[[0m[1m[38;5;208mINFO[0m[37m][0m [38;5;208mGetting guideline page links from search[0m
[37m2025-04-07 18:31:11[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:14[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:16[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:18[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:20[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:22[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:24[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:26[0m [37m[[0m[1m[3;90mNOTE[0m[37m][0m [3;90mScrolling...[0m
[37m2025-04-07 18:31:2

[GuidelineMetadata(awmf_register_number='', awmf_class='', title='', leading_publishing_organizations=['Deutsche Gesellschaft für Hygiene und Mikrobiologie e.V. (DGHM)'], other_contributing_organizations=['Deutsche Dermatologische Gesellschaft e.V. (DDG)', 'Deutsche Gesellschaft für Allgemein- und Viszeralchirurgie e.V. (DGAV)', 'Deutsche Gesellschaft für Anästhesiologie und Intensivmedizin e.V. (DGAI)', 'Deutsche Gesellschaft für Chirurgie e.V. (DGCH)', 'Deutsche Gesellschaft für Gastroenterologie, Verdauungs- und Stoffwechselkrankheiten e.V. (DGVS)', 'Deutsche Gesellschaft für Gefäßchirurgie und Gefäßmedizin - Gesellschaft für operative, endovaskuläre und präventive Gefäßmedizin e.V. (DGG)', 'Deutsche Gesellschaft für Gynäkologie und Geburtshilfe e.V. (DGGG)', 'Deutsche Gesellschaft für Handchirurgie e.V. (DGH)', 'Deutsche Gesellschaft für Hals-Nasen-Ohren-Heilkunde, Kopf- und Hals-Chirurgie e.V. (DGHNO-KHC)', 'Deutsche Gesellschaft für Infektiologie e.V. (DGI)', 'Deutsche Gesellscha

## Download guidelines
Based on the output file from before, the guidelines can now be downloaded, some metadata can be extracted, and they can be inserted into a MongoDB.

In [3]:
run_guideline_downloader(
    file=awmf_guideline_list_file,
    pdf_output_folder=pdf_output_dir, text_output_folder=text_output_dir,
    guideline_collection=dbi.get_collection(CollectionName.GUIDELINES)
)

[37m2025-04-07 18:35:36[0m [37m[[0m[1m[38;5;208mINFO[0m[37m][0m [38;5;208mDownloading pdfs listed in output/guideline/guideline_list.txt ...[0m
[37m2025-04-07 18:35:36[0m [37m[[0m[1m[38;5;208mINFO[0m[37m][0m [38;5;208mDownloading pdfs listed in output/guideline/guideline_list.txt ...[0m


TypeError: expected str, bytes or os.PathLike object, not TextIOWrapper

## Analyze and visualize guidelines
Will look at different statistics for evaluation. Starting with a general overview:

In [2]:
number_of_guidelines = get_number_of_documents(dbi.get_collection(CollectionName.GUIDELINES))

number_of_outdated = get_number_of_outdated_documents(dbi.get_collection(CollectionName.GUIDELINES))
outdated_percentage = (number_of_outdated / number_of_guidelines) * 100

oms_spec_guidelines = get_number_of_oms_specific_guidelines(dbi.get_collection(CollectionName.GUIDELINES))

total_pages = get_total_page_count(dbi.get_collection(CollectionName.GUIDELINES))
average_pages = total_pages / number_of_guidelines

avg_update_diff = get_average_publication_interval_in_days(dbi.get_collection(CollectionName.GUIDELINES))

print(f"Total number of guidelines: {number_of_guidelines}")
print(f"Number of outdated guidelines: {number_of_outdated} (-> {outdated_percentage:.2f}%)")
print(f"Number of OMS-specific guidelines: {oms_spec_guidelines:.0f}")
print(f"Total page count: {total_pages:.0f} (-> on average {average_pages:.0f} pages per guideline)")
print(f"Avg time between updates: {avg_update_diff:.2f} days ({avg_update_diff/7:.0f} weeks)")

Total number of guidelines: 93
Number of outdated guidelines: 26 (-> 27.96%)
Number of OMS-specific guidelines: 22
Total page count: 11702 (-> on average 126 pages per guideline)
Avg time between updates: 37.72 days (5 weeks)


Next up, we can visualize these statistics as well, starting with a visualization for the page count.

In [3]:
img__page_count = visualize_page_count(collection=dbi.get_collection(CollectionName.GUIDELINES), bin_size=10)

img__page_count.update_layout(width=screen_width, height=screen_height)
img__page_count.show()

And finally, we can look at the visualization of the publication intervals, simultaneously with a look on validity of the respective guidelines.

In [4]:
img__dates_scattered, img__dates_valid = visualize_publication_dates(collection=dbi.get_collection(CollectionName.GUIDELINES))

img__dates_scattered.update_layout(width=screen_width, height=screen_height)
img__dates_valid.update_layout(width=screen_width, height=screen_height)

img__dates_scattered.show()
img__dates_valid.show()

HBox(children=(Output(), Box(layout=Layout(width='50px')), Output()))

Can alternatively also save the images and numbers.

In [None]:
for path in [statistics_doc, page_count_doc, dates_scatter_doc, dates_valid_doc]:
    os.makedirs(os.path.dirname(path), exist_ok=True)

with open(statistics_doc, "w", encoding="utf-8") as statistics_file:
    statistics_file.write(
        f"Total number of guidelines: {number_of_guidelines}\n"
        f"Number of outdated guidelines: {number_of_outdated} (-> {outdated_percentage:.2f}%)\n"
        f"Number of OMS-specific guidelines: {oms_spec_guidelines:.0f}\n"
        f"Total page count: {total_pages:.0f} (-> on average {average_pages:.0f} pages per guideline)\n"
        f"Avg time between updates: {avg_update_diff:.2f} days ({avg_update_diff/7:.0f} weeks)\n"
    )

img__page_count.write_image(page_count_doc, width=width, height=height)
img__dates_scattered.write_image(dates_scatter_doc, width=width, height=height)
img__dates_valid.write_image(dates_valid_doc, width=width, height=height)

## Shutdown

In [5]:
dbi.close()