## 📚 Prerequisites

Ensure that your Azure Services are properly set up, your Conda environment is created, and your environment variables are configured as per the instructions in the [README.md](README.md) file.

## 📋 Table of Contents

This notebook assists in creating an Azure AI Search Index, covering the following sections:

1. [**Define Field Types**](#define-field-types): Outlines the process of defining the structure and behavior of an index using various field types.

2. [**Configuring Vector Search**](#configuring-vector-search): Discusses the setup of algorithms and profiles for handling vector-based queries.

3. [**Configuring Semantic Search**](#configuring-semantic-search): Explores how to enhance search capabilities by leveraging advanced AI models.

4. [**Create or Update Index**](#create-or-update-index): Details the steps to create a new index or update an existing one.

For additional information, refer to the following resources:
- [Azure AI Search Documentation](https://learn.microsoft.com/en-us/azure/search/)

In [20]:
import os

# Define the target directory
target_directory = r"C:\Users\pablosal\Desktop\gbb-ai-hls-factory-prior-auth"  # change your directory here

# Check if the directory exists
if os.path.exists(target_directory):
    # Change the current working directory
    os.chdir(target_directory)
    print(f"Directory changed to {os.getcwd()}")
else:
    print(f"Directory {target_directory} does not exist.")

Directory changed to C:\Users\pablosal\Desktop\gbb-ai-hls-factory-prior-auth


In [21]:
import json
from pathlib import Path
import pandas as pd

def save_dictionary_to_file(dictionary, file_path):
    with open(file_path, 'w') as file:
        json.dump(dictionary, file, indent=4)

def load_dictionary_from_file(file_path):
    with open(file_path, 'r') as file:
        return json.load(file)

def build_ground_truth_dataset(data_folder, categories):
    file_info_dict = {}
    for case_folder in data_folder.iterdir():
        if case_folder.is_dir():
            process_case_folder(case_folder, file_info_dict, categories)
    return file_info_dict

def initialize_file_info_dict(categories):
    return {cat: [] for cat in categories}

def process_category_folder(category_folder, file_info_dict, case_eval_id):
    for pdf_file in category_folder.glob("*.pdf"):
        file_info_dict[case_eval_id][category_folder.name].append(str(pdf_file))

def extract_results_data(results_file, file_info_dict, case_eval_id):
    with open(results_file, 'r') as f:
        results_data = json.load(f)
        file_info_dict[case_eval_id]['evaluation_time'] = results_data.get('evaluation_time', 'N/A')
        file_info_dict[case_eval_id]['decision'] = results_data.get('decision', 'N/A')
        file_info_dict[case_eval_id]['notes'] = results_data.get('notes', 'N/A')

def process_evaluation_folder(eval_folder, file_info_dict, case_eval_id, categories):
    file_info_dict[case_eval_id] = initialize_file_info_dict(categories)
    file_info_dict[case_eval_id]['evaluation_time'] = 'N/A'
    file_info_dict[case_eval_id]['decision'] = 'N/A'
    file_info_dict[case_eval_id]['notes'] = 'N/A'

    for category_folder in eval_folder.iterdir():
        if category_folder.is_dir() and category_folder.name in categories:
            process_category_folder(category_folder, file_info_dict, case_eval_id)

    results_file = eval_folder / 'results.json'
    if results_file.exists():
        extract_results_data(results_file, file_info_dict, case_eval_id)

def process_case_folder(case_folder, file_info_dict, categories):
    case_id = case_folder.name
    for eval_folder in case_folder.iterdir():
        if eval_folder.is_dir() and eval_folder.name.isalpha():
            eval_id = eval_folder.name
            case_eval_id = f"{case_id}_{eval_id}"
            process_evaluation_folder(eval_folder, file_info_dict, case_eval_id, categories)

In [22]:
data_folder = Path('utils/data/cases')
categories = ['doctor_notes', 'imaging', 'labs', 'pa_form', 'policies']
file_info_dict = build_ground_truth_dataset(data_folder, categories)

save_path = data_folder / 'ground_truth.json'
save_dictionary_to_file(file_info_dict, save_path)

loaded_dict = load_dictionary_from_file(save_path)

# Create DataFrame from the loaded dictionary
df = pd.DataFrame.from_dict(loaded_dict, orient='index')

In [23]:
df

Unnamed: 0,doctor_notes,imaging,labs,pa_form,policies,evaluation_time,decision,notes
001_a,[utils\data\cases\001\a\doctor_notes\01_a.pdf],[utils\data\cases\001\a\imaging\01_a.pdf],[utils\data\cases\001\a\labs\01_a.pdf],[utils\data\cases\001\a\pa_form\01_a.pdf],[utils\data\cases\001\a\policies\001_inflammat...,2023-10-01T12:00:00Z,approved,Manually evaluated by MD based on the policies.
001_b,[utils\data\cases\001\b\doctor_notes\01_a.pdf],[utils\data\cases\001\b\imaging\01_a.pdf],[utils\data\cases\001\b\labs\01_a.pdf],[utils\data\cases\001\b\pa_form\01_a.pdf],[utils\data\cases\001\b\policies\001_inflammat...,2023-10-01T12:00:00Z,approved,Manually evaluated by MD based on the policies.
002_a,[utils\data\cases\002\a\doctor_notes\01_a.pdf],[utils\data\cases\002\a\imaging\01_a.pdf],[utils\data\cases\002\a\labs\01_a.pdf],[utils\data\cases\002\a\pa_form\01_a.pdf],[utils\data\cases\002\a\policies\001_inflammat...,2023-10-01T12:00:00Z,approved,Manually evaluated by MD based on the policies.
002_b,[utils\data\cases\002\b\doctor_notes\01_a.pdf],[utils\data\cases\002\b\imaging\01_a.pdf],[utils\data\cases\002\b\labs\01_a.pdf],[utils\data\cases\002\b\pa_form\01_a.pdf],[utils\data\cases\002\b\policies\001_inflammat...,2023-10-01T12:00:00Z,approved,Manually evaluated by MD based on the policies.
