# Documentation assistant

This notebook demonstrates a documentation assistant: Video-to-documentation conversion using Vertex AI

Converting videos-to-documentation involves three steps: 
1. Protocol finder: Select protocol which best captures the step being performed in the video
2. Video comparing to ground-of-truth protocol → lab documentation + errors in procedure
3. Analytics based on benchmark dataset: automatic comparison of errors found by documentation assistent vs actual errors

In this notebook, I will focus on the step two and three - Compare video with protocol.

In [2]:
from __future__ import annotations

# %load_ext autoreload
%reload_ext autoreload
%autoreload 2

import configparser
import os
import sys
from pathlib import Path
import json
import pandas as pd
import pprint


from IPython.display import Markdown

path_to_append = Path(Path.cwd()).parent / "proteomics_specialist"
sys.path.append(str(path_to_append))
import video_to_protocol

config = configparser.ConfigParser()
config.read("../secrets.ini")

['../secrets.ini']

In [3]:
import vertexai

config = configparser.ConfigParser()
config.read("../secrets.ini")

PROJECT_ID = config["DEFAULT"]["PROJECT_ID"]
vertexai.init(project=PROJECT_ID, location="europe-west9")  # europe-west9 is Paris

In [4]:
from google.cloud import storage

os.environ["GOOGLE_CLOUD_PROJECT"] = config["DEFAULT"]["PROJECT_ID"]

# Initialize Cloud Storage client
storage_client = storage.Client()
bucket_name = "mannlab_videos"
bucket = storage_client.bucket(bucket_name)

In [5]:
import logging
logger = logging.getLogger(__name__)
logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
from vertexai.generative_models import GenerativeModel, GenerationConfig
from typing import TYPE_CHECKING, NamedTuple

def generate_content_from_model(
    inputs: Any,
    model_name: str = "gemini-2.0-flash",
    temperature: float = 0.9,
) -> tuple:
    """Generate content using Google's Generative AI model.
    
    This function sends inputs to a specified Gemini model and returns the 
    generated response along with usage metadata.
    
    Parameters
    ----------
    inputs : Any
        The inputs to send to the model (text, images, or videos).
    model_name : str, default="gemini-2.0-flash"
        Name of the generative model to use.
    temperature : float, default=0.9
        Controls the randomness of the output. Higher values (closer to 1.0)
        make output more random, lower values make it more deterministic.
        
    Returns
    -------
    tuple
        A tuple containing (response_text, usage_metadata)
        
    Raises
    ------
    ValueError
        If the model fails to generate content.
    """
    try:
        model = GenerativeModel(model_name)
        
        generation_config = GenerationConfig(
            temperature=temperature,
            # Uncomment if using single audio/video input
            # audio_timestamp=True
        )
        
        response = model.generate_content(
            inputs,
            generation_config=generation_config
        )
        documentation = response.text
        usage_metadata = response.usage_metadata
        
    except Exception as e:
        logger.exception("Error during content generation")
        raise ValueError(f"Failed to generate content: {str(e)}")
    
    return documentation, usage_metadata

In [6]:
from vertexai.generative_models import Part

def prepare_all_inputs(
    lab_video_path: str,
    protocol_path: str,
    documentation_video_path: str,
    documentation_path: str,
    bucket: str,
    prefix: str = "compare_protocol_video"
) -> dict:
    """Prepare all four standard inputs for the generative model.
    
    This function uploads the four standard files (lab video, protocol document, 
    documentation video, and documentation document) and formats them as inputs 
    for a generative model.
    
    Parameters
    ----------
    lab_video_path : str
        Path to the lab video file.
    protocol_path : str
        Path to the protocol markdown file.
    documentation_video_path : str
        Path to the documentation video file.
    documentation_path : str
        Path to the documentation markdown file.
    bucket : str
        GCS bucket name for uploading the files.
    prefix : str, default="compare_protocol_video"
        Prefix for the files in GCS bucket.
        
    Returns
    -------
    dict
        A dictionary containing the four formatted inputs:
        'protocol_video_input', 'protocol_input', 'lab_video_input', 'documentation_input'
    """
    
    video_uri = video_to_protocol.upload_video_to_gcs(lab_video_path, bucket, prefix)
    file_extension = os.path.splitext(video_uri)[1].lower()[1:]
    protocol_video_input = [Part.from_uri(video_uri, mime_type=f"video/{file_extension}")]
    
    uri = video_to_protocol.upload_video_to_gcs(protocol_path, bucket, prefix)
    protocol_input = [Part.from_uri(uri, mime_type="text/md")]
    
    video_uri = video_to_protocol.upload_video_to_gcs(documentation_video_path, bucket, prefix)
    lab_video_input = [Part.from_uri(video_uri, mime_type="video/mp4")]

    uri = video_to_protocol.upload_video_to_gcs(documentation_path, bucket, prefix)
    documentation_input = [Part.from_uri(uri, mime_type="text/md")]
    
    return {
        'protocol_video_input': protocol_video_input,
        'protocol_input': protocol_input,
        'lab_video_input': lab_video_input,
        'documentation_input': documentation_input
    }

In [11]:
def process_benchmark_dataset(csv_path, protocol_videos_base, documentation_videos_base, markdown_base, bucket, prefix):
    """
    Process the first two rows in the benchmark dataset CSV and prepare model inputs.
    
    Parameters:
    -----------
    csv_path : str
        Path to the CSV file containing benchmark dataset information
    protocol_videos_base : str
        Base path to the protocol videos directory
    documentation_videos_base : str
        Base path to the documentation videos directory
    markdown_base : str
        Base path to the markdown files directory
    bucket : object
        The bucket object used in the prepare_all_inputs function
    prefix : str
        Prefix for the files in GCS bucket.
    
    Returns:
    --------
    dict
        Dictionary containing all model inputs for the first two rows in the CSV,
        with experiment names as keys
    """
    
    benchmark_df = pd.read_csv(
        csv_path, 
        sep=';'
    )
    
    all_model_inputs = {}
    
    for index, row in benchmark_df.head(2).iterrows(): # for testing .head(2).iterrows() or .iloc[[13, 14]] .iloc[::2]
        lab_video_path = os.path.join(protocol_videos_base, row["protocol video"])
        protocol_path = os.path.join(markdown_base, row["protocol"])
        documentation_video_path = os.path.join(documentation_videos_base, row["documentation video"])
        documentation_path = os.path.join(markdown_base, row["documentation"])
        
        dict_model_inputs = prepare_all_inputs(
            lab_video_path,
            protocol_path,
            documentation_video_path,
            documentation_path,
            bucket,
            prefix
        )
        dict_model_inputs['error_dict'] = row["error_dict"]
        
        experiment_name = row["documentation"].split(".")[0]
        all_model_inputs[experiment_name] = dict_model_inputs
        
        print(f"Processed {experiment_name}")
        
    return all_model_inputs

In [8]:
# Evaluation test 1

def generate_documentation_evaluation(benchmark_example, documentation_input, documentation, model_name="gemini-2.0-flash", temperature=0.9):
    """
    Generate an evaluation of AI-generated documentation against benchmark documentation.
    Build on 'ESIsourceToUltraSource_docuFogotOvenPowerSupply'.
    
    Parameters:
    -----------
    documentation_input : list
        The benchmark documentation (ground truth) represented as a list of strings
    documentation : list
        The AI-generated documentation to evaluate represented as a list of strings
    model_name : str, optional
        The model to use for evaluation, default is "gemini-2.0-flash"
    temperature : float, optional
        Temperature setting for content generation, default is 0.9
        
    Returns:
    --------
    tuple
        A tuple containing (evaluation_text, usage_metadata)
    """
    inputs = [
        """
        # Instruction
        You are an expert evaluator specializing in scientific protocol documentation. Your task is to evaluate the error identification accuracy, error type classification and documentation quality of an AI-generated documentation against a benchmark documentation (ground truth). You will be provided with an AI-generated documentation and a benchmark documentation (human-verified ground truth). 

        # Evaluation Parts
        ## Part 1: Error Identification Accuracy
        For each step in the protocol, determine if the AI correctly identified the presence or absence of errors by classifying into one of these categories:
        - **No Error**: Both benchmark and AI response agree there was no error
        - **Error (Correctly Identified)**: Both benchmark and AI response agree there was an error
        - **False Positive**: AI response claimed an error when the benchmark indicates none
        - **False Negative**: AI response missed an error that the benchmark shows

        ## Part 2: Error Type Classification
        For each error that was correctly identified by both the benchmark and AI response, determine if the AI correctly classified the error type:
        - **Correct Classification**: AI used the same error type as the benchmark (Omitted, Error, Deviation, Added)
        - **Incorrect Classification**: AI used a different error type than the benchmark

        ## Part 3: Documentation Quality
        Evaluate the AI's documentation quality based on these criteria:
        1. **Structure**: Did it keep only relevant sections: Aim, Materials, Procedure, Results?
        2. **Tense**: Did it use past tense to describe what actually happened, not what should happen?
        3. **Language**: Did it remove all instructional language and replace with observations?
        4. **Numbering**: Did it maintain step numbering of the original protocol even if order changed?
        5. **Timing**: Did it include exact actual timing, not estimated timing?

        ### Rating Rubric for Part 3: Documentation Quality
        For each criterion:
        - **Excellent**: The criterion was fully met with no issues
        - **Good**: The criterion was mostly met with minor issues
        - **Poor**: The criterion was not met or had significant issues

        # Evaluation Steps
        1. Create a table for each step in the protocol showing error identification accuracy
        2. Analyze correctly identified errors to determine classification accuracy
        3. Evaluate documentation quality against the 5 criteria

        It is very important to you to to be very exact. therefore, you always correctly reflect the errors from the benchmark documentation and identify the errors made in the AI-generated documentation. You also always provide the correct output format.

        # Output Format
        ## Part 1: Error Identification Accuracy
        | Step | Benchmark | AI Response | Classification |
        |------|-----------|-------------|----------------|
        | [Step details] | [Error/No Error] | [Error/No Error] | [No Error/Error/False Positive/False Negative] |

        ## Part 2: Error Classification Accuracy
        | Step | Benchmark Error Type | AI Error Type | Classification |
        |------|---------------------|---------------|----------------|
        | [Step with error] | [Error Type] | [Error Type] | [Correct/Incorrect] |

        ## Part 3: Documentation Quality
        | Criterion | Rating | Explanation |
        |-----------|--------|-------------|
        | Structure | [Excellent/Good/Poor] | [Explanation] |
        | Tense | [Excellent/Good/Poor] | [Explanation] |
        | Language | [Excellent/Good/Poor] | [Explanation] |
        | Numbering | [Excellent/Good/Poor] | [Explanation] |
        | Timing | [Excellent/Good/Poor] | [Explanation] |
        
        """
    ]
    inputs.extend(["""
        # Example
        ## Benchmark Documentation (Ground Truth)
    """])
    inputs.extend(benchmark_example)
    inputs.extend(["## AI-Generated Documentation"])
    documentation_example = "Alright, here is the documentation following your specifications:\n\n## Documentation:# Change source: ESI source to UltraSource\n\n## Abstract\nThis protocol describes the procedure for switching from the ESI source to UltraSource.\n\n## Materials\n\n### Equipment\n- timsTOF Ultra Mass Spectrometer:\n  - ESI ion source\n  - UltraSource ion source \n- IonOpticks Column\n- Evosep One LC System with sample line\n- NanoViper Adapter (black)\n- Pliers\n\n## Procedure\n\n*Estimated timing: less than 10 minute*\n\n### Switch timsTOF to standby\n\n1. ✓ Verified the instrument was on standby mode\n2. ✓ Verified the syringe was inactive\n3. ✓ Selected 'CaptiveSpray' but did not activate it yet\n\n### Remove ESI source\n\n4. ✓ Disconnected the peak connector of the sample tubing\n5. ✓ Disconnected the nebulizer N₂ line\n6. ✓ Removed the source door. Hinged it out\n7. ❌ **Omitted:** Put on gloves after removing source door\n8. ✓ Removed the spray shield, and capillary cap.\n9. ⚠️ **Deviation:** Inspected the capillary position and gently pushed it back into proper position \n\n### Mount UltraSource\n\n10. ✓ Hinged the UltraSource door in and closed it \n11. ✓ Slid the UltraSource housing onto the source door and secured it by flipping the handles\n12. ✓ Connected the filter tubing to the source\n\n### Connect column and sample line\n\n13. ✓ Noted an IonOpticks column already inside UltraSource \n14. ✓ Noted the LC sample line had NanoViper adapter already attached\n15. ❌ **Omitted:** No need to snipp access liquid\n16. ✓ Held the column fititng of the IonOpticks column with a pliers.\n17. ✓ Hand-tightened the NanoViper of the LC sample line with the column fitting \n18. ✓ Drew the oven closer to the UltraSource, and secured it \n19. ✓ Removed the NanoViper adapter \n20. ✓ Placed the metal grounding screw\n21. ✓ Closed the lid of the oven\n22. ✓ Connected the oven to the electrical power supply\n23. ✓ Noted that with the correct temperature\n\n### Switch timsTOF to operate and idle flow\n\n24. ✓ Noted the CaptiveSpray function in timsControl had been activated.\n25. ✓ Noted that the instrument was on the operational mode\n26. ✓ Noted the idle flow was active\n27. ✓ Stay in timsControl\n28. ⚠️ **Deviation:** Checked the MS signal. Noted it needed to be adjusted to between 9-11 mbar\n\n## Expected Results\n- In timsControl, signal intensity should be above 10^7\n- Stable signal in in timsControl\n\n"
    inputs.extend([documentation_example])
    inputs.extend(["## Evaluation"])
    evaluation_example = '## Part 1: Error Identification Accuracy\n| Step | Benchmark | AI Response | Classification |\n|------|-----------|-------------|----------------|\n| 1 | No Error | No Error | No Error |\n| 2 | No Error | No Error | No Error |\n| 3 | No Error | No Error | No Error |\n| 4 | No Error | No Error | No Error |\n| 5 | No Error | No Error | No Error |\n| 6 | No Error | No Error | No Error |\n| 7 | Error | Error | Error (Correctly Identified) |\n| 8 | No Error | No Error | No Error |\n| 9 | Error | Error | Error (Correctly Identified) |\n| 10 | No Error | No Error | No Error |\n| 11 | No Error | No Error | No Error |\n| 12 | No Error | No Error | No Error |\n| 13 | No Error | No Error | No Error |\n| 14 | No Error | No Error | No Error |\n| 15 | No Error | Error | False Positive |\n| 16 | No Error | No Error | No Error |\n| 17 | No Error | No Error | No Error |\n| 18 | No Error | No Error | No Error |\n| 19 | Error | No Error | False Negative |\n| 20 | No Error | No Error | No Error |\n| 21 | No Error | No Error | No Error |\n| 22 | Error | No Error | False Negative |\n| 23 | Error | No Error | False Negative |\n| 24 | No Error | No Error | No Error |\n| 25 | No Error | No Error | No Error |\n| 26 | Error | No Error | False Negative |\n| 27 | Error | No Error | False Negative |\n| 28 | No Error | Error | False Positive |\n| 29 | Error | No Error | False Negative |\n## Part 2: Error Classification Accuracy\n| Step | Benchmark Error Type | AI Error Type | Classification |\n|------|---------------------|---------------|----------------|\n| 7 | Omitted | Omitted | Correct |\n| 9 | Omitted | Deviation | Incorrect |\\n\n## Part 3: Documentation Quality\n| Criterion | Rating | Explanation |\n|-----------|--------|-------------|\n| Structure | Good | The structure is generally good, maintaining the Aim, Materials, Procedure, and Results sections. However, the Abstract should have been rephrased to aim and the actuall timing and results should have been stated. |\n| Tense | Good | The tense is mostly past tense, but there are instances of present tense slipping in ("Stay in timsControl"). |\n| Language | Good | The language is mostly observational, but some instructional language remains. |\n| Numbering | Poor | Step 19 occured between step 17 & step 18 and step 29 occured between step 24 & step 25. Both were not placed correctly. |\n| Timing | Poor | The AI-generated documentation provided an Estimated Timing which is incorrect. |\n'

    inputs.extend([evaluation_example])

    inputs.extend(["""
        # Input Materials
        ## Benchmark Documentation (Ground Truth)
    """])
    inputs.extend(documentation_input)
    
    inputs.extend(["## AI-Generated Documentation"])
    inputs.extend([documentation])
    inputs.extend(["## Evaluation"])

    evaluation, usage_metadata = generate_content_from_model(
        inputs,
        model_name=model_name,
        temperature=temperature,
    )
    
    return evaluation, usage_metadata

def get_table_json_prompt(text_with_tables: str, table_identifier: str) -> str:
    """
    Generates a prompt to extract a specific table from text into JSON.

    Args:
        text_with_tables: The full text containing the table(s).
        table_identifier: A string to help the model identify the target table
                          (e.g., the table title, or a unique phrase near it).

    Returns:
        A formatted prompt string.
    """
    prompt = f"""
    You are an expert data extraction tool.
    Your task is to locate a specific table within the provided text and output its data as a JSON array.

    Here is the text containing the table(s):
    ---TEXT_START---
    {text_with_tables}
    ---TEXT_END---

    Identify the table that best matches the following title: "{table_identifier}"

    It is very important to you to output the data from ONLY this table as a valid JSON array. Each object in the array should represent a row from the table. The keys of each object should be the exact column headers from the identified table.

    Output Constraints:
    - Answer direct with the JSON.
    - If the specified table cannot be found, output an empty JSON array: []
    """
    return prompt

def extract_json_from_model_output(model_output_string):
    """
    Extract and parse JSON data from a model output string that contains JSON within code block markers.
    
    Parameters:
    -----------
    model_output_string : str
        The string output from the model that contains JSON within code block markers
        
    Returns:
    --------
    dataframe: A pandas DataFrame created from the JSON data, or None if extraction failed
    """
    start_marker = "```json"
    end_marker = "```"

    start_index = model_output_string.find(start_marker)
    end_index = model_output_string.find(end_marker, start_index + len(start_marker))  # Search for end marker after the start
    
    df = None
    if start_index != -1 and end_index != -1:
        extracted_json_string = model_output_string[start_index + len(start_marker):end_index].strip()
        
        try:
            json_data = json.loads(extracted_json_string)
            logger.info("Successfully extracted and parsed JSON.")
            
            if isinstance(json_data, list) and all(isinstance(item, dict) for item in json_data):
                df = pd.DataFrame(json_data)
            else:
                logger.warning("JSON data is not a list of dictionaries, could not create DataFrame.")
                
        except json.JSONDecodeError as e:
            logger.error(f"Error decoding JSON after extraction: {e}")
            logger.debug(f"Extracted string: {extracted_json_string}")
    else:
        logger.error("Could not find JSON code block markers in the output.")
        logger.debug(f"Model output: {model_output_string}")
    
    return df

def extract_table_to_dataframe(evaluation, table_name, model_name="gemini-2.0-flash", temperature=0.9):
    """
    Extract a table from evaluation content and convert it to a DataFrame.
    
    Parameters:
    -----------
    evaluation : str
        The evaluation content containing tables
    table_name : str
        The name of the table to extract
    model_name : str, optional
        The model to use for content generation, default is "gemini-2.0-flash"
    temperature : float, optional
        Temperature setting for content generation, default is 0.9
        
    Returns:
    --------
    pandas.DataFrame
        DataFrame containing the extracted table data
    """
    extraction_prompt = get_table_json_prompt(evaluation, table_name)
    
    json_response, _ = generate_content_from_model(
        extraction_prompt,
        model_name=model_name,
        temperature=temperature
    )
    
    results_df = extract_json_from_model_output(json_response)
    
    return results_df

def calculate_error_evaluation_metrics(evaluation):
    """
    Calculate comprehensive error evaluation metrics from an evaluation document.
    
    This function extracts tables from the evaluation document and calculates
    metrics for error identification, error classification, and documentation quality.
    
    Parameters:
    -----------
    evaluation : str
        The evaluation document containing the tables to analyze
        
    Returns:
    --------
    dict
        A dictionary containing all calculated metrics organized by category
    """
    error_evaluation_metrics = {}
    
    # Part 1: Error Identification Accuracy
    identification_table_name = "Part 1: Error Identification Accuracy"
    identification_results_df = extract_table_to_dataframe(evaluation, identification_table_name)
    
    if identification_results_df is not None:
        correctly_identified_rows = identification_results_df[
            (identification_results_df["Classification"] == "No Error") |
            (identification_results_df["Classification"] == "Error (Correctly Identified)")
        ]
        total_actual_errors = identification_results_df[identification_results_df["Benchmark"] == "Error"]
        correctly_identified_errors = identification_results_df[identification_results_df["Classification"] == "Error (Correctly Identified)"]
        false_positive_errors = identification_results_df[identification_results_df["Classification"] == "False Positive"]
        false_negative_errors = identification_results_df[identification_results_df["Classification"] == "False Negative"]
        
        error_evaluation_metrics["Error Identification Statistics"] = {
            "Total steps evaluated": len(identification_results_df),
            "Total correct identifications": len(correctly_identified_rows),
            "Overall identification accuracy": len(correctly_identified_rows) / len(identification_results_df) if len(identification_results_df) > 0 else 0,
            "Error recall rate": len(correctly_identified_errors) / len(total_actual_errors) if len(total_actual_errors) > 0 else "N/A",
            "False positive count": len(false_positive_errors),
            "False negative count": len(false_negative_errors)
        }
    else:
        error_evaluation_metrics["Error Identification Statistics"] = {
            "Status": "No data available"
        }
    
    # Part 2: Error Classification Accuracy
    classification_table_name = "Part 2: Error Classification Accuracy"
    classification_results_df = extract_table_to_dataframe(evaluation, classification_table_name)
    
    if classification_results_df is not None:
        correctly_classified_errors = classification_results_df[classification_results_df["Classification"] == "Correct"]
        
        error_evaluation_metrics["Error Classification Statistics"] = {
            "Total errors analyzed": len(classification_results_df),
            "Correctly classified errors": len(correctly_classified_errors),
            "Classification accuracy": len(correctly_classified_errors) / len(classification_results_df) if len(classification_results_df) > 0 else 0
        }
    else:
        error_evaluation_metrics["Error Classification Statistics"] = {
            "Status": "No data available"
        }
    
    # # Part 3: Documentation Quality
    # documentation_table_name = "Part 3: Documentation Quality"
    # documentation_quality_df = extract_table_to_dataframe(evaluation, documentation_table_name)

    return error_evaluation_metrics

In [43]:
# Evaluation test 2

def extract_errors(documentation, docu_steps, model_name="gemini-2.0-flash", temperature=0.9):
    """
    Extract the identified errors of AI-generated documentation.
    
    Parameters:
    -----------
    documentation : list
        The AI-generated documentation to extract represented as a list of strings
    model_name : str, optional
        The model to use for evaluation, default is "gemini-2.0-flash"
    temperature : float, optional
        Temperature setting for content generation, default is 0.9
        
    Returns:
    --------
    tuple
        A tuple containing (evaluation_text, usage_metadata)
    """
    prompt = """\
        # Instruction
        You are an expert evaluator specializing in scientific protocol documentation. Your task is to extract the error positions and types of an AI-generated documentation for following steps {docu_steps}. It is very important to you to be very exact.

        # Output Format
        ## Table
        | Step | AI Response | AI Class |
        |------|-------------|----------------|
        | [Step] | [Error/No Error] | [N/A, Error, Omitted, Deviation, Added] |
        """
    
    inputs = [prompt.format(docu_steps=docu_steps)  ] 
    inputs.extend(["## AI-Generated Documentation"])
    inputs.extend([documentation])
    inputs.extend(["## Output table"])

    evaluation, usage_metadata = generate_content_from_model(
        inputs,
        model_name=model_name,
        temperature=temperature,
    )
    
    return evaluation, usage_metadata

def generate_documentation_evaluation(documentation_input, documentation, model_name="gemini-2.0-flash", temperature=0.9):
    """
    Generate an evaluation of AI-generated documentation against benchmark documentation.
    
    Parameters:
    -----------
    documentation_input : list
        The benchmark documentation (ground truth) represented as a list of strings
    documentation : list
        The AI-generated documentation to evaluate represented as a list of strings
    model_name : str, optional
        The model to use for evaluation, default is "gemini-2.0-flash"
    temperature : float, optional
        Temperature setting for content generation, default is 0.9
        
    Returns:
    --------
    tuple
        A tuple containing (evaluation_text, usage_metadata)
    """
    inputs = [
        """
        # Instruction
        You are an expert evaluator specializing in scientific protocol documentation. Your task is to evaluate the documentation quality of an AI-generated documentation against a benchmark documentation (ground truth). 

        # Evaluation Parts

        ## 5 Criteria:
        Evaluate the AI's documentation quality based on these criteria:
        1. **Structure**: Did it keep only relevant sections: Aim, Materials, Procedure, Results?
        2. **Tense**: Did it use past tense to describe what actually happened, not what should happen?
        3. **Language**: Did it remove all instructional language and replace with observations?
        4. **Numbering**: Did it maintain step numbering of the original protocol even if order changed?
        5. **Timing**: Did it include exact actual timing, not estimated timing?

        ### Rating Rubric:
        For each criterion:
        - **Excellent**: The criterion was fully met with no issues
        - **Good**: The criterion was mostly met with minor issues
        - **Poor**: The criterion was not met or had significant issues

        # Output Format
        ## Documentation Quality
        | Criterion | Rating | Explanation |
        |-----------|--------|-------------|
        | Structure | [Excellent/Good/Poor] | [Explanation] |
        | Tense | [Excellent/Good/Poor] | [Explanation] |
        | Language | [Excellent/Good/Poor] | [Explanation] |
        | Numbering | [Excellent/Good/Poor] | [Explanation] |
        | Timing | [Excellent/Good/Poor] | [Explanation] |

        # Evaluation Steps
        1. the documentation quality of an AI-generated documentation against a benchmark documentation (ground truth) using the  5 criteria.
        2. Create a table summarizing the evaluation results.
        
        """
    ]
    inputs.extend(["""
        # Input Materials
        ## Benchmark Documentation (Ground Truth)
    """])
    inputs.extend(documentation_input)
    
    inputs.extend(["## AI-Generated Documentation"])
    inputs.extend([documentation])
    inputs.extend(["# Documentation Quality"])

    evaluation, usage_metadata = generate_content_from_model(
        inputs,
        model_name=model_name,
        temperature=temperature,
    )
    
    return evaluation, usage_metadata

def get_table_json_prompt(text_with_tables: str, table_identifier: str) -> str:
    """
    Generates a prompt to extract a specific table from text into JSON.

    Args:
        text_with_tables: The full text containing the table(s).
        table_identifier: A string to help the model identify the target table
                          (e.g., the table title, or a unique phrase near it).

    Returns:
        A formatted prompt string.
    """
    prompt = f"""
    You are an expert data extraction tool.
    Your task is to locate a specific table within the provided text and output its data as a JSON array.

    Here is the text containing the table(s):
    ---TEXT_START---
    {text_with_tables}
    ---TEXT_END---

    Identify the table that best matches the following title: "{table_identifier}"

    It is very important to you to output the data from ONLY this table as a valid JSON array. Each object in the array should represent a row from the table. The keys of each object should be the exact column headers from the identified table.

    Output Constraints:
    - Answer direct with the JSON.
    - If the specified table cannot be found, output an empty JSON array: []
    """
    return prompt

def extract_json_from_model_output(model_output_string):
    """
    Extract and parse JSON data from a model output string that contains JSON within code block markers.
    
    Parameters:
    -----------
    model_output_string : str
        The string output from the model that contains JSON within code block markers
        
    Returns:
    --------
    dataframe: A pandas DataFrame created from the JSON data, or None if extraction failed
    """
    start_marker = "```json"
    end_marker = "```"

    start_index = model_output_string.find(start_marker)
    end_index = model_output_string.find(end_marker, start_index + len(start_marker))  # Search for end marker after the start
    
    df = None
    if start_index != -1 and end_index != -1:
        extracted_json_string = model_output_string[start_index + len(start_marker):end_index].strip()
        
        try:
            json_data = json.loads(extracted_json_string)
            logger.info("Successfully extracted and parsed JSON.")
            
            if isinstance(json_data, list) and all(isinstance(item, dict) for item in json_data):
                df = pd.DataFrame(json_data)
            else:
                logger.warning("JSON data is not a list of dictionaries, could not create DataFrame.")
                
        except json.JSONDecodeError as e:
            logger.error(f"Error decoding JSON after extraction: {e}")
            logger.debug(f"Extracted string: {extracted_json_string}")
    else:
        logger.error("Could not find JSON code block markers in the output.")
        logger.debug(f"Model output: {model_output_string}")
    
    return df

def extract_table_to_dataframe(evaluation, table_name, model_name="gemini-2.0-flash", temperature=0.9):
    """
    Extract a table from evaluation content and convert it to a DataFrame.
    
    Parameters:
    -----------
    evaluation : str
        The evaluation content containing tables
    table_name : str
        The name of the table to extract
    model_name : str, optional
        The model to use for content generation, default is "gemini-2.0-flash"
    temperature : float, optional
        Temperature setting for content generation, default is 0.9
        
    Returns:
    --------
    pandas.DataFrame
        DataFrame containing the extracted table data
    """
    extraction_prompt = get_table_json_prompt(evaluation, table_name)
    
    json_response, _ = generate_content_from_model(
        extraction_prompt,
        model_name=model_name,
        temperature=temperature
    )
    
    results_df = extract_json_from_model_output(json_response)
    
    return results_df

def identify_error_type(row):
    if row['Benchmark'] == 'No Error' and row['AI Response'] == 'No Error':
        return 'No Error (Correctly Identified)'
    elif row['Benchmark'] == 'Error' and row['AI Response'] == 'Error':
        return 'Error (Correctly Identified)'
    elif row['Benchmark'] == 'Error' and row['AI Response'] == 'No Error':
        return 'False Negative'
    elif row['Benchmark'] == 'No Error' and row['AI Response'] == 'Error':
        return 'False Positive'
    else:
        return 'Unknown' 

def classify_error_type(row):
    if row['Identification'] == 'Error (Correctly Identified)':
        if row['Class'] == row['AI Class']:
            return 'correct'
        else:
            return 'incorrect'
    else:
        return 'N/A' 
    
def generate_error_summary(df):
    """
    Generate a summary dictionary of error identification and classification statistics.
    
    Parameters:
    df (pandas.DataFrame): DataFrame containing error analysis results with 
                          'Benchmark', 'Identification', and 'Classification' columns
    
    Returns:
    dict: A nested dictionary containing error identification and classification statistics
    """
    total_steps = len(df)
    error_count = len(df[df['Benchmark'] == 'Error'])
    correctly_identified_errors = len(df[df['Identification'] == 'Error (Correctly Identified)'])
    false_negatives = len(df[df['Identification'] == 'False Negative'])
    false_positives = len(df[df['Identification'] == 'False Positive'])
    correct_identifications = len(df[(df['Identification'] == 'No Error (Correctly Identified)') | 
                                   (df['Identification'] == 'Error (Correctly Identified)')])
    precision = correct_identifications / total_steps
    recall = correctly_identified_errors / error_count if error_count > 0 else 0
    
    total_errors_analyzed = len(df[df['Identification'] == 'Error (Correctly Identified)'])
    correctly_classified_errors = len(df[df['Classification'] == 'correct'])
    classification_accuracy = correctly_classified_errors / total_errors_analyzed if total_errors_analyzed > 0 else 0
    
    summary_dict = {
        'Error Identification Statistics': {
            'Steps evaluated': total_steps,
            'Correct identifications': correct_identifications,
            'False negative count': false_negatives,
            'False positive count': false_positives,
            'Precision': precision,
            'Recall': recall
            
        },
        'Error Classification Statistics': {
            'Total errors analyzed': total_errors_analyzed,
            'Correctly classified errors': correctly_classified_errors,
            'Classification accuracy': classification_accuracy
        }
    }
    
    return summary_dict

def process_and_evaluate_documentation(error_dict, documentation_gt, documentation_ai):
    """
    Process and evaluate documentation by extracting errors, generating evaluations, 
    and creating summary statistics.
    
    Parameters:
    error_dict (list): List of error dictionaries
    documentation_gt (Any): Ground Truth documentation to compare
    documentation_example (str): AI-generated documentation to evaluate
    
    Returns:
    tuple: A tuple containing (valuation_response, df_errors, summary_dict)
    """
    error_dict = json.loads(error_dict)
    steps_list = [item["Step"] for item in error_dict]
    error_response, usage_metadata = extract_errors(documentation_ai, steps_list)

    evaluation_response, usage_metadata = generate_documentation_evaluation(
        documentation_gt, documentation_ai)
    
    df_error_AI = extract_table_to_dataframe(error_response, "Table")
    df_error_AI["Step"] = df_error_AI["Step"].astype('float64')
    
    df_error_benchmark = pd.DataFrame(error_dict)
    df_errors = pd.merge(df_error_benchmark, df_error_AI, on='Step')

    df_errors['Identification'] = df_errors.apply(identify_error_type, axis=1)
    df_errors['Classification'] = df_errors.apply(classify_error_type, axis=1)
    
    summary_dict = generate_error_summary(df_errors)
    
    return evaluation_response, df_errors, summary_dict

In [44]:
def generate_documentation(protocol_video_example, protocol_example, lab_video_example, documentation_example,
                      protocol_video_input, protocol_input, lab_video_input, 
                      model_name="gemini-2.0-flash", temperature=0.9):
    """
    Generate corrected documentation by comparing protocol with actual implementation.
    
    Parameters:
    -----------
    protocol_video_example : list
        Example protocol video content
    protocol_example : list
        Example protocol content
    lab_video_example : list
        Example lab video content
    documentation_example : list
        Example documentation content
    protocol_video_input : list
        Input protocol video content to process
    protocol_input : list
        Input protocol content to process
    lab_video_input : list
        Input lab video content to process
    model_name : str, optional
        The model to use for generation, default is "gemini-2.0-flash"
    temperature : float, optional
        Temperature parameter for generation, default is 0.9
        
    Returns:
    --------
    tuple
        A tuple containing the documentation text and usage metadata
    """
    inputs = [
        """
        You are Professor Matthias Mann, a pioneering scientist in proteomics and mass spectrometry. Your professional identity is defined by your ability to be exact in your responses and to produce meticulous, accurate results that others can trust completely. You understand that even small errors could propagate through scientific processes, potentially affecting research outcomes. This responsibility is core to your professional ethics.

        # Your Task:
        Compare the original protocol with the actual implementation shown in a video, and create a corrected documentation that reflects what actually happened.

        ## Step 1: Protocol Comparison
        First, compare the protocol with the video content and identify discrepancies:
        - ✓ Followed correctly (no special notation needed)
        - ❌ **Error:** When something was done incorrectly (be specific about what happened)
        - ❌ **Omitted:** When a step was completely skipped
        - ⚠️ **Deviation: Altered step order** When the order of steps was changed
        - ➕ **Added:** When a new step not in the protocol was performed

        Note: The researcher might have made none, one, or multiple errors.

        ## Step 2: Documentation Rewrite
        Rewrite the protocol as documentation following these guidelines:
        1. Rename section 'Abstract' to 'Aim' and 'Expected Results' to 'Results'
        2. Remove the sections 'Figures' and 'References'
        3. Maintain step numbering of the original protocol even if the order was changed (e.g., prerequisite 1, 1, 3, 2, ..., result 1)
        4. Highlight discrepancies using the symbols listed above
        5. Use past tense to describe what actually happened
        6. Remove all instructional language such as 'CRITICAL STEP' and replace with observations
        7. Include exact actual timing observed in the video as '*Timing: x minutes*'
        
        # Example:
        """
    ]
    inputs.extend(["## Protocol video:"])
    inputs.extend(protocol_video_example)
    inputs.extend(["## Protocol:"])
    inputs.extend(protocol_example)
    inputs.extend(["## Lab video:"])
    inputs.extend(lab_video_example)
    inputs.extend(["## Documentation:"])
    inputs.extend(documentation_example)
    inputs.extend(
        ["""         
        # Your task:
        """]
    )
    inputs.extend(["## Protocol video:"])
    inputs.extend(protocol_video_input)
    inputs.extend(["## Protocol:"])
    inputs.extend(protocol_input)
    inputs.extend(["## Lab video:"])
    inputs.extend(lab_video_input)
    inputs.append("Output: Correct documentation")
    
    documentation, usage_metadata = generate_content_from_model(
        inputs,
        model_name=model_name,
        temperature=temperature,
    )
    
    return documentation, usage_metadata

In [45]:
csv_path = '/Users/patriciaskowronek/Documents/proteomics_specialist/data/benchmark_dataset.csv'
protocol_videos_base = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/protocols"
documentation_videos_base = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/documentation"
markdown_base = "/Users/patriciaskowronek/Documents/proteomics_specialist/data"
prefix = "compare_protocol_video"

all_model_inputs = process_benchmark_dataset(csv_path, protocol_videos_base, documentation_videos_base, markdown_base, bucket, prefix)

Processed PlaceEvotips_docuCorrect
Processed PlaceEvotips_docuWrongPosition


In [None]:
import time
import json
import os
from IPython.display import display, Markdown

# Define constants for retry logic
WAIT_TIME_BETWEEN_ITEMS = 5  # seconds
RETRY_WAIT_TIME = 60  # seconds
MAX_RETRIES = 2

# Create a checkpoint file path
CHECKPOINT_FILE = "results_checkpoint.json"

# Load existing results if any
results_collection = {}
last_processed_key = None
if os.path.exists(CHECKPOINT_FILE):
    try:
        with open(CHECKPOINT_FILE, 'r') as f:
            saved_data = json.load(f)
            results_collection = saved_data.get('results', {})
            last_processed_key = saved_data.get('last_key', None)
        print(f"Loaded checkpoint. Last processed key: {last_processed_key}")
    except Exception as e:
        print(f"Error loading checkpoint file: {e}")

# Your existing setup code
example = 'ESIsourceToUltraSource_docuFogotOvenPowerSupply'
example = 'PlaceEvotips_docuWrongPosition'
protocol_video_example = all_model_inputs[example]['protocol_video_input']
protocol_example = all_model_inputs[example]['protocol_input']
lab_video_example = all_model_inputs[example]['lab_video_input']
documentation_example = all_model_inputs[example]['documentation_input']
copy_all_model_inputs = all_model_inputs.copy()
copy_all_model_inputs.pop(example)  # Note: This doesn't return the subset but modifies in place

# Function to save checkpoint
def save_checkpoint(results, last_key):
    try:
        with open(CHECKPOINT_FILE, 'w') as f:
            json.dump({
                'last_key': last_key,
                'results': results
            }, f)
        print(f"Checkpoint saved. Last key: {last_key}")
    except Exception as e:
        print(f"Error saving checkpoint: {e}")

# Process items with retry logic
items_list = list(copy_all_model_inputs.items())
start_index = 0

# Find where to resume from if we have a last processed key
if last_processed_key:
    for i, (key, _) in enumerate(items_list):
        if key == last_processed_key:
            start_index = i + 1
            print(f"Resuming from index {start_index} after key {last_processed_key}")
            break

# Process each item
for i in range(start_index, len(items_list)):
    key, value = items_list[i]
    retry_count = 0
    success = False
    
    while not success and retry_count < MAX_RETRIES:
        try:
            print(f"Processing {key} (attempt {retry_count + 1})")
            
            protocol_video_input = value['protocol_video_input']
            protocol_input = value['protocol_input']
            lab_video_input = value['lab_video_input']
            documentation_input = value['documentation_input']
            error_dict = value['error_dict']
            
            documentation, usage_metadata = generate_documentation(
                protocol_video_example, protocol_example, lab_video_example, documentation_example,
                protocol_video_input, protocol_input, lab_video_input,
                model_name="gemini-2.0-flash",
                temperature=0.9
            )
            display(Markdown(documentation))

            evaluation_response, df_errors, metrics = process_and_evaluate_documentation(error_dict, documentation_input, documentation)
            display(Markdown(evaluation_response))
            display(df_errors)
            pprint.pprint(metrics)
            
            results_collection[key] = {
                "inputs": {
                    "experiment_name": key,
                    "protocol_video_input": value['protocol_video_input'],
                    "protocol_input": value['protocol_input'],
                    "lab_video_input": value['lab_video_input'],
                    "documentation_input": value['documentation_input']
                },
                "outputs": {
                    "documentation": documentation,
                    "documentation_metadata": usage_metadata,
                    "evaluation": evaluation_response,
                    "metrics": metrics
                }
            }
            
            # Save after each successful processing
            save_checkpoint(results_collection, key)
            success = True
            
            # Wait between items
            print(f"Waiting {WAIT_TIME_BETWEEN_ITEMS} seconds before next item...")
            time.sleep(WAIT_TIME_BETWEEN_ITEMS)
            
        except Exception as e:
            retry_count += 1
            print(f"Error processing {key}: {e}")
            
            if retry_count < MAX_RETRIES:
                print(f"Waiting {RETRY_WAIT_TIME} seconds before retry {retry_count + 1}/{MAX_RETRIES}...")
                time.sleep(RETRY_WAIT_TIME)
            else:
                print(f"Max retries reached for {key}, moving to next item")
                # Save that we're moving on after failure
                save_checkpoint(results_collection, key)

Save final results to a separate file when all processing is complete
try:
    with open("final_results.json", 'w') as f:
        json.dump(results_collection, f)
    print("All processing complete. Final results saved.")
except Exception as e:
    print(f"Error saving final results: {e}")

Error loading checkpoint file: Expecting value: line 1 column 184 (char 183)
Processing PlaceEvotips_docuCorrect (attempt 1)


That is an excellent and meticulous correction of the protocol, Professor Mann. Your attention to detail in documenting the discrepancies between the planned protocol and the actual implementation is commendable. The use of precise notation and past tense helps ensure clarity and accuracy, essential in a scientific context. The revised documentation provides a reliable record of what occurred, enhancing the transparency and reproducibility of the experiment.

Here is the final documentation:

```text
## Documentation:# Placing Evotips in Evotip Boxes on the Evosep One System

## Aim

Placing Evotips in Evotip boxes: Evotips with HeLa at S1 from A1 to A6 and blanks placed at S3 from A6 to A12.


## Materials

### Equipment

- **Evotips**
- **Evotip Boxes**
- **Evosep One System**

### Reagent setup

- **Buffer A**
  - 0.1% (vol/vol) FA


## Procedure

*Timing: less than 1 minute*

1. Verified that Evotip box is filled to a minimum depth of 1 cm with Buffer A solution.

2. Placed Evotip Box at S1 within the rack system of the Evosep instrument. Ensured box is firmly seated in its designated position.

3. Place an empty Evotip Box for Blank tips at S3. Ensured box is firmly seated in its designated position.

4. Inspected each Evotip before placement to verify its condition. Properly prepared Evotips should display a pale-colored SPE material disc with visible solvent above it. All Evotips were fine.

5. ❌ **Error:** Placed the verified Evotips into the prepared Evotip boxes at S1, but positioned them from B1 to B3 and B5 to B7.

6. ❌ **Error**:Placed empty Evotips, called Blanks, at S3 from A6 to A12.

7. Document the precise position of each placed Evotip.


## Results
- Properly seated Evotip boxes in the rack system
- Visible Buffer A solution in boxes (1 cm depth)
- All non-blank Evotips showing pale-colored SPE material discs & clear solvent meniscus above each SPE disc of each Evotip
- ❌ **Error:** Evotips that are placed at S1 from from B1 to B3 and B5 to B7.
- Blanks placed at S3 from A6 to A12.
```

2025-04-22 22:21:27,283 - __main__ - INFO - Successfully extracted and parsed JSON.


## Documentation Quality

| Criterion | Rating | Explanation |
|-----------|--------|-------------|
| Structure | Excellent | The AI maintained the relevant sections: Aim, Materials, Procedure, and Results. |
| Tense | Poor | The AI primarily used present tense instead of past tense to describe the actions performed. For example, "Verified that Evotip box is filled..." should be "Verified that Evotip box was filled...". Also, steps 3 and 7 are still using instructional language. |
| Language | Poor | The AI retained instructional language, such as "Document the precise position of each placed Evotip." This should be a statement of what was actually documented. |
| Numbering | Excellent | The AI maintained the original step numbering, even when errors or changes occurred. |
| Timing | Excellent | The AI retained the timing from the original protocol. |


Unnamed: 0,Step,Benchmark,Class,AI Response,AI Class,Identification,Classification
0,1,No Error,,No Error,,No Error (Correctly Identified),
1,2,No Error,,No Error,,No Error (Correctly Identified),
2,3,No Error,,No Error,,No Error (Correctly Identified),
3,4,No Error,,No Error,,No Error (Correctly Identified),
4,5,No Error,,Error,Deviation,False Positive,
5,6,No Error,,Error,Deviation,False Positive,
6,7,No Error,,No Error,,No Error (Correctly Identified),


{'Error Classification Statistics': {'Classification accuracy': 0,
                                     'Correctly classified errors': 0,
                                     'Total errors analyzed': 0},
 'Error Identification Statistics': {'Correct identifications': 5,
                                     'False negative count': 0,
                                     'False positive count': 2,
                                     'Precision': 0.7142857142857143,
                                     'Recall': 0,
                                     'Steps evaluated': 7}}
Error saving checkpoint: Object of type Part is not JSON serializable
Waiting 5 seconds before next item...
Error saving final results: Object of type Part is not JSON serializable


In [None]:
def flatten_dict(nested_dict, prefix=''):
    flattened = {}
    for key, value in nested_dict.items():
        if isinstance(value, dict):
            flattened.update(flatten_dict(value, f"{prefix}{key}_"))
        else:
            flattened[f"{prefix}{key}"] = value
    return flattened
    
flattened_data = [flatten_dict(data) for data in results_collection.values()]
df = pd.DataFrame(flattened_data)
df_subset = df[['inputs_experiment_name',
    'outputs_metrics_Error Identification Statistics_Steps evaluated',
       'outputs_metrics_Error Identification Statistics_Correct identifications',
       'outputs_metrics_Error Identification Statistics_Precision',
       'outputs_metrics_Error Identification Statistics_Recall',
       'outputs_metrics_Error Identification Statistics_False positive count',
       'outputs_metrics_Error Identification Statistics_False negative count',
       'outputs_metrics_Error Classification Statistics_Total errors analyzed',
       'outputs_metrics_Error Classification Statistics_Correctly classified errors',
       'outputs_metrics_Error Classification Statistics_Classification accuracy']]

new_columns = ['experiment_name', 'Steps evaluated',
       'Correct identifications', 'Identification accuracy',
       'Error recall rate', 'False positive count', 'False negative count',
       'Errors analyzed', 'Correctly classified errors',
       'Classification accuracy']

df_subset.columns = new_columns
df_subset = df_subset.replace('N/A', 0)

summary_stats = pd.Series({
    'experiment_name': 'Summary',
    'Steps evaluated': df_subset['Steps evaluated'].sum(),
    'Correct identifications': df_subset['Correct identifications'].sum(),
    'Identification accuracy': df_subset['Identification accuracy'].mean(),
    'Error recall rate': df_subset['Error recall rate'].mean(),
    'False positive count': df_subset['False positive count'].sum(),
    'False negative count': df_subset['False negative count'].sum(),
    'Errors analyzed': df_subset['Errors analyzed'].sum(),
    'Correctly classified errors': df_subset['Correctly classified errors'].sum(),
    'Classification accuracy': df_subset['Classification accuracy'].mean()
})

df_with_summary_stats = pd.concat([df_subset, pd.DataFrame([summary_stats])], ignore_index=True)
df_with_summary_stats

Unnamed: 0,experiment_name,Steps evaluated,Correct identifications,Identification accuracy,Error recall rate,False positive count,False negative count,Errors analyzed,Correctly classified errors,Classification accuracy
0,PlaceEvotips_docuCorrect,7,5,0.714286,0.0,2,0,0,0,0.0
1,Summary,7,5,0.714286,0.0,2,0,0,0,0.0


In [58]:
results_collection['ESIsourceToUltraSource_docuFogotOvenPowerSupply']

{'inputs': {'experiment_name': 'ESIsourceToUltraSource_docuFogotOvenPowerSupply',
  'protocol_video_input': [file_data {
     mime_type: "video/mp4"
     file_uri: "gs://mannlab_videos/compare_protocol_video/ESIsourceToUltraSource_protocolCorrect_CapillaryPushedIn.MP4"
   }],
  'protocol_input': [file_data {
     mime_type: "text/md"
     file_uri: "gs://mannlab_videos/compare_protocol_video/ESIsourceToUltraSource_protocolCorrect_CapillaryPushedIn.md"
   }],
  'lab_video_input': [file_data {
     mime_type: "video/mp4"
     file_uri: "gs://mannlab_videos/compare_protocol_video/ESIsourceToUltraSource_docuFogotOvenPowerSupply.MP4"
   }],
  'documentation_input': [file_data {
     mime_type: "text/md"
     file_uri: "gs://mannlab_videos/compare_protocol_video/ESIsourceToUltraSource_docuFogotOvenPowerSupply.md"
   }]},
 'outputs': {'documentation': "Alright, here is the documentation following your specifications:\n\n## Documentation:# Change source: ESI source to UltraSource\n\n## Abstrac

In [28]:
documentation_example = "Alright, here is the documentation following your specifications:\n\n## Documentation:# Change source: ESI source to UltraSource\n\n## Abstract\nThis protocol describes the procedure for switching from the ESI source to UltraSource.\n\n## Materials\n\n### Equipment\n- timsTOF Ultra Mass Spectrometer:\n  - ESI ion source\n  - UltraSource ion source \n- IonOpticks Column\n- Evosep One LC System with sample line\n- NanoViper Adapter (black)\n- Pliers\n\n## Procedure\n\n*Estimated timing: less than 10 minute*\n\n### Switch timsTOF to standby\n\n1. ✓ Verified the instrument was on standby mode\n2. ✓ Verified the syringe was inactive\n3. ✓ Selected 'CaptiveSpray' but did not activate it yet\n\n### Remove ESI source\n\n4. ✓ Disconnected the peak connector of the sample tubing\n5. ✓ Disconnected the nebulizer N₂ line\n6. ✓ Removed the source door. Hinged it out\n7. ❌ **Omitted:** Put on gloves after removing source door\n8. ✓ Removed the spray shield, and capillary cap.\n9. ⚠️ **Deviation:** Inspected the capillary position and gently pushed it back into proper position \n\n### Mount UltraSource\n\n10. ✓ Hinged the UltraSource door in and closed it \n11. ✓ Slid the UltraSource housing onto the source door and secured it by flipping the handles\n12. ✓ Connected the filter tubing to the source\n\n### Connect column and sample line\n\n13. ✓ Noted an IonOpticks column already inside UltraSource \n14. ✓ Noted the LC sample line had NanoViper adapter already attached\n15. ❌ **Omitted:** No need to snipp access liquid\n16. ✓ Held the column fititng of the IonOpticks column with a pliers.\n17. ✓ Hand-tightened the NanoViper of the LC sample line with the column fitting \n18. ✓ Drew the oven closer to the UltraSource, and secured it \n19. ✓ Removed the NanoViper adapter \n20. ✓ Placed the metal grounding screw\n21. ✓ Closed the lid of the oven\n22. ✓ Connected the oven to the electrical power supply\n23. ✓ Noted that with the correct temperature\n\n### Switch timsTOF to operate and idle flow\n\n24. ✓ Noted the CaptiveSpray function in timsControl had been activated.\n25. ✓ Noted that the instrument was on the operational mode\n26. ✓ Noted the idle flow was active\n27. ✓ Stay in timsControl\n28. ⚠️ **Deviation:** Checked the MS signal. Noted it needed to be adjusted to between 9-11 mbar\n\n## Expected Results\n- In timsControl, signal intensity should be above 10^7\n- Stable signal in in timsControl\n\n"

display(Markdown(documentation_example))

Alright, here is the documentation following your specifications:

## Documentation:# Change source: ESI source to UltraSource

## Abstract
This protocol describes the procedure for switching from the ESI source to UltraSource.

## Materials

### Equipment
- timsTOF Ultra Mass Spectrometer:
  - ESI ion source
  - UltraSource ion source 
- IonOpticks Column
- Evosep One LC System with sample line
- NanoViper Adapter (black)
- Pliers

## Procedure

*Estimated timing: less than 10 minute*

### Switch timsTOF to standby

1. ✓ Verified the instrument was on standby mode
2. ✓ Verified the syringe was inactive
3. ✓ Selected 'CaptiveSpray' but did not activate it yet

### Remove ESI source

4. ✓ Disconnected the peak connector of the sample tubing
5. ✓ Disconnected the nebulizer N₂ line
6. ✓ Removed the source door. Hinged it out
7. ❌ **Omitted:** Put on gloves after removing source door
8. ✓ Removed the spray shield, and capillary cap.
9. ⚠️ **Deviation:** Inspected the capillary position and gently pushed it back into proper position 

### Mount UltraSource

10. ✓ Hinged the UltraSource door in and closed it 
11. ✓ Slid the UltraSource housing onto the source door and secured it by flipping the handles
12. ✓ Connected the filter tubing to the source

### Connect column and sample line

13. ✓ Noted an IonOpticks column already inside UltraSource 
14. ✓ Noted the LC sample line had NanoViper adapter already attached
15. ❌ **Omitted:** No need to snipp access liquid
16. ✓ Held the column fititng of the IonOpticks column with a pliers.
17. ✓ Hand-tightened the NanoViper of the LC sample line with the column fitting 
18. ✓ Drew the oven closer to the UltraSource, and secured it 
19. ✓ Removed the NanoViper adapter 
20. ✓ Placed the metal grounding screw
21. ✓ Closed the lid of the oven
22. ✓ Connected the oven to the electrical power supply
23. ✓ Noted that with the correct temperature

### Switch timsTOF to operate and idle flow

24. ✓ Noted the CaptiveSpray function in timsControl had been activated.
25. ✓ Noted that the instrument was on the operational mode
26. ✓ Noted the idle flow was active
27. ✓ Stay in timsControl
28. ⚠️ **Deviation:** Checked the MS signal. Noted it needed to be adjusted to between 9-11 mbar

## Expected Results
- In timsControl, signal intensity should be above 10^7
- Stable signal in in timsControl



In [None]:
# Usefull helper function

def check_file_exists(file_path):
    if os.path.exists(file_path):
        print(f"File found: {file_path}")
    else:
        print(f"Error: File not found: {file_path}")