# Documentation assistant

This notebook demonstrates a documentation assistant: Video-to-documentation conversion using Vertex AI

Converting videos-to-documentation involves three steps: 
1. Protocol finder: Select protocol which best captures the step being performed in the video
2. Video comparing to ground-of-truth protocol → lab documentation + errors in procedure
3. Analytics based on benchmark dataset: automatic comparison of errors found by documentation assistent vs actual errors

In this notebook, I will focus on the first step - the protocol finder.

In [1]:
# %pip install google-cloud-storage
# %pip install --upgrade --user --quiet google-cloud-aiplatform

In [2]:
from __future__ import annotations

# %load_ext autoreload
%reload_ext autoreload
%autoreload 2

import configparser
import enum
import os
import sys
from pathlib import Path
from typing import Any

import pandas as pd
import vertexai
from google.cloud import storage
from IPython.display import Markdown
from vertexai.generative_models import GenerationConfig, GenerativeModel, Part

MODEL_ID = "gemini-2.5-pro-preview-03-25"

path_to_append = Path(Path.cwd()).parent / "proteomics_specialist"
sys.path.append(str(path_to_append))
import video_to_protocol

config = configparser.ConfigParser()
config.read("../secrets.ini")

['../secrets.ini']

In [3]:
config = configparser.ConfigParser()
config.read("../secrets.ini")

PROJECT_ID = config["DEFAULT"]["PROJECT_ID"]
vertexai.init(project=PROJECT_ID, location="us-central1")  # europe-west9 is Paris

In [None]:
os.environ["GOOGLE_CLOUD_PROJECT"] = config["DEFAULT"]["PROJECT_ID"]

storage_client = storage.Client()
bucket_name = "mannlab_videos"
bucket = storage_client.bucket(bucket_name)

In [8]:
def identify_protocol_from_video(
    model_input: list[Any],
    model_name: str = "gemini-2.5-pro-preview-03-25",
    temperature: float = 0.9,
) -> dict[str, Any]:
    """Identifies the protocol number from a lab video using Gemini model with customizable prompts.

    Parameters
    ----------
    model_input : List[Any]
        Input to be processed by the model
    model_name : str, optional
        Name of the Gemini model to use
    temperature : float, optional
        Temperature for model generation

    Returns
    -------
    dict
        Dictionary containing the model response and metadata including:
        - observation: The text response from the model
        - usage_metadata: Usage statistics from the model
        - markdown_observation: Markdown formatted observation

    """
    model = GenerativeModel(model_name)  # , client_options=options)
    response = model.generate_content(
        model_input, generation_config={"temperature": temperature}
    )
    verbose_eval = response.text

    return {
        "observation": verbose_eval,
        "usage_metadata": response.usage_metadata,
        "video_path": video_path,
        "markdown_observation": Markdown(verbose_eval),
    }

In [9]:
def score_protocol_from_video(
    verbose_eval: str,
    protocol_enum: enum.Enum | None = None,
    model_name: str = "gemini-2.5-pro-preview-03-25",
    temperature: float = 0.9,
) -> dict[str, Any]:
    """Scores and classifies a protocol number from verbose evaluation using Gemini model.

    Parameters
    ----------
    verbose_eval : str
        The verbose evaluation text from a previous model run
    protocol_enum : Enum
        Enumeration containing valid protocol numbers
    model_name : str
        Name of the Gemini model to use
    temperature : float
        Temperature for model generation

    Returns
    -------
    dict
        Dictionary containing the model response and metadata including:
        - score_str: The raw text response from the model
        - structured_eval: The protocol number as enum value
        - usage_metadata: Usage statistics from the model

    """
    model = GenerativeModel(model_name)

    score_prompt = f"{verbose_eval}\n\nWhich protocol number is this? Please respond with only one of the following: {[e.value for e in protocol_enum]}."

    response_score = model.generate_content(
        score_prompt,
        generation_config=GenerationConfig(
            temperature=temperature
        ),  # , max_output_tokens=2)
    )

    score_str = response_score.text.strip()
    structured_eval = protocol_enum(score_str)

    return {
        "score_str": score_str,
        "structured_eval": structured_eval,
        "usage_metadata": response_score.usage_metadata,
    }

In [None]:
def validate_protocol_number(
    video_path: str | Path,
    csv_path: str | Path,
    verbose_eval: str,
    protocol_num: enum.Enum | None = None,
) -> dict[str, bool | str | list[str]]:
    """Validates if the protocol number in the observation matches the expected protocol number from the benchmark dataset (csv).

    Parameters
    ----------
    video_path : str or Path
        Path to the video file
    csv_path : str or Path
        Path to the benchmark dataset as csv file containing protocol information
    verbose_eval : str
        Text containing protocol number to be extracted
    protocol_num : enum.Enum, optional
        Enum value for the protocol number, default is None

    Returns
    -------
    dict
        Result of the validation containing match status and relevant details including:
        - matches: Boolean indicating if protocol numbers match
        - video: Basename of the video file
        - expected_protocol: Protocol number from CSV
        - found_protocol: Protocol number extracted from observation
        - message: Optional explanation message (only present on failure)

    """
    video_basename = Path(video_path).name

    df_link = pd.read_csv(csv_path, delimiter=";")
    df_link = df_link.dropna()

    if video_basename not in list(df_link["documentation video"]):
        return {
            "matches": False,
            "video": video_basename,
            "expected_protocol": [],
            "found_protocol": [],
            "message": f"Video not part of benchmark dataset: {video_basename}",
        }

    expected_protocol_number = str(
        df_link["Number"][df_link["documentation video"] == video_basename].item()
    )

    try:
        result_score = score_protocol_from_video(
            verbose_eval, protocol_enum=protocol_num
        )
        found_protocol_number = result_score["score_str"]

    except (ValueError, KeyError, AttributeError, Exception) as e:
        return {
            "matches": False,
            "video": video_basename,
            "expected_protocol": expected_protocol_number,
            "found_protocol": f"Error extracting protocol number: {e!s}",
            "verbose_eval": verbose_eval,
        }
    else:
        return {
            "matches": expected_protocol_number == found_protocol_number,
            "video": video_basename,
            "expected_protocol": expected_protocol_number,
            "found_protocol": found_protocol_number,
            "verbos_eval": verbose_eval,
        }

In [11]:
# First upload videos to Google Cloud Storage and convert to model input (reusable info, time consuming step)

video_directory = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/documentation"

video_parts = {}
all_files = os.listdir(video_directory)

for video in all_files[:3]:
    print(video)
    video_path = Path(video_directory) / video

    if video.lower().endswith((".mp4", ".mov")):
        file_extension = Path(video_path).suffix.lower()[1:]

        video_uri = video_to_protocol.upload_video_to_gcs(
            video_path, bucket, "protocol_finder_2"
        )

        video_parts[video] = {
            "path": str(video_path),
            "gcs_uri": video_uri,
            "part": [
                "Lab video:",
                Part.from_uri(video_uri, mime_type=f"video/{file_extension}"),
            ],
        }

UltraSourceToESIsource_docuForgotN2Line.MP4
.DS_Store
TimsCalibration_docuCorrect_camera.MP4


In [12]:
# Upload protocols to GCS

directory = "/Users/patriciaskowronek/Documents/proteomics_specialist/data/"

list_protocol_gcs = [
    video_to_protocol.upload_video_to_gcs(
        Path(directory) / file, bucket, "protocol_finder"
    )
    for file in os.listdir(directory)
    if "protocol" in file.lower()
]

print(list_protocol_gcs)

protocol_members = {}

for index, file_path in enumerate(list_protocol_gcs):
    member_name = Path(file_path).name
    protocol_members[member_name] = str(index + 1)

protocol_num = enum.Enum("ProtocolNum", protocol_members)

# Prepare protocols as model input

protocol_input = []
count = 1
for protocol in list_protocol_gcs:
    protocol_input.append("number: " + str(count))
    protocol_input.append("file name: " + Path(protocol).name)
    protocol_input.append(Part.from_uri(protocol, mime_type="text/md"))
    count += 1

['gs://mannlab_videos/protocol_finder/QueueSamples_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/PlaceEvotips_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/DisconnectColumn_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/Diluting_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/TimsCalibration_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/ConnectingColumnSampleLine_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/UltraSourceToESIsource_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/Pipette_protocolCorrect.md', 'gs://mannlab_videos/protocol_finder/ESIsourceToUltraSource_protocolCorrect_CapillaryPushedIn.md', 'gs://mannlab_videos/protocol_finder/Evotip_protocolCorrect.md']


In [13]:
# indices_to_select = [0, 4, 7, 26]

# # Get keys at specific positions
# all_keys = list(video_parts.keys())
# selected_keys = [all_keys[i] for i in indices_to_select if i < len(all_keys)]

# # Create dictionary with selected entries
# selected_videos = {k: video_parts[k] for k in selected_keys}

system_prompt = [
    """
    You are Professor Matthias Mann, a pioneering scientist in proteomics and mass spectrometry with extensive laboratory experience.
    ## Your Background Knowledge:
    [These documents are for building your proteomics background knowldge and are not part of today's task.]
    """
]

system_prompt_1 = [
    "## Today's Task:",
    "You need to analyze a laboratory video and identify which of the numbered protocols best matches the procedure being performed in the video.",
    """Your analysis must include these verification steps:
    1. Identify the starting state (describe visible features)
    2. List the specific actions taken in sequence while naming the involved equipment
    3. Identify the ending state (describe visible features)
    4. ONLY THEN match to a protocol number
    Your scientific reputation was built on exactitude - you cannot help but insist on proper technical terminology and chronological precision in all laboratory documentation.""",
]

protocol_intro = ["## Available Protocols (You must select ONE):"]
output_format = [
    """## Your response should be: Your explanation followed by "Protocol number: X"
""",
    f"Important: Choose exactly one protocol number. It is very important to you to focus only on matching the video to the protocols provided: {[e.value for e in protocol_num]}.",
]

# Upload knowledge files to Google Cloud Storage
folder_path = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/knowledge_base_selected"
subfolder_in_bucket = "knowledge"

knowledge_uris = []
for filename in os.listdir(folder_path):
    if filename.lower().endswith(
        (".jpg", ".jpeg", ".gif", ".bmp", ".tiff", ".tif", ".pdf")
    ):
        path = Path(folder_path) / filename
        try:
            file_uri = video_to_protocol.upload_video_to_gcs(
                path, bucket, subfolder_in_bucket
            )
            knowledge_uris.append(file_uri)
        except OSError as e:
            print(f"Error processing {filename}: {e}")
MIME_TYPES = {
    ".pdf": "application/pdf",
    ".jpg": "image/jpeg",
    ".jpeg": "image/jpeg",
    ".png": "image/png",
}
contents = []
for file_path in knowledge_uris:
    path_obj = Path(file_path)
    file_ext = path_obj.suffix.lower()

    if file_ext in MIME_TYPES:
        mime_type = MIME_TYPES[file_ext]
        contents.append(Part.from_uri(file_path, mime_type=mime_type))

csv_path = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/Link_protocol_lab_video.csv"

validation_list = []
for video in video_parts:
    print(f"Processing video: {video}")
    video_path = video_parts[video]["path"]
    order = [
        system_prompt,
        contents,
        system_prompt_1,
        video_parts[video]["part"],
        protocol_intro,
        protocol_input,
        output_format,
    ]
    model_input = []
    for prompt_input in order:
        model_input.extend(prompt_input)

    result = identify_protocol_from_video(
        model_input, model_name="gemini-2.5-pro-preview-03-25", temperature=0.9
    )

    # print(result["usage_metadata"])
    print(result["video_path"])
    display(result["markdown_observation"])

    result_validation = validate_protocol_number(
        video_path, csv_path, result["observation"], protocol_num
    )
    print(result_validation)
    validation_list.append(result_validation["matches"])

success_ratio = (
    sum(validation_list) / len(validation_list) if len(validation_list) > 0 else 0
)
print(f"Success rate: {success_ratio:.2%}")

Processing video: UltraSourceToESIsource_docuForgotN2Line.MP4
/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/documentation/UltraSourceToESIsource_docuForgotN2Line.MP4


Okay, let's analyze this procedure with the precision it deserves.

**1. Identify the starting state:**
The video commences with a Bruker timsTOF mass spectrometer equipped with an **UltraSource** ion source. An IonOpticks chromatography column is installed within the UltraSource's column oven and is visibly connected to a sample line, presumably originating from an adjacent Evosep One liquid chromatography system. The UltraSource also has its white corrugated air filter tube and the column oven power supply connected. The **TimsControl** software, which governs the mass spectrometer, is initially in "Operate" mode, indicated by a green status panel.

**2. List the specific actions taken in sequence while naming the involved equipment:**
*   The operator navigates to the **TimsControl** software. The instrument status is "Operate" (green). The operator clicks the power icon, causing a "Change Source" dialog to appear. This dialog is initially cancelled. The instrument status then transitions to "Standby" (red/orange status panel).
*   At the **UltraSource**, the operator disconnects the electrical power supply cable for the column oven.
*   The white corrugated air filter tube is detached from the **UltraSource**.
*   Two retaining handles on the **UltraSource housing** (characterized by its glossy black, oblate spheroid shape) are rotated. The housing is then slid off the timsTOF instrument, exposing the internal source door and glass capillary.
*   The detached **UltraSource housing** is placed on a laboratory bench.
*   The internal **source door** (previously part of the UltraSource assembly) is opened, unhinged from the timsTOF, and also placed on the bench.
*   The operator dons black laboratory gloves.
*   A metal **capillary cap** is retrieved and carefully placed onto the exposed glass capillary of the timsTOF.
*   A metal **spray shield** is attached to the desolvation stage housing of the timsTOF.
*   An **ESI source housing** (characterized by its half-sphere shape and white warning triangles) is taken. It is hinged onto the timsTOF mass spectrometer and secured using its lever mechanism.
*   A red PEEK (polyetheretherketone) sample inlet tubing is connected to the newly installed **ESI source** by screwing in a fitting.
*   A clear nebulizer gas inlet line is connected to the **ESI source**.
*   The operator takes a **syringe**, proceeds to a solvent cabinet, opens a bottle (presumably containing Tuning Mix), expels any residual liquid from the syringe, and then carefully withdraws new Tuning Mix solution, ensuring no air bubbles are entrained.
*   Returning to the timsTOF, the operator connects the Tuning Mix-filled **syringe** to the red PEEK sample inlet tubing already attached to the ESI source.
*   The **syringe** is then mounted into an external syringe pump assembly situated on top of the timsTOF.
*   The operator returns to the **TimsControl** software. In the "Change Source" dialog (or a newly opened one), "ESI" is selected as the source type, and the "Activate Source" button is clicked.
*   The **TimsControl** software status changes from "Standby" back to "Operating" (green status panel).
*   Within the "Source" tab of **TimsControl**, under the "Syringe Pump" section, the operator confirms syringe parameters (e.g., Hamilton 500 µL) and sets a flow rate (3 µL/min). The "Start" button is pressed to initiate infusion of the Tuning Mix.
*   The operator observes the signal trace in **TimsControl** and audibly notes an anomaly ("Hmm, something is strange").

**3. Identify the ending state:**
The timsTOF mass spectrometer is now configured with the **ESI source**. The previously installed **UltraSource housing** and its associated source door have been removed and are resting on the lab bench. A **syringe** containing Tuning Mix is connected to the ESI source's sample inlet and is actively infusing solution via the external syringe pump. The nebulizer gas line is also connected to the ESI source. The **TimsControl** software is in "Operating" mode, and the system is actively acquiring data, though an unexpected signal characteristic has been observed by the operator.

**4. Match to a protocol number:**
The observed sequence of disassembling the UltraSource and installing the ESI source, including the software steps for changing the source type and initiating infusion with a syringe pump, aligns precisely with the procedure for changing from an UltraSource to an ESI source.

Protocol number: 7

Processing video: TimsCalibration_docuCorrect_camera.MP4
/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/documentation/TimsCalibration_docuCorrect_camera.MP4


Ah, excellent. Let us meticulously examine this video recording of a procedure performed on the TimsControl software. Precision is paramount in our field.

**1. Identify the starting state (describe visible features)**
The TimsControl software interface for a Bruker timsTOF mass spectrometer is displayed.
*   In the top-left "Instrument" status panel:
    *   "HyStar" and "System" show "Operating" status (green indicators).
    *   "Calibration" and "Mobility" show "n/a" status (yellow indicators).
    *   "Vacuum" shows "OK" status (green indicator).
*   A dia-PASEF method is active, as indicated by "Scan Mode: dia-PASEF" in the "MS Settings" panel.
*   The main display area is focused on the "Calibration" -> "Mobility" section.
*   A "Reference List" titled "ESI, Tuning Mix ES-TOF CCS compendium (ESI)" is visible, with several calibrant m/z values listed.
*   The "Calibration Mode" is set to "Linear", "Detection Range" is "5.00%", and "Width" is "+/- 0.1 Da".
*   A "Score" of "100.00%" is displayed, and a message "Mobility calibration is valid." is shown.
*   In the "TIMS Settings" panel, "MS Averaging" is set to "1".
*   In the "MS Settings" panel, "Scan Begin" is 100 m/z, "Scan End" is 1700 m/z, and "Ion Polarity" is "Positive".

**2. List the specific actions taken in sequence while naming the involved equipment**
All actions are performed within the TimsControl software interface:
*   0:08: The "Scan Mode" in the "MS Settings" panel is changed from "dia-PASEF" to "MS". The "dia-PASEF Windows" display disappears.
*   0:16: The "MS Averaging" value in the "TIMS Settings" panel is changed from "1" to "30".
*   0:22 - 0:30: The system appears to be in the calibration verification stage; the "Calibrate" button is implicitly active or has been previously pressed, as the software is displaying calibrant information.
*   0:31: A specific calibrant (likely C60H99N3O6P3 at m/z 1221.9987, based on subsequent highlighting) is selected from the "Reference List".
*   0:32 - 0:38: The software switches to the "TIMS View" (mobilogram) for the selected calibrant, allowing visual verification of the peak picking.
*   0:39: The "Accept" button in the "Calibration" -> "Mobility" section is clicked, confirming the displayed calibration.
*   0:40: The "Method" option is selected from the top menu bar.
*   0:41: "Load Recent" is selected from the "Method" dropdown menu.
*   0:42: The previously active method (named, for example, "20240703_DIA_maintenance_10min_TIMS_100_1700_MS_1600V_timsControl.m") is selected from the list of recent methods.
*   0:43: A "Load method" pop-up dialog appears, asking, "The current method has been modified. Do you want to save the method before loading another method?".
*   0:45: The "Discard changes" button is clicked in the pop-up dialog.
*   0:48: The "MS Averaging" value in the "TIMS Settings" panel is changed back from "30" to "1". The "Scan Mode" in "MS Settings" reverts to "dia-PASEF" as part of the method reload.

**3. Identify the ending state (describe visible features)**
*   The TimsControl software interface is displayed.
*   The instrument status indicators in the top-left panel remain largely the same ("HyStar: Operating", "System: Operating", "Calibration: n/a", "Mobility: n/a", "Vacuum: OK").
*   The dia-PASEF method that was selected from "Load Recent" is now active.
*   "Scan Mode" in "MS Settings" is "dia-PASEF".
*   "MS Averaging" in "TIMS Settings" is "1".
*   The "Calibration" -> "Mobility" section is still displayed, with the ESI tuning mix reference list and calibration parameters visible. The score likely remains 100%.

**4. ONLY THEN match to a protocol number**
The observed sequence of operations—modifying scan mode and averaging for calibration, verifying calibrant peaks, accepting the calibration, reloading the original analytical method, and resetting averaging—aligns precisely with the procedure for calibrating the TIMS device.

Protocol number: 5

{'matches': True, 'video': 'TimsCalibration_docuCorrect_camera.MP4', 'expected_protocol': '5', 'found_protocol': '5', 'verbos_eval': 'Ah, excellent. Let us meticulously examine this video recording of a procedure performed on the TimsControl software. Precision is paramount in our field.\n\n**1. Identify the starting state (describe visible features)**\nThe TimsControl software interface for a Bruker timsTOF mass spectrometer is displayed.\n*   In the top-left "Instrument" status panel:\n    *   "HyStar" and "System" show "Operating" status (green indicators).\n    *   "Calibration" and "Mobility" show "n/a" status (yellow indicators).\n    *   "Vacuum" shows "OK" status (green indicator).\n*   A dia-PASEF method is active, as indicated by "Scan Mode: dia-PASEF" in the "MS Settings" panel.\n*   The main display area is focused on the "Calibration" -> "Mobility" section.\n*   A "Reference List" titled "ESI, Tuning Mix ES-TOF CCS compendium (ESI)" is visible, with several calibrant m/z v

In [14]:
indices_to_select = [0, 4, 7, 8, 18, 26]

# Get keys at specific positions
all_keys = list(video_parts.keys())
selected_keys = [all_keys[i] for i in indices_to_select if i < len(all_keys)]
selected_keys

['UltraSourceToESIsource_docuForgotN2Line.MP4']