# Documention Agent

This notebook demonstrates a documention agent:
1. Video-to-protocol conversion using Vertex AI
2. With knowledge from documents and pictures that are loaded into cache

In [16]:
# %pip install google-cloud-storage
# %pip install --upgrade --user --quiet google-cloud-aiplatform

In [17]:
# %load_ext autoreload
%reload_ext autoreload
%autoreload 2

import os
import sys
from pathlib import Path

import configparser
from IPython.display import Markdown

path_to_append = Path(Path.cwd()).parent / "proteomics_specialist"
sys.path.append(str(path_to_append))
import videoToProtocol as video_to_protocol

config = configparser.ConfigParser()
config.read("../secrets.ini")

['../secrets.ini']

In [18]:
import configparser
import vertexai

config = configparser.ConfigParser()
config.read("../secrets.ini")

PROJECT_ID = config["DEFAULT"]["PROJECT_ID"]
vertexai.init(project=PROJECT_ID, location="europe-west9") # europe-west9 is Paris

In [19]:
import os
from google.cloud import storage

os.environ["GOOGLE_CLOUD_PROJECT"] = config["DEFAULT"]["PROJECT_ID"]

# Initialize Cloud Storage client
storage_client = storage.Client()
bucket_name = "mannlab_videos"
bucket = storage_client.bucket(bucket_name)

In [20]:
import datetime

from vertexai.generative_models import Part
from vertexai.preview import caching
from vertexai.preview.generative_models import GenerativeModel

MODEL_ID = "gemini-1.5-pro-001" 

# Following: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/context-caching/intro_context_caching_vertex_ai_sdk.ipynb

In [21]:
# Upload knowledge files to Google Cloud Storage
folder_path = '/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/knowledge_base'
subfolder_in_bucket = "knowledge"

knowledge_uris=[]
for filename in os.listdir(folder_path):
    if filename.lower().endswith(('.jpg', '.jpeg', '.gif', '.bmp', '.tiff', '.tif', '.pdf')):
        path = os.path.join(folder_path, filename)
        try:
            file_uri = video_to_protocol.upload_video_to_gcs(path, bucket, subfolder_in_bucket)
            knowledge_uris.append(file_uri) 
        except Exception as e:
                print(f"Error processing {filename}: {e}")

Uploaded to: gs://mannlab_videos/knowledge/07a_Getting Started with NanoElute_Revision_F.pdf
Uploaded to: gs://mannlab_videos/knowledge/Evosep-One-User-Guide-v18.pdf
Uploaded to: gs://mannlab_videos/knowledge/01b(i)_Intro Trapped Ion Mobility_Pro2, fleX, HT_Revision_G.pdf
Uploaded to: gs://mannlab_videos/knowledge/02a_Introduction to Electrospray Ionization_Revision_F.pdf
Uploaded to: gs://mannlab_videos/knowledge/Evosep_rack_box_layout.pdf
Uploaded to: gs://mannlab_videos/knowledge/03a_TIMS-TOF Calibration with ESI Sources_Revision_F.pdf
Uploaded to: gs://mannlab_videos/knowledge/07h_diagonal-PASEF_User_Guide_timsControl_Revision_A.pdf
Uploaded to: gs://mannlab_videos/knowledge/04a_DataAnalysis Getting Started_Revision_F.pdf
Uploaded to: gs://mannlab_videos/knowledge/07b_Shotgun Proteomics using timsTOF Pro and nanoElute_Revision_F.pdf
Uploaded to: gs://mannlab_videos/knowledge/01b(iii)_Intro to Trapped Ion Mobility_timsTOF_Revision_B.pdf
Uploaded to: gs://mannlab_videos/knowledge/05b

In [22]:
# Create cache with Vertex AI
import os
from collections import defaultdict

# Define supported file types with corresponding MIME types
MIME_TYPES = {
    '.pdf': 'application/pdf',
    '.jpg': 'image/jpeg',
    '.jpeg': 'image/jpeg',
    '.png': 'image/png' 
}

def create_cached_content(knowledge_uris, bucket_name, subfolder_in_bucket, model_id):
    contents = []
    file_counts = defaultdict(int)
    
    for file_path in knowledge_uris:
        filename = os.path.basename(file_path)
        file_ext = os.path.splitext(filename)[1].lower()
        
        if file_ext in MIME_TYPES:
            mime_type = MIME_TYPES[file_ext]
            
            try:
                contents.append(Part.from_uri(file_path, mime_type=mime_type))
                file_counts[file_ext] += 1
            except Exception as e:
                print(f"Error creating Part from {file_path}: {e}")
        else:
            print(f"Skipping unsupported file: {filename}")
    
    print(f"Total files processed: {len(contents)}")
    for ext, count in file_counts.items():
        print(f"  {ext[1:].upper()}: {count}")
    
    if contents:
        cached_content = caching.CachedContent.create(
            model_name=model_id,
            contents=contents,
            ttl=datetime.timedelta(minutes=60),
        )
        print("Cached content created successfully!")
        return cached_content
    else:
        print("No matching files found. Cached content not created.")
        return None

cached_content = create_cached_content(
    knowledge_uris,
    bucket_name,
    subfolder_in_bucket,
    model_id=MODEL_ID
)

Total files processed: 40
  PDF: 40
Cached content created successfully!


In [23]:
# cached_content.delete()

In [24]:
print(cached_content.name)
# print(cached_content.resource_name)
# print(cached_content.model_name)
print(cached_content.create_time)
print(cached_content.expire_time)

5299927520857030656
2025-03-22 09:53:36.470278+00:00
2025-03-22 10:53:36.454694+00:00


In [25]:
from vertexai.preview.generative_models import GenerativeModel
model = GenerativeModel.from_cached_content(cached_content=cached_content)

In [26]:
system_prompt_1 = """
You are Professor Matthias Mann, a pioneering scientist in proteomics and mass spectrometry. 
Your task is to analyze the provided video and to convert it into a Nature-style protocol. The goal is a clear, concise, unambiguous protocol reproducible by someone with no prior knowledge.
Take deep breath and think step-by-step with me. Answer direct.
"""
example_documentation_1 = """
1. Describe what you can hear with timestamps:
    - 0:00 - 0:08 "Okay, I want to disconnect um the ion optics column which is here" 
    - 0:09 - 0:17 "from the sample line of the Evosep. So for this we first have to go to the tims control software and check if it is" 
    - 0:18 - 0:24 "in standby mode. At the moment you can see here it's in operating mode so we just click on the on-off signal"
    - 0:25 - 0:29 "and it's here in standby. Good." 
    - 0:30 - 0.39 "Let's move back to the instrument. As a next step we have to open or I open the lid of the column oven which is also called" 
    - 0:40 - 0:48 "column toaster. Then there is this screw which is um to that's the securing screw which does" 
    - 0:49 - 0:59 "ground the ion optics column so that we have a um stable spray so I just lift uh lifted it up and" 
    - 1:00 - 1:11 "yeah loosen it so that is not anymore above the connection. Then I attach um a nano viper adapter here to the sample line" 
    - 1:12 - 1:19 "So it's just like pushing it on top of it. I find it easier to hold like the column fitting with pillars" 
    - 1:20 - 1:31 "and then to open the column counterclockwise. So I find this a little bit easier for handling. I unscrew here the nano connector" 
    - 1:32 - 1:43 "from the column fitting um and then so that the sample line at the Evosep and the ion optics column are detached. And now I put position here the sample line" 
    - 1:44 - 1:53 "on top of the bumper of the Evosep just to have it out of the way and very can um yeah where it can" 
    - 1:54 - 2:05 "then uh the idle flow can flow. So now ion optics column and sample line are disconnected."

2. Describe what you can see with timestamps:
    - 0:00 - 0:07 The video starts with a view of the UltraSource of a timsTOF Ultra instrument. The camera pans down since the researcher shows a box containing an IonOpticks column.
    - 0:08 - 0:12 The researcher points on the sample line connected with the Evosep One LC system.
    - 0:13 - 0:21 The researcher walks to the computer monitor displaying the timsControl software.
    - 0:22 - 0:22 The researcher pointes to the "operating" software status.
    - 0:23 - 0:27 The researcher clicks on the "on/off Instrument" button, which changes the software status to "Standing by".
    - 0:28 - 0:32 The researcher walks back to the timsTof Ultra.
    - 0:33 - 0:40 The researcher opens the lid of the column oven, which is also called the column toaster.
    - 0:41 - 1.03 The researcher points to a securing screw that grounds the ion optics column. They lift the screw and move it away so that it cannot make a connection with the column fitting anymore.
    - 1.04 - 1:13 The researcher grabs a NanoViper adapter and attaches it to the sample line by gently pushing it onto the NanoViper connector.
    - 1:14 - 1:20 The researcher grabs pliers and holds the column fitting with them.
    - 1:21 - 1:32 The researcher uses their other hand to unscrew the nano connector from the column fitting counter-clockwise.
    - 1:33 - 1:40 The researcher explains and shows the detached sample line connector and column fitting.
    - 1:41 - 1:43 The researcher places the sample line on top of the bumper of the Evosep One LC system.
    - 1:44 - 1:47 The researcher places the pliers back on the Evosep carrier.
    - 1:48 - 2:05 The researcher confirms that the ion optics column and sample line are now disconnected.

3. Describe the used equipment:
    - timsTOF Ultra instrument: Black and silver, with a round, black source housing on the right side. The instrument is labeled "timsTOF SCP".
    - IonOpticks column: Small, cylindrical column packaged in a blue and white box.
    - Evosep One LC system: Orange and white, with a transparent bumper. The system is labeled "EVOSEP ONE".
    - timsControl software: Displayed on a computer monitor. The software interface shows various tabs, including "Instrument", "Method", "Acquisition", "Home", "Maintenance", and "Monitoring".
    - Column oven: Small, rectangular oven attached to the timsTOF Ultra instrument. The oven has a black lid and is also called "Column Toaster".
    - Sample line: Thin, blue tubing connected to the IonOpticks column.
    - NanoViper adapter: Small, black fitting attached to the sample line.
    - Pliers: Red and yellow, used to hold the column fitting of the IonPticks column.
"""

example_documentation_2 = """
1. Describe what you can hear with timestamps:

0:00 - 0:07 "I want to place six Evotips from position A1 to A6."
0:08 - 0:21 "So first I need an Evotip box and check that there's some solvent inside, so it should be like 1 centimeter, which is fine."
0:21 - 0:31 "Then I place this Evotip box on top here of the rack and make sure that it's securely seated within the rack." 
0:32 - 0:39 "I do the same with an empty box where I will fill in blanks."
0:40 - 0:46 "Next, I will add some Evotips. For this I have here some prepared." 
0:47 - 1:11 "I look at them visually if they look fine. They have like some solvent on top, on the bottom, and like the um SPE material, this white material, it has like a pale color, so I believe they are fine. So I check all of them by hand."
1:12 - 1:20 "Then I do the same with the blanks."
1:21 - 1:30 "Here like they are completely dry, so these are just unused tips, not cleaned, nothing, so whatever. Uh, which you just did here."
1:31 - 1:44 "Good. And now as a last step, um I remember or note down that they are from A1 to A6 on the rack one and here on rack three. They are also from A1 to A6."

2. Describe what you can see with timestamps:

0:00 - 0:08 The video starts with a view of the Evosep One instrument. The researcher is wearing black gloves.
0:09 - 0:22 The researcher picks up a yellow Evotip box and checks the solvent level. 
0:23 - 0:31 They then place the box in position S1 on the Evosep One instrument and ensure that it is well seated. 
0:32 - 0:40 They repeat this with an empty Evotip box, placing it in position S3.
0:41 - 0:56 The researcher picks up a box of Evotips and visually inspects them, checking for solvent and the color of the SPE material. 
0:57 - 1:13 They then place the Evotips in S1 from A1 to A6 on the Evosep One instrument. 
1:14 - 1:31 The researcher picks up a box of empty Evotips and places tem in S3 from A1 to A6.
1:32 - 1:03 The researcher points to the Evotip boxes and verbally confirms their positions.

3. Describe the used equipment:

Evosep One instrument: White and orange, with a transparent bumper and a sample tray with six positions.
Evotip boxes: Transparent box. Each box holds 96 Evotips.
Evotips: transparent pipette tips with a white material at the bottom.
"""

In [27]:
video_path = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/DisconnectColumn_protocolCorrect.MP4"
video_uri_example_1 = video_to_protocol.upload_video_to_gcs(video_path, bucket)

path = "/Users/patriciaskowronek/Documents/proteomics_specialist/data/DisconnectColumn_protocolCorrect.md"
protocol_uri_example_1 = video_to_protocol.upload_video_to_gcs(path, bucket)

video_path = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/PlaceEvotips_protocolCorrect.MP4"
video_uri_example_2 = video_to_protocol.upload_video_to_gcs(video_path, bucket)

path = "/Users/patriciaskowronek/Documents/proteomics_specialist/data/PlaceEvotips_protocolCorrect.md"
protocol_uri_example_2 = video_to_protocol.upload_video_to_gcs(path, bucket)

video_path = "/Users/patriciaskowronek/Documents/documentation_agent_few_shot_examples/benchmark_dataset/ConnectingColumnSampleLine_protocolCorrect.MP4"
video_uri_input = video_to_protocol.upload_video_to_gcs(video_path, bucket)

Uploaded to: gs://mannlab_videos/DisconnectColumn_protocolCorrect.MP4
Uploaded to: gs://mannlab_videos/DisconnectColumn_protocolCorrect.md
Uploaded to: gs://mannlab_videos/PlaceEvotips_protocolCorrect.MP4
Uploaded to: gs://mannlab_videos/PlaceEvotips_protocolCorrect.md
Uploaded to: gs://mannlab_videos/ConnectingColumnSampleLine_protocolCorrect.MP4


In [28]:
inputs = [
    system_prompt_1,
    "EXAMPLE SECTION: The following is an EXAMPLE of the video, documentation and protocol process.",
    # "Example Video 1:",
    # Part.from_uri(
    #     video_uri_example_1, mime_type="video/mp4"
    # ),
    # "Example documentation 1:",
    # example_documentation_1,
    # "Example protocol 1:",
    # Part.from_uri(
    #     protocol_uri_example_1,
    #     mime_type="text/md",
    # ),
    "Example Video 2:",
    Part.from_uri(
        video_uri_example_2, mime_type="video/mp4"
    ),
    "Example documentation 2:",
    example_documentation_2,
    "Example protocol 2:",
    Part.from_uri(
        protocol_uri_example_2,
        mime_type="text/md",
    ),
    "END OF EXAMPLE SECTION",
    "FORMATTING INSTRUCTIONS:",
    """
    It is of highest importance for you to exactly follow the format of the example protocol with these sections:
    1. Title (format: **# Title**)
    2. Abstract (format: **## Abstract** followed by a paragraph)
    3. Materials section with Equipment and Reagents subsections
        (format: **## Materials**
                **### Equipment**
                - **Item 1**
                - **Item 2**)
    4. Procedure with estimated timing
        (format: **## Procedure**
                *Estimated timing: X minutes*
                1. Step one
                2. Step two)
    5. Expected Results section (format: **## Expected Results**)
    6. Figures section (format: **## Figures**)
    7. References section (format: **## References**)
    """,
    "YOUR TASK SECTION:",
    "Analyze the following NEW video and create a protocol for it. Video:",
    Part.from_uri(
        video_uri_input, mime_type="video/mp4"
    ),
    "Provide your thought process and protocol:"
]

response = model.generate_content(
    inputs,
    generation_config={"temperature": 0} 
)
observation = response.text
print(response.usage_metadata)
Markdown(observation)

prompt_token_count: 339205
candidates_token_count: 1516
total_token_count: 340721
cached_content_token_count: 261613
prompt_tokens_details {
  modality: DOCUMENT
  token_count: 261612
}
prompt_tokens_details {
  modality: VIDEO
  token_count: 75810
}
prompt_tokens_details {
  modality: TEXT
  token_count: 1783
}
candidates_tokens_details {
  modality: TEXT
  token_count: 1516
}



## 1. Describe what you can hear with timestamps:

0:00 - 0:08 "I want to connect here from the column the fitting with the sample line connector."
0:09 - 0:13 "First let's check the software starters in timsControl."
0:16 - 0:25 "It's currently in standby. If it would have been in operate mode like now, we would have had to click here and switch it to standby."
0:26 - 0:35 "For IonOpticks columns it's important that they're not left for an extended period of time in operate mode or in standby as soon as there is some flow on the column."
0:41 - 0:58 "So the protocol assumes that the IonOpticks column is already inserted into the Ultrasource which is the case. Then let's go to the sample line. Here we connect um the adapter for the nano connector. Just push it on it. Then we check if there's some liquid on top. If I just snip it away."
1:05 - 1:22 "Then I hold column fitting with pillars and I hand tightened the sample line into the column connector. Here it's important to have it finger tight but not over tighten it. Then I usually already removed the adapter."
1:25 - 1:40 "As a next step um you would have to check if um here the column oven is at the right position. Here is like a um screw which can be loosened to move the oven and it should be as close as possible to the source. Then here is a screw which should be on top of the connection. It makes sure that um the column is grounded. If it's a longer column, it also could be grounded with this screw or with that screw."
1:55 - 2:12 "Okay. That's all fine. So we will close the column oven. And then check that it's at 50 degrees which is indicated by three lights. It's blinking so this just means that it has to heat up to have the full temperature. Okay."
2:15 - 2:25 "Next we go to timsControl, switching to operate mode. And then go to High Star and check if idle flow is on. It is. Otherwise we could have right-clicked and say here idle flow run, but it's already on. No reason to do this by hand. So it's operating and we have signal. Wonderful."

## 2. Describe what you can see with timestamps:

0:00 - 0:09 The video starts with a view of the Bruker timsTOF SCP instrument. The researcher is wearing black gloves.
0:09 - 0:16 The researcher switches to the timsControl software on the computer screen. The software is in standby mode.
0:17 - 0:25 The researcher explains that the IonOpticks column should not be left connected for an extended period of time when the instrument is in operate or standby mode.
0:26 - 0:58 The researcher connects the sample line to the IonOpticks column using an adapter and a pair of pliers. They check for leaks and remove the adapter.
0:59 - 1:24 The researcher adjusts the column oven, ensuring it is close to the source and that the column is grounded.
1:25 - 2:14 The researcher closes the column oven and checks the temperature, which is indicated by three blinking lights.
2:15 - 2:25 The researcher switches to the timsControl software and sets the instrument to operate mode.
2:26 - 2:43 The researcher switches to the High Star software and confirms that the idle flow is on. They point out that the instrument is now operating and acquiring signal.

## 3. Describe the used equipment:

Bruker timsTOF SCP instrument: Black and grey, with a round source compartment and a front panel with a display and control knobs.
IonOpticks column: A small, thin, silver-colored column.
Sample line: A thin, transparent tube with a metal connector at one end.
Adapter: A small, metal piece used to connect the sample line to the column.
Pliers: Red and yellow pliers used to tighten the connection between the sample line and the column.
Column oven: A black box with a lid, used to heat the column.
Computer with timsControl and High Star software: Two computer screens displaying the software interfaces.

# Connecting an IonOpticks Column to a Bruker timsTOF SCP

## Abstract

This protocol describes the procedure for connecting an IonOpticks column to a Bruker timsTOF SCP mass spectrometer. It includes steps for preparing the instrument, connecting the column, checking for leaks, adjusting the column oven, and confirming the instrument is operating correctly.

## Materials

### Equipment

- Bruker timsTOF SCP mass spectrometer
- IonOpticks column
- Sample line
- Adapter for nano connector
- Pliers
- Column oven
- Computer with timsControl and High Star software

### Reagents

- None

## Procedure

*Estimated timing: 5 minutes*

1. Ensure the timsTOF SCP instrument is in standby mode. If the instrument is in operate mode, switch it to standby mode in the timsControl software.
2. Verify that the IonOpticks column is already inserted into the Ultrasource.
3. Connect the sample line to the IonOpticks column using the adapter and a pair of pliers.
    a. Hold the column fitting with the pliers and carefully connect the sample line to the column connector.
    b. Hand-tighten the connection, ensuring it is finger tight but not over-tightened.
    c. Remove the adapter.
4. Check for leaks by visually inspecting the connection for any liquid droplets. If a leak is present, carefully tighten the connection.
5. Adjust the column oven.
    a. Loosen the screw on the column oven to move it closer to the source.
    b. Ensure the column is grounded by tightening the screw on top of the connection.
6. Close the column oven lid.
7. Check the temperature of the column oven. The oven should be set to 50°C, which is indicated by three blinking lights on the front panel. Wait until the lights stop blinking, indicating the oven has reached the desired temperature.
8. Switch the timsTOF SCP instrument to operate mode in the timsControl software.
9. In the High Star software, confirm that the idle flow is on. The instrument is now operating and acquiring signal.

## Expected Results

After following this protocol, the IonOpticks column should be successfully connected to the Bruker timsTOF SCP instrument. The instrument should be in operate mode, the idle flow should be on, and the column oven should be at the correct temperature. The instrument should be acquiring signal, indicating a successful connection.

## Figures

None

## References

None
