## Extract and Prep Data

In [1]:
from PIL import Image
import zipfile

import base64
import os
import requests

from mistralai import Mistral

In [2]:
# Helper function
def convert_tiff_to_png(tiff_path, png_path):
    with Image.open(tiff_path) as img:
        img.save(png_path, 'PNG')

In [None]:
zip_file_path = 'data/ecSeg_dataset.zip'

# Folder inside the ZIP file that contains the images
folder_name = 'test_im/'

# Folder to extract converted images
extract_folder = 'data/jpgs/'

# Open the ZIP file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    all_files = zip_ref.namelist()
    
    # Filter image files from the specified folder
    image_files = [f for f in all_files if f.find(folder_name) != -1]
    
    for image_file in image_files:
        # Extract the file to the extraction folder
        zip_ref.extract(image_file, extract_folder)
        
        # Check if the file is a TIFF image (ends with .tif)
        if image_file.lower().endswith('.tif'):
            # Construct the local path for the TIFF file and the destination PNG path
            extracted_tiff_path = os.path.join(extract_folder, image_file)
            png_path = os.path.join(extract_folder, f"{os.path.splitext(image_file)[0]}.jpg")
            
            # Convert the TIFF to PNG
            convert_tiff_to_png(extracted_tiff_path, png_path)
            
            # Optionally, remove the original TIFF file after conversion
            os.remove(extracted_tiff_path)
            print(f"Converted {image_file} to PNG.")

    print(f"Images extracted and converted to PNG in: {extract_folder}")
    
zip_ref.close()

Converted ecSeg_dataset/test_im/0.tif to PNG.
Converted ecSeg_dataset/test_im/1.tif to PNG.
Converted ecSeg_dataset/test_im/10.tif to PNG.
Converted ecSeg_dataset/test_im/1000.tif to PNG.
Converted ecSeg_dataset/test_im/1001.tif to PNG.
Converted ecSeg_dataset/test_im/1004.tif to PNG.
Converted ecSeg_dataset/test_im/1006.tif to PNG.
Converted ecSeg_dataset/test_im/1008.tif to PNG.
Converted ecSeg_dataset/test_im/1009.tif to PNG.
Converted ecSeg_dataset/test_im/1013.tif to PNG.
Converted ecSeg_dataset/test_im/102.tif to PNG.
Converted ecSeg_dataset/test_im/1025.tif to PNG.
Converted ecSeg_dataset/test_im/1026.tif to PNG.
Converted ecSeg_dataset/test_im/1028.tif to PNG.
Converted ecSeg_dataset/test_im/1029.tif to PNG.
Converted ecSeg_dataset/test_im/103.tif to PNG.
Converted ecSeg_dataset/test_im/1030.tif to PNG.
Converted ecSeg_dataset/test_im/1031.tif to PNG.
Converted ecSeg_dataset/test_im/1033.tif to PNG.
Converted ecSeg_dataset/test_im/1034.tif to PNG.
Converted ecSeg_dataset/test_i

## Demo Run

We will be prompting the `Pixtral-12b` model with the following metaphase image of a cell:

![alt text](data/converted/19.jpg "Metaphase Image of Cell")

And the following prompt:

"For this task, act as a bioligist who is studying extrachromosomal DNA (ecDNA). Consider this metaphase image containing ecDNA. Analyze it for possible ecDNA."

In [3]:
def encode_image(image_path):
    """Encode the image to base64."""
    try:
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    except FileNotFoundError:
        print(f"Error: The file {image_path} was not found.")
        return None
    except Exception as e:  # Added general exception handling
        print(f"Error: {e}")
        return None

In [None]:
api_key = "jPj7kLzhpj8aicxtA9cpyDg1uP50zvtW"

# Specify model
model = "pixtral-large-latest"

# Path to your image
image_path = "data/converted/19.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

client = Mistral(api_key=api_key)

# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "For this task, act as a bioligist who is studying extrachromosomal DNA (ecDNA). Consider this metaphase image containing ecDNA. Analyze it for possible ecDNA."
            },
            {
                "type": "image_url",
                "image_url": f"data:image/jpeg;base64,{base64_image}" 
            }
        ]
    }
]

# Get the chat response
chat_response = client.chat.complete(
    model=model,
    messages=messages
)

# Print the content of the response
print(chat_response.choices[0].message.content)

To analyze this metaphase image for possible extrachromosomal DNA (ecDNA), we need to focus on the structures that do not appear to be part of the regular chromosomes. Here are the steps involved in identifying potential ecDNA:

1. **Identify Chromosomal DNA**: The chromosomal DNA in this image is stained blue, which is characteristic of DAPI (4',6-diamidino-2-phenylindole) staining, a common method used to stain DNA in cells.

2. **Observe Structures Outside the Chromosomes**: ecDNA typically appears as small circular or irregularly shaped structures that are separate from the main chromosomes. Look for areas of blue staining that are distinct from the larger, more condensed chromosomes.

3. **Consider Additional Stains**: If the image includes other stains (e.g., red staining that might indicate specific proteins or DNA sequences associated with ecDNA), these can help confirm the presence of ecDNA. In this image, there are some red spots, which might indicate regions of ecDNA if they

In [7]:
# Save output
with open("output.txt", "w") as file:
    file.write(chat_response.choices[0].message.content)

## Conclusions
As we can see from the above input, the Pixtral-12b model is able to provide a quite detailed and somewhat coherent explanation of the image. It is able to reason why certain parts of the image are colored the way they are such as discussing relevant methods used to stain DNA.