# Exploring Data Tutorial

This tutorial demonstrates how to explore your data in Datamint. You'll learn how to:

- Connect to the Datamint API
- List and select projects
- Fetch resources from a project
- Download and display resource images
- Fetch annotations and their segmentation files

**Note:** This tutorial focuses on read-only operations

## 1. Setup and API Connection

First, let's import the necessary libraries and connect to the Datamint API.

In [None]:
from datamint import Api
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import os

# Initialize the API client
# Make sure you have set your DATAMINT_API_KEY environment variable
# or pass it directly: api = Api(api_key="your_api_key")
api = Api()

print("✓ Successfully connected to Datamint API")

## 2. Exploring Projects

Let's start by listing all available projects and selecting one to work with.

In [None]:
# Get all projects
projects = api.projects.get_all()

print(f"Found {len(projects)} projects:\n")
for i, project in enumerate(projects, 1):
    print(f"{i}. {project.name}")
    print(f"   Description: {project.description}")
    print()

In [None]:
# Select a project to work with (by name or index)
# Option 1: By name
project_name = "Your Project Name"  # Replace with your project name
selected_project = api.projects.get_by_name(project_name)

# Option 2: By index (uncomment to use)
# project_index = 0  # First project
# selected_project = projects[project_index]

if selected_project:
    print(f"Selected project: {selected_project.name}")
else:
    print(f"Project '{project_name}' not found. Please check the project name.")

## 3. Getting Resources from a Project

Now let's fetch all resources associated with the selected project.

In [None]:
# Get all resources from the selected project
resources = api.projects.get_project_resources(selected_project)

print(f"Found {len(resources)} resources in project '{selected_project.name}':\n")

# Display first 5 resources
for i, resource in enumerate(resources[:3], 1):
    print(f"{i}. Filename: {resource.filename}")
    print(f"   Created at: {resource.created_at}")
    print(f"   Created by: {resource.created_by}")
    print(f"   MIME type: {resource.mimetype}")
    print(f"   Status: {resource.status}")
    print()

if len(resources) > 3:
    print(f"... and {len(resources) - 3} more resources")

## 4. Downloading and Displaying Resource Images

Let's download and display the image data from a resource using the `download_resource_file` method.

In [None]:
# Select a resource to work with
selected_resource = resources[0]  # First resource

print(f"Selected resource: {selected_resource.filename}")

# Download the resource file with auto-conversion
# This will automatically convert DICOM to numpy array, etc.
image_data = api.resources.download_resource_file(
    selected_resource,
    auto_convert=True
)

print(f"\nImage data type: {type(image_data)}")
if isinstance(image_data, np.ndarray):
    print(f"Image shape: {image_data.shape}")
elif hasattr(image_data, 'pixel_array'):  # DICOM
    print(f"DICOM pixel array shape: {image_data.pixel_array.shape}")

In [None]:
# Display the image
fig, ax = plt.subplots(1, 1, figsize=(10, 10))

if isinstance(image_data, np.ndarray):
    # NumPy array
    if len(image_data.shape) == 3:
        # RGB or multi-frame
        if image_data.shape[0] == 3:  # RGB format (C, H, W)
            ax.imshow(image_data.transpose(1, 2, 0))
        else:
            # Multi-frame, show first frame
            ax.imshow(image_data[0], cmap='gray')
    else:
        # Grayscale
        ax.imshow(image_data, cmap='gray')
elif hasattr(image_data, 'pixel_array'):
    # DICOM
    ax.imshow(image_data.pixel_array[0], cmap='gray')
elif isinstance(image_data, Image.Image):
    # PIL Image
    ax.imshow(image_data)
else:
    print(f"Unsupported image format: {type(image_data)}")

ax.set_title(f"Resource: {selected_resource.filename}")
ax.axis('off')
plt.tight_layout()
plt.show()

## 5. Getting Annotations for Resources

Now let's fetch annotations for our resources, including segmentation files.

In [None]:
# Get annotations for a single resource
annotations = api.annotations.get_list(
    resource=selected_resource,
    load_ai_segmentations=True
)

print(f"Found {len(annotations)} annotations for resource '{selected_resource.filename}':\n")

for i, annotation in enumerate(annotations, 1):
    print(f"{i}. Annotation ID: {annotation.id}")
    print(f"   Type: {annotation.type}")
    print(f"   Identifier: {annotation.identifier}")
    print(f"   Scope: {annotation.scope}")
    if hasattr(annotation, 'frame_index') and annotation.frame_index is not None:
        print(f"   Frame index: {annotation.frame_index}")
    print()

In [None]:
# Get annotations for multiple resources
# This is more efficient than calling get_list for each resource individually
multiple_resources = resources[:3]  # First 3 resources

# Option 1: Get all annotations in a flat list
all_annotations = api.annotations.get_list(
    resource=multiple_resources,
    load_ai_segmentations=True
)

print(f"Found {len(all_annotations)} total annotations across {len(multiple_resources)} resources\n")

# Option 2: Group annotations by resource
grouped_annotations = api.annotations.get_list(
    resource=multiple_resources,
    load_ai_segmentations=True,
    group_by_resource=True
)

print("Annotations grouped by resource:")
for i, (resource, resource_annotations) in enumerate(zip(multiple_resources, grouped_annotations), 1):
    print(f"\n{i}. Resource: {resource.filename}")
    print(f"   Number of annotations: {len(resource_annotations)}")
    for j, ann in enumerate(resource_annotations, 1):
        print(f"   {j}. {ann.identifier} ({ann.type})")

## 6. Downloading and Displaying Segmentation Files

Let's download segmentation files for annotations that have them.

In [None]:
from io import BytesIO

# Filter for segmentation annotations
segmentation_annotations = [ann for ann in annotations if ann.type == 'segmentation']

if segmentation_annotations:
    print(f"Found {len(segmentation_annotations)} segmentation annotation(s)")
    
    # Download the first segmentation
    seg_annotation = segmentation_annotations[0]
    print(f"\nDownloading segmentation: {seg_annotation.identifier}")
    
    # Download segmentation file
    seg_data = seg_annotation.fetch_file_data(use_cache=True) # cache at "~/.datamint/" for faster access next time
    # seg_data is a numpy array

    print(f"Segmentation shape: {seg_data.shape}")
    print(f"Unique values: {np.unique(seg_data)}")
else:
    print("No segmentation annotations found for this resource")

In [None]:
# Display the original image and segmentation side by side
if segmentation_annotations:
    fig, axes = plt.subplots(1, 2, figsize=(15, 7))
    
    # Original image
    if isinstance(image_data, np.ndarray):
        if len(image_data.shape) == 3 and image_data.shape[0] == 3:
            axes[0].imshow(image_data.transpose(1, 2, 0))
        else:
            axes[0].imshow(image_data[0] if len(image_data.shape) == 3 else image_data, cmap='gray')
    elif hasattr(image_data, 'pixel_array'):
        axes[0].imshow(image_data.pixel_array, cmap='gray')
    elif isinstance(image_data, Image.Image):
        axes[0].imshow(image_data)
    
    axes[0].set_title(f"Original: {selected_resource.filename}")
    axes[0].axis('off')
    
    # Segmentation
    axes[1].imshow(seg_array, cmap='jet', alpha=0.7)
    axes[1].set_title(f"Segmentation: {seg_annotation.identifier}")
    axes[1].axis('off')
    
    plt.tight_layout()
    plt.show()

## 7. Bulk Download Multiple Segmentations

For efficiency, you can download multiple segmentation files at once.

In [None]:
# Create a directory to save segmentations
output_dir = "./downloaded_segmentations"
os.makedirs(output_dir, exist_ok=True)

# Download multiple segmentation files
if len(segmentation_annotations) > 1:
    # Prepare save paths
    save_paths = [
        os.path.join(output_dir, f"{ann.identifier}_{ann.id}.png")
        for ann in segmentation_annotations
    ]
    
    # Bulk download (faster than downloading one by one)
    results = api.annotations.download_multiple_files(
        segmentation_annotations,
        save_paths
    )
    
    # Check results
    successful = [r for r in results if r['success']]
    failed = [r for r in results if not r['success']]
    
    print(f"Downloaded {len(successful)} segmentation files successfully")
    if failed:
        print(f"Failed to download {len(failed)} files:")
        for f in failed:
            print(f"  - {f['annotation_id']}: {f.get('error', 'Unknown error')}")
else:
    print("Not enough segmentations for bulk download demo")

## 8. Filtering Annotations

You can filter annotations by various criteria.

In [None]:
from datetime import date, timedelta

# Filter by annotation type
category_annotations = api.annotations.get_list(
    resource=selected_resource,
    annotation_type='category'
)
print(f"Category annotations: {len(category_annotations)}")

# Filter by date range
date_to = date.today()
date_from = date_to - timedelta(days=30)  # Last 30 days

recent_annotations = api.annotations.get_list(
    resource=selected_resource,
    date_from=date_from,
    date_to=date_to
)
print(f"Annotations from last 30 days: {len(recent_annotations)}")

# Filter by status
published_annotations = api.annotations.get_list(
    resource=selected_resource,
    status='published'
)
print(f"Published annotations: {len(published_annotations)}")

## Summary

In this tutorial, you learned how to:

1. ✓ Connect to the Datamint API
2. ✓ List and select projects
3. ✓ Fetch resources from a project using `get_project_resources()`
4. ✓ Download and display resource images using `download_resource_file()`
5. ✓ Fetch annotations for resources (single or multiple)
6. ✓ Download segmentation files using `download_file()` or `download_multiple_files()`
7. ✓ Filter annotations by type, date, and status

All operations were read-only, demonstrating how to explore your data without making any modifications.