# Crucible Python Client Tutorial

This notebook demonstrates how to use the Crucible Python client to manage datasets, samples, projects, and their relationships.

**Prerequisites:**
- Crucible API credentials configured (run `crucible config init` in terminal)
- Access to project `crucible-demo`
- Example data files in the `data/` directory

## Table of Contents
1. [Setup and Configuration](#setup)
2. [Creating a Sample](#create-sample)
3. [Creating a Dataset](#create-dataset)
4. [Listing Datasets in a Project](#list-datasets)
5. [Getting a Dataset with Metadata](#get-dataset)
6. [Linking a Sample to a Dataset](#link-sample-dataset)
7. [Linking Two Datasets (Parent-Child)](#link-datasets)
8. [Linking Two Samples (Parent-Child)](#link-samples)
9. [Adding a Thumbnail to a Dataset](#add-thumbnail)

<a id='setup'></a>
## 1. Setup and Configuration

First, you need to configure your Crucible API credentials. Run this command in your terminal (only needed once):

```bash
crucible config init
```

This will prompt you for:
- **API Key** - Get it from https://crucible.lbl.gov/api/v1/user_apikey
- **API URL** - Defaults to https://crucible.lbl.gov/api/v1
- **Default Project** (optional) - You can set `crucible-demo` as default

Once configured, you can import and use the Crucible client:

In [1]:
import os
from pathlib import Path
from crucible.client import CrucibleClient

# Initialize the client (automatically loads configuration)
client = CrucibleClient()

# Get the data directory path
EXAMPLES_DIR = Path(os.getcwd())
DATA_DIR = EXAMPLES_DIR / "data"

print("✓ Crucible client initialized successfully!")
print(f"Data directory: {DATA_DIR}")

✓ Crucible client initialized successfully!
Data directory: /home/roncofaber/software/nano-crucible/examples/data


Set your project ID for this tutorial:

In [2]:
# Project ID for this tutorial
PROJECT_ID = "crucible-demo"

# Verify project exists
project = client.projects.get(PROJECT_ID)
if project:
    print(f"✓ Working with project: {project['project_id']}")
    print(f"  Organization: {project.get('organization', 'N/A')}")
    print(f"  Lead: {project.get('project_lead_email', 'N/A')}")
else:
    print(f"⚠ Project '{PROJECT_ID}' not found. You may need to create it first.")

✓ Working with project: crucible-demo
  Organization: Molecular Foundry
  Lead: roncoroni@lbl.gov


<a id='create-sample'></a>
## 2. Creating a Sample

Samples represent physical materials or specimens. Let's create a sample:

In [3]:
# Create a sample
sample = client.samples.create(
    sample_name="Silicon Wafer A - Tutorial Example",
    project_id=PROJECT_ID,
    description="Silicon wafer for thermal conductivity measurements (tutorial example)"
)

sample_id = sample['unique_id']
print(f"✓ Sample created successfully!")
print(f"  Sample ID: {sample_id}")
print(f"  Sample Name: {sample['sample_name']}")
print(f"  Project: {sample['project_id']}")

✓ Sample created successfully!
  Sample ID: 0tcy5e1n1hs53000kqzr2peykr
  Sample Name: Silicon Wafer A - Tutorial Example
  Project: crucible-demo


<a id='create-dataset'></a>
## 3. Creating a Dataset

Datasets contain data files and metadata. You can create a dataset with or without files.

### 3.1 Create Dataset with Metadata Only

In [4]:
from crucible.models import BaseDataset

# Define dataset metadata
dataset_metadata = BaseDataset(
    project_id=PROJECT_ID,
    measurement="thermal_conductivity",
    dataset_name="Thermal Conductivity Measurement - Sample A (Tutorial)",
    public=False
)

# Create dataset without files
result = client.datasets.create(
    dataset=dataset_metadata,
    scientific_metadata={
        "temperature_range": "273-363 K",
        "measurement_method": "3-omega method",
        "equipment": "Lakeshore 336 + SR830 lock-in",
        "sample_type": "silicon wafer"
    },
    keywords=["thermal", "conductivity", "silicon", "tutorial"]
)

dataset_id = result['dsid']
print(f"✓ Dataset created successfully!")
print(f"  Dataset ID: {dataset_id}")
print(f"  Dataset Name: {result['created_record']['dataset_name']}")

✓ Dataset created successfully!
  Dataset ID: 0tcy5tt115xs7000n802q2fxsr
  Dataset Name: Thermal Conductivity Measurement - Sample A (Tutorial)


### 3.2 Create Dataset with Files

Now let's create a dataset with actual data files:

In [5]:
# Create dataset with file upload
dataset_with_files = BaseDataset(
    project_id=PROJECT_ID,
    measurement="thermal_conductivity",
    dataset_name="Thermal Conductivity Data with Files (Tutorial)",
    public=False
)

# Files to upload
files_to_upload = [
    str(DATA_DIR / "thermal_conductivity_data.csv"),
    str(DATA_DIR / "measurement_notes.txt")
]

# Verify files exist
print("Files to upload:")
for f in files_to_upload:
    exists = "✓" if Path(f).exists() else "✗"
    print(f"  {exists} {Path(f).name}")

result_with_files = client.datasets.create(
    dataset=dataset_with_files,
    files_to_upload=files_to_upload,
    scientific_metadata={
        "temperature_range": "273-363 K",
        "data_points": 10,
        "measurement_method": "3-omega method",
        "sample_material": "silicon"
    },
    keywords=["thermal", "conductivity", "data", "tutorial"],
    ingestor="ApiUploadIngestor",
    wait_for_ingestion_response=True
)

dataset_with_files_id = result_with_files['dsid']
print(f"\n✓ Dataset with files created successfully!")
print(f"  Dataset ID: {dataset_with_files_id}")
print(f"  Files uploaded: {len(result_with_files.get('uploaded_files', []))}")
print(f"  Ingestion status: {result_with_files.get('ingestion_request', {}).get('status', 'N/A')}")

Files to upload:
  ✓ thermal_conductivity_data.csv
  ✓ measurement_notes.txt

✓ Dataset with files created successfully!
  Dataset ID: 0tcy5tvd65rjz00039fj0k8nyr
  Files uploaded: 2
  Ingestion status: complete


<a id='list-datasets'></a>
## 4. Listing Datasets in a Project

Retrieve all datasets associated with a project:

In [6]:
# List all datasets in the project
datasets = client.datasets.list(project_id=PROJECT_ID, limit=50)

print(f"Found {len(datasets)} dataset(s) in project {PROJECT_ID}\n")

# Display first 5 datasets
for i, ds in enumerate(datasets[:5], 1):
    print(f"{i}. {ds.get('unique_id', 'N/A')}")
    print(f"   Name: {ds.get('dataset_name', 'Unnamed')}")
    print(f"   Measurement: {ds.get('measurement', 'N/A')}")
    print(f"   Public: {ds.get('public', False)}")
    if ds.get('creation_time'):
        print(f"   Created: {ds['creation_time'][:10]}")
    print()

Found 5 dataset(s) in project crucible-demo

1. 0tcy5mfwcxyxs000fs84n0vacw
   Name: Thermal Conductivity Measurement - Sample A (Tutorial)
   Measurement: thermal_conductivity
   Public: False
   Created: 2026-02-24

2. 0tcy5n3hpnxhn0001scz149deg
   Name: Thermal Conductivity Data with Files (Tutorial)
   Measurement: thermal_conductivity
   Public: False
   Created: 2026-02-24

3. 0tcy5q5ytsvf1000h9jaz9kfc8
   Name: Processed Thermal Conductivity Data (Tutorial)
   Measurement: thermal_conductivity_analysis
   Public: False
   Created: 2026-02-24

4. 0tcy5tt115xs7000n802q2fxsr
   Name: Thermal Conductivity Measurement - Sample A (Tutorial)
   Measurement: thermal_conductivity
   Public: False
   Created: 2026-02-24

5. 0tcy5tvd65rjz00039fj0k8nyr
   Name: Thermal Conductivity Data with Files (Tutorial)
   Measurement: thermal_conductivity
   Public: False
   Created: 2026-02-24



<a id='get-dataset'></a>
## 5. Getting a Dataset with Metadata

Retrieve detailed information about a specific dataset, including scientific metadata:

In [7]:
# Get dataset with metadata
dataset_details = client.datasets.get(
    dsid=dataset_id,
    include_metadata=True
)

print(f"Dataset: {dataset_details['unique_id']}")
print(f"Name: {dataset_details.get('dataset_name', 'N/A')}")
print(f"Measurement: {dataset_details.get('measurement', 'N/A')}")
print(f"Public: {dataset_details.get('public', False)}")
print(f"Project: {dataset_details.get('project_id', 'N/A')}")

print(f"\nScientific Metadata:")
if 'scientific_metadata' in dataset_details and dataset_details['scientific_metadata']:
    for key, value in dataset_details['scientific_metadata'].items():
        print(f"  {key}: {value}")
else:
    print("  No metadata available")

Dataset: 0tcy5tt115xs7000n802q2fxsr
Name: Thermal Conductivity Measurement - Sample A (Tutorial)
Measurement: thermal_conductivity
Public: False
Project: crucible-demo

Scientific Metadata:
  id: 107978
  dataset_unique_id: 0tcy5tt115xs7000n802q2fxsr
  scientific_metadata: {'temperature_range': '273-363 K', 'measurement_method': '3-omega method', 'equipment': 'Lakeshore 336 + SR830 lock-in', 'sample_type': 'silicon wafer'}


You can also get keywords and other dataset properties:

In [8]:
# Get keywords
keywords = client.datasets.get_keywords(dataset_id)
if keywords:
    keyword_list = [kw.get('keyword', '') for kw in keywords]
    print(f"Keywords: {', '.join(keyword_list)}")
else:
    print("Keywords: None")

# Get thumbnails
thumbnails = client.datasets.get_thumbnails(dataset_id)
print(f"Number of thumbnails: {len(thumbnails)}")

Keywords: thermal, silicon, conductivity, tutorial
Number of thumbnails: 0


<a id='link-sample-dataset'></a>
## 6. Linking a Sample to a Dataset

Associate a dataset with a sample to indicate which sample the data comes from:

In [9]:
# Link sample to dataset
result = client.samples.add_to_dataset(
    sample_id=sample_id,
    dataset_id=dataset_id
)

print(f"✓ Sample {sample_id} linked to dataset {dataset_id}")

# Verify the link
datasets_for_sample = client.datasets.list(sample_id=sample_id)
print(f"\nDatasets linked to sample {sample_id}: {len(datasets_for_sample)}")
for ds in datasets_for_sample:
    print(f"  - {ds['unique_id']}: {ds.get('dataset_name', 'N/A')}")

✓ Sample 0tcy5e1n1hs53000kqzr2peykr linked to dataset 0tcy5tt115xs7000n802q2fxsr

Datasets linked to sample 0tcy5e1n1hs53000kqzr2peykr: 2
  - 0tcy5mfwcxyxs000fs84n0vacw: Thermal Conductivity Measurement - Sample A (Tutorial)
  - 0tcy5tt115xs7000n802q2fxsr: Thermal Conductivity Measurement - Sample A (Tutorial)


<a id='link-datasets'></a>
## 7. Linking Two Datasets (Parent-Child)

Create hierarchical relationships between datasets. For example, link a processed dataset to its raw data:

In [10]:
# Create a second dataset (processed data)
processed_dataset = BaseDataset(
    project_id=PROJECT_ID,
    measurement="thermal_conductivity_analysis",
    dataset_name="Processed Thermal Conductivity Data (Tutorial)",
    public=False
)

result_processed = client.datasets.create(
    dataset=processed_dataset,
    scientific_metadata={
        "thermal_conductivity_300K": 148.5,
        "thermal_conductivity_unit": "W/m·K",
        "processing_method": "polynomial curve fitting (order 2)",
        "uncertainty": 1.8,
        "parent_dataset": dataset_with_files_id
    },
    keywords=["processed", "thermal", "conductivity", "analysis", "tutorial"]
)

processed_dataset_id = result_processed['dsid']
print(f"✓ Processed dataset created: {processed_dataset_id}")

# Link datasets: raw data (parent) -> processed data (child)
link_result = client.datasets.link_parent_child(
    parent_dataset_id=dataset_with_files_id,
    child_dataset_id=processed_dataset_id
)

print(f"\n✓ Datasets linked successfully!")
print(f"  Parent (raw data): {dataset_with_files_id}")
print(f"  Child (processed): {processed_dataset_id}")

# List children of parent dataset
children = client.datasets.list_children(dataset_with_files_id)
print(f"\nChild datasets of {dataset_with_files_id}: {len(children)}")
for child in children:
    print(f"  - {child['unique_id']}: {child.get('dataset_name', 'N/A')}")

✓ Processed dataset created: 0tcy5v7sknwks000dxx1rbvpm8

✓ Datasets linked successfully!
  Parent (raw data): 0tcy5tvd65rjz00039fj0k8nyr
  Child (processed): 0tcy5v7sknwks000dxx1rbvpm8

Child datasets of 0tcy5tvd65rjz00039fj0k8nyr: 1
  - 0tcy5v7sknwks000dxx1rbvpm8: Processed Thermal Conductivity Data (Tutorial)


<a id='link-samples'></a>
## 8. Linking Two Samples (Parent-Child)

Create hierarchical relationships between samples. For example, link a subsample to its parent sample:

In [11]:
# Create a subsample
subsample = client.samples.create(
    sample_name="Silicon Wafer A - Region 1 (Tutorial)",
    project_id=PROJECT_ID,
    description="Sub-region of wafer A for localized measurements (tutorial example)"
)

subsample_id = subsample['unique_id']
print(f"✓ Subsample created: {subsample_id}")

# Link samples: parent sample -> subsample
link_result = client.samples.link(
    parent_id=sample_id,
    child_id=subsample_id
)

print(f"\n✓ Samples linked successfully!")
print(f"  Parent: {sample_id}")
print(f"  Child: {subsample_id}")

# List children of parent sample
children = client.samples.list_children(sample_id)
print(f"\nChild samples of {sample_id}: {len(children)}")
for child in children:
    print(f"  - {child['unique_id']}: {child.get('sample_name', 'N/A')}")

✓ Subsample created: 0tcy5qrjehx0n0006p5a0y8ttg

✓ Samples linked successfully!
  Parent: 0tcy5e1n1hs53000kqzr2peykr
  Child: 0tcy5qrjehx0n0006p5a0y8ttg

Child samples of 0tcy5e1n1hs53000kqzr2peykr: 1
  - 0tcy5qrjehx0n0006p5a0y8ttg: Silicon Wafer A - Region 1 (Tutorial)


<a id='add-thumbnail'></a>
## 9. Adding a Thumbnail to a Dataset

Upload a thumbnail image to provide a visual preview of your dataset:

In [12]:
# Path to thumbnail image
thumbnail_path = str(DATA_DIR / "thermal_measurement_preview.png")

# Verify file exists
if Path(thumbnail_path).exists():
    print(f"✓ Thumbnail file found: {Path(thumbnail_path).name}")
    
    # Add thumbnail to dataset
    result = client.datasets.add_thumbnail(
        dsid=dataset_id,
        file_path=thumbnail_path,
        thumbnail_name="thermal_measurement_preview"
    )
    
    print(f"\n✓ Thumbnail added to dataset {dataset_id}")
    
    # List all thumbnails for the dataset
    thumbnails = client.datasets.get_thumbnails(dataset_id)
    print(f"\nThumbnails for dataset:")
    for thumb in thumbnails:
        print(f"  - {thumb.get('name', 'unnamed')}")
else:
    print(f"✗ Thumbnail file not found: {thumbnail_path}")

✓ Thumbnail file found: thermal_measurement_preview.png

✓ Thumbnail added to dataset 0tcy5tt115xs7000n802q2fxsr

Thumbnails for dataset:
  - unnamed


## Summary

This notebook demonstrated the core Crucible operations:

1. ✓ **Configuration** - Set up API credentials with `crucible config init`
2. ✓ **Create Sample** - `client.samples.create()`
3. ✓ **Create Dataset** - `client.datasets.create()` with optional files
4. ✓ **List Datasets** - `client.datasets.list(project_id=...)`
5. ✓ **Get Dataset Details** - `client.datasets.get(dsid=..., include_metadata=True)`
6. ✓ **Link Sample to Dataset** - `client.samples.add_to_dataset()`
7. ✓ **Link Datasets** - `client.datasets.link_parent_child()`
8. ✓ **Link Samples** - `client.samples.link()`
9. ✓ **Add Thumbnail** - `client.datasets.add_thumbnail()`

### Resource IDs Created in This Tutorial

The following variables contain IDs of resources created in this notebook:

In [None]:
# Display all IDs created
print("Resource IDs created in this tutorial:")
print(f"\nSamples:")
print(f"  sample_id = {sample_id}")
print(f"  subsample_id = {subsample_id}")
print(f"\nDatasets:")
print(f"  dataset_id = {dataset_id}")
print(f"  dataset_with_files_id = {dataset_with_files_id}")
print(f"  processed_dataset_id = {processed_dataset_id}")

print("\n" + "="*60)
print("You can open any of these resources in your browser using:")
print("="*60)
print(f"\n  crucible open {sample_id}")
print(f"  crucible open {dataset_id}")
print(f"  crucible open {dataset_with_files_id}")
print("\nThe 'crucible open' command works with any dataset, sample, or project ID.")

### Additional Common Operations

Here are some other useful operations you can perform:

In [15]:
# Update dataset metadata
# client.datasets.update_scientific_metadata(
#     dataset_id, 
#     {"new_field": "value", "updated_field": "new_value"}
# )

# Add additional keywords
# client.datasets.add_keyword(dataset_id, "new-keyword")

# Upload additional files to existing dataset
# client.datasets.upload_file(dataset_id, "/path/to/file.txt")

# Request SciCat upload
# client.datasets.request_scicat_upload(dataset_id)

# List all samples in a project
samples = client.samples.list(project_id=PROJECT_ID, limit=999999)
print(f"Total samples in project: {len(samples)}")

# List all projects
projects = client.projects.list(limit=9999)
print(f"Total accessible projects: {len(projects)}")

Total samples in project: 2
Total accessible projects: 187


### Additional Resources

- **Documentation**: See `crucible/cli/README.md` for CLI usage
- **API Reference**: Check docstrings in `crucible/resources/` for all available methods
- **Examples**: More examples in `crucible/parsers/` for specialized data formats (LAMMPS, MatEnsemble)
- **Data Files**: Example data files used in this tutorial are in `examples/data/`