## Crucible Python Client Tutorial

This notebook demonstrates how to use the Crucible Python Client to interact with the Crucible Data Platform.
<br>
You can run ```help(client.{function_name})``` to see details about any of the client functions.
<br>
Replace ```{function_name}``` with the name of the function you want to see

### Setup

First, import the client and initialize it with your API credentials.

In [None]:
import os
import mfid
from dotenv import load_dotenv
from pycrucible import CrucibleClient
from pycrucible.models import BaseDataset

# Load environment variables
load_dotenv()

# Initialize the client
api_url = 'https://crucible.lbl.gov/testapi'
api_key = os.environ.get("admin_apikey")  # or "admin_apikey" for admin access

client = CrucibleClient(api_url, api_key)

### 1. Searching for Datasets

Use `list_datasets()` to search for datasets with optional filters.

In [None]:
# List datasets 
datasets = client.list_datasets(limit=1000)
print(f"Found {len(datasets)} datasets")
print(f"\nFirst dataset: {datasets[0]['dataset_name']}")

In [None]:
datasets[0]

In [None]:
help(client.list_datasets)

#### Available Filters

You can filter datasets using various parameters:

In [None]:
# Filter by keyword
keyword_datasets = client.list_datasets(keyword='tem', limit=5)
print(f"Datasets with keyword 'tem': {len(keyword_datasets)}")

# Filter by instrument
instrument_datasets = client.list_datasets(instrument_name='titanx', limit=5)
print(f"Datasets from 'titanx' instrument: {len(instrument_datasets)}")

# Filter by owner ORCID
owner_datasets = client.list_datasets(owner_orcid='0009-0001-9493-2006', limit=5)
print(f"Datasets by owner: {len(owner_datasets)}")

# Combine multiple filters
filtered = client.list_datasets(keyword='tem', instrument_name='titanx', limit=5)
print(f"Datasets matching multiple filters: {len(filtered)}")

In [None]:
filtered[0]

In [None]:
# Search datasets by sample ID
sample_id = '0t3q9zq7enrhf0004dvevszkmm'  # Example sample ID
sample_datasets = client.list_datasets(sample_id=sample_id)
print(f"Datasets for sample {sample_id}: {len(sample_datasets)}")

In [None]:
# Search datasets by sample ID and filter
sample_id = '0t3q9zq7enrhf0004dvevszkmm'  # Example sample ID
sample_datasets = client.list_datasets(sample_id=sample_id, data_format = 'h5')
print(f"Datasets for sample {sample_id}: {len(sample_datasets)}")

### 2. Adding Datasets

There are two main ways to add datasets: from JSON metadata only, or with a file upload.

#### Option A: Add Dataset from JSON (metadata only)

In [None]:
my_dataset = BaseDataset(dataset_name='TEST3 - My New Dataset',
                         owner_orcid='0009-0001-9493-2006',
                         project_id='MFP08540',
                         instrument_name='titanx',
                         measurement='haadf',
                         public=False
                        )

example_scientific_metadata = {'voltage': '200kV', 'magnification': '50000x'}
example_keywords = ['tem', 'nanoparticles']

result = client.create_new_dataset(dataset = my_dataset,
                                   scientific_metadata=example_scientific_metadata,
                                   keywords=exampel_keywords
                                   )

dsid = result['created_record']['unique_id']
print(f"Created dataset: {dsid}")

In [None]:
# check what it looks like in the database
new_ds = client.get_dataset(dsid = dsid)
new_ds

#### Option B: Add Dataset with File Upload

In [None]:
help(client.create_new_dataset_from_files)

In [None]:
file_path = './test-data/0sdazahr0nxh300075jj73j2kg_240119_144139_hyperspec_picam_mcl.h5'
my_file_dataset = BaseDataset(dataset_name='TEST4 - Dataset with File',
                              unique_id='0sdazahr0nxh300075jj73j2kg', 
                              owner_orcid='0009-0001-9493-2006',
                              project_id='MFP08540')

result = client.create_new_dataset_from_files(
    dataset = my_file_dataset,
    files_to_upload=[file_path],
    scientific_metadata={'notes': 'this is a test dataset we keep reusing'},
    keywords=['test'],
    ingestor='HyperspecScopeFoundryH5Ingestor',  # Optional: specify ingestion class
    wait_for_ingestion_response=True
)

dsid = result['created_record']['unique_id']
print(f"Created dataset with file: {dsid}")
print(f"Ingestion status: {result['ingestion_request']['status']}")

In [None]:
# check what it looks like in the database
dsid = result['created_record']['unique_id']
new_ds = client.get_dataset(dsid = dsid)
new_ds

### 3. Updating Datasets

Update existing dataset fields or scientific metadata.

In [None]:
# Update basic dataset fields
dsid = result['created_record']['unique_id']
updated = client.update_dataset(
    dsid,
    dataset_name='Updated Dataset Name',
    public=True,
    measurement='Hyperspectral Raman'
)

print(f"Updated dataset: {updated['dataset_name']}")
print(f"Now public: {updated['public']}")
client.get_dataset(dsid = dsid)

In [None]:
# Update scientific metadata (merge with existing)
dsid = result['created_record']['unique_id']
new_metadata = {
    'new_parameter': 'new_value',
    'analysis_date': '2024-01-15'
}

client.update_scientific_metadata(dsid, new_metadata, overwrite=False)
print("Scientific metadata updated (merged)")
client.get_dataset(dsid = dsid, include_metadata = True)
# Or overwrite all scientific metadata
complete_metadata = {
    'voltage': '300kV',
    'magnification': '100000x'
}

client.update_scientific_metadata(dsid, complete_metadata, overwrite=True)
print("Scientific metadata replaced")
client.get_dataset(dsid = dsid, include_metadata = True)

### 5. Working with Samples

Create samples and link them to datasets.

#### Create a New Sample

In [None]:
# Create a new sample
sample = client.add_sample(
    unique_id=mfid.mfid()[0],
    sample_name='TEST00000000',
    description='Au Nanoparticles Batch 42',
    owner_orcid='0009-0001-9493-2006',
    creation_date='2024-01-15'
)

sample_id = sample['unique_id']
print(f"Created sample: {sample_id}")

In [None]:
client.get_sample(sample_id)

#### Link Sample to Dataset

In [None]:
# Link an existing sample to a dataset
dataset_id = '0sdazahr0nxh300075jj73j2kg'
sample_id = sample_id

link = client.add_sample_to_dataset(dataset_id, sample_id)
print(f"Linked sample {sample_id} to dataset {dataset_id}")

In [None]:
client.get_sample(sample_id)

#### Get Datasets for a Sample

In [None]:
# Find all datasets associated with a sample
sample_datasets = client.list_datasets(sample_id=sample_id)
print(f"Found {len(sample_datasets)} datasets for sample {sample_id}")

for ds in sample_datasets[:3]:  # Show first 3
    print(f"  - {ds['dataset_name']} ({ds['unique_id']})")

#### Get Sample Information

In [None]:
# Retrieve sample details
sample = client.get_sample(sample_id)
print(f"Sample name: {sample['sample_name']}")
print(f"Description: {sample.get('description', 'N/A')}")
print(f"Owner ORCID: {sample.get('owner_orcid', 'N/A')}")

#### List all the parents of a given sample

In [None]:
sample_id = '0ta8e4hgfnz710006jdac3cttw'
client.list_parents_of_sample(sample_id)

#### List all the children of a given sample

In [None]:
client.list_children_of_sample(sample_id)