# Pycrucible Tutorial

This tutorial demonstrates how to use the pycrucible client to manage data through the Crucible Platform:
- Retrieve your user crucible API key
- Upload datasets to Crucible with automated metadata parsing
- Upload datasets to Crucible with manually curated metadata appended
- Associate datasets with batches
- Query datasets by batch, sample
- Query samples by batch, dataset
- Upload sample synthesis metadata
- Download data
- Generate AutoBot batch report

In [None]:
import os
import json
from datetime import datetime
from pycrucible import CrucibleClient
import uuid
from typing import List, Dict
import pprint

#### Step 1: Set up the Crucible Python Client

In a web browser navigate to https://crucible.lbl.gov/testapi/user_apikey.  You will be prompted to login with your ORCID.  Login to ORCID and copy the resulting apikey to an environment variable. 

In [None]:
# Configuration - Update these with your credentials
API_URL = "https://crucible.lbl.gov/testapi"  # Replace with your API URL
API_KEY = os.getenv('CRUCIBLE_APIKEY')

# Initialize the client
client = CrucibleClient(API_URL, API_KEY)
print("Crucible client initialized successfully!")

#### Step 2: Use the Crucible python client to upload and ingest a batch of SpecRun datasets

In [None]:
data_folder = "tutorial_data/inline_characterization"
h5_files = [f for f in os.listdir(data_folder) if f.endswith('spec_run.h5')]

for h5file in h5_files[0:1]:
    h5 = os.path.join(data_folder, h5file)
    print(h5)
    ds = client.build_new_dataset_from_file(files_to_upload = [h5],
                                            dataset_name = f"mkw-{os.path.basename(h5)}",
                                            ingestor = "SpinbotSpecRunIngestor",
                                            verbose = False)

##### Check out the data you just uploaded

In [None]:
found_ds = client.get_dataset('0t4pt8d375vt70006zph06dfh0', include_metadata=True)
pprint.pprint(found_ds)

In [None]:
# should make a client func for ingesting from dsid
client.list_datasets(file_to_upload = 'api-uploads/tutorial_data/inline_characterization/yrliu98_S-pMeMBAI-pre-2_1_5_run1_spec_run.h5')

In [None]:
# query by dataset
client.list_samples(dataset_id = '0t4ps2dvydsfq000ar7vs9nrr8')

In [None]:
batch_id = '0t3h7ymbm5s27000z6tt82zvx4'

In [None]:
# query by batch id
client.list_samples(parent_id = batch_id)

In [None]:
# see all datasets for a batch
client.list_datasets(sample_id = batch_id)

#### Step 3: Send the dataset information to the data catalog (SciCat)

In [None]:
client.send_to_scicat(dsid = '0t4qtjskj9sjv000n56t1x66j0', wait_for_scicat_response= True)

Go to https://mf-scicat.lbl.gov to get a quick look at your data

##### Add a project to associate with your data

In [None]:
help(client.add_project(project_info)

In [None]:
client.add_project(project_info = {"project_id":"AUM_DEMO",
                                   "organization":"Summer School",
                                   "project_lead_email":"mkwall@lbl.gov"})

#### Step 4: Use the Crucible python client to upload and ingest a photo of the batch as a dataset

In [None]:
metadata_to_add = {'comments': 'this is a fake dataset', 
                   'weather': 'sunny',
                   'iphone_version': 11
                  }

In [None]:
batch_name = 'S-pMeMBAI-pre-2'
data_folder = "tutorial_data/photo_captures"
p1 = os.path.join(data_folder, 'DSC_0001.jpg')
p2 = os.path.join(data_folder, 'DSC_0002.jpg')
ds = client.build_new_dataset_from_file(files_to_upload = [p1, p2],
                                        dataset_name = 'S-pMeMBAI-pre-photo-capture',
                                        project_id = "AUM_DEMO",
                                        owner_orcid = None,
                                        instrument_name = "PhotoBox",
                                        measurement = "iphone_capture",
                                        session_name = 'S-pMeMBAI-pre-2', 
                                        creation_time = None,
                                        source_folder = data_folder,
                                        scientific_metadata = metadata_to_add,
                                        keywords = [batch_name], 
                                        ingestor = 'ImageIngestor',
                                        verbose = False, 
                                        wait_for_ingestion_response = True)

#### Step 4: Link this new dataset to the batch it is associated with 

In [None]:
client.add_dataset_to_sample(dataset_id = '0t4pvp3s1dvc30006g6t6fq7qc', sample_id = batch_id)

In [None]:
client.list_datasets(sample_id = batch_id)

#### Step 5: Add Additional Metadata to Samples

Demonstrate how to add custom metadata to individual samples.

In [None]:
from pydantic import BaseModel
class SpinbotSampleMetadata(BaseModel):
    sample_id: str
    spin_duration_s:
    spin_velocity_rpm:
    dispense_delay_s:
    pipette_height_mm:
    dispense_speed_ul_s:
    precursor_b_volume_ul:
    annealing_duration_s:
    molar_ratio_fai_macl:

client.add_sample_metadata(sample_id = batch_id, metadata = {**spinbot_batch_md})

#### Step 7: Download the data associated with a batch

Download all datasets associated with a batch.

In [None]:
datasets_in_batch = client.list_datasets(sample_id = batch_id)
for ds in datasets_in_batch:
    print()
    client.download_dataset(dsid = ds['unique_id'])

#### Step 8: Generate a Batch Report Card