# Assemble subset .h5ad files

For QC and inspection of our cell type labels, we'll assemble samples based on subject metadata to get some reusable units that will be helpful for downstream analysis. We'll group subjects by their cohort (BR1 or BR2), biological sex (Female or Male), and CMV status (Negative or Positive). This should break up our very large dataset into chunks of a couple of million cells apiece.

To each of these subsets, we'll add our CMV and BMI metadata, as well as cell type label predictions and scrublet scores and calls.

We'll carry these subsets forward into QC filtering and cell type-based doublet and mislabeling clean-up in later notebooks.

## Load Packages

`anndata`: Data structures for scRNA-seq  
`datetime`: date and time functions  
`h5py`: HDF5 file I/O  
`hisepy`: The HISE SDK for Python  
`os`: operating system calls  
`pandas`: DataFrame data structures  
`re`: Regular expressions  
`scanpy`: scRNA-seq analysis  
`scipy.sparse`: Spare matrix data structures  

In [1]:
import anndata
from datetime import date
import h5py
import hisepy
import os
import pandas as pd
from pandas.api.types import is_object_dtype
import re
import scanpy as sc
import scipy.sparse as scs

## Helper functions

In [2]:
def cache_uuid_path(uuid):
    cache_path = '/home/jupyter/cache/{u}'.format(u = uuid)
    if not os.path.isdir(cache_path):
        hise_res = hisepy.reader.cache_files([uuid])
    filename = os.listdir(cache_path)[0]
    cache_file = '{p}/{f}'.format(p = cache_path, f = filename)
    return cache_file

In [3]:
def read_csv_uuid(uuid):
    cache_file = cache_uuid_path(uuid)
    res = pd.read_csv(cache_file)
    return res

In [4]:
def read_parquet_uuid(uuid):
    cache_file = cache_uuid_path(uuid)
    res = pd.read_parquet(cache_file)
    return res

Functions to read pipeline .h5 files as anndata

In [5]:
# define a function to read count data
def read_mat(h5_con):
    mat = scs.csc_matrix(
        (h5_con['matrix']['data'][:], # Count values
         h5_con['matrix']['indices'][:], # Row indices
         h5_con['matrix']['indptr'][:]), # Pointers for column positions
        shape = tuple(h5_con['matrix']['shape'][:]) # Matrix dimensions
    )
    return mat

# define a function to read obeservation metadata (i.e. cell metadata)
def read_obs(h5con):
    bc = h5con['matrix']['barcodes'][:]
    bc = [x.decode('UTF-8') for x in bc]

    # Initialized the DataFrame with cell barcodes
    obs_df = pd.DataFrame({ 'barcodes' : bc })

    # Get the list of available metadata columns
    obs_columns = h5con['matrix']['observations'].keys()
    
    # For each column
    for col in obs_columns:
        # Read the values
        values = h5con['matrix']['observations'][col][:]
        # Check for byte storage
        if(isinstance(values[0], (bytes, bytearray))):
            # Decode byte strings
            values = [x.decode('UTF-8') for x in values]
        # Add column to the DataFrame
        obs_df[col] = values

    obs_df = obs_df.set_index('barcodes', drop = False)
    
    return obs_df

# define a function to construct anndata object from a h5 file
def read_h5_anndata(h5_file):
    h5_con = h5py.File(h5_file, mode = 'r')
    # extract the expression matrix
    mat = read_mat(h5_con)
    # extract gene names
    genes = h5_con['matrix']['features']['name'][:]
    genes = [x.decode('UTF-8') for x in genes]
    # extract metadata
    obs_df = read_obs(h5_con)
    # construct anndata
    adata = anndata.AnnData(mat.T,
                             obs = obs_df)
    # make sure the gene names aligned
    adata.var_names = genes

    adata.var_names_make_unique()
    return adata

In [40]:
def element_id(n = 3):
    import periodictable
    from random import randrange
    rand_el = []
    for i in range(n):
        el = randrange(0,118)
        rand_el.append(periodictable.elements[el].name)
    rand_str = '-'.join(rand_el)
    return rand_str

## Read sample metadata from HISE

In [6]:
in_uuids = {}

In [7]:
sample_meta_uuid = 'd82c5c42-ae5f-4e67-956e-cd3b7bf88105'
sample_meta = read_csv_uuid(sample_meta_uuid)
in_uuids['sample_meta'] = sample_meta_uuid

downloading fileID: d82c5c42-ae5f-4e67-956e-cd3b7bf88105
Files have been successfully downloaded!


We only need to keep some of the metadata columns that pertain to cohort, subject, and sample. We'll also keep the originating File GUID to help us keep track of provenance. Let's select just these columns:

In [8]:
keep_meta = [
    'cohort.cohortGuid',
    'subject.subjectGuid', 'subject.biologicalSex', 
    'subject.race', 'subject.ethnicity', 'subject.birthYear',
    'sample.sampleKitGuid', 'sample.visitName', 'sample.drawDate',
    'file.id'
]

In [9]:
sample_meta = sample_meta[keep_meta]

In [10]:
cmv_meta_uuid = '9469f67c-b09a-454d-9fb9-f50ff3494d69'
cmv_meta = read_csv_uuid(cmv_meta_uuid)
in_uuids['cmv_meta'] = cmv_meta_uuid

downloading fileID: 9469f67c-b09a-454d-9fb9-f50ff3494d69
Files have been successfully downloaded!


In [11]:
bmi_meta_uuid = 'e507258c-d175-4d8e-a455-5229870dc991'
bmi_meta = read_csv_uuid(bmi_meta_uuid)
in_uuids['bmi_meta'] = bmi_meta_uuid

downloading fileID: e507258c-d175-4d8e-a455-5229870dc991
Files have been successfully downloaded!


## Combine sample-level metadata

In [12]:
cmv_meta = cmv_meta[['subject.subjectGuid', 'subject.cmv']].drop_duplicates()

In [13]:
combined_sample_meta = sample_meta.merge(cmv_meta, on = 'subject.subjectGuid', how = 'left')

In [14]:
bmi_meta = bmi_meta[['sample.sampleKitGuid', 'subject.bmi']]

In [15]:
combined_sample_meta = combined_sample_meta.merge(bmi_meta, on = 'sample.sampleKitGuid', how = 'left')

In [16]:
combined_sample_meta.shape

(868, 12)

In [17]:
combined_sample_meta.head()

Unnamed: 0,cohort.cohortGuid,subject.subjectGuid,subject.biologicalSex,subject.race,subject.ethnicity,subject.birthYear,sample.sampleKitGuid,sample.visitName,sample.drawDate,file.id,subject.cmv,subject.bmi
0,BR1,BR1001,Female,Caucasian,Non-Hispanic origin,1987,KT00001,Flu Year 1 Day 0,2019-10-01T00:00:00Z,fec489f9-9a74-4635-aa91-d2bf09d1faec,Negative,22.5924
1,BR1,BR1002,Male,Caucasian,Non-Hispanic origin,1991,KT00002,Flu Year 1 Day 0,2019-10-01T00:00:00Z,7c0c7979-eebd-4aba-b5b2-6e76b4643623,Negative,22.332902
2,BR1,BR1003,Female,Caucasian,Non-Hispanic origin,1989,KT00003,Flu Year 1 Day 0,2019-10-01T00:00:00Z,40efd03a-cb2f-4677-af42-a056cbfe5a17,Negative,20.956658
3,BR1,BR1004,Male,Caucasian,Non-Hispanic origin,1989,KT00004,Flu Year 1 Day 0,2019-10-01T00:00:00Z,68fbcd34-1d63-461d-8195-df5b8dc61b31,Negative,21.50234
4,BR1,BR1005,Female,Caucasian,Non-Hispanic origin,1992,KT00006,Flu Year 1 Day 0,2019-10-01T00:00:00Z,ea8d98e9-e99e-4dc6-9e78-9866e0deac68,Negative,20.484429


## Read labels and doublet calls from HISE

In [18]:
l1_uuids = ['65c30bb1-111d-4137-9648-5db3f7d88386',
            'd7212d9f-49bc-4f23-94b2-cce164503dcb',
            '7534fb06-9aa6-4aa6-9813-2c3bbb92f9c1',
            '3935e870-cd4d-4daf-81dc-2e94bff46843']
l1_list = []
for l1_uuid in l1_uuids:
    l1_list.append(read_parquet_uuid(l1_uuid))
l1_labels = pd.concat(l1_list)
in_uuids['l1_labels'] = l1_uuids

downloading fileID: 65c30bb1-111d-4137-9648-5db3f7d88386
Files have been successfully downloaded!
downloading fileID: d7212d9f-49bc-4f23-94b2-cce164503dcb
Files have been successfully downloaded!
downloading fileID: 7534fb06-9aa6-4aa6-9813-2c3bbb92f9c1
Files have been successfully downloaded!
downloading fileID: 3935e870-cd4d-4daf-81dc-2e94bff46843
Files have been successfully downloaded!


In [19]:
l1_labels = l1_labels[['barcodes', 'AIFI_L1', 'AIFI_L1_score']]

In [20]:
l2_uuids = ['d2d1d6bf-ea44-44b2-bfa5-09a5321a8ef0',
            '520156ca-bc37-4c7e-895d-fb2259d6ed1e',
            '3c45f3b2-2628-4e8d-8629-28ab8b1045dd',
            '55c53d17-1a31-4a37-8d67-06d8e4218df7']
l2_list = []
for l2_uuid in l2_uuids:
    l2_list.append(read_parquet_uuid(l2_uuid))
l2_labels = pd.concat(l2_list)
in_uuids['l2_labels'] = l2_uuids

downloading fileID: d2d1d6bf-ea44-44b2-bfa5-09a5321a8ef0
Files have been successfully downloaded!
downloading fileID: 520156ca-bc37-4c7e-895d-fb2259d6ed1e
Files have been successfully downloaded!
downloading fileID: 3c45f3b2-2628-4e8d-8629-28ab8b1045dd
Files have been successfully downloaded!
downloading fileID: 55c53d17-1a31-4a37-8d67-06d8e4218df7
Files have been successfully downloaded!


In [21]:
l2_labels = l2_labels[['barcodes', 'AIFI_L2', 'AIFI_L2_score']]

In [22]:
doublet_uuid = 'b2bea832-a41e-4699-9d1d-a05a3ea1cf88'
doublet_labels = read_parquet_uuid(doublet_uuid)
in_uuids['doublet_labels'] = doublet_uuid

downloading fileID: b2bea832-a41e-4699-9d1d-a05a3ea1cf88
Files have been successfully downloaded!


## Combine label sets to simplify merges

In [23]:
all_labels = l1_labels.merge(l2_labels, on = 'barcodes', how = 'left')
all_labels = all_labels.merge(doublet_labels, on = 'barcodes', how = 'left')

In [24]:
all_labels.head()

Unnamed: 0,barcodes,AIFI_L1,AIFI_L1_score,AIFI_L2,AIFI_L2_score,predicted_doublet,doublet_score
0,a2d395708e7011ecb207eeccf5f10377,B cell,1.0,Transitional B cell,0.998853,False,0.012551
1,a2d3982c8e7011ecb207eeccf5f10377,T cell,0.999465,Memory CD8 T cell,0.703823,False,0.038824
2,a2d399d08e7011ecb207eeccf5f10377,T cell,0.999995,MAIT,0.991137,False,0.028543
3,a2d39f668e7011ecb207eeccf5f10377,T cell,0.99999,gdT,0.999946,False,0.050659
4,a2d3a3948e7011ecb207eeccf5f10377,T cell,0.99999,Treg,0.999947,False,0.034787


## Define subsets

In [25]:
subset_columns = ['cohort.cohortGuid', 'subject.biologicalSex', 'subject.cmv']

In [26]:
subset_counts = combined_sample_meta[subset_columns].value_counts()
subset_counts

cohort.cohortGuid  subject.biologicalSex  subject.cmv
BR2                Female                 Positive       163
BR1                Female                 Negative       162
BR2                Male                   Negative       116
BR1                Male                   Negative       107
BR2                Female                 Negative        91
                   Male                   Positive        80
BR1                Female                 Positive        76
                   Male                   Positive        73
Name: count, dtype: int64

In [27]:
subset_sample_meta = combined_sample_meta.groupby(subset_columns)

## Read and assemble anndata objects for each subset

In [28]:
group_input_dict = {}
group_adata_dict = {}
for group, df in subset_sample_meta:
    group_name = '_'.join(group)
    #df = df.iloc[0:5] # Remove for full run
    group_uuids = df['file.id'].tolist()
    group_input_dict[group_name] = group_uuids

    # Cache files
    cache_dir = '/home/jupyter/cache'
    cache_paths = []
    for uuid in group_uuids:
        cache_path = '{d}/{u}'.format(d = cache_dir, u = uuid)
        if not os.path.isdir(cache_path):
            hise_res = hisepy.reader.cache_files([uuid])
        cache_paths.append(cache_path)

    # Get cached file paths
    cache_files = []
    for cache_path in cache_paths:
        fn = os.listdir(cache_path)[0]
        cache_files.append('{d}/{f}'.format(d = cache_path, f = fn))

    # Read cached files as anndata
    adata_list = []
    for cache_file in cache_files:
        adata = read_h5_anndata(cache_file)
        adata_list.append(adata)
    group_adata = sc.concat(adata_list)
    
    group_adata_dict[group_name] = group_adata

downloading fileID: fec489f9-9a74-4635-aa91-d2bf09d1faec
Files have been successfully downloaded!
downloading fileID: 1faf2b5f-66e4-4787-8a8b-487621fc4c08
Files have been successfully downloaded!
downloading fileID: cda87fcc-a50e-4c0f-b26c-482a6a88ef41
Files have been successfully downloaded!
downloading fileID: cd86a3b7-4955-4d76-9b2c-f076024a04eb
Files have been successfully downloaded!
downloading fileID: 9d5d8b77-6fb9-4f6c-8f0f-a24d87968962
Files have been successfully downloaded!
downloading fileID: 07528ecf-0d7d-4935-9244-b263883e69ca
Files have been successfully downloaded!
downloading fileID: 99c4ef81-a7b5-4828-ac71-fea1aa8b7580
Files have been successfully downloaded!
downloading fileID: dffa2241-a366-44ff-8f0a-894dd7cbbe6c
Files have been successfully downloaded!
downloading fileID: 2022f329-08b6-4b26-b71b-162bc30b19c9
Files have been successfully downloaded!
downloading fileID: 8513bba7-658f-4158-a73c-98845071abcf
Files have been successfully downloaded!
downloading fileID: 

In [29]:
group_adata_dict.keys()

dict_keys(['BR1_Female_Negative', 'BR1_Female_Positive', 'BR1_Male_Negative', 'BR1_Male_Positive', 'BR2_Female_Negative', 'BR2_Female_Positive', 'BR2_Male_Negative', 'BR2_Male_Positive'])

In [30]:
group_adata_dict['BR1_Female_Negative']

AnnData object with n_obs × n_vars = 2970934 × 33538
    obs: 'barcodes', 'batch_id', 'cell_name', 'cell_uuid', 'chip_id', 'hto_barcode', 'hto_category', 'n_genes', 'n_mito_umis', 'n_reads', 'n_umis', 'original_barcodes', 'pbmc_sample_id', 'pool_id', 'seurat_pbmc_type', 'seurat_pbmc_type_score', 'umap_1', 'umap_2', 'well_id'

## Update Observations with additional metadata

Now, we'll add the sample metadata, CMV status, and BMI data to our scRNA-seq data.

First, we'll convert `pbmc_sample_id` to `sample.sampleKitGuid` using a regular expression. PBMC samples are derived from kits in our LIMS system, so both share the same numerical core. The difference is that there can be multiple PBMC samples collected at the same time, so PBMC samples are prefixed with PB to indicate their sample type, and suffixed with -XX to indicate an aliquot number.

In [31]:
def sample_to_kit(sample):
    kit = re.sub('PB([0-9]+)-.+','KT\\1',sample)
    return(kit)

To keep things tidy, we'll also drop the `seurat_pbmc_type`, `seurat_pbmc_type_score`, and UMAP coordinates generated by our processing pipeline. These cell type assignments are from a now-outdated reference dataset, and the UMAP coordinates are generated for viewing individual samples - not helpful for our full dataset.

In [32]:
drop_columns = [
    'seurat_pbmc_type','seurat_pbmc_type_score',
    'umap_1', 'umap_2'
]

Then, we'll add our new sample data with a left join on the `sample.sampleKitGuid` values, and add our labels and doublet calls with a left join on cell `barcodes`.

Next, we'll convert all of our text columns to categorical. This is used to make storage of text data more efficient when we write our output file, as we'll only need to store a single instance of our strings.

We do this for all columns except barcodes, which we need to retain as a string type for use as an index.

Finally, we'll add these back to our anndata object

In [33]:
for group_name, adata in group_adata_dict.items():
    print(group_name)
    obs = adata.obs
    
    # Convert sample.sampleKitGuid
    print('converting ids')
    obs['sample.sampleKitGuid'] = [sample_to_kit(sample) for sample in obs['pbmc_sample_id']]
    
    # Drop old columns
    obs = obs.drop(drop_columns, axis = 1)
    
    print('merging sample metadata')
    # Add new metadata
    obs = obs.merge(
        combined_sample_meta,
        how = 'left',
        on = 'sample.sampleKitGuid'
    )
    
    print('merging labels')
    # Add labels
    obs = obs.merge(all_labels, how = 'left', on = 'barcodes')
    
    print('converting to categorical')
    # Convert to categorical
    cat_obs = obs
    for i in range(cat_obs.shape[1]):
        col_name = cat_obs.dtypes.index.tolist()[i]
        col_type = cat_obs.dtypes[col_name]
        if col_name == 'barcodes':
            cat_obs[col_name] = cat_obs[col_name].astype(str)
        elif is_object_dtype(col_type):
            cat_obs[col_name] = cat_obs[col_name].astype('category')
    cat_obs = cat_obs.set_index('barcodes', drop = False)
    
    # Assign final observations back to anndata
    adata.obs = cat_obs
    
    group_adata_dict[group_name] = adata

BR1_Female_Negative
converting ids
merging sample metadata
merging labels
converting to categorical
BR1_Female_Positive
converting ids
merging sample metadata
merging labels
converting to categorical
BR1_Male_Negative
converting ids
merging sample metadata
merging labels
converting to categorical
BR1_Male_Positive
converting ids
merging sample metadata
merging labels
converting to categorical
BR2_Female_Negative
converting ids
merging sample metadata
merging labels
converting to categorical
BR2_Female_Positive
converting ids
merging sample metadata
merging labels
converting to categorical
BR2_Male_Negative
converting ids
merging sample metadata
merging labels
converting to categorical
BR2_Male_Positive
converting ids
merging sample metadata
merging labels
converting to categorical


In [34]:
group_adata_dict['BR1_Female_Negative']

AnnData object with n_obs × n_vars = 2970934 × 33538
    obs: 'barcodes', 'batch_id', 'cell_name', 'cell_uuid', 'chip_id', 'hto_barcode', 'hto_category', 'n_genes', 'n_mito_umis', 'n_reads', 'n_umis', 'original_barcodes', 'pbmc_sample_id', 'pool_id', 'well_id', 'sample.sampleKitGuid', 'cohort.cohortGuid', 'subject.subjectGuid', 'subject.biologicalSex', 'subject.race', 'subject.ethnicity', 'subject.birthYear', 'sample.visitName', 'sample.drawDate', 'file.id', 'subject.cmv', 'subject.bmi', 'AIFI_L1', 'AIFI_L1_score', 'AIFI_L2', 'AIFI_L2_score', 'predicted_doublet', 'doublet_score'

## Write assembled data to disk

In [35]:
out_dir = 'output'
if not os.path.isdir(out_dir):
    os.makedirs(out_dir)

In [None]:
out_h5ads = {}
for group_name, adata in group_adata_dict.items():
    out_h5ad = 'output/diha_PBMC_{g}_raw_labeled_{d}.h5ad'.format(g = group_name, d = date.today())
    adata.write_h5ad(out_h5ad)
    out_h5ads[group_name] = out_h5ad

In [None]:
out_csvs = {}
out_parquets = {}
for group_name, adata in group_adata_dict.items():
    obs = adata.obs
    
    out_csv = 'output/diha_PBMC_{g}_raw_labeled_meta_{d}.csv'.format(g = group_name, d = date.today())
    obs.to_csv(out_csv)
    out_csvs[group_name] = out_csv

    out_parquet = 'output/diha_PBMC_{g}_raw_labeled_meta_{d}.parquet'.format(g = group_name, d = date.today())
    obs.to_parquet(out_parquet)
    out_parquets[group_name] = out_parquet    

## Upload assembled data to HISE

Finally, we'll use `hisepy.upload.upload_files()` to send a copy of our output to HISE to use for downstream analysis steps.

In [41]:
study_space_uuid = 'de025812-5e73-4b3c-9c3b-6d0eac412f2a'
title = 'DIHA PBMC Labeled Raw scRNA-seq Assembly {d}'.format(d = date.today())

In [42]:
search_id = element_id()
search_id

'cobalt-neptunium-cadmium'

In [44]:
in_files = [sample_meta_uuid, cmv_meta_uuid, bmi_meta_uuid,
            l1_uuid, l2_uuid, doublet_uuid]
in_files = in_files + sample_meta['file.id'].tolist()

In [45]:
len(in_files)

874

In [46]:
in_files[0:10]

['d82c5c42-ae5f-4e67-956e-cd3b7bf88105',
 '9469f67c-b09a-454d-9fb9-f50ff3494d69',
 'e507258c-d175-4d8e-a455-5229870dc991',
 '3935e870-cd4d-4daf-81dc-2e94bff46843',
 '55c53d17-1a31-4a37-8d67-06d8e4218df7',
 'b2bea832-a41e-4699-9d1d-a05a3ea1cf88',
 'fec489f9-9a74-4635-aa91-d2bf09d1faec',
 '7c0c7979-eebd-4aba-b5b2-6e76b4643623',
 '40efd03a-cb2f-4677-af42-a056cbfe5a17',
 '68fbcd34-1d63-461d-8195-df5b8dc61b31']

In [47]:
out_files = list(out_h5ads.values()) + list(out_csvs.values()) + list(out_parquets.values())

In [48]:
out_files

['output/diha_PBMC_BR1_Female_Negative_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR1_Female_Positive_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR1_Male_Negative_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR1_Male_Positive_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR2_Female_Negative_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR2_Female_Positive_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR2_Male_Negative_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR2_Male_Positive_raw_labeled_2024-04-19.h5ad',
 'output/diha_PBMC_BR1_Female_Negative_raw_labeled_meta_2024-04-19.csv',
 'output/diha_PBMC_BR1_Female_Positive_raw_labeled_meta_2024-04-19.csv',
 'output/diha_PBMC_BR1_Male_Negative_raw_labeled_meta_2024-04-19.csv',
 'output/diha_PBMC_BR1_Male_Positive_raw_labeled_meta_2024-04-19.csv',
 'output/diha_PBMC_BR2_Female_Negative_raw_labeled_meta_2024-04-19.csv',
 'output/diha_PBMC_BR2_Female_Positive_raw_labeled_meta_2024-04-19.csv',
 'output/diha_PBMC_BR2

In [50]:
hisepy.upload.upload_files(
    files = out_files,
    study_space_id = study_space_uuid,
    title = title,
    input_file_ids = in_files,
    destination = search_id
)

Cannot determine the current notebook.
1) /home/jupyter/IH-A-Aging-Analysis-Notebooks/scrna-seq_analysis/02-reference_labeling/06d-Python_partition_large_cell_classes_BR2_Male.ipynb
2) /home/jupyter/IH-A-Aging-Analysis-Notebooks/scrna-seq_analysis/02-reference_labeling/06c-Python_partition_large_cell_classes_BR2_Female.ipynb
3) /home/jupyter/IH-A-Aging-Analysis-Notebooks/scrna-seq_analysis/02-reference_labeling/04-Python_assembly_of_subsets.ipynb
Please select (1-3) 


 3


you are trying to upload file_ids... ['output/diha_PBMC_BR1_Female_Negative_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR1_Female_Positive_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR1_Male_Negative_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR1_Male_Positive_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR2_Female_Negative_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR2_Female_Positive_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR2_Male_Negative_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR2_Male_Positive_raw_labeled_2024-04-19.h5ad', 'output/diha_PBMC_BR1_Female_Negative_raw_labeled_meta_2024-04-19.csv', 'output/diha_PBMC_BR1_Female_Positive_raw_labeled_meta_2024-04-19.csv', 'output/diha_PBMC_BR1_Male_Negative_raw_labeled_meta_2024-04-19.csv', 'output/diha_PBMC_BR1_Male_Positive_raw_labeled_meta_2024-04-19.csv', 'output/diha_PBMC_BR2_Female_Negative_raw_labeled_meta_2024-04-19.csv', 'output/diha_PBMC_BR2_Female_Positive_raw_labeled_meta_2024-04-19.csv'

(y/n) y


{'trace_id': '5467d987-3ce8-41b1-8f67-4efdcfc1edaf',
 'files': ['output/diha_PBMC_BR1_Female_Negative_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR1_Female_Positive_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR1_Male_Negative_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR1_Male_Positive_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR2_Female_Negative_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR2_Female_Positive_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR2_Male_Negative_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR2_Male_Positive_raw_labeled_2024-04-19.h5ad',
  'output/diha_PBMC_BR1_Female_Negative_raw_labeled_meta_2024-04-19.csv',
  'output/diha_PBMC_BR1_Female_Positive_raw_labeled_meta_2024-04-19.csv',
  'output/diha_PBMC_BR1_Male_Negative_raw_labeled_meta_2024-04-19.csv',
  'output/diha_PBMC_BR1_Male_Positive_raw_labeled_meta_2024-04-19.csv',
  'output/diha_PBMC_BR2_Female_Negative_raw_labeled_meta_2024-04-19.csv',
  'output/diha_PBMC_

In [51]:
import session_info
session_info.show()