# Filter cell classes for doublets and low gene abundance

In this notebook, we'll read the AIFI_L2 cell class partition subsets and filter Leiden clusters based on expression of marker genes from off-target cell types - these look like doublets that are missed by scrublet - as well as clusters with abnormally low average gene detection.

We'll also generate plots of these results to help us manually review our filtering to double-check removed cells.

## Load Packages

In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)

from datetime import date
import h5py
import hisepy
import os
import pandas as pd
import re
import scanpy as sc
import tarfile

In [2]:
out_dir = 'output'
if not os.path.isdir(out_dir):
    os.makedirs(out_dir)

In [3]:
review_dir = 'output/review'
if not os.path.isdir(review_dir):
    os.makedirs(review_dir)

## Helper functions

These functions make it easy to read our files from UUID in HISE

In [4]:
def cache_uuid_path(uuid):
    cache_path = '/home/jupyter/cache/{u}'.format(u = uuid)
    if not os.path.isdir(cache_path):
        hise_res = hisepy.reader.cache_files([uuid])
    filename = os.listdir(cache_path)[0]
    cache_file = '{p}/{f}'.format(p = cache_path, f = filename)
    return cache_file

In [5]:
def read_adata_uuid(uuid):
    cache_file = cache_uuid_path(uuid)
    res = sc.read_h5ad(cache_file)
    return res

These functions utilize scanpy's dotplot function to identify clusters to filter.

The dotplot function needs to assemble the fraction of cells expressing a set of genes (or features), as well as the average per cluster, which is useful for applying threshholds to filter.

In [6]:
def marker_frac_df(adata, markers, clusters = 'leiden_2'):
    gene_cl_frac = sc.pl.dotplot(
        adata, 
        groupby = clusters,
        var_names = markers,
        return_fig = True
    ).dot_size_df
    return gene_cl_frac

def marker_mean_df(adata, markers, log = False, clusters = 'leiden_2'):
    gene_cl_mean = sc.pl.dotplot(
        adata, 
        groupby = clusters,
        var_names = markers,
        return_fig = True,
        log = log
    ).dot_color_df
    
    return gene_cl_mean

In [7]:
def select_clusters_above_gene_frac(adata, gene, cutoff, clusters = 'leiden_2'):
    gene_cl_frac = marker_frac_df(adata, gene, clusters)
    select_cl = gene_cl_frac.index[gene_cl_frac[gene] > cutoff].tolist()

    return select_cl

def select_clusters_above_gene_mean(adata, gene, cutoff, clusters = 'leiden_2'):
    gene_cl_mean = marker_mean_df(adata, gene, log = True, clusters = clusters)
    select_cl = gene_cl_mean.index[gene_cl_mean[gene] > cutoff].tolist()

    return select_cl

def select_clusters_by_low_gene_frac(adata, n_cutoff, frac_cutoff, clusters = 'leiden_2'):

    obs = adata.obs
    n_cells = obs.groupby(clusters)['barcodes'].count()

    low_obs = obs['n_genes'] < n_cutoff
    n_low = obs[low_obs].groupby(clusters)['barcodes'].count()

    frac_low = n_low / n_cells
    low_cl = frac_low[frac_low > frac_cutoff]
    low_cl = low_cl.index.tolist()

    return low_cl

In [8]:
def tidy_marker_df(adata, markers, clusters = 'leiden_2'):
    gene_cl_frac = marker_frac_df(adata, markers, clusters)
    gene_cl_frac = gene_cl_frac.reset_index(drop = False)
    gene_cl_frac = pd.melt(gene_cl_frac, id_vars = clusters, var_name = 'gene', value_name = 'gene_frac')
    
    gene_cl_mean = marker_mean_df(adata, markers, clusters)
    gene_cl_mean = gene_cl_mean.reset_index(drop = False)
    gene_cl_mean = pd.melt(gene_cl_mean, id_vars = clusters, var_name = 'gene', value_name = 'gene_mean')

    marker_df = gene_cl_frac.merge(gene_cl_mean, on = [clusters, 'gene'], how = 'left')
    return marker_df

In [9]:
def element_id(n = 3):
    import periodictable
    from random import randrange
    rand_el = []
    for i in range(n):
        el = randrange(0,118)
        rand_el.append(periodictable.elements[el].name)
    rand_str = '-'.join(rand_el)
    return rand_str

## Set expression cutoffs

After generating the AIFI_L2 partitioned data, we interactively examined the expression of cell class-specific marker genes to identify good frequency or mean expression cutoffs for gene expression and gene detection. Here, we'll encode these cutoffs in a dictionary so that we can apply them to our datasets

### Fraction filters

Most filters work well using the frequency of gene detection. The dictionary below defines these cutoffs for each cell type, with this nested structure:  
`dict[cell_type][reason][marker] = cutoff`

Markers for which the fraction of cells in a cluster are greater than the specified cutoff are flagged with the Reason for removal of the cluster.

In [10]:
frac_filter_dict = {
    'ASDC' : {
        'T cell doublet':      {'CD3E': 0.4}
    },
    'CD14 monocyte': {
        'T cell doublet':      {'CD3E':  0.1},
        'T cell doublet':      {'IL7R':  0.2},
        'B cell doublet':      {'MS4A1': 0.2},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'CD16 monocyte': {
        'T cell doublet':      {'CD3E':  0.2},
        'Erythrocyte doublet': {'HBB':   0.2},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'CD56bright NK cell': {
        'T cell doublet':      {'CD3D':  0.4},
        'Myeloid doublet':     {'FCN1':  0.4},
        'Erythrocyte doublet': {'HBB':   0.2},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'CD56dim NK cell': {
        'T cell doublet':      {'IL7R':  0.4},
        'Myeloid doublet':     {'FCN1':  0.4},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'CD8aa': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'cDC1': {
        'T cell doublet':      {'CD3D':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'T cell doublet':      {'IL7R':  0.2},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.2}
    } ,
    'cDC2': {
        'T cell doublet':      {'CD3D':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.2}
    },
    'DN T cell': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'Effector B cell': {
        'T cell doublet':      {'CD3D':  0.2},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'T cell doublet':      {'IL7R':  0.2},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'Erythrocyte': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'gdT': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.2},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.4}
    },
    'ILC': {
        'T cell doublet':      {'CD3D':  0.4},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.2},
        'Myeloid doublet':     {'FCN1':  0.2}
    },
    'Intermediate monocyte': {
        'T cell doublet':      {'CD3D':  0.4},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.2}
    },
    'MAIT': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.2}
    },
    'Memory B cell': {
        'T cell doublet':      {'CD3D':  0.4},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.2}
    },
    'Memory CD4 T cell': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.2}
    },
    'Memory CD8 T cell': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.2}
    },
    'Naive B cell': {
        'T cell doublet':      {'CD3D':  0.4},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.2}        
    },
    'Naive CD4 T cell': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.2},
        'B cell doublet':      {'MS4A1': 0.2},
    },
    'Naive CD8 T cell': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.2},
        'Platelet doublet':    {'PPBP':  0.2}
    },
    'pDC' : {
        'T cell doublet':      {'CD3D':  0.4},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.2},
        'Platelet doublet':    {'PPBP':  0.2}
    },
    'Plasma cell': {
        'T cell doublet':      {'CD3D':  0.4},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.4}     
    },
    'Platelet' : {
        'T cell doublet':      {'CD3D':  0.4},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.4}        
    },
    'Progenitor cell': {
        'T cell doublet':      {'CD3E':  0.2},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.2},
        'Platelet doublet':    {'PPBP':  0.2}       
    },
    'Proliferating NK cell': {
        'T cell doublet':      {'CD3D':  0.4},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.2}       
    },
    'Proliferating T cell': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'B cell doublet':      {'MS4A1': 0.4},
        'Platelet doublet':    {'PPBP':  0.2}       
    },
    'Transitional B cell': {
        'T cell doublet':      {'CD3D':  0.4},
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'Platelet doublet':    {'PPBP':  0.2}               
    },
    'Treg': {
        'Myeloid doublet':     {'FCN1':  0.2},
        'Erythrocyte doublet': {'HBB':   0.4},
        'B cell doublet':      {'MS4A1': 0.4}
    }
}

### Mean cutoffs

In some cases, the mean of gene expression rather than the fraction of cells works as a better cutoff for cleanup. We'll specify these cutoffs here in normalized, log-transformed units:

In [11]:
mean_filter_dict = {
    'CD14 monocyte': {
        'Erythrocyte doublet':    {'HBB':  1}
    },
    'CD56dim NK cell': {
        'Erythrocyte doublet':    {'HBB':  0.7}
    },
    'Memory CD8 T cell': {
        'Erythrocyte doublet':    {'HBB':  0.7}
    },
    'Proliferating T cell': {
        'Erythrocyte doublet':    {'HBB':  0.7}
    }
}

## Assemble markers for review plots

In [12]:
all_filter_markers = []
for cell_type, filters in frac_filter_dict.items():
    for reason, filter in filters.items():
        for gene in filter.keys():
            if not gene in all_filter_markers:
                all_filter_markers.append(gene)
                
for cell_type, filters in mean_filter_dict.items():
    for reason, filter in filters.items():
        for gene in filter.keys():
            if not gene in all_filter_markers:
                all_filter_markers.append(gene)

In [13]:
all_filter_markers.sort()
all_filter_markers

['CD3D', 'CD3E', 'FCN1', 'HBB', 'IL7R', 'MS4A1', 'PPBP']

Additional markers for review

In [14]:
additional_markers = ['NCAM1','ISG15', 'MKI67','CD4','CD8A']
all_filter_markers = all_filter_markers + additional_markers

## Set Gene Detection cutoffs

We'll use these cutoffs to filter clusters with low gene expression. If more than the `min_gene_frac` proportion of cells in a cluster have expression below `min_n_genes`, we'll flag that cluster for removal.

Because Erythrocytes and Platelets normally have low gene detection, we'll add exceptions for these two classes for this filter.

In [15]:
min_n_genes = 750
min_gene_frac = 0.3
min_gene_exceptions = ['Erythrocyte', 'Platelet']

## Identify files for use in HISE

In [16]:
search_ids = [
    'arsenic-iodine-polonium',
    'fermium-oxygen-cadmium',
    'iodine-titanium-livermorium',
    'mercury-protactinium-lead',
    'uranium-sodium-cesium'
]
search_id = '|'.join(search_ids)

Retrieve files stored in our HISE project store

In [17]:
ps_df = hisepy.list_files_in_project_store('cohorts')
ps_df = ps_df[['id', 'name']]

Filter for files from the previous notebook using our search_id

In [18]:
search_df = ps_df[ps_df['name'].str.contains(search_id)]
search_df = search_df.sort_values('name')

In [19]:
search_df.shape

(64, 2)

In [20]:
h5ad_uuids = {}
for i in range(search_df.shape[0]):
    group_name = re.sub('.+qc_','',search_df['name'].tolist()[i])
    group_name = re.sub('_20.+','',group_name)
    h5ad_uuids[group_name] = search_df['id'].tolist()[i]

In [21]:
h5ad_uuids

{'BR1_Female_Negative_CD14_monocyte': '9e0af64b-2173-4327-bee3-11944487d350',
 'BR1_Female_Negative_CD56dim_NK_cell': '64b85675-4329-41a7-901d-9b738939f9e5',
 'BR1_Female_Negative_Memory_CD4_T_cell': '84869ac0-5a4d-4377-9f0b-c81f90bca9b5',
 'BR1_Female_Negative_Memory_CD8_T_cell': '4261fa08-3169-48a9-b2c4-a7cc767a1d0f',
 'BR1_Female_Negative_Naive_CD4_T_cell': '710513ff-75c0-4ea0-bb50-2ab192a5a0c7',
 'BR1_Female_Positive_CD14_monocyte': 'b458f394-8d94-40f7-a4fa-3aebbfd8f31b',
 'BR1_Female_Positive_CD56dim_NK_cell': '6f0a57e9-f0ce-425d-a1d7-aad26e4147af',
 'BR1_Female_Positive_Memory_CD4_T_cell': '5a9e0cd4-feb1-4e4e-84ca-4b16afb2f8d3',
 'BR1_Female_Positive_Memory_CD8_T_cell': 'b3096a16-ff88-42d8-abf4-5f6e9a8367eb',
 'BR1_Female_Positive_Naive_CD4_T_cell': '1c80095a-4184-4975-9875-963b3d2e5e30',
 'BR1_Male_Negative_CD14_monocyte': '9e7dfd0b-608d-45de-b0d7-93a1c58ab030',
 'BR1_Male_Negative_CD56dim_NK_cell': 'c1e99a0f-deb6-46b1-9a3b-5163f8a12e0d',
 'BR1_Male_Negative_Memory_CD4_T_cell': 

## Apply filters to datasets

In [22]:
out_files = []
for group_name, uuid in h5ad_uuids.items():
    print(group_name)
    out_file = 'output/diha_{g}_filtered_{d}.h5ad'.format(g = group_name, d = date.today())
    if os.path.isfile(out_file):
        print('Previously filtered {g}; Skipping.'.format(g = group_name))
        out_files.append(out_file)
    else:        
        adata = read_adata_uuid(uuid)
    
        cell_type = adata.obs['AIFI_L2'].iloc[0]

        # Track filter results
        filter_list = []
        filter_cl_list = []
        
        # Filter for low gene detection
        if not cell_type in min_gene_exceptions:
            filter_cl = select_clusters_by_low_gene_frac(
                adata,
                n_cutoff = min_n_genes, 
                frac_cutoff = min_gene_frac, 
                clusters = 'leiden_2'
            )
            reason = 'Low gene detection'
            check_cl = []
            for cl in filter_cl:
                if not cl in filter_cl_list:
                    check_cl.append(cl)
                    filter_cl_list.append(cl)
            
            filter_df = pd.DataFrame({'leiden_2': check_cl, 'remove_reason': [reason]*len(check_cl)})
            filter_list.append(filter_df)
        
        # Filter by Fractional gene expression
        marker_filters = frac_filter_dict[cell_type]
    
        for reason, filter in marker_filters.items():
            for marker, cutoff in filter.items():
                filter_cl = select_clusters_above_gene_frac(
                    adata,
                    gene = marker,
                    cutoff = cutoff,
                    clusters = 'leiden_2'
                )

                check_cl = []
                for cl in filter_cl:
                    if not cl in filter_cl_list:
                        check_cl.append(cl)
                        filter_cl_list.append(cl)
                
                filter_df = pd.DataFrame({'leiden_2': check_cl, 'remove_reason': [reason]*len(check_cl)})
                filter_list.append(filter_df)
    
        # Filter by Mean gene expression
        if cell_type in mean_filter_dict.keys():
            marker_filters = mean_filter_dict[cell_type]
        
            for reason, filter in marker_filters.items():
                for marker, cutoff in filter.items():
                    filter_cl = select_clusters_above_gene_mean(
                        adata,
                        gene = marker,
                        cutoff = cutoff,
                        clusters = 'leiden_2'
                    )
    
                    check_cl = []
                    for cl in filter_cl:
                        if not cl in filter_cl_list:
                            check_cl.append(cl)
                            filter_cl_list.append(cl)
                    
                    filter_df = pd.DataFrame({'leiden_2': check_cl, 'remove_reason': [reason]*len(check_cl)})
                    filter_list.append(filter_df)

        # Assemble all filtering results
        filter_df = pd.concat(filter_list)

        # Save filtered clusters for review
        rev_file = '{r}/diha_qc_{g}_filter_df_{d}.csv'.format(
            r = review_dir,
            g = group_name,
            d = date.today()
        )
        filter_df.to_csv(rev_file)
    
        # Add filters to cells
        obs = adata.obs.copy()
        obs = obs.merge(filter_df, on = 'leiden_2', how = 'left')
        obs['remove_reason'] = obs['remove_reason'].fillna('Not removed')
        
        # Save observations and UMAP coordinates for review
        review_obs = obs
        umap_mat = adata.obsm['X_umap']
        umap_df = pd.DataFrame(umap_mat, columns = ['umap_1', 'umap_2'])
        review_obs['umap_1'] = umap_df['umap_1']
        review_obs['umap_2'] = umap_df['umap_2']
        
        rev_file = '{r}/diha_qc_{g}_obs_df_{d}.csv'.format(
            r = review_dir,
            g = group_name,
            d = date.today()
        )
        review_obs.to_csv(rev_file)
        
        # Save expression of marker features for review
        marker_df = tidy_marker_df(adata, all_filter_markers, clusters = 'leiden_2')

        rev_file = '{r}/diha_qc_{g}_marker_df_{d}.csv'.format(
            r = review_dir,
            g = group_name,
            d = date.today()
        )
        marker_df.to_csv(rev_file)
    
        # Apply filters to data
        print(adata.shape)
        keep_cells = obs['remove_reason'] == 'Not removed'
        adata = adata[keep_cells]
        print(adata.shape)
        
        # Save filtered data
        adata.write_h5ad(out_file)
    
        out_files.append(out_file)

        # Clean up cache so we don't run out of disk space
        h5ad_path = cache_uuid_path(uuid)
        os.remove(h5ad_path)
        cache_path = '/home/jupyter/cache/{u}'.format(u = uuid)
        os.rmdir(cache_path)

BR1_Female_Negative_CD14_monocyte
(405610, 773)
(369171, 773)
BR1_Female_Negative_CD56dim_NK_cell
downloading fileID: 64b85675-4329-41a7-901d-9b738939f9e5
Files have been successfully downloaded!
(199092, 913)
(180974, 913)
BR1_Female_Negative_Memory_CD4_T_cell
downloading fileID: 84869ac0-5a4d-4377-9f0b-c81f90bca9b5
Files have been successfully downloaded!
(446761, 1175)
(444161, 1175)
BR1_Female_Negative_Memory_CD8_T_cell
downloading fileID: 4261fa08-3169-48a9-b2c4-a7cc767a1d0f
Files have been successfully downloaded!
(163921, 997)
(157251, 997)
BR1_Female_Negative_Naive_CD4_T_cell
downloading fileID: 710513ff-75c0-4ea0-bb50-2ab192a5a0c7
Files have been successfully downloaded!
(659739, 504)
(628955, 504)
BR1_Female_Positive_CD14_monocyte
downloading fileID: b458f394-8d94-40f7-a4fa-3aebbfd8f31b
Files have been successfully downloaded!
(194032, 858)
(178275, 858)
BR1_Female_Positive_CD56dim_NK_cell
downloading fileID: 6f0a57e9-f0ce-425d-a1d7-aad26e4147af
Files have been successfully d

## Bundle Review data

We saved review data, including cell metadata and UMAP coordinates, filtered clusters, and marker gene expression, to enable us to assemble figures to double-check our filtering process.

To help with file transfer, we'll use `tarfile` to bundle our review files.

In [23]:
review_files = os.listdir(review_dir)
review_files = ['{p}/{f}'.format(p = review_dir, f = fn) for fn in review_files]

review_tar = 'output/diha_qc_AIFI_L2_filter_review_{d}.tar.gz'.format(d = date.today())
tar = tarfile.open(review_tar, 'w:gz')
for review_file in review_files:
    tar.add(review_file)
tar.close()

## Upload Cell Type data to HISE

Finally, we'll use `hisepy.upload.upload_files()` to send a copy of our output to HISE to use for downstream analysis steps.

In [24]:
study_space_uuid = 'de025812-5e73-4b3c-9c3b-6d0eac412f2a'
title = 'DIHA PBMC AIFI_L2 Filter Cleanup .h5ad {d}'.format(d = date.today())

In [25]:
search_id = element_id()
search_id

'molybdenum-germanium-copper'

In [26]:
in_files = list(h5ad_uuids.values())
in_files

['9e0af64b-2173-4327-bee3-11944487d350',
 '64b85675-4329-41a7-901d-9b738939f9e5',
 '84869ac0-5a4d-4377-9f0b-c81f90bca9b5',
 '4261fa08-3169-48a9-b2c4-a7cc767a1d0f',
 '710513ff-75c0-4ea0-bb50-2ab192a5a0c7',
 'b458f394-8d94-40f7-a4fa-3aebbfd8f31b',
 '6f0a57e9-f0ce-425d-a1d7-aad26e4147af',
 '5a9e0cd4-feb1-4e4e-84ca-4b16afb2f8d3',
 'b3096a16-ff88-42d8-abf4-5f6e9a8367eb',
 '1c80095a-4184-4975-9875-963b3d2e5e30',
 '9e7dfd0b-608d-45de-b0d7-93a1c58ab030',
 'c1e99a0f-deb6-46b1-9a3b-5163f8a12e0d',
 '0318482c-b61c-437d-8f48-efa7b9a8ab18',
 '9aa54376-330c-4729-8b48-0ea5f5eca5fd',
 'cf42563f-d819-492c-bb95-beff17ac4383',
 'f34e55f1-5ddc-4373-830d-35f29b99d98d',
 '61b54268-c6ed-4256-9b71-cfbef7c03851',
 '55cc7b51-d85c-4036-9c72-9f0d0fe7923b',
 '23178cdc-f7e9-4e31-b4f2-0f3d54a277e0',
 'f2229916-c643-445b-ba98-f7362ac5650f',
 '3a93abec-a596-436b-a23e-057e60145997',
 'f7b5bd47-1bb5-4879-9972-a1d999e0a49a',
 'a801a680-1213-4fed-aebe-0cd284b9a018',
 'bf9dcb65-06b2-4b19-ad64-0a9b5f8acab6',
 'deca727f-4a5e-

In [27]:
out_files = out_files + [review_tar]

In [28]:
out_files

['output/diha_BR1_Female_Negative_CD14_monocyte_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Negative_CD56dim_NK_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Negative_Memory_CD4_T_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Negative_Memory_CD8_T_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Negative_Naive_CD4_T_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Positive_CD14_monocyte_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Positive_CD56dim_NK_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Positive_Memory_CD4_T_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Positive_Memory_CD8_T_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Female_Positive_Naive_CD4_T_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Male_Negative_CD14_monocyte_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Male_Negative_CD56dim_NK_cell_filtered_2024-04-21.h5ad',
 'output/diha_BR1_Male_Negative_Memory_CD4_T_cell_filtered_2024-04-21.h5ad',

In [29]:
hisepy.upload.upload_files(
    files = out_files,
    study_space_id = study_space_uuid,
    title = title,
    input_file_ids = in_files,
    destination = search_id
)

you are trying to upload file_ids... ['output/diha_BR1_Female_Negative_CD14_monocyte_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Negative_CD56dim_NK_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Negative_Memory_CD4_T_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Negative_Memory_CD8_T_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Negative_Naive_CD4_T_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Positive_CD14_monocyte_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Positive_CD56dim_NK_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Positive_Memory_CD4_T_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Positive_Memory_CD8_T_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Female_Positive_Naive_CD4_T_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Male_Negative_CD14_monocyte_filtered_2024-04-21.h5ad', 'output/diha_BR1_Male_Negative_CD56dim_NK_cell_filtered_2024-04-21.h5ad', 'output/diha_BR1_Male_Negative_Memory_CD4_T_cell_f

(y/n) y


{'trace_id': '22d3380c-9448-4152-9c48-6e0ca52acbb4',
 'files': ['output/diha_BR1_Female_Negative_CD14_monocyte_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Negative_CD56dim_NK_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Negative_Memory_CD4_T_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Negative_Memory_CD8_T_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Negative_Naive_CD4_T_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Positive_CD14_monocyte_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Positive_CD56dim_NK_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Positive_Memory_CD4_T_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Positive_Memory_CD8_T_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Female_Positive_Naive_CD4_T_cell_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Male_Negative_CD14_monocyte_filtered_2024-04-21.h5ad',
  'output/diha_BR1_Male_Negative_CD56dim_NK_cell_filtered_2024-04-21.h5ad',
  

In [30]:
import session_info
session_info.show()