# Normalize all genes for additional visualization

In the previous notebook, we generated updated projection coordinates for our atlas dataset. Here, we retrieve the .h5ad object with those coordinates and generate a version that has all gene expression normalized, instead of only the subset of highly variable genes. This will help with display of arbitrary gene expression in our UMAP plot viewer.

We'll also generate a stripped-down, counts-only version for use in cases when we need to save space.

## Load packages

In [1]:
from datetime import date
import hisepy
import os
import scanpy as sc
import sys

In [2]:
out_dir = 'output'
if not os.path.isdir(out_dir):
    os.makedirs(out_dir)

## Helper functions

In [3]:
def read_adata_uuid(h5ad_uuid):
    h5ad_path = '/home/jupyter/cache/{u}'.format(u = h5ad_uuid)
    if not os.path.isdir(h5ad_path):
        hise_res = hisepy.reader.cache_files([h5ad_uuid])
    h5ad_filename = os.listdir(h5ad_path)[0]
    h5ad_file = '{p}/{f}'.format(p = h5ad_path, f = h5ad_filename)
    adata = sc.read_h5ad(h5ad_file)
    return adata

In [4]:
def element_id(n = 3):
    import periodictable
    from random import randrange
    rand_el = []
    for i in range(n):
        el = randrange(0,118)
        rand_el.append(periodictable.elements[el].name)
    rand_str = '-'.join(rand_el)
    return rand_str

## Retrieve previous results from HISE

In [5]:
h5ad_uuid = '0625f92b-cce2-4f70-bc65-707840496818'

In [6]:
adata = read_adata_uuid(h5ad_uuid)

## Raw counts only

In [7]:
adata = read_adata_uuid(h5ad_uuid)

In [8]:
adata = adata.raw.to_adata()

In [9]:
adata.shape

(1823666, 33538)

## Save raw count .h5ad file

In [11]:
out_raw = 'output/ref_clean_pbmc_raw_{d}.h5ad'.format(d = date.today())
adata.write_h5ad(out_raw)

## Normalize all genes

In [12]:
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)



In [13]:
adata.shape

(1823666, 33538)

## Save normalized .h5ad file

In [14]:
out_norm = 'output/ref_clean_pbmc_all_genes_norm_{d}.h5ad'.format(d = date.today())
adata.write_h5ad(out_norm)

## Deposit updated results in HISE

In [15]:
study_space_uuid = '64097865-486d-43b3-8f94-74994e0a72e0'
title = '10x 3-prime PBMC Ref with UMAP and All Genes {d}'.format(d = date.today())

In [16]:
search_id = element_id()
search_id

'moscovium-plutonium-roentgenium'

In [17]:
in_files = [h5ad_uuid]

In [18]:
in_files

['0625f92b-cce2-4f70-bc65-707840496818']

In [20]:
out_files = [out_raw, out_norm]
out_files

['output/ref_clean_pbmc_raw_2024-04-11.h5ad',
 'output/ref_clean_pbmc_all_genes_norm_2024-04-11.h5ad']

In [21]:
hisepy.upload.upload_files(
    files = out_files,
    study_space_id = study_space_uuid,
    title = title,
    input_file_ids = in_files,
    destination = search_id
)

output/ref_clean_pbmc_raw_2024-04-11.h5ad
output/ref_clean_pbmc_all_genes_norm_2024-04-11.h5ad
you are trying to upload file_ids... ['output/ref_clean_pbmc_raw_2024-04-11.h5ad', 'output/ref_clean_pbmc_all_genes_norm_2024-04-11.h5ad']. Do you truly want to proceed?


(y/n) y


{'trace_id': '4fe77990-4f2b-499e-8647-488ce04488ae',
 'files': ['output/ref_clean_pbmc_raw_2024-04-11.h5ad',
  'output/ref_clean_pbmc_all_genes_norm_2024-04-11.h5ad']}

In [22]:
import session_info
session_info.show()