## 1.7. L2 normalize and scale dataset

This notebook can be used to create Anndata and Seurat normalized and scaled count matrices from the complete time series or the control subset. For some downstream analyses (like calculating the pseudotemporal ordering in R script 3.1) we will need a normalized and scaled version of the data. In case you want to run those codes you will need to run this Notebook first.

In [1]:
### loading python libraries

# standard libraries
import numpy as np
import pandas as pd

# single cell libraries
import anndata as ad
import scanpy as sc
sc.settings.verbosity = 0 

# other 
import warnings
warnings.filterwarnings('ignore') #ignore filter warnings
import sklearn as sk #used for L2 normalization
import sklearn.preprocessing #used for L2 normalization
import os #to get absolute path of file

In [2]:
### load R into Notebook 
%load_ext rpy2.ipython 

### Load data

Here, we load the filtered complete dataset and control-only HSPC subset (as filtered in notebook 1.1 and 1.3 respectively).

In [3]:
### load Anndata object with the control subset 
adata_ctrl_path = "../data/count_matrices/control_filtered.h5ad"
adata_ctrl = sc.read_h5ad(adata_ctrl_path)

In [4]:
### load Anndata object with the HSPC dataset (all four subsets incl.)
adata_path = "../data/count_matrices/all_filtered.h5ad"
adata = sc.read_h5ad(adata_path)

In [5]:
### L2 normalize both dataset
adata_ctrl.X = sk.preprocessing.normalize(adata_ctrl.X, norm='l2')
adata.X = sk.preprocessing.normalize(adata.X, norm='l2')

### Save as Anndata objects

Save the L2 normalized version of the control subset and complete HSPC dataset as Anndata objects.

In [6]:
### save Anndata object
name_file = "control_filtered_L2-normalized.h5ad"
adata_ctrl_path  = "../data/count_matrices/" + name_file
adata_ctrl.write_h5ad(adata_ctrl_path)

In [7]:
### save Anndata object
name_file = "all_filtered_L2-normalized.h5ad"
adata_path  = "../data/count_matrices/" + name_file
adata.write_h5ad(adata_path)

### Save as Seurat objects

Convert the L2 normalized version of the control subset and complete HSPC dataset from Anndata to Seurat objects.

In [8]:
### get absolute path of anndata object
adata_ctrl_path = os.path.abspath(adata_ctrl_path)

In [9]:
%%R -i adata_ctrl_path 

### load packages
library(Seurat) 
library(SeuratDisk)

### load dataset
dataset_dir <- adata_ctrl_path

### convert h5ad to h5seurat 
Convert(dataset_dir, paste0(tools::file_path_sans_ext(dataset_dir), ".h5seurat"), overwrite=TRUE)

R[write to console]: Attaching SeuratObject

R[write to console]: Registered S3 method overwritten by 'SeuratDisk':
  method            from  
  as.sparse.H5Group Seurat

R[write to console]: Warnung:
R[write to console]:  Unknown file type: h5ad

R[write to console]: Warnung:
R[write to console]:  'assay' not set, setting to 'RNA'

R[write to console]: Creating h5Seurat file for version 3.1.5.9900

R[write to console]: Adding X as data

R[write to console]: Adding X as counts

R[write to console]: Adding meta.features from var

R[write to console]: Adding X_umap as cell embeddings for umap

R[write to console]: Adding clusters_colors to miscellaneous data



In [10]:
### get absolute path of anndata object
adata_path = os.path.abspath(adata_path)

In [11]:
%%R -i adata_path 

### load packages
library(Seurat) 
library(SeuratDisk)

### load dataset
dataset_dir <- adata_path

### convert h5ad to h5seurat 
Convert(dataset_dir, paste0(tools::file_path_sans_ext(dataset_dir), ".h5seurat"), overwrite=TRUE)

R[write to console]: Warnung:
R[write to console]:  Unknown file type: h5ad

R[write to console]: Warnung:
R[write to console]:  'assay' not set, setting to 'RNA'

R[write to console]: Creating h5Seurat file for version 3.1.5.9900

R[write to console]: Adding X as data

R[write to console]: Adding X as counts

R[write to console]: Adding meta.features from var

R[write to console]: Adding X_umap as cell embeddings for umap

R[write to console]: Adding clusters_colors to miscellaneous data



### Scale data

In [12]:
### scale data
sc.pp.scale(adata_ctrl)
sc.pp.scale(adata)

### Save as Anndata objects

Save the L2 normalized and scaled version of the control subset and complete HSPC dataset as Anndata objects.

In [13]:
### save Anndata object
name_file = "control_filtered_L2-normalized_scaled.h5ad"
adata_ctrl_path  = "../data/count_matrices/" + name_file
adata_ctrl.write_h5ad(adata_ctrl_path)

In [14]:
### save Anndata object
name_file = "all_filtered_L2-normalized_scaled.h5ad"
adata_path  = "../data/count_matrices/" + name_file
adata.write_h5ad(adata_path)

### Save as Seurat objects

Convert the L2 normalized and scaled version of the control subset and complete HSPC dataset from Anndata to Seurat objects.

In [15]:
### get absolute path of anndata object
adata_ctrl_path = os.path.abspath(adata_ctrl_path)

In [16]:
%%R -i adata_ctrl_path 

### load packages
library(Seurat) 
library(SeuratDisk)

### load dataset
dataset_dir <- adata_ctrl_path

### convert h5ad to h5seurat 
Convert(dataset_dir, paste0(tools::file_path_sans_ext(dataset_dir), ".h5seurat"), overwrite=TRUE)

R[write to console]: Warnung:
R[write to console]:  Unknown file type: h5ad

R[write to console]: Warnung:
R[write to console]:  'assay' not set, setting to 'RNA'

R[write to console]: Creating h5Seurat file for version 3.1.5.9900

R[write to console]: Adding X as scale.data

R[write to console]: Adding X as data

R[write to console]: Adding X as counts

R[write to console]: Adding meta.features from var

R[write to console]: Adding X_umap as cell embeddings for umap

R[write to console]: Adding clusters_colors to miscellaneous data



In [17]:
### get absolute path of anndata object
adata_path = os.path.abspath(adata_path)

In [18]:
%%R -i adata_path 

### load packages
library(Seurat) 
library(SeuratDisk)

### load dataset
dataset_dir <- adata_path

### convert h5ad to h5seurat 
Convert(dataset_dir, paste0(tools::file_path_sans_ext(dataset_dir), ".h5seurat"), overwrite=TRUE)

R[write to console]: Warnung:
R[write to console]:  Unknown file type: h5ad

R[write to console]: Warnung:
R[write to console]:  'assay' not set, setting to 'RNA'

R[write to console]: Creating h5Seurat file for version 3.1.5.9900

R[write to console]: Adding X as scale.data

R[write to console]: Adding X as data

R[write to console]: Adding X as counts

R[write to console]: Adding meta.features from var

R[write to console]: Adding X_umap as cell embeddings for umap

R[write to console]: Adding clusters_colors to miscellaneous data

