# 00 Precourse

- Paulo Czarnewski  
- Spatial Omics Data Analysis 2022

## Introduction

In this tutorial, we will download all datasets necessary for the course and test whether we can load and use the main functions.

***



## Loading required libraries

First, we can start by loading some libraries.

In [1]:
import warnings
import os
import scanpy as sc
import anndata as an
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import scanorama

from urllib import request

We can also setup some warning and plotting default configurations.

In [8]:
#sc.logging.print_versions() # gives errror!!
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
warnings.simplefilter(action='ignore', category=FutureWarning)

In [6]:
# Create directory for downloading data
for i in ['results','data']:
    for j in ['single_cell','spatial_transcriptomics','in_situ_sequencing']:
        os.makedirs( '../'+i+'/'+j , exist_ok= True )

## Single cell

In [7]:
# Define base URL path to the data
base_path = 'https://export.uppmax.uu.se/snic2022-23-113/courses/spatial_omics_2022/single_cell/'

# Download each dataset
for i in ['10X151w1','10X180w1','10x289w3','10x303w3']:
    tmp = 'Sountoulidis2022_'+i+'_counts.h5'
    request.urlretrieve( base_path+tmp , '../data/single_cell/'+tmp )

Check whether the files were downloaded correctly:

In [9]:
%%bash
tree ../data/single_cell

../data/single_cell
├── Sountoulidis2022_10X151w1_counts.h5
├── Sountoulidis2022_10X180w1_counts.h5
├── Sountoulidis2022_10x289w3_counts.h5
├── Sountoulidis2022_10x303w3_counts.h5
└── scRNAseq.h5ad

0 directories, 5 files


We can now read all tables into a list and then merge them into a single AnnData object

In [13]:
# Define paths and get sample_ids
obj_list = list()
file_list = os.listdir('../data/single_cell')
sample_ids = [j.replace('_counts.h5','') for j in [x.replace('Sountoulidis2022_','') for x in file_list]]

# Read each table and append to our list
for i in range(0,len(file_list)):
    print(i)
    tmp = sc.read_10x_h5('../data/single_cell/'+file_list[i])
    tmp.var_names_make_unique()
    tmp.obs['sample_id'] = sample_ids[i]
    obj_list.append(tmp)

0
reading ../data/single_cell/Sountoulidis2022_10X180w1_counts.h5
 (0:00:02)
1
reading ../data/single_cell/Sountoulidis2022_10x303w3_counts.h5
 (0:00:00)
2
reading ../data/single_cell/Sountoulidis2022_10x289w3_counts.h5
 (0:00:01)
3
reading ../data/single_cell/Sountoulidis2022_10X151w1_counts.h5
 (0:00:01)


In [14]:
# Merge all samples into a single AnnData Object 
adata = an.concat( obj_list , merge='same' )
adata

AnnData object with n_obs × n_vars = 25601 × 33538
    obs: 'sample_id'
    var: 'gene_ids', 'feature_types', 'genome'

Finally, we can now save the annData object to a file

In [15]:
os.makedirs('../results/single_cell', exist_ok=True)
adata.write_h5ad('../results/single_cell/scRNAseq.h5ad' )

... storing 'sample_id' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical


With the dataset saved, we can now safelly remove some of these objects from memory

In [16]:
# Remove other objects from memory
del(obj_list,file_list,sample_ids,adata)

## Spatial Transcriptomics

In [17]:
# Define base URL path to the data
base_path = 'https://export.uppmax.uu.se/snic2022-23-113/courses/spatial_omics_2022/spatial_transcriptomics/'    

# For each dataset do:
for j in ['154441','154442']:
    
    # Create directory for the dataset
    os.makedirs( '../data/spatial_transcriptomics/'+j+'/spatial' , exist_ok= True )
    ss='/spatial/'
    
    # Download necessary files in their respective folder
    for i in [j+'/filtered_feature_bc_matrix.h5',
              j+ss+'tissue_lowres_image.png',
              j+ss+'tissue_hires_image.png',
              j+ss+'tissue_positions_list.txt',
              j+ss+'scalefactors_json.json']:
        request.urlretrieve( base_path+i , '../data/spatial_transcriptomics/'+i.replace('.txt','.csv')  )

Check whether the files were downloaded correctly:

In [18]:
%%bash
tree ../data/spatial_transcriptomics

../data/spatial_transcriptomics
├── 154441
│   ├── filtered_feature_bc_matrix.h5
│   └── spatial
│       ├── scalefactors_json.json
│       ├── tissue_hires_image.png
│       ├── tissue_lowres_image.png
│       └── tissue_positions_list.csv
└── 154442
    ├── filtered_feature_bc_matrix.h5
    └── spatial
        ├── scalefactors_json.json
        ├── tissue_hires_image.png
        ├── tissue_lowres_image.png
        └── tissue_positions_list.csv

4 directories, 10 files


We can now read all tables into a list and then merge them into a single AnnData object

In [24]:
# Define paths and get sample_ids
obj_list = list()
file_list = os.listdir('../data/spatial_transcriptomics')
file_list

['154442', '154441']

In [25]:
# Read each table and append to our list
for i in range(0,len(file_list)):
    print(i)
    tmp = sc.read_visium('../data/spatial_transcriptomics/'+file_list[i])
    tmp.var_names_make_unique()
    tmp.obs['sample_id'] = file_list[i]
    obj_list.append(tmp)

0
reading ../data/spatial_transcriptomics/154442/filtered_feature_bc_matrix.h5


Variable names are not unique. To make them unique, call `.var_names_make_unique`.


 (0:00:00)


Variable names are not unique. To make them unique, call `.var_names_make_unique`.


1
reading ../data/spatial_transcriptomics/154441/filtered_feature_bc_matrix.h5


Variable names are not unique. To make them unique, call `.var_names_make_unique`.


 (0:00:00)


Variable names are not unique. To make them unique, call `.var_names_make_unique`.


In [26]:
# Merge all samples into a single AnnData Object 
adata = an.concat( obj_list , merge='same',uns_merge="unique" )
adata

Observation names are not unique. To make them unique, call `.obs_names_make_unique`.


AnnData object with n_obs × n_vars = 716 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'sample_id'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

Finally, we can now save the annData object to a file

In [27]:
os.makedirs('../results/spatial_transcriptomics', exist_ok=True)
adata.write_h5ad('../results/spatial_transcriptomics/visium.h5ad' )

... storing 'sample_id' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical


With the dataset saved, we can now safelly remove some of these objects from memory

In [23]:
# Remove other objects from memory
del(obj_list,file_list,adata)

## _In situ_ sequencing