<a href="https://colab.research.google.com/github/cellatlas/cellatlas/blob/main/docs/PREPROCESS_MAT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
title: Preprocess Matrix
date: 2024-07-07
authors:
  - name: A. Sina Booeshaghi
---

Short description of the steps we are taking (insert image from cell atlas paper)

1. Filter matrix
2. Normalize matrix
3. Assign celltypes or cell categories

In [None]:
!pip install --quiet git+https://github.com/cellatlas/ec.git
!pip install --quiet git+https://github.com/cellatlas/mx.git

In [23]:
# https://www.10xgenomics.com/datasets/human-pbmc-from-a-healthy-donor-1-k-cells-v-2-2-standard-4-0-0
!wget --quiet --show-progress https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_1k/sc5p_v2_hs_PBMC_1k_raw_feature_bc_matrix.tar.gz
!tar -xzf sc5p_v2_hs_PBMC_1k_raw_feature_bc_matrix.tar.gz
!gunzip raw_feature_bc_matrix/*



In [33]:
!cut -f2 raw_feature_bc_matrix/features.tsv > raw_feature_bc_matrix/genes.txt
!cut -f1 -d'-' raw_feature_bc_matrix/barcodes.tsv > raw_feature_bc_matrix/barcodes.txt

In [37]:
from scipy.io import mmread, mmwrite; mmwrite("matrix.mtx", mmread("raw_feature_bc_matrix/matrix.mtx").T.tocsr())

In [43]:
!ln -s raw_feature_bc_matrix/genes.txt .
!ln -s raw_feature_bc_matrix/barcodes.txt .

# Filter matrix

## Command line

In [45]:
!mx filter -c 2 2 -bi barcodes.txt -bo barcodes.filt.txt -o matrix.filt.mtx matrix.mtx

Filtered to 1,034 cells with at least 367 UMIs.


## Python

In [50]:
from mx.mx_filter import mx_filter
from scipy.io import mmread
import pandas as pd

mtx = mmread("matrix.mtx").tocsr()
bcs = pd.read_csv("barcodes.txt", index_col=0, header=None)
fbcs, fmtx = mx_filter(mtx.copy(), bcs.index.values, sum_axis=1, comps=[2,2], select_axis=None)

Filtered to 1,034 cells with at least 367 UMIs.


# Normalize counts

## Comand line

In [52]:
!mx normalize -m log1pPF -o matrix.norm.mtx matrix.filt.mtx

## Python

In [53]:
from scipy.io import mmread, mmwrite
from mx.mx_normalize import mx_normalize

mtx = mmread("matrix.filt.mtx").tocsr()
nmtx = mx_normalize(mtx.copy(), "log1pPF")

# Assign cell types

In [55]:
!wget --quiet --show-progress https://raw.githubusercontent.com/cellatlas/human/main/markers/blood/markers.txt



## Command line

In [77]:
!ec clean -o markers.txt markers.txt

In [86]:
# these genes are not in the index so we remove it
!ec filter -bt <(printf "TM4SF19-TCTEX1D2\nFCGR2C\nCORO7-PAM16") -o clean.markers.txt markers.txt

In [92]:
# verify they are not in the file
!grep "TM4SF19-TCTEX1D2" clean.markers.txt
!grep "FCGR2C" clean.markers.txt
!grep "CORO7-PAM16" clean.markers.txt

In [88]:
!ec index -g groups.txt -t targets.txt -e markers.ec.txt clean.markers.txt

In [89]:
!mx extract -t targets.txt -gi genes.txt -go extract_genes.txt -o extract.mtx matrix.norm.mtx

In [90]:
!mx clean -gi extract_genes.txt -go clean_genes.txt -bi barcodes.filt.txt -bo clean_barcodes.txt -o clean.mtx --bad extract.mtx

Dropping 7 cells
Dropping 70 genes


In [93]:
!mx normalize -m rank -o rank.mtx clean.mtx

In [95]:
!mx assign -g groups.txt -gi clean_genes.txt -bi clean_barcodes.txt -e markers.ec.txt -o assignments.txt rank.mtx

Traceback (most recent call last):
  File "/usr/local/bin/mx", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/mx/main.py", line 77, in main
    COMMAND_TO_FUNCTION[sys.argv[1]](parser, args)
  File "/usr/local/lib/python3.10/dist-packages/mx/mx_assign.py", line 1564, in validate_mx_assign_args
    run_mx_assign(
  File "/usr/local/lib/python3.10/dist-packages/mx/mx_assign.py", line 1699, in run_mx_assign
    df, means = mx_assign(G, barcodes, genes, groups, markers_ec)
  File "/usr/local/lib/python3.10/dist-packages/mx/mx_assign.py", line 1599, in mx_assign
    centroids_init = get_marker_centroids(X_init, markers_ec, "max")
  File "/usr/local/lib/python3.10/dist-packages/mx/mx_assign.py", line 1581, in get_marker_centroids
    submx = X[:, v]
IndexError: index 466 is out of bounds for axis 1 with size 466


## Python