# SPLiT-seq processing pipeline using splitcode and kallisto-busThis notebook provides an end-to-end example of processing SPLiT-seq data using [splitcode](https://splitcode.readthedocs.io/en/latest/tutorials_splitseq.html) for barcode extraction and [kallisto | bustools](https://www.kallistobus.tools/) for alignment and quantification.

## SetupInstall `splitcode` and `kb-python` (which contains `kallisto` and `bustools`).

In [None]:
!pip install -q kb-python splitcode

## Download example dataReplace the link below with your own SPLiT-seq FASTQ files. The example data from [Pachter lab](https://github.com/pachterlab/LSRRSRLFKOTWMWMP_2024) was uploaded to Zenodo.

In [None]:
!wget https://zenodo.org/records/14146317/files/lr-kallisto_example.tar.gz!tar -xvf lr-kallisto_example.tar.gz

The archive contains example FASTQ files and configuration files for `splitcode`.

In [None]:
!ls lr-kallisto_example

## Build transcriptome index`kb ref` is used to build the transcriptome index for kallisto. Here we fetch a pre-built mouse index and gene mapping.

In [None]:
!kb ref -i mouse_k63.idx -g mouse.t2g -f1 mouse.fa --download=mouse

## Extract barcodes with splitcodeUse the provided configuration file describing the SPLiT-seq layout. This step produces read files with corrected barcodes and UMIs.

In [None]:
!splitcode -c lr-kallisto_example/config-correct.txt --nFastqs 2 --select 0 --gzip -o cDNA.fastq.gz _cDNA.fastq _barcode.fastq -t 2

In [None]:
!splitcode -c lr-kallisto_example/config-correct.txt --nFastqs 2 --select 0 --gzip -o umi.fastq.gz _umi.fastq _barcode.fastq -t 2

In [None]:
!splitcode -c lr-kallisto_example/config.mergeRT -o barcode.fastq barcode.fastq.gz -t 2

## Pseudoalignment and countingWe use `kb count` with the `--long` flag because this dataset contains long reads. Replace `-k 63` with the k-mer length of your index.

In [None]:
!kb count -k 63 --long -i mouse_k63.idx -g mouse.t2g -o output -x '2,0,24:1,0,10:0,0,0' cDNA.fastq.gz umi.fastq.gz barcode.fastq

After running, the resulting matrix and metadata are located in the `output` folder.