# SCORE Introduction

This tutorial outlines the basic functionality of the `SCORE` package. We start from the raw binned interaction files included in the `examples/data` directory of the repository and walk through the full analysis pipeline using `SCORE`.

## Creating a .scool file (`score cooler`)

`SCORE` works best when the input data is in [.scool](https://academic.oup.com/bioinformatics/article/37/14/2053/5948994) format. You can use the `score cooler` command to create a .scool file given the following inputs:

#### 1. Cell reference file

This is a tab-delimited file which must at least contain a `cell` column with the cell name/IDs. All other columns are optional and used for filtering, plotting, etc.

In [1]:
! head data/oocyte_zygote_ref

cell	depth	batch	cluster
anchor_loop.55_oocyte_NSN	130771	1	oocyte
anchor_loop.65_pronucleus-w-o_nucl_extr-female	155306	1	ZygM
anchor_loop.71_pronucleus-w-o_nucl_extr-female	134485	1	ZygM
anchor_loop.230_pronucleus-male	151376	1	ZygP
anchor_loop.58_pronucleus-female	169110	1	ZygM
anchor_loop.168_pronucleus-w-o-inh-male	3647	1	ZygP
anchor_loop.93_pronucleus-male	72157	1	ZygP
anchor_loop.34_oocyte_SN	233878	1	oocyte
anchor_loop.118_oocyte_SN	35386	1	oocyte


#### 2. Cell interaction files

These are tab-delimited files containing rows of `bin1`, `bin2`, `count` values indicating interactions in each cell.

In [2]:
! head data/oocyte_zygote_mm10/1M/anchor_loop.55_oocyte_NSN.1M

3	5	3
3	90	2
3	4	7
3	6	1
3	3	44
4	5	10
4	90	2
4	4	28
4	6	1
5	5	28


#### 3 Bin/Anchor reference file

This is a tab-delimited file containing the indices and sizes of the bins/anchors participating in each interaction. It should have columns `chrom`, `start`, `end`, `ID`.

In [3]:
! head data//mm10.genome_split_1M

chr1	1	1000001	0
chr1	1000001	2000001	1
chr1	2000001	3000001	2
chr1	3000001	4000001	3
chr1	4000001	5000001	4
chr1	5000001	6000001	5
chr1	6000001	7000001	6
chr1	7000001	8000001	7
chr1	8000001	9000001	8
chr1	9000001	10000001	9


Using these files as input to the `score cooler` command will produce a more efficient compressed representation of the dataset. By default the file will be saved at `data/scools/<dataset_name>_<resolution>.scool` but you can also provide the `--out` argument to save it somewhere else.

In [4]:
! score cooler --dset oocyte_zygote \
               --data_dir data/oocyte_zygote_mm10/1M \
               --anchor_file data/mm10.genome_split_1M \
               --reference data/oocyte_zygote_ref \
               --resolution 1M \
               --out oocyte_zygote_1M.scool

[32mDataset:[0m oocyte_zygote
[35mTotal Cells: [0m[1;35m169[0m
[32mZygP[0m: [1;36m35[0m
[32moocyte[0m: [1;36m98[0m
[32mZygM[0m: [1;36m36[0m
100%|█████████████████████████████████████████| 169/169 [00:47<00:00,  3.54it/s]
0
[0m

## Embedding the data

Now you can use the `--scool` argument to provide the dataset to the `score embed` command. Note that you can run most (not all) embedding functionality using the `--data_dir` and `--anchor_file` arguments with raw data but it will not be as efficient in loading each cell.

In [5]:
! score embed --dset oocyte_zygote \
              --scool oocyte_zygote_1M.scool \
              --reference data/oocyte_zygote_ref \
              --embedding_algs scHiCluster InnerProduct cisTopic

[32mDataset:[0m oocyte_zygote
[33mResolution not provided, inferring from bins, make sure this is right[0m[33m...[0m
[35mInferred 1M resolution[0m[35m...[0m
[35mTotal Cells: [0m[1;35m169[0m
Cells before filtering: 169
Cells before filtering by chr: 152
not filtered : 150
reference depth < min_depth : 17
chr_reads < chr_length : 2
Reference file with filtering criteria saved to data/oocyte_zygote_filtered_ref
[32mZygP[0m: [1;36m28[0m
[32moocyte[0m: [1;36m95[0m
[32mZygM[0m: [1;36m27[0m
[1;32mEmbedding data using:[0m
[1;35m[[0m[35m'scHiCluster'[0m[35m, [0m[35m'InnerProduct'[0m[35m, [0m[35m'cisTopic'[0m[1;35m][0m

[1;33moocyte_zygote[0m - [1;32mschicluster:vc_sqrt_norm,convolution,random_walk[0m
Writing scHiCTools data (this only needs to be done once for each resolution or simulated replicate)...
 89%|████████████████████████████████████▋    | 134/150 [00:40<00:06,  2.60it/s]