scNanoHi-C

Introduction

scNanoHi-C is a single-cell long-read concatemer sequencing method which could be used to investigate the higher-order 3D genomic structures in individual cells. This repository contains snakemeke workflows to preprocess scNanoHi-C data and other codes for analyses in our paper.

Dependencies

scNanoHi-C is based on Snakemake framework and conda enrivonments. Install dependencies and environments as follows:

conda install -c bioconda snakemake
# create conda environment
conda env create -f r_env.yaml
conda env create -f hic_env.yaml
conda env create -f py2_env.yaml

Then, dip-c and hickit should be installed under the py2_env (for 3D modelling).

Usage

Demultiplex and remove adapter sequences

scNanoHi-C sequencing data was first demultiplexed to single cells by Nanoplexer using known cell barcodes with default parameters. Adapter sequences were trimmed by Cutadapt and reads shorter than 500bp were also removed.

Input files:
- pass.fastq.gz the output sequencing data from Nanopore platform;
- bc_index a tab-separated file containing PCR barcodes (1-96) in the first column, the start and end of TN5 barcodes (1-24) in the second and third column and an optional library name in the fourth column as follows:
```
$ head bc_index
25      1       8       A
25      17      24      A
26      9       16      B
27      17      24      C
```
  The resulted name of cells in this example will be:
```
A25B1
A25B2
...
C27B24
```
  And the library name will assign with ‘P’, if the library name was not supplied. In default, if the file contains only the first column, all 1-24 TN5 barcodes will be used for each PCR barcode.
Output files:
- raw_data/ containing raw fastq files for each single cell
- trim/ containing trimmed fastq files for each single cell, can be used as the input files for Pore-C-Snakemake
- raw_fastq.stats statistics of raw fastq files
- trim_fastq.stats statistics of trimmed fastq files
usage:

put pass.fastq.gz and bc_index together with data_processing.smk into the working directory and run snakemake via:
```
snakemake -s data_processing.smk -j 10
```

Run Pore-C-Snakemake to generate concatemers

Trimmed fastq files for single cell were then used to run default Pore-C-Snakemake workflow, see details in Pore-C-Snakemake.

Run single-cell preprocessing pipeline of scNanoHi-C

The main script of this step is filter_contacts.smk , which could be run with the shell script run_smk.sh . This step is consist of following sections

remove artifact contacts in single cells

To remove artifact contacts, concatemers from each single cell were firstly decomposed into VPCs. Five types of artifacts were removed sequentially:
1. Adjacent contacts: contacts assigned to adjacent restriction fragments;
2. Close contacts: contacts with two alignments separating from less than 1000 bp in genomic distance;
3. Duplicates: we designated contacts from the same pair of restriction fragments as duplicates. In order to remove PCR duplicates and preserve the high-order structure of concatemers as much as possible, only one contact from the concatemer with the highest cardinality in each duplicates set was selected to remain.
4. Promiscuous contacts: contacts from restriction fragments which involved in more than 10 interactions.
5. Isolated contacts: contacts which no other contact within 1Mb distance.
More information about the evaluation of these filters could be found in the Methods of our paper.
reconstruct single-cell 3D models

The haplotype-tagged virtual pair-wise contacts from scNanoHi-C were used for haplotype imputation and single-cell 3D genome structure model reconstruction by hickit and dip-C packages.The high-depth data (24 cells per run) of scNanoHi-C were recommended for 3D genome construction.
generate quality control metric for downstream analysis
calculated the single-cell A/B compartment values (scA/B)

The scA/B values were calculated through the dip-C package in both 2D and 3D mode, which could be used for dimensionality reduction and cell clustering. The recommended resolution is 1 Mb.
Input files:
- run_smk.sh : the main script.
- results from Pore-C pipeline: including the .parquet files and concatemer summary file of each cell.
- config.yaml : the configure files containing the parameters and location of files used in the pipeline.
- sample : a file containing the name of cells in each row.
main output files:
- *.contacts.clean.parquet : the files containing filtered contacts for each cell.
- *_stats_summary.csv: files containing the quality control information of all cells.
- *.n.cif : results of single-cell 3D models.
- RMSD_summary.txt : summary of RMSD of all models.
- cpg_2d_*_b1m_summary.csv : matrix of 2D scA/B values in all cells.
These outputs could be optional selected by adjusting the rule all in filter_contacts.smk.

usage:

put the run_smk.sh , config.yaml and sample files into the merged_contacts directory of Pore-C output, and modify these files accordingly then run as following:

$ sh run_smk.sh
Run scNanoHi-C snakemake pipeline? [y/dry/n]:
# type 'y' to run the pipeline directly
# type 'dry' to display what would be done
# type 'n' to cancel the program

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

preprocess

preprocess

LICENSE

LICENSE

README.md

README.md

hic_env.yaml

hic_env.yaml

py2_env.yaml

py2_env.yaml

r_env.yaml

r_env.yaml

Repository files navigation

scNanoHi-C

Introduction

Dependencies

Usage

Demultiplex and remove adapter sequences

Run Pore-C-Snakemake to generate concatemers

Run single-cell preprocessing pipeline of scNanoHi-C

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
analysis		analysis
preprocess		preprocess
LICENSE		LICENSE
README.md		README.md
hic_env.yaml		hic_env.yaml
py2_env.yaml		py2_env.yaml
r_env.yaml		r_env.yaml

License

LuJiansen/scNanoHi-C

Folders and files

Latest commit

History

Repository files navigation

scNanoHi-C

Introduction

Dependencies

Usage

Demultiplex and remove adapter sequences

Run Pore-C-Snakemake to generate concatemers

Run single-cell preprocessing pipeline of scNanoHi-C

About

Resources

License

Stars

Watchers

Forks

Languages