Panaln

version 1.0（20230920）

What is it?

Indexing Pan-genome with Applications in Read Mapping and Alignment.

How to use it?

Panaln consists of three components, data preprocessing, pan-index building, and read mapping. After preprocessing the data format, you should first build the pan-index with the VCF file (e.g., snp144common.txt from dbSNP dataset) and the reference genome (e.g., GRCh38.fasta), then perform the mapping processing. Since the data is quite large, we put the download link below or you can send me an email to request it.

Available data:

reference genome: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.4
vcf file from dbSNP: https://ftp.ncbi.nih.gov/snp/
sequencing read: https://www.ebi.ac.uk/ena/browser/view/ERR194146

Step I. Install

Download (or clone) the source code form https://github.com/Hongweihuo-Lab/Panaln
Compile the source code. (Note that you need to compile FM first)

Step II. Data preprocessing

Convert vcf format to custom snp format with five columns by running: "./vcf2snp snp144common.txt snp144common.snp". (If your reference genome is an entire sequence, you can split it into independent chromosomes by running: "genom2chr" .)
Embed snp information into the reference genome by running: "./snp_embed". (The default output path is "/home/lab/gll/formatsnp", you can modify it according to your preference.)
Output the snp and indel information by running: "./snp_indel". (The default output path is "/home/lab/gll/formatsnp/panVcf", you can modify it according to your preference.)
Generate the sequence of the linear serialization model by running: "./comb -w 124 genome.fa snp144Comm.fasta snp144Comm_indel.fasta snp144Comm.data"

Step III. Build pan-index

Run the shell command: "./panaln index snp144Comm_indel.fasta".
Get the index file as: "sal.idx.256, newD.idx.256, lroot.idx.256, rroot.idx.256"

Step IV. Mapping processing

run the shell command: "./panaln align -n 3 <processed_fasta_Name> <fastq_Name> <aln_Name>, where processed_fasta_Name is the processed genome name, fastq_Name is the sequencing reads file name, aln_Name is the intermediate alignment file name.
run the shell command: "./panaln aln2sam <processed_fasta_Name> <aln_Name> <sam_Name>, where processed_fasta_Name is the processed genome name, aln_Name is the intermediate alignment file name, sam_Name is the mapping result file.

Feedback

Please report bugs to Email: guolilu@stu.xidian.edu.cn if any questions or suggestions. Your feedback and test results are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
panaln-aligner		panaln-aligner
panaln-reference		panaln-reference
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panaln-aligner

panaln-aligner

panaln-reference

panaln-reference

README.md

README.md

Repository files navigation

Panaln

What is it?

How to use it?

Available data:

Step I. Install

Step II. Data preprocessing

Step III. Build pan-index

Step IV. Mapping processing

Feedback

About

Releases

Packages

Languages

Lilu-guo/Panaln

Folders and files

Latest commit

History

Repository files navigation

Panaln

What is it?

How to use it?

Available data:

Step I. Install

Step II. Data preprocessing

Step III. Build pan-index

Step IV. Mapping processing

Feedback

About

Resources

Stars

Watchers

Forks

Languages