PMalleleFinder

General info

This pipeline will allow you to extract allele sequences and population information for TR genes.

Requirements

python=3.7.1
biopython=1.76
BCBio.gff=0.6.6
pandas=1.0.1
numpy=1.18.1
R=3.6.1
samtools=1.6
tabix=1.6
dplyr=0.8.5
tidyr

Setup

Clone the project structure and run main_pipe.py

Example usage:

$ python3 main_pipe.py --gff=gencode.v32.chr_patch_hapl_scaff.annotation.gff3 --vcf="ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz;ALL.chr14.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz" --popfile=pop-info.tsv --ref=GRCh38_full_analysis_set_plus_decoy_hla.fa

Command line options

--gff=gencodefile required This option required a .gff or .gff3 input file to read gene locations from
--vcf=vcffile.vcf required This option requires a .vcf.gz file. A .vcf.tbi file is also required to be present in the same location to extract variation.
--popfile=populationfile required This option requires a tab seperated population file in the format of sample pop super_pop
--ref=reference.fasta required This option requires the reference genome fasta. It is also required to have a index fasta file in the form of a .fai file present in the same folder.
--filter=filterfile This option allows the specification of genes to filter on. Every new filter should be on a new line in this file. If there is no filter specified the pipeline will assume TCR genes. Chromosones can be filtered by adding #CHR int,int on the first line of the filter file.
--threshold=integer This option allows the specification of the threshold used to discard sequences. If not specified the threshold is set to 4.
--rss=TRUE|FALSE This option if set to TRUE will also generate sequences containing the RSS for V,D,J genes. 60bp sequence will be cut.
--outdir=dir This option will change the output directory to a desired string. ex: output/test. When no output directory is specified the time of task start will be used as directory.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
__pycache__		__pycache__
output		output
scripts		scripts
README.md		README.md
TR_div_env.yml		TR_div_env.yml
main_pipe.py		main_pipe.py
packages.txt		packages.txt
pipe_env_shark.txt		pipe_env_shark.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

output

output

scripts

scripts

README.md

README.md

TR_div_env.yml

TR_div_env.yml

main_pipe.py

main_pipe.py

packages.txt

packages.txt

pipe_env_shark.txt

pipe_env_shark.txt

Repository files navigation

PMalleleFinder

Table of contents

General info

Requirements

Setup

Command line options

About

Releases

Packages

Languages

JulianDekker/PMalleleFinder

Folders and files

Latest commit

History

Repository files navigation

PMalleleFinder

Table of contents

General info

Requirements

Setup

Command line options

About

Resources

Stars

Watchers

Forks

Languages