Using SOAPnuke as an example, execute the following program to preprocess the data.
Data_Preprocessing.sh -name Sample_NameGangSTR and ExpansionHunter were applied to genotype TRs in parallel. After conducting independent quality control on the outputs from each tool, the results were integrated using ensembleTR. Execute the following program to perform TR genotyping.
TR_Genotyping.sh -name Sample_Name -sex Sample_SexBased on the TR allele frequency data from TR-Atlas, execute the following program to perform TR allele frequency fitering. Regarding the acquisition of TR allele frequency data, please reach out to the authors of the TR-Atlas.
python TR_AF_Filtering.py --file1 TR_AlleleFreq.txt --file2 Sample_Name_ensembletr.vcf --output Sample_Name_ensembletr_AF.vcf
bcftools view Sample_Name_ensembletr_AF.vcf -Oz -o Sample_Name_ensembletr_AF.vcf.gz
bcftools index -t Sample_Name_ensembletr_AF.vcf.gzMerge VCF files of samples in the same pedigree into one VCF file, remove lines with GT="./."
bcftools merge -o Pedigree_No_orig.vcf.gz Sample_Name1_ensembletr_AF.vcf.gz Sample_Name2_ensembletr_AF.vcf.gz
bcftools view -i 'COUNT(GT="./.") == 0' Pedigree_No_orig.vcf.gz -o Pedigree_No.vcf.gzFind TRs that have been annotated in TRAD, TRAD_anno.txt was obtained from TRAD database
python TRAD_anno.py Pedigree_No.vcf.gz TRAD_anno.txt Pedigree_No_TRAD.txtExtract key columns from annotation results
python Extract_TRAD_anno.py Pedigree_No Pedigree_No_TRAD.txt Pedigree_No_TRAD_extracted.txtFiltering extracted TRAD annotation. Only retain TRs that have the same rare allele (frequency ≤ 0.005) present in all pedigree members.
python Filtering_extracted_TRAD_anno.py Pedigree_No_TRAD_extracted.txt Pedigree_No_TRAD_extracted_filtered.txt Pedigree_No_TRAD_extracted_filtered.xlsx