How to prepare a vcf before importation SQLite db see https://github.com/tkoomar/VCFdbR for further explanation
MYVCF=Perry_truncated.vcf.gz OUTVCF=Perry_truncated_ready.vcf.gz
MYVCF=SeqOne_truncated.vcf.gz OUTVCF=SeqOne_truncated_ready.vcf.gz
This VCF needs to have all multialleleic sites split. All fields which once had one value per alternate allele (Number=A) also need to be converted to a single value (Number=1). You can do that with bcftools:
bcftools norm -c ws -f /home/ptngs/genome_references/GRCh37.fa -m - ${MYVCF} | sed -e 's/Number=A/Number=1/g' | sed -e 's/Number=./Number=1/g' | bgzip -c > TEMP.vcf.gz
bcftools view TEMP.vcf.gz | sed -e 's/Number=1,Type=Float/Number=1,Type=String/g' | sed -e 's/Number=1,Type=Integer/Number=1,Type=String/g' | bgzip -c > TEMP2.vcf.gz
zcat TEMP2.vcf.gz| sed '/^#/! s/;;/;/g' | bgzip -c >
tabix ${OUTVCF}
file=/home/ptngs/ClinicalResultsBrowser_vcfs/SEQONE_DATABASE_EXPORTS/27_02_2023/merge_grenoble_onco.vcf.gz
tmpdir=/home/ptngs/ClinicalResultsBrowser_vcfs/SEQONE_DATABASE_EXPORTS/27_02_2023/splited_onco_vcf_cosm
mkdir -p
https://cancer.sanger.ac.uk/cosmic/file_download/GRCh37/cosmic/v${VERSION}/VCF/CosmicCodingMuts.vcf.gz >
#tabix
if you vcf contains too many samples it could require to much RAM for your computer. So split it by sample like this
THREADS=2
outdir=/home/ptngs/ClinicalResultsBrowser_vcfs/SEQONE_DATABASE_EXPORTS/27_02_2023/splited_onco_vcf
outdir=/home/ptngs/ClinicalResultsBrowser_vcfs/SEQONE_DATABASE_EXPORTS/27_02_2023/splited_onco_twofiles/
file=/home/ptngs/ClinicalResultsBrowser_vcfs/SEQONE_DATABASE_EXPORTS/27_02_2023/merge_grenoble_onco.vcf.gz
for sample in bcftools query -l $file
; do
echo "bcftools view --threads ${THREADS} -Oz -s
cd /home/ptngs/Documents/haplotypecaller
bcftools annotate -x ID 23A1424_S3/23A1424_S3.haplotypecaller.filtered_VEP.ann.vcf.gz -Oz -o 23A1424_S3/23A1424_S3.haplotypecaller.filtered_VEP.ann.NoID.vcf.gz
bcftools norm -c ws -f /home/ptngs/genome_references/GRCh37.fa -m - 23A1424_S3/23A1424_S3.haplotypecaller.filtered_VEP.ann.NoID.vcf.gz | sed -e 's/Number=A/Number=1/g' | sed -e 's/Number=./Number=1/g' | bgzip -c > TEMP.vcf.gz bcftools view TEMP.vcf.gz | sed -e 's/Number=1,Type=Float/Number=1,Type=String/g' | sed -e 's/Number=1,Type=Integer/Number=1,Type=String/g' | bgzip -c > TEMP2.vcf.gz
zcat TEMP2.vcf.gz | sed '/^#/! s/;;/;/g' | bgzip -c > /home/ptngs/Documents/haplotypecaller/23A1424_S3/23A1424_S3.haplotypecaller.filtered_VEP.ann.NoID_splitted.vcf.gz