Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nmasked human genome #55

Closed
fanghe0720 opened this issue Oct 11, 2021 · 1 comment
Closed

Nmasked human genome #55

fanghe0720 opened this issue Oct 11, 2021 · 1 comment

Comments

@fanghe0720
Copy link

fanghe0720 commented Oct 11, 2021

Update:
I think the problem is in the format of my SNP files. I used the one from mouse genome and replaced the content. Everything is good now. Sorry for the inconvenience and thank you very much!

Hi,

I'm trying to prepare an N-masked human genome with SNPsplit. I have read the issues #22 in the thread on a similar topic and I decided to follow your suggestion to use --skip_filtering option. I used the strain name 'SPRET_EiJ' for convenience and put my SNP files to a folder named 'SNPs_SPRET_EiJ'. However I still got problems as 0 positions are changed to N per chromosome. Could you help to check where is the problem?

My command is ../SNPsplit-0.3.4/SNPsplit_genome_preparation --nmasking --skip_filtering --vcf_file sample.vcf.gz --reference_genome ../hg38_genome/ --strain SPRET_EiJ --genome_build hg38_Nmasked

My vcf file is downloaded from https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-common_all.vcf.gz

My reference genome is downloaded from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/GRCh38.primary_assembly.genome.fa.gz. All 'chr*' are replaced by '*'.

My SNP files are in the below format.
rs544419019 1 11012 C/G
rs561109771 1 11063 T/G
rs540538026 1 13110 G/A
rs62635286 1 13116 T/G
rs62028691 1 13118 A/G
rs531730856 1 13273 G/C
rs548333521 1 13284 G/A
rs571093408 1 13380 C/G
rs568927457 1 13453 T/C
rs546169444 1 14464 A/T

My log file is

Reading/filtering VCF file: No (skipped by user)
Reference genome: ../hg38_genome/
N-masking: Yes
Full SNP genome: No
SNP strain: SPRET_EiJ

Using the following chromosomes (HARCODED IN!!!):
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22
X Y MT

Skipped reading the VCF file and filtering SNPs again (specified by user)

Now reading in and storing sequence information of the genome specified in: ../hg38_genome/
chr 1 (248956422 bp)
chr 2 (242193529 bp)
chr 3 (198295559 bp)
chr 4 (190214555 bp)
chr 5 (181538259 bp)
chr 6 (170805979 bp)
chr 7 (159345973 bp)
chr 8 (145138636 bp)
chr 9 (138394717 bp)
chr 10 (133797422 bp)
chr 11 (135086622 bp)
chr 12 (133275309 bp)
chr 13 (114364328 bp)
chr 14 (107043718 bp)
chr 15 (101991189 bp)
chr 16 (90338345 bp)
chr 17 (83257441 bp)
chr 18 (80373285 bp)
chr 19 (58617616 bp)
chr 20 (64444167 bp)
chr X (156040895 bp)
chr Y (57227415 bp)
chr M (16569 bp)
Processing chromosome 1 (for strain SPRET_EiJ)
Reading SNPs from file /net/noble/vol8/hefang2/hg38.Nmasked.common/SNPs_SPRET_EiJ/chr1.txt
Clearing SNP array...
Writing modified chromosome (N-masking)
Writing N-masked output to: /net/noble/vol8/hefang2/hg38.Nmasked.common/SPRET_EiJ_N-masked/chr1.N-masked.fa
0 SNPs total for chromosome 1
0 positions on chromosome 1 were changed to 'N'

Processing chromosome 2 (for strain SPRET_EiJ)
Reading SNPs from file /net/noble/vol8/hefang2/hg38.Nmasked.common/SNPs_SPRET_EiJ/chr2.txt
Clearing SNP array...
Writing modified chromosome (N-masking)
Writing N-masked output to: /net/noble/vol8/hefang2/hg38.Nmasked.common/SPRET_EiJ_N-masked/chr2.N-masked.fa
0 SNPs total for chromosome 2
0 positions on chromosome 2 were changed to 'N'

@FelixKrueger
Copy link
Owner

Excellent, I am glad it worked now - especially since it didn't require anything from my side :P Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants