Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Simulation: reference length limitation? #30

Closed
ampodu opened this issue Jul 9, 2018 · 4 comments
Closed

Read Simulation: reference length limitation? #30

ampodu opened this issue Jul 9, 2018 · 4 comments

Comments

@ampodu
Copy link

ampodu commented Jul 9, 2018

Hi,
I was hoping to use this tool to simulate some reads for a set of amplicons. However, it gives me an error when I try it:

src/simulator.py linear -r ~/R/projects/umi.sim/nanosim.input.fasta -c training.juplasmid -o
sim.pcrprod
Traceback (most recent call last):
  File "src/simulator.py", line 716, in <module>
    main()
  File "src/simulator.py", line 710, in main
    simulation(ref, out, dna_type, perfect, kmer_bias, max_readlength, min_readlength)
  File "src/simulator.py", line 284, in simulation
    read_mutated = mutate_read(new_read, new_read_name, out_error, error_dict, kmer_bias, False)
  File "src/simulator.py", line 577, in mutate_read
    tmp_bases.remove(read[key + i])
IndexError: string index out of range

the reference file contains 10 000 entries with a size of 1,3 kb. So I thought maybe it can not deal with so many entries. That is why I tried the simulation with a plasmid of 7.4 kb, but that outputs the same error.

When I try it with a 49kb lambda genome it works, that is why I assume there is a limitation in the reference size?

Is it possible to adjust NanoSim for shorter references or am I missing something?

@cheny19
Copy link
Collaborator

cheny19 commented Jul 9, 2018

NanoSim is able to deal with multiple entries in a fasta file and it should work fine for small reference genomes, just take longer to generate a proper read length. Could you send me your reference plasmid fasta file so I can test on it? Thanks!

@ampodu
Copy link
Author

ampodu commented Jul 10, 2018

I send the file to your bcgsc.ca mail address.

I did some more testing and it seems that every reference over ~40kb works but with ~20kb or less it always crashes for me. However, if I use --perfect, it works just fine. The problem probably lies within mutate_read.

@ampodu
Copy link
Author

ampodu commented Jul 10, 2018

I found a solution: specifying the "max_len" as the length of the reference contigs.
Seems like it was trying to create reads longer than the reference?

@cheny19
Copy link
Collaborator

cheny19 commented Jul 10, 2018

Oh, I see. NanoSim will try to mimic the length distribution as the training profile. So if your reference genome is smaller than the empirical length distribution, it may not able to find the right length for each read. Specifying the max_len is the solution and probably you don't mind the simulated read length doesn't mach the profile given the reference size.

@ampodu ampodu closed this as completed Jul 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants