Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write temporary bam files #8

Closed
dcopetti opened this issue Feb 27, 2018 · 16 comments
Closed

write temporary bam files #8

dcopetti opened this issue Feb 27, 2018 · 16 comments
Assignees
Labels

Comments

@dcopetti
Copy link

dcopetti commented Feb 27, 2018

Hello,

I was running tigmint-span and the sever crashed (I am pretty sure it is not due to tigmint, unless it required a huge amount of memory all of a sudden).
I have now a bunch of *.bam.tmp.0628.bam files - I guess it died during the sorting of this command:
bwa mem -t$t -pC $(draft).fa $< | samtools view -h -F4 | samtools sort -@$t -tBX -o $@
I am thinking of two alternative options:

  • to split the above command in separate steps and have a file for each (at least until the following step is completed with success);
  • to feed tigmint with a bam file produced outside of its pipeline, and run just the actual tigmint algorithm.
    Would it be possible to implement any of the options, or any other that will help deal with long computation time?
    Thanks,
    Dario
@lcoombe
Copy link
Member

lcoombe commented Feb 27, 2018

Hi Dario,

I agree that it is probably not tigmint that caused the crash, since it was just bwa mem running at that stage. Using additional threads would speed up the bwa mem alignment step.
Splitting the alignment stage into multiple parallel jobs is certainly an option, and one that I have done before when dealing with a very large dataset.

You could:

  • split your chromium read file into partitions
  • run the above bwa mem command for each partition
  • run samtools merge -tBX to merge together the separate BAM files
  • Then, continue running the tigmint pipeline from after the bwa mem alignment step. You just have to ensure that the BX-sorted, merged BAM file is named correctly for tigmint to recognize that this file exists and can be used for subsequent steps: Ex. myassembly.myreads.sortbx.bam, where myassembly.fa is the draft assembly file, and myreads.fq.gz is the chromium reads file. You can use the -n option with the tigmint makefile to do a 'dry run' of the commands to double check that the pipeline starts up again after the alignment step as desired.

Hope that helps!
Lauren

@dcopetti
Copy link
Author

Hi Lauren,

Thanks for the suggestion, I will go for it.
I am new to using 10X data, and I am not sure about the structure of the fq.gz file in the input (I produced it with longranger basic).
If it is a normal fastq, I am thinking just to split it in blocks, or if the barcoding information is creating issues, is there a specific way to do it, like with Longranger?
Thanks,

Dario

@lcoombe
Copy link
Member

lcoombe commented Feb 28, 2018

Hi Dario,

Yes, you can partition the fq.gz (post-Longranger basic) file the same as you would any other interleaved fastq file. The only special thing about the longranger basic output as opposed to another fastq file is the chromium barcode information encoded in the read header.

Ex:

[lcoombe@lcoombe01 outs]$ gunzip -c barcoded.fastq.gz |head -n 10000 |tail -n 8
@E00247:267:HMVT3CCXX:6:1103:27265:17922 BX:Z:AAACACCAGACAATAC-1
TTATGTGGCAAAACCCAGAAAGATCCATCATGAATCCAAGATACTTTCAGCAAAAAGTTATACCAAAATAAATAAAATAAAATTGAAATAATGCTTAGCTGATCCCAAGTCAAGATTTACGTTTGTAT
+
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFKKKKKKKKKKKKKKKKKKK
@E00247:267:HMVT3CCXX:6:1103:27265:17922 BX:Z:AAACACCAGACAATAC-1
GCCACGGTGCCCAGCTCACCTACTGGTTTTAAAGAGGGAACTCTGGGGGCAGTGTGGAGGGAGGGACCTACTGCCATAGAATACAAACGTAAATCTTGACTTGGGATCAGCTAAGCATTATTTCAATTTTATTTTATTTATTTTGGTATAA
+
AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAKKFAFK

So as long as your partitioning approach keeps the R1/R2 pairs together, and keeps the "BX:Z" comment in the read header, you will be fine! Running bwa mem with the -pC option as you have in your above command indicates an interleaved reads file, and will put the barcode information in the BX tag of the resulting SAM file, which tigmint will then use in downstream steps.

Hope that helps -- let me know if you have any other questions!
Lauren

@lcoombe lcoombe self-assigned this Feb 28, 2018
@lcoombe
Copy link
Member

lcoombe commented Mar 6, 2018

@dcopetti - I'll close this for now, but feel free to re-open if you have further questions!

@dcopetti
Copy link
Author

dcopetti commented Mar 28, 2018

Hello,
I have the large bam file (with the sorting/tag/index issues of case #811 above) and I tried to run tigmint (span branch).
After commenting the commands in lines 142-154 in a copy of the script, I get this:

~/bin/tigmint/tigmint-make3 tigmint draft=genome reads=reads
make: Nothing to be done for 'tigmint'.

~/bin/tigmint/tigmint-make3 --dry-run
echo 'Tigmint: Correct misassemblies using linked reads'
echo 'Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...'
echo 'Example: tigmint-make tigmint draft=myassembly reads=myreads'
echo 'For more information see https://bcgsc.github.io/tigmint/'

I am not sure how else I should compose the command to use the bam as an input instead.
Thanks,
Dario

@lcoombe
Copy link
Member

lcoombe commented Mar 28, 2018

Hi Dario,

First off, you can just use the master branch -- the logic in the tigmint-span branch used to look for spanning molecules to identify misassemblies has been incorporated into the master branch.

Also, no need to comment out commands to ensure that the Makefile starts at the step that you want. You just need to ensure that your BAM file is named myassembly.myreads.sortbx.bam (using notation from the help page above), as expected by tigmint. You can use the --dry-run option with the full tigmint command to ensure that the pipeline is starting where you want it to.

It appears that tigmint detects that the target files of the pipeline are already present in your directory. Could you note the other files in your working directory, as well as what you named the BAM file?

@dcopetti
Copy link
Author

Hi,

I installed the newest version, but it is giving me an error, same as the older installation (both attempts are here below:

[copettid@kp141-242 tigmint]$ tigmint-make
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/
[copettid@kp141-242 tigmint]$ tigmint-make tigmint
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'.  Stop.
[copettid@kp141-242 tigmint]$ ~/bin/old_tigmint/tigmint-make
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/
[copettid@kp141-242 tigmint]$ ~/bin/old_tigmint/tigmint-make tigmint
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'.  Stop.
[copettid@kp141-242 tigmint]$ tigmint-make tigmint raft=Rabiosa_genome reads=reads
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'.  Stop.

Also adding --dry-run gives me the same error.
The folder looks like this:

-rwx------ 1 copettid mpb 4.3G Feb 20 08:44 Rabiosa_genome.fa
-rw-r--r-- 1 copettid mpb 4.3G Feb 21 10:00 Rabiosa_genome.fa.bwt
-rw-r--r-- 1 copettid mpb 1.1G Feb 21 10:01 Rabiosa_genome.fa.pac
-rw-r--r-- 1 copettid mpb 8.9M Feb 21 10:01 Rabiosa_genome.fa.ann
-rw-r--r-- 1 copettid mpb 2.6M Feb 21 10:01 Rabiosa_genome.fa.amb
-rw-r--r-- 1 copettid mpb 2.2G Feb 21 10:22 Rabiosa_genome.fa.sa
lrwxrwxrwx 1 copettid mpb  103 Mar  2 08:53 input_fastqs -> '/home/copettid/public/Molecular Plant Breeding Group/Dario/Lolium/rabiosa_raw_data/10x_raw_data/fastqs/'
-rw-r--r-- 1 copettid mpb  344 Mar  5 10:08 transfer
lrwxrwxrwx 1 copettid mpb  144 Mar  5 10:11 reads.fq.gz -> '/home/copettid/public/Molecular Plant Breeding Group/Dario/Lolium/rabiosa_raw_data/10x_raw_data/longranger_basic_output/4606/outs/barcoded.fq.gz'
-rwx------ 1 copettid mpb 180G Mar 17 10:09 Rabiosa_genome.reads.sortbx.bam
-rw-r--r-- 1 copettid mpb 8.4M Mar 28 09:39 Rabiosa_genome.fa.fai

I am not sure what else I should do.
Thanks,

Dario

@lcoombe
Copy link
Member

lcoombe commented Mar 28, 2018

Looks like you misspelled draft here?
[copettid@kp141-242 tigmint]$ tigmint-make tigmint raft=Rabiosa_genome reads=reads

@dcopetti
Copy link
Author

Dang!
That is stupid.

With the --dry-run looks like this:

tigmint-make tigmint draft=Rabiosa_genome reads=reads --dry-run
/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
awk 'NR>1 { print $1"\t"$2-1"\t"$3-1"\tReads="$7",Size="$4",Mapq="$8",AS="$9",NM="$10",BX="$5",MI="$6"\t"$7 }' Rabiosa_genome.reads.as0.65.nm5.molecule.tsv \
        | sort -k1,1 -k2,2n -k3,3n >Rabiosa_genome.reads.as0.65.nm5.molecule.bed
awk '$3 - $2 >= 2000' Rabiosa_genome.reads.as0.65.nm5.molecule.bed >Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.bed
/home/copettid/bin/tigmint/bin/tigmint-cut -p8 -w1000 -n20 -t0 -o Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa Rabiosa_genome.fa Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.bed
ln -sf Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa Rabiosa_genome.tigmint.fa

Then I run it and I get this error:

tigmint-make tigmint draft=Rabiosa_genome reads=reads
/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
Traceback (most recent call last):
  File "/home/copettid/bin/tigmint/bin/tigmint-molecule", line 16, in <module>
    import pysam
ImportError: No module named 'pysam'
/home/copettid/bin/tigmint/bin/tigmint-make:174: recipe for target 'Rabiosa_genome.reads.as0.65.nm5.molecule.tsv' failed
make: *** [Rabiosa_genome.reads.as0.65.nm5.molecule.tsv] Error 1

so I updated pysam:

conda install pysam
Fetching package metadata .................
Solving package specifications: .
Package plan for installation in environment /home/copettid/miniconda2:
The following NEW packages will be INSTALLED:
    bcftools: 1.7-0        bioconda
The following packages will be UPDATED:
    pysam:    0.8.4-py27_0 bioconda --> 0.14.1-py27_htslib1.7_0 bioconda
[...]

move to a new terminal and I get this:

tigmint-make tigmint draft=Rabiosa_genome reads=reads
make: *** No rule to make target 'reads.fq.gz', needed by 'Rabiosa_genome.reads.sortbx.bam'.  Stop.

as if it is not recognizing the bam file?

@lcoombe
Copy link
Member

lcoombe commented Mar 28, 2018

Just checking -- are you in the same working directory as before? It looks like it can't see the file named reads.fq.gz?

@dcopetti
Copy link
Author

dcopetti commented Mar 28, 2018

You are right, the links were not activated.
But the error with pysam persists:

/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
Traceback (most recent call last):
  File "/home/copettid/bin/tigmint/bin/tigmint-molecule", line 16, in <module>
    import pysam
ImportError: No module named 'pysam'
/home/copettid/bin/tigmint/bin/tigmint-make:174: recipe for target 'Rabiosa_genome.reads.as0.65.nm5.molecule.tsv' failed

should I maybe redownlaod tigmint after having updated pysam?

@dcopetti
Copy link
Author

I just downloaded again tigmint, same error

@lcoombe
Copy link
Member

lcoombe commented Mar 28, 2018

Can you double check that the python3 that you want to use is being found (ie. in your PATH)?
Ex: when you use which python3, you see the path to your desired miniconda installation?

@dcopetti
Copy link
Author

Nope, it is the one in /usr/bin/python3.
I will install it tomorrow with my colleague,
thanks for the help!

@lcoombe
Copy link
Member

lcoombe commented Mar 28, 2018

No problem! Let me know if you encounter any other issues.

@dcopetti
Copy link
Author

Hi,
After installing python3.5 and updating the dependencies, I am now able to run tigmint as
tigmint tigmint draft=Rabiosa_genome reads=reads span=4 nm=4 dist=20000 t=8
Regarding the parameters, are the default already stringent? My goal is to break scaffolds with low (or questionable) 10x support.
In parallel, I am now running longranger wgs to visualize the linked reads around my putative junctions. Will ~12x coverage be enough? Or else I can use all the reads (~25x), but it will take longer.
Thanks for the support, very nice of you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants