No subdirectories generated after assemble step #21

Linda-Lan · 2018-08-13T16:27:37Z

Hi BraCeR team,

Thank you for the previous solutions which are work!

I use FASTQ file containing #1 and #2 mates from paired-end sequencing generated from 10x genomic system. This sample supposed to be detected 500+ cells +400 paired vdj through 10x’s software-cellranger vdj.

I got the following output which unlike test data contains 3 cells. Unsure whether I should
Set cell_name for 500+ cells that I won’t be able to know until BCR construction. Do you have any solution for this? Please let me know if you need more information!
Mac-Pro:outs patrickwilson$ cd ..
Mac-Pro:319vdj patrickwilson$ ls
319-VDJ_S11_L006_I1_001.fastq 319-VDJ_S11_L006_R1_001.fastq.gz outs
319-VDJ_S11_L006_I1_001.fastq.gz 319-VDJ_S11_L006_R2_001.fastq
319-VDJ_S11_L006_R1_001.fastq 319-VDJ_S11_L006_R2_001.fastq.gz
Mac-Pro:319vdj patrickwilson$ cd outs
Mac-Pro:outs patrickwilson$ pwd
/Users/patrickwilson/Desktop/linda/319vdj/outs
Mac-Pro:outs patrickwilson$ ls
319vdj
Mac-Pro:outs patrickwilson$ cd 319vdj/
Mac-Pro:319vdj patrickwilson$ ls
BLAST_output aligned_reads trimmed_reads
IgBLAST_output expression_quantification unfiltered_BCR_seqs
Trinity_output filtered_BCR_seqs
Mac-Pro:319vdj patrickwilson$

I put the last step showing on my end of assemble step for your reference.

##Running Kallisto##
##Making Kallisto indices##

[build] loading fasta file /scratch/outs/319vdj/expression_quantification/kallisto_index/319vdj_transcriptome.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 1549 target sequences
[build] warning: replaced 4 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 1285321 contigs and contains 126172909 k-mers

##Quantifying with Kallisto##

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 200,371
[index] number of k-mers: 126,172,909
[index] number of equivalence classes: 825,477
[quant] running in paired-end mode
[quant] will process pair 1: /scratch/outs/319vdj/trimmed_reads/319-VDJ_S11_L006_R1_001_val_1.fq
/scratch/outs/319vdj/trimmed_reads/319-VDJ_S11_L006_R2_001_val_2.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 6,016,781 reads, 2,008,887 reads pseudoaligned
[quant] estimated average fragment length: 182.116
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 648 rounds

Thank you,
Linda

mstubb · 2018-08-13T21:22:37Z

Hi Linda, BraCeR is not intended for use with data generated using the 10x Genomics Chromium. BraCeR expects sequencing reads to be demultiplexed into separate fastq files for each cell so that it can run on one cell at a time. If you’re not doing that it’s unlikely to work. Even if you are doing that, I don’t know what performance to expect if you use 10x data and, if you’re using the 10x VDJ assay, I recommend you use Cell Ranger to assemble your BCR sequences. If you wish to use BraCeR’s clonality inference and lineage tree construction methods you should then be able to use the Cell Ranger VDJ sequences as input. Best, Mike

…

On 13 Aug 2018, at 19:27, Linda-Lan ***@***.***> wrote: Hi BraCeR team, Thank you for the previous solutions which are work! I use FASTQ file containing #1 and #2 mates from paired-end sequencing generated from 10x genomic system. This sample supposed to be detected 500+ cells +400 paired vdj through 10x’s software-cellranger vdj. I got the following output which unlike test data contains 3 cells. Unsure whether I should Set cell_name for 500+ cells that I won’t be able to know until BCR construction. Do you have any solution for this? Please let me know if you need more information! Mac-Pro:outs patrickwilson$ cd .. Mac-Pro:319vdj patrickwilson$ ls 319-VDJ_S11_L006_I1_001.fastq 319-VDJ_S11_L006_R1_001.fastq.gz outs 319-VDJ_S11_L006_I1_001.fastq.gz 319-VDJ_S11_L006_R2_001.fastq 319-VDJ_S11_L006_R1_001.fastq 319-VDJ_S11_L006_R2_001.fastq.gz Mac-Pro:319vdj patrickwilson$ cd outs Mac-Pro:outs patrickwilson$ pwd /Users/patrickwilson/Desktop/linda/319vdj/outs Mac-Pro:outs patrickwilson$ ls 319vdj Mac-Pro:outs patrickwilson$ cd 319vdj/ Mac-Pro:319vdj patrickwilson$ ls BLAST_output aligned_reads trimmed_reads IgBLAST_output expression_quantification unfiltered_BCR_seqs Trinity_output filtered_BCR_seqs Mac-Pro:319vdj patrickwilson$ Thank you, Linda — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

JohnMCMa · 2018-08-15T13:55:59Z

Hi Mike,
I'm joining since we also plan to use outputs from 10X Chromium. When you say "If you wish to use BraCeR’s clonality inference and lineage tree construction methods you should then be able to use the Cell Ranger VDJ sequences as input", do you mean one of the fastq files given out by cellranger? If so, which one?
Cheers,
John

mstubb · 2018-08-15T14:48:34Z

Hi @JohnMCMa,

I’m on vacation. Perhaps @idalind can help with input files. If not, I’ll get back to you on my return.

Mike

Linda-Lan · 2018-08-15T16:52:47Z

Hi Mike and @idalind,

Thank you for reply. I do intend to use BraCeR’s clonality inference and lineage tree construction methods because the visualization figure is clear and informative!

Here is my output from Cellranger vdj, would like to know which would be input files?

Best,
Linda

[lindalan@midway-login2 vdj_out]$ ls
301VDJ_s 317VDJ_s 324VDJ_s 331VDJ_s 337VDJ_s 347VDJ_s HDVDJ_s
308VDJ_s 319VDJ 326VDJ_s 333VDJ_s 342VDJ_s 349VDJ_s
310VDJ_s 319-VDJ_5_prime_s 327VDJ_s 334VDJ_s 343VDJ_s 350VDJ_s
311VDJ_s 322VDJ_s 328VDJ_s 336VDJ_s 346VDJ_s 351VDJ_s
[lindalan@midway-login2 vdj_out]$ cd 319VDJ
[lindalan@midway-login2 319VDJ]$ ls
319VDJ.mri.tgz _invocation outs _tags _versions
_cmdline _jobmode _perf _timestamp
_filelist _log SC_VDJ_ASSEMBLER_CS _uuid
_finalstate _mrosource _sitecheck _vdrkill
[lindalan@midway-login2 319VDJ]$ cd outs
[lindalan@midway-login2 outs]$ ls
all_contig_annotations.bed consensus_annotations.csv
all_contig_annotations.csv consensus_annotations.json
all_contig_annotations.json consensus.bam
all_contig.bam consensus.bam.bai
all_contig.bam.bai consensus.fasta
all_contig.fasta consensus.fasta.fai
all_contig.fasta.fai consensus.fastq
all_contig.fastq filtered_contig_annotations.csv
clonotypes.csv filtered_contig.fasta
concat_ref.bam filtered_contig.fastq
concat_ref.bam.bai metrics_summary.csv
concat_ref.fasta vloupe.vloupe
concat_ref.fasta.fai web_summary.html

JohnMCMa · 2018-08-29T15:12:34Z

Hi @mstubb , any news on this?

idalind · 2018-08-30T11:01:16Z

Hi @Linda-Lan and @JohnMCMa . Apologies for the late reply. I will have a look at this in the next few days and will get back to you soon.

Linda-Lan · 2018-08-30T14:24:59Z

Thank you!

…

On Thu, Aug 30, 2018 at 6:21 AM idalind ***@***.***> wrote: [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible for Automatic Cleanup! ***@***.***) Add cleanup rule <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DC%252Funj%252FmCC7xZrE8jnDPd5DV8z6ZhUUn7uzp9ttfIHdw%253D%26token%3DbxpZkkFk29nQfDtWbK25Vtmmxp4DVgVNU7rRKx%252BXmA6lWfl7v4%252Baei9Gz21ClyM85jOcmtmpJ6FXdxe86ztq4DIyFDQeD6FM5lhgwfM59YDIVef555vI18ziYsbijmt%252FIEVlwb034LNynXvGuhjyuA%253D%253D&tc_serial=42670968216&tc_rand=991301354&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> | More info <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=42670968216&tc_rand=991301354&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> Hi @Linda-Lan <https://github.com/Linda-Lan> and @JohnMCMa <https://github.com/JohnMCMa> . Apologies for the late reply. I will have a look at this in the next few days and will get back to you soon. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AgAEoMaQJ_vZyq2eZ2nQh66ccuwzBFRxks5uV8X9gaJpZM4V60yp> .

idalind · 2018-08-31T16:04:41Z

Hi again. I would try parsing the 10x output file "filtered_contig.fasta" into separate fasta files for each cell (according to the cell barcodes given by 10x), for example in python like this:

with open("filtered_contig.fasta", "r") as input:
    for line in input:
        if line.startswith(">"):
            cell = line.split("-1_contig")[0][1:]
        with open("{}.fasta".format(cell), "a") as output:
            output.write(line)

This will give a fasta file for each cell (with the barcode as cell name, but you could change this if you like), which each needs to be provided to bracer assemble. This is very quick, but needs to be done for each fasta file. Example for one fasta file:

bracer assemble CACATAGAGAAGGACA output_dir --assembled_file CACATAGAGAAGGACA.fa

I hope these instructions help, and please let me know if you have any further issues.

Linda-Lan · 2018-09-03T23:13:19Z

Hi idalind,

Thank you for the python script. It successfully generate fasta files of each cells from "filtered_contig.fasta". Should I use assemble function or I can directly use summarise? Because the sequence in fasta looks already assembled by cell ranger, containing VDJ genes.

Lindas-MacBook-Pro:~ lindalan$ cd 327vdj/
Lindas-MacBook-Pro:327vdj lindalan$ ls
test
Lindas-MacBook-Pro:327vdj lindalan$ cd test/
Lindas-MacBook-Pro:test lindalan$ ls
AAACCTGAGAATCTCC.fasta AGTGTCACAGTAAGCG.fasta CCGGGATTCATATCGG.fasta CTTTGCGTCACTATTC.fasta GTAGTCATCGAGGTAG.fasta TCTTTCCCAAGCGAGT.fasta
AAGGCAGTCAGAGCTT.fasta AGTGTCAGTTAAGATG.fasta CCTAGCTAGCTACCGC.fasta GAAACTCGTCTAGAGG.fasta GTCGTAAGTACAGTTC.fasta TGACGGCCACGGCTAC.fasta
ACACCGGCAAACCTAC.fasta ATCATCTAGCCAACAG.fasta CGACTTCAGAACAATC.fasta GAAATGATCCCTCAGT.fasta GTCTCGTGTACCGCTG.fasta TGACGGCTCTGCTGTC.fasta
ACACCGGCACGCTTTC.fasta ATCATGGCATGCTGGC.fasta CGTAGGCCACAGAGGT.fasta GAACGGATCGAATGGG.fasta GTGAAGGTCGGTGTTA.fasta TGACTAGCACGCTTTC.fasta
ACAGCTAAGCTCCTTC.fasta ATCCGAACACTGCCAG.fasta CGTCCATGTAGAGTGC.fasta GAATGAATCCGAGCCA.fasta GTGCTTCAGCAGCCTC.fasta TGAGCATTCAGTTTGG.fasta
ACATACGCACATGACT.fasta ATCTGCCGTTAAAGTG.fasta CGTGAGCAGGCGACAT.fasta GACGTTATCGAATCCA.fasta GTGGGTCAGGAATTAC.fasta TGCACCTTCAGCATGT.fasta
ACCCACTCATCCGTGG.fasta ATTGGACTCAACGAAA.fasta CGTTAGAGTTGCGCAC.fasta GATGAAAGTCGTTGTA.fasta GTGTTAGTCATGTCCC.fasta TGCCCATTCAGAGCTT.fasta
ACGATACCAGCTCGAC.fasta CAAGAAAGTTCCTCCA.fasta CTAACTTCAACTGCTA.fasta GCAATCAAGCTCCTCT.fasta GTTAAGCCAGGGCATA.fasta TGCGCAGCACCAGTTA.fasta
ACGATGTAGTCTCGGC.fasta CAAGTTGCAGGTGGAT.fasta CTACATTAGCTATGCT.fasta GCACTCTAGACTTTCG.fasta TAAACCGCATTCCTGC.fasta TGGACGCGTTAAGACA.fasta
ACGCAGCCAGGGCATA.fasta CACAAACAGCTGCAAG.fasta CTACCCAGTTACCGAT.fasta GCAGCCATCATAAAGG.fasta TACGGTAAGCAACGGT.fasta TGGGAAGAGGATCGCA.fasta
ACGCAGCGTAGCTCCG.fasta CACACAACAGTCGTGC.fasta CTAGAGTGTTTGACAC.fasta GGACATTCACGCATCG.fasta TACTTACCATCCGTGG.fasta TTCTCAACAGATTGCT.fasta
ACGGAGAAGCATCATC.fasta CACATAGTCCACGTGG.fasta CTAGCCTTCATTGCCC.fasta GGATGTTCACCGGAAA.fasta TACTTGTTCTCGGACG.fasta TTCTTAGAGAGCAATT.fasta
ACGGGTCCATGCATGT.fasta CAGATCACAGTAAGCG.fasta CTCCTAGGTCTCATCC.fasta GGCTGGTTCCTTGGTC.fasta TAGAGCTAGTAAGTAC.fasta TTGGAACGTAAGTGGC.fasta
ACTGAACAGAATAGGG.fasta CATCAGATCAGAGCTT.fasta CTCGAAAAGAGTGACC.fasta GGGAATGTCACCCTCA.fasta TATGCCCCAGAGTGTG.fasta TTGGAACTCCGTAGTA.fasta
ACTGCTCTCATGTCTT.fasta CCAATCCAGCGTTGCC.fasta CTCGAAAGTTATGTGC.fasta GGGACCTCAGATCGGA.fasta TCATTACTCACAAACC.fasta TTTACTGGTTAGGGTG.fasta
AGAGTGGTCTTGTCAT.fasta CCAATCCCAAGAGGCT.fasta CTGCCTAGTGCCTGGT.fasta GGGCACTAGGGAACGG.fasta TCATTTGCAGTATAAG.fasta filtered_contig.fasta
AGCGTATGTCCCTTGT.fasta CCACGGACACGAAAGC.fasta CTTAGGATCATAGCAC.fasta GTACTTTTCCAGTAGT.fasta TCGCGAGAGGAGTTTA.fasta test.py
AGTAGTCGTACTTAGC.fasta CCCAATCTCCCGACTT.fasta CTTGGCTTCTATCCCG.fasta GTAGTCACAGAAGCAC.fasta TCGGGACCATTTCAGG.fasta

Lindas-MacBook-Pro:327vdj lindalan$ cat test/CACACAACAGTCGTGC.fasta

CACACAACAGTCGTGC-1_contig_1
TGGGGGGAGTCAGTCTCAGTCAGGACACAGCATGGACATGAGGGTCCCCGCTCAGCTCCTGGGGCTCCTGCTACTCTGGCTCCGAGGTGCCAGATGTGACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAGGAGACAGAGTCACCATCACTTGCCGGGCAAGTCAGAGCATTAGCAGCTATTTAAATTGGTATCAGCAGAAACCAGGGAAAGCCCCTAAGCTCCTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGACAGATTTCACTCTCACCATCAGCAGTCTGCAACCTGAAGATTTTGCAACTTACTACTGTCAACAGAGTTACAGTACCCCCCCCCCGGAGGGACCAAGGTGGAGATCAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGCCCTCCAATCGGGTAACTCCCAGGAGAGTGTCACAGAGCAGGACAGCAAGGACAGCACCTACAGCCTCAGCAGCACCCTGACGCTGAGCAAAGCAGACTACGAGAA

I then run this commend,
Lindas-MacBook-Pro:test lindalan$ docker run -it -v $PWD:/scratch teichlab/bracer summarise /scratch/test

Traceback (most recent call last):
File "/usr/local/bin/bracer", line 11, in
load_entry_point('bracer==0.1', 'console_scripts', 'bracer')()
File "/usr/local/lib/python3.5/dist-packages/bracer-0.1-py3.5.egg/bracerlib/launcher.py", line 43, in launch
Task().run()
File "/usr/local/lib/python3.5/dist-packages/bracer-0.1-py3.5.egg/bracerlib/tasks.py", line 712, in run
subdirectories = next(os.walk(self.root_dir))[1]
StopIteration

Do you have any thoughts on this issue? Thank you!

idalind · 2018-09-04T11:21:13Z

Hi @Linda-Lan,
You will have to run the assemble step for each cell (as described in my previous post) in order to analyse the sequences and create the internal data structures BraCeR expects as input for the summarise step.

This was referenced Sep 6, 2018

Adding assembled_file option assembly process Teichlab/tracer#77

Open

Memory requirements in Summarise #25

Open

idalind closed this as completed Dec 31, 2018

arutik mentioned this issue Oct 15, 2019

TraCeR on 10X cellranegr vdj output filtered_contig.fasta demultiplexed files Teichlab/tracer#95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No subdirectories generated after assemble step #21

No subdirectories generated after assemble step #21

Linda-Lan commented Aug 13, 2018 •

edited

mstubb commented Aug 13, 2018 via email

JohnMCMa commented Aug 15, 2018

mstubb commented Aug 15, 2018

Linda-Lan commented Aug 15, 2018

JohnMCMa commented Aug 29, 2018

idalind commented Aug 30, 2018

Linda-Lan commented Aug 30, 2018 via email

idalind commented Aug 31, 2018 •

edited

Linda-Lan commented Sep 3, 2018

idalind commented Sep 4, 2018

No subdirectories generated after assemble step #21

No subdirectories generated after assemble step #21

Comments

Linda-Lan commented Aug 13, 2018 • edited

mstubb commented Aug 13, 2018 via email

JohnMCMa commented Aug 15, 2018

mstubb commented Aug 15, 2018

Linda-Lan commented Aug 15, 2018

JohnMCMa commented Aug 29, 2018

idalind commented Aug 30, 2018

Linda-Lan commented Aug 30, 2018 via email

idalind commented Aug 31, 2018 • edited

Linda-Lan commented Sep 3, 2018

idalind commented Sep 4, 2018

Linda-Lan commented Aug 13, 2018 •

edited

idalind commented Aug 31, 2018 •

edited