Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No subdirectories generated after assemble step #21

Closed
Linda-Lan opened this issue Aug 13, 2018 · 10 comments
Closed

No subdirectories generated after assemble step #21

Linda-Lan opened this issue Aug 13, 2018 · 10 comments

Comments

@Linda-Lan
Copy link

Linda-Lan commented Aug 13, 2018

Hi BraCeR team,

Thank you for the previous solutions which are work!

I use FASTQ file containing #1 and #2 mates from paired-end sequencing generated from 10x genomic system. This sample supposed to be detected 500+ cells +400 paired vdj through 10x’s software-cellranger vdj.

I got the following output which unlike test data contains 3 cells. Unsure whether I should
Set cell_name for 500+ cells that I won’t be able to know until BCR construction. Do you have any solution for this? Please let me know if you need more information!
Mac-Pro:outs patrickwilson$ cd ..
Mac-Pro:319vdj patrickwilson$ ls
319-VDJ_S11_L006_I1_001.fastq 319-VDJ_S11_L006_R1_001.fastq.gz outs
319-VDJ_S11_L006_I1_001.fastq.gz 319-VDJ_S11_L006_R2_001.fastq
319-VDJ_S11_L006_R1_001.fastq 319-VDJ_S11_L006_R2_001.fastq.gz
Mac-Pro:319vdj patrickwilson$ cd outs
Mac-Pro:outs patrickwilson$ pwd
/Users/patrickwilson/Desktop/linda/319vdj/outs
Mac-Pro:outs patrickwilson$ ls
319vdj
Mac-Pro:outs patrickwilson$ cd 319vdj/
Mac-Pro:319vdj patrickwilson$ ls
BLAST_output aligned_reads trimmed_reads
IgBLAST_output expression_quantification unfiltered_BCR_seqs
Trinity_output filtered_BCR_seqs
Mac-Pro:319vdj patrickwilson$ 

I put the last step showing on my end of assemble step for your reference.

##Running Kallisto##
##Making Kallisto indices##

[build] loading fasta file /scratch/outs/319vdj/expression_quantification/kallisto_index/319vdj_transcriptome.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 1549 target sequences
[build] warning: replaced 4 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 1285321 contigs and contains 126172909 k-mers

##Quantifying with Kallisto##

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 200,371
[index] number of k-mers: 126,172,909
[index] number of equivalence classes: 825,477
[quant] running in paired-end mode
[quant] will process pair 1: /scratch/outs/319vdj/trimmed_reads/319-VDJ_S11_L006_R1_001_val_1.fq
/scratch/outs/319vdj/trimmed_reads/319-VDJ_S11_L006_R2_001_val_2.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 6,016,781 reads, 2,008,887 reads pseudoaligned
[quant] estimated average fragment length: 182.116
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 648 rounds

Thank you,
Linda

@mstubb
Copy link
Member

mstubb commented Aug 13, 2018 via email

@JohnMCMa
Copy link

Hi Mike,
I'm joining since we also plan to use outputs from 10X Chromium. When you say "If you wish to use BraCeR’s clonality inference and lineage tree construction methods you should then be able to use the Cell Ranger VDJ sequences as input", do you mean one of the fastq files given out by cellranger? If so, which one?
Cheers,
John

@mstubb
Copy link
Member

mstubb commented Aug 15, 2018

Hi @JohnMCMa,

I’m on vacation. Perhaps @idalind can help with input files. If not, I’ll get back to you on my return.

Mike

@Linda-Lan
Copy link
Author

Hi Mike and @idalind,

Thank you for reply. I do intend to use BraCeR’s clonality inference and lineage tree construction methods because the visualization figure is clear and informative!

Here is my output from Cellranger vdj, would like to know which would be input files?

Best,
Linda

[lindalan@midway-login2 vdj_out]$ ls
301VDJ_s 317VDJ_s 324VDJ_s 331VDJ_s 337VDJ_s 347VDJ_s HDVDJ_s
308VDJ_s 319VDJ 326VDJ_s 333VDJ_s 342VDJ_s 349VDJ_s
310VDJ_s 319-VDJ_5_prime_s 327VDJ_s 334VDJ_s 343VDJ_s 350VDJ_s
311VDJ_s 322VDJ_s 328VDJ_s 336VDJ_s 346VDJ_s 351VDJ_s
[lindalan@midway-login2 vdj_out]$ cd 319VDJ
[lindalan@midway-login2 319VDJ]$ ls
319VDJ.mri.tgz _invocation outs _tags _versions
_cmdline _jobmode _perf _timestamp
_filelist _log SC_VDJ_ASSEMBLER_CS _uuid
_finalstate _mrosource _sitecheck _vdrkill
[lindalan@midway-login2 319VDJ]$ cd outs
[lindalan@midway-login2 outs]$ ls
all_contig_annotations.bed consensus_annotations.csv
all_contig_annotations.csv consensus_annotations.json
all_contig_annotations.json consensus.bam
all_contig.bam consensus.bam.bai
all_contig.bam.bai consensus.fasta
all_contig.fasta consensus.fasta.fai
all_contig.fasta.fai consensus.fastq
all_contig.fastq filtered_contig_annotations.csv
clonotypes.csv filtered_contig.fasta
concat_ref.bam filtered_contig.fastq
concat_ref.bam.bai metrics_summary.csv
concat_ref.fasta vloupe.vloupe
concat_ref.fasta.fai web_summary.html

@JohnMCMa
Copy link

Hi @mstubb , any news on this?

@idalind
Copy link
Collaborator

idalind commented Aug 30, 2018

Hi @Linda-Lan and @JohnMCMa . Apologies for the late reply. I will have a look at this in the next few days and will get back to you soon.

@idalind
Copy link
Collaborator

idalind commented Aug 31, 2018

Hi again. I would try parsing the 10x output file "filtered_contig.fasta" into separate fasta files for each cell (according to the cell barcodes given by 10x), for example in python like this:

with open("filtered_contig.fasta", "r") as input:
    for line in input:
        if line.startswith(">"):
            cell = line.split("-1_contig")[0][1:]
        with open("{}.fasta".format(cell), "a") as output:
            output.write(line)

This will give a fasta file for each cell (with the barcode as cell name, but you could change this if you like), which each needs to be provided to bracer assemble. This is very quick, but needs to be done for each fasta file. Example for one fasta file:

bracer assemble CACATAGAGAAGGACA output_dir --assembled_file CACATAGAGAAGGACA.fa

I hope these instructions help, and please let me know if you have any further issues.

@Linda-Lan
Copy link
Author

Hi idalind,

Thank you for the python script. It successfully generate fasta files of each cells from "filtered_contig.fasta". Should I use assemble function or I can directly use summarise? Because the sequence in fasta looks already assembled by cell ranger, containing VDJ genes.

Lindas-MacBook-Pro:~ lindalan$ cd 327vdj/
Lindas-MacBook-Pro:327vdj lindalan$ ls
test
Lindas-MacBook-Pro:327vdj lindalan$ cd test/
Lindas-MacBook-Pro:test lindalan$ ls
AAACCTGAGAATCTCC.fasta AGTGTCACAGTAAGCG.fasta CCGGGATTCATATCGG.fasta CTTTGCGTCACTATTC.fasta GTAGTCATCGAGGTAG.fasta TCTTTCCCAAGCGAGT.fasta
AAGGCAGTCAGAGCTT.fasta AGTGTCAGTTAAGATG.fasta CCTAGCTAGCTACCGC.fasta GAAACTCGTCTAGAGG.fasta GTCGTAAGTACAGTTC.fasta TGACGGCCACGGCTAC.fasta
ACACCGGCAAACCTAC.fasta ATCATCTAGCCAACAG.fasta CGACTTCAGAACAATC.fasta GAAATGATCCCTCAGT.fasta GTCTCGTGTACCGCTG.fasta TGACGGCTCTGCTGTC.fasta
ACACCGGCACGCTTTC.fasta ATCATGGCATGCTGGC.fasta CGTAGGCCACAGAGGT.fasta GAACGGATCGAATGGG.fasta GTGAAGGTCGGTGTTA.fasta TGACTAGCACGCTTTC.fasta
ACAGCTAAGCTCCTTC.fasta ATCCGAACACTGCCAG.fasta CGTCCATGTAGAGTGC.fasta GAATGAATCCGAGCCA.fasta GTGCTTCAGCAGCCTC.fasta TGAGCATTCAGTTTGG.fasta
ACATACGCACATGACT.fasta ATCTGCCGTTAAAGTG.fasta CGTGAGCAGGCGACAT.fasta GACGTTATCGAATCCA.fasta GTGGGTCAGGAATTAC.fasta TGCACCTTCAGCATGT.fasta
ACCCACTCATCCGTGG.fasta ATTGGACTCAACGAAA.fasta CGTTAGAGTTGCGCAC.fasta GATGAAAGTCGTTGTA.fasta GTGTTAGTCATGTCCC.fasta TGCCCATTCAGAGCTT.fasta
ACGATACCAGCTCGAC.fasta CAAGAAAGTTCCTCCA.fasta CTAACTTCAACTGCTA.fasta GCAATCAAGCTCCTCT.fasta GTTAAGCCAGGGCATA.fasta TGCGCAGCACCAGTTA.fasta
ACGATGTAGTCTCGGC.fasta CAAGTTGCAGGTGGAT.fasta CTACATTAGCTATGCT.fasta GCACTCTAGACTTTCG.fasta TAAACCGCATTCCTGC.fasta TGGACGCGTTAAGACA.fasta
ACGCAGCCAGGGCATA.fasta CACAAACAGCTGCAAG.fasta CTACCCAGTTACCGAT.fasta GCAGCCATCATAAAGG.fasta TACGGTAAGCAACGGT.fasta TGGGAAGAGGATCGCA.fasta
ACGCAGCGTAGCTCCG.fasta CACACAACAGTCGTGC.fasta CTAGAGTGTTTGACAC.fasta GGACATTCACGCATCG.fasta TACTTACCATCCGTGG.fasta TTCTCAACAGATTGCT.fasta
ACGGAGAAGCATCATC.fasta CACATAGTCCACGTGG.fasta CTAGCCTTCATTGCCC.fasta GGATGTTCACCGGAAA.fasta TACTTGTTCTCGGACG.fasta TTCTTAGAGAGCAATT.fasta
ACGGGTCCATGCATGT.fasta CAGATCACAGTAAGCG.fasta CTCCTAGGTCTCATCC.fasta GGCTGGTTCCTTGGTC.fasta TAGAGCTAGTAAGTAC.fasta TTGGAACGTAAGTGGC.fasta
ACTGAACAGAATAGGG.fasta CATCAGATCAGAGCTT.fasta CTCGAAAAGAGTGACC.fasta GGGAATGTCACCCTCA.fasta TATGCCCCAGAGTGTG.fasta TTGGAACTCCGTAGTA.fasta
ACTGCTCTCATGTCTT.fasta CCAATCCAGCGTTGCC.fasta CTCGAAAGTTATGTGC.fasta GGGACCTCAGATCGGA.fasta TCATTACTCACAAACC.fasta TTTACTGGTTAGGGTG.fasta
AGAGTGGTCTTGTCAT.fasta CCAATCCCAAGAGGCT.fasta CTGCCTAGTGCCTGGT.fasta GGGCACTAGGGAACGG.fasta TCATTTGCAGTATAAG.fasta filtered_contig.fasta
AGCGTATGTCCCTTGT.fasta CCACGGACACGAAAGC.fasta CTTAGGATCATAGCAC.fasta GTACTTTTCCAGTAGT.fasta TCGCGAGAGGAGTTTA.fasta test.py
AGTAGTCGTACTTAGC.fasta CCCAATCTCCCGACTT.fasta CTTGGCTTCTATCCCG.fasta GTAGTCACAGAAGCAC.fasta TCGGGACCATTTCAGG.fasta

Lindas-MacBook-Pro:327vdj lindalan$ cat test/CACACAACAGTCGTGC.fasta

CACACAACAGTCGTGC-1_contig_1
TGGGGGGAGTCAGTCTCAGTCAGGACACAGCATGGACATGAGGGTCCCCGCTCAGCTCCTGGGGCTCCTGCTACTCTGGCTCCGAGGTGCCAGATGTGACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAGGAGACAGAGTCACCATCACTTGCCGGGCAAGTCAGAGCATTAGCAGCTATTTAAATTGGTATCAGCAGAAACCAGGGAAAGCCCCTAAGCTCCTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGACAGATTTCACTCTCACCATCAGCAGTCTGCAACCTGAAGATTTTGCAACTTACTACTGTCAACAGAGTTACAGTACCCCCCCCCCGGAGGGACCAAGGTGGAGATCAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGCCCTCCAATCGGGTAACTCCCAGGAGAGTGTCACAGAGCAGGACAGCAAGGACAGCACCTACAGCCTCAGCAGCACCCTGACGCTGAGCAAAGCAGACTACGAGAA

I then run this commend,
Lindas-MacBook-Pro:test lindalan$ docker run -it -v $PWD:/scratch teichlab/bracer summarise /scratch/test

Traceback (most recent call last):
File "/usr/local/bin/bracer", line 11, in
load_entry_point('bracer==0.1', 'console_scripts', 'bracer')()
File "/usr/local/lib/python3.5/dist-packages/bracer-0.1-py3.5.egg/bracerlib/launcher.py", line 43, in launch
Task().run()
File "/usr/local/lib/python3.5/dist-packages/bracer-0.1-py3.5.egg/bracerlib/tasks.py", line 712, in run
subdirectories = next(os.walk(self.root_dir))[1]
StopIteration

Do you have any thoughts on this issue? Thank you!

@idalind
Copy link
Collaborator

idalind commented Sep 4, 2018

Hi @Linda-Lan,
You will have to run the assemble step for each cell (as described in my previous post) in order to analyse the sequences and create the internal data structures BraCeR expects as input for the summarise step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants