doing bcl->qseq->fastq->analysis->galaxy in one machine #20

tanglingfung · 2011-04-11T17:03:00Z

Hi,

We have a different setting here where the drive with the bcl files is mounted to the analysis machine and we would do everything there. Do you recommend we keep the messaging system in the pipeline? Just want to get some advices.

Thanks,
Paul

chapmanb · 2011-04-12T01:37:13Z

Paul;
The messaging system is nice because it gives you the flexibility to separate the servers later if necessary. If you wanted to cut out this step of the process, it would require reimplementing 'finished_message' in 'illumina_finished_msg.py' to call 'analyze_finished_sqn.py' directly. Let me know if you take it in this direction; I'd be happy to merge in some generalized code for this case.

tanglingfung · 2011-04-12T01:44:28Z

Brad:

Thanks. Yes, I think it's nice too. But it would be also nice if it comes as a option. We were thinking to amend the code a bit, but just wonder what's the best way to generalize the code.

chapmanb · 2011-04-12T11:43:51Z

Paul;
Sounds good, let me know how the modifications go. My approach would be to have a configuration option that specifies "local" or "messaging" for calling analyze_finished_sqn.py, and then use these to adjust as I described above.

tanglingfung · 2011-04-12T14:41:29Z

Thanks. We are getting a similar approach.

On Tue, Apr 12, 2011 at 4:43 AM, chapmanb
reply@reply.github.com
wrote:

Paul;
Sounds good, let me know how the modifications go. My approach would be to have a configuration option that specifies "local" or "messaging" for calling analyze_finished_sqn.py, and then use these to adjust as I described above.

Reply to this email directly or view it on GitHub:
#20 (comment)

tanglingfung · 2011-04-15T20:27:09Z

Hi Brad,

One question. Where can I find the details of get_flowcell_info() and get_get_fastq_dir()? (Sorry, I am kind of new to Python.) We have changed the illumina_finished_msg.py to transfer the data from the sequencing machine to the analysis machine in the bcl->qseq step, and extracted the copy_and_analysis in analyze_finish_sqn.py to initiate the automated_initial_analysis.py. However, I wasn't sure which directory I should put for the

tanglingfung · 2011-04-16T01:53:42Z

sorry, I found the files.

But then, I got an error in nosetests that i wasn't sure where the problem is:
ERROR 2011-04-15 15:02:04 ProcessExecutor Rscript execution error: No such file or directory
[Fri Apr 15 15:02:04 PDT 2011] net.sf.picard.analysis.CollectGcBiasMetrics done.

do you have any idea?

chapmanb · 2011-04-16T12:18:01Z

Paul;
Great -- glad the fastq files made sense.

For your CollectGcBiasMetrics error, it sounds like Picard is having trouble finding the Rscript executable, which should come as part of R. Is R installed and on your path? Picard uses Rscript to generate plots.

tanglingfung · 2011-04-16T16:55:15Z

Brad,

Well. It will be good to be able to specify the fastq and qseqs files
in the configuration file. And in the BCL conversion step, I am not
sure if it's my system problem. I can't trigger the OLB right the way.
I need to append 'Data/Intensities/Basecalls' to the flow cell dir to
get to the config.xml that OLB requires.

Yes, I can call Rscript anywhere. I should be on my path too. Would it
be looking for the bait and target file? So far, I didn't know what to
put there.

Thanks,
Paul

On Sat, Apr 16, 2011 at 5:18 AM, chapmanb
reply@reply.github.com
wrote:

Paul;
Great -- glad the fastq files made sense.

For your CollectGcBiasMetrics error, it sounds like Picard is having trouble finding the Rscript executable, which should come as part of R. Is R installed and on your path? Picard uses Rscript to generate plots.

Reply to this email directly or view it on GitHub:
#20 (comment)

chapmanb · 2011-04-17T16:13:16Z

Paul;
It would be great to generalize this and I'm happy to look at that. Do you have your code available somewhere as a fork so I can get a sense of how you approached it? This would also help me to look at your OLB/BLC problem; it's not clear to me how you are caling it from your message.

For your CollectGcBias error, can you run this by hand from the commandline? That's the best way to debug it.

The bait and target files are only necessary if you are doing targetted re-sequencing. That wouldn't have any effect on CollectGcBias.

Thanks,
Brad

tanglingfung · 2011-04-17T16:56:13Z

Brad,

That's a good idea. Let me try to put it up soon.

As for the OLB problem, we find that it's a problem on our side. We
overlooked your code and left the get_qseq_dir(). Your code should be
fine. But it's a little bit confusing at first since I thought it's a
bc_dir, and there isn't qseqs_dir at that point.

Our approach now is to transfer the qseqs and fastq in a different
place by outputing them in a different directory in the OLB, and then
transfer the fastq files to the analysis folder before demultiplexing
step to keep the structure of your code. How does it sound?

In the future, we want to generalize it to handle the following cases:

uses fastq as an input
have an option to process a subset of fastq

I saw you had plans for case 1, right?

On Sun, Apr 17, 2011 at 9:13 AM, chapmanb
reply@reply.github.com
wrote:

Paul;
It would be great to generalize this and I'm happy to look at that. Do you have your code available somewhere as a fork so I can get a sense of how you approached it? This would also help me to look at your OLB/BLC problem; it's not clear to me how you are caling it from your message.

For your CollectGcBias error, can you run this by hand from the commandline? That's the best way to debug it.

The bait and target files are only necessary if you are doing targetted re-sequencing. That wouldn't have any effect on CollectGcBias.

Thanks,
Brad

Reply to this email directly or view it on GitHub:
#20 (comment)

chapmanb · 2011-04-19T14:03:10Z

Paul;
Great, that'll really help to be able to look at the code.

For the fastq files, I would suggest transferring them somewhere outside of the Illumina dump tree. You'll probably have different backup strategies for these compared to the rest of the Illumina output, so this helps facilitate that.

For your last question, I'm not sure what step you are referring to. The analysis works with fastq files now and if you only want to process certain lanes, you can pass in a custom run_info.yaml file specifying what to process to automated_initial_analysis.py:

https://github.com/chapmanb/bcbb/blob/master/nextgen/config/run_info.yaml

Hope this helps.

tanglingfung · 2011-04-20T07:21:38Z

Hi Brad,

I'm not used to github yet, so, I posted our code with gist. Basically, we took away the messaging part in illumina_finished_msg.py, transfer the qseqs files to analysis machine, generate fastq files directly to the analysis dir and start the analysis there. I attached the code in the following URL:
https://gist.github.com/1769766276fc6b18494c
and in order to specify output directory of the fastq, we modified the solexa_qseq_to_fastq.py a bit:
https://gist.github.com/9b8a821cabc30711b454

I hope it can be generalized and compatible with your scripts.

We seems have the system set up on our server. The analysis pipeline is still running.

Thanks,
Paul

chapmanb · 2011-04-22T13:50:10Z

Paul;
Thanks for posting these. I checked in updated to solexa_qseq_to_fastq and illumina_finished_msg to allow you to specify a local directory for writing the fastq and analysis files. If you pass in a post processing configuration file and specify postprocess_dir in your transfer_info.yaml this will automatically process on disk instead of using RabbitMQ.

The only thing I did not add is changing where qseqs are dumped. I'd rather not mess with OLB directories and processes to keep this forward compatible as Illumina practices and software changes.

Hopefully this works for what you wanted to do. Let me know if you run into any problems.

Demux code and paths

chapmanb closed this as completed Apr 12, 2011

b97pla referenced this issue in b97pla/bcbb Nov 30, 2011

Merge pull request SciLifeLab#20 from b97pla/master

19bea0e

Demux code and paths

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doing bcl->qseq->fastq->analysis->galaxy in one machine #20

doing bcl->qseq->fastq->analysis->galaxy in one machine #20

tanglingfung commented Apr 11, 2011

chapmanb commented Apr 12, 2011

tanglingfung commented Apr 12, 2011

chapmanb commented Apr 12, 2011

tanglingfung commented Apr 12, 2011

tanglingfung commented Apr 15, 2011

tanglingfung commented Apr 16, 2011

chapmanb commented Apr 16, 2011

tanglingfung commented Apr 16, 2011

chapmanb commented Apr 17, 2011

tanglingfung commented Apr 17, 2011

chapmanb commented Apr 19, 2011

tanglingfung commented Apr 20, 2011

chapmanb commented Apr 22, 2011

doing bcl->qseq->fastq->analysis->galaxy in one machine #20

doing bcl->qseq->fastq->analysis->galaxy in one machine #20

Comments

tanglingfung commented Apr 11, 2011

chapmanb commented Apr 12, 2011

tanglingfung commented Apr 12, 2011

chapmanb commented Apr 12, 2011

tanglingfung commented Apr 12, 2011

tanglingfung commented Apr 15, 2011

tanglingfung commented Apr 16, 2011

chapmanb commented Apr 16, 2011

tanglingfung commented Apr 16, 2011

chapmanb commented Apr 17, 2011

tanglingfung commented Apr 17, 2011

chapmanb commented Apr 19, 2011

tanglingfung commented Apr 20, 2011

chapmanb commented Apr 22, 2011