Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doing bcl->qseq->fastq->analysis->galaxy in one machine #20

Closed
tanglingfung opened this issue Apr 11, 2011 · 13 comments
Closed

doing bcl->qseq->fastq->analysis->galaxy in one machine #20

tanglingfung opened this issue Apr 11, 2011 · 13 comments

Comments

@tanglingfung
Copy link

Hi,

We have a different setting here where the drive with the bcl files is mounted to the analysis machine and we would do everything there. Do you recommend we keep the messaging system in the pipeline? Just want to get some advices.

Thanks,
Paul

@chapmanb
Copy link
Owner

Paul;
The messaging system is nice because it gives you the flexibility to separate the servers later if necessary. If you wanted to cut out this step of the process, it would require reimplementing 'finished_message' in 'illumina_finished_msg.py' to call 'analyze_finished_sqn.py' directly. Let me know if you take it in this direction; I'd be happy to merge in some generalized code for this case.

@tanglingfung
Copy link
Author

Brad:

Thanks. Yes, I think it's nice too. But it would be also nice if it comes as a option. We were thinking to amend the code a bit, but just wonder what's the best way to generalize the code.

@chapmanb
Copy link
Owner

Paul;
Sounds good, let me know how the modifications go. My approach would be to have a configuration option that specifies "local" or "messaging" for calling analyze_finished_sqn.py, and then use these to adjust as I described above.

@tanglingfung
Copy link
Author

Thanks. We are getting a similar approach.

On Tue, Apr 12, 2011 at 4:43 AM, chapmanb
reply@reply.github.com
wrote:

Paul;
Sounds good, let me know how the modifications go. My approach would be to have a configuration option that specifies "local" or "messaging" for calling analyze_finished_sqn.py, and then use these to adjust as I described above.

Reply to this email directly or view it on GitHub:
#20 (comment)

@tanglingfung
Copy link
Author

Hi Brad,

One question. Where can I find the details of get_flowcell_info() and get_get_fastq_dir()? (Sorry, I am kind of new to Python.) We have changed the illumina_finished_msg.py to transfer the data from the sequencing machine to the analysis machine in the bcl->qseq step, and extracted the copy_and_analysis in analyze_finish_sqn.py to initiate the automated_initial_analysis.py. However, I wasn't sure which directory I should put for the

@tanglingfung
Copy link
Author

sorry, I found the files.

But then, I got an error in nosetests that i wasn't sure where the problem is:
ERROR 2011-04-15 15:02:04 ProcessExecutor Rscript execution error: No such file or directory
[Fri Apr 15 15:02:04 PDT 2011] net.sf.picard.analysis.CollectGcBiasMetrics done.

do you have any idea?

@chapmanb
Copy link
Owner

Paul;
Great -- glad the fastq files made sense.

For your CollectGcBiasMetrics error, it sounds like Picard is having trouble finding the Rscript executable, which should come as part of R. Is R installed and on your path? Picard uses Rscript to generate plots.

@tanglingfung
Copy link
Author

Brad,

Well. It will be good to be able to specify the fastq and qseqs files
in the configuration file. And in the BCL conversion step, I am not
sure if it's my system problem. I can't trigger the OLB right the way.
I need to append 'Data/Intensities/Basecalls' to the flow cell dir to
get to the config.xml that OLB requires.

Yes, I can call Rscript anywhere. I should be on my path too. Would it
be looking for the bait and target file? So far, I didn't know what to
put there.

Thanks,
Paul

On Sat, Apr 16, 2011 at 5:18 AM, chapmanb
reply@reply.github.com
wrote:

Paul;
Great -- glad the fastq files made sense.

For your CollectGcBiasMetrics error, it sounds like Picard is having trouble finding the Rscript executable, which should come as part of R. Is R installed and on your path? Picard uses Rscript to generate plots.

Reply to this email directly or view it on GitHub:
#20 (comment)

@chapmanb
Copy link
Owner

Paul;
It would be great to generalize this and I'm happy to look at that. Do you have your code available somewhere as a fork so I can get a sense of how you approached it? This would also help me to look at your OLB/BLC problem; it's not clear to me how you are caling it from your message.

For your CollectGcBias error, can you run this by hand from the commandline? That's the best way to debug it.

The bait and target files are only necessary if you are doing targetted re-sequencing. That wouldn't have any effect on CollectGcBias.

Thanks,
Brad

@tanglingfung
Copy link
Author

Brad,

That's a good idea. Let me try to put it up soon.

As for the OLB problem, we find that it's a problem on our side. We
overlooked your code and left the get_qseq_dir(). Your code should be
fine. But it's a little bit confusing at first since I thought it's a
bc_dir, and there isn't qseqs_dir at that point.

Our approach now is to transfer the qseqs and fastq in a different
place by outputing them in a different directory in the OLB, and then
transfer the fastq files to the analysis folder before demultiplexing
step to keep the structure of your code. How does it sound?

In the future, we want to generalize it to handle the following cases:

  1. uses fastq as an input
  2. have an option to process a subset of fastq

I saw you had plans for case 1, right?

On Sun, Apr 17, 2011 at 9:13 AM, chapmanb
reply@reply.github.com
wrote:

Paul;
It would be great to generalize this and I'm happy to look at that. Do you have your code available somewhere as a fork so I can get a sense of how you approached it? This would also help me to look at your OLB/BLC problem; it's not clear to me how you are caling it from your message.

For your CollectGcBias error, can you run this by hand from the commandline? That's the best way to debug it.

The bait and target files are only necessary if you are doing targetted re-sequencing. That wouldn't have any effect on CollectGcBias.

Thanks,
Brad

Reply to this email directly or view it on GitHub:
#20 (comment)

@chapmanb
Copy link
Owner

Paul;
Great, that'll really help to be able to look at the code.

For the fastq files, I would suggest transferring them somewhere outside of the Illumina dump tree. You'll probably have different backup strategies for these compared to the rest of the Illumina output, so this helps facilitate that.

For your last question, I'm not sure what step you are referring to. The analysis works with fastq files now and if you only want to process certain lanes, you can pass in a custom run_info.yaml file specifying what to process to automated_initial_analysis.py:

https://github.com/chapmanb/bcbb/blob/master/nextgen/config/run_info.yaml

Hope this helps.

@tanglingfung
Copy link
Author

Hi Brad,

I'm not used to github yet, so, I posted our code with gist. Basically, we took away the messaging part in illumina_finished_msg.py, transfer the qseqs files to analysis machine, generate fastq files directly to the analysis dir and start the analysis there. I attached the code in the following URL:
https://gist.github.com/1769766276fc6b18494c
and in order to specify output directory of the fastq, we modified the solexa_qseq_to_fastq.py a bit:
https://gist.github.com/9b8a821cabc30711b454

I hope it can be generalized and compatible with your scripts.

We seems have the system set up on our server. The analysis pipeline is still running.

Thanks,
Paul

@chapmanb
Copy link
Owner

Paul;
Thanks for posting these. I checked in updated to solexa_qseq_to_fastq and illumina_finished_msg to allow you to specify a local directory for writing the fastq and analysis files. If you pass in a post processing configuration file and specify postprocess_dir in your transfer_info.yaml this will automatically process on disk instead of using RabbitMQ.

The only thing I did not add is changing where qseqs are dumped. I'd rather not mess with OLB directories and processes to keep this forward compatible as Illumina practices and software changes.

Hopefully this works for what you wanted to do. Let me know if you run into any problems.

b97pla referenced this issue in b97pla/bcbb Nov 30, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants