New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastQC vs SolexaQA #25
Comments
and our system appears to be overloaded after recalibration using GATK, and then the system will slow down a lot afterwards. Do you have any suggestion on this? Thanks a lot! |
Paul; For your GATK issues, how much memory does your machine have? Perhaps our memory settings now (6Gb per process) are too high and it's causing swapping. I could make that configurable if it would be helpful. Thanks, |
Brad, I actually like the change. I am just concerned with consistence for some of our projects. But I agree with you that FastQC has more useful details. Our machine currently has 48G RAM with 2x12 cores. I guess it would be more reasonable to add more RAM in our case. May I know the configuration of your system? and how much time does it take to go through the whole pipeline? (e.g. in a SNP calling analysis) Thanks, By the way, we are working on a script that creates symbolic link to fastq files in different flow cell directories and put them in a virtual one to be processed by automated_initial_analysis.py. I think that would be useful for multiple samples run in multiple flowcells |
Paul; We have 48G of RAM but only 8 cores; we need more processors to deal with the new HiSeq. If you can run up to 24 processes currently, you would expect some memory swapping on barcoded HiSeq lanes. I'll work on making that a configurable parameter; it might slow down GATK but would at least save tons of swapping slowness. Full barcoded SNP calling analyses can take a couple of days to process on that machine; it can be even longer if you have lots of barcodes as you need to wait for cores. Let me know when you have your script finished. I'd be very happy to link to it from the documentation or include it as a utility for others. Thanks again, |
Thanks. I may make a mistake on the number of cores. Maybe ours is also 8 By the way, what's the best practice to use git to stay up to date Best, On Sun, May 1, 2011 at 9:52 AM, chapmanb
|
Hello Paul, I'm also running Brad's pipeline in production. I use a rather naïve approach to launch the automatic initial analysis, but so far has worked acceptably well (~2 days of processing on average per Run). It consists of putting a wrapper in place in post_process.yaml: (...) instead of the default "automated_initial_analysis.py" Illumina_run_batch.sh will queue the job on a cluster and launch the analysis on a single machine, using all 8 cores. All this assumes that you have a "beowulf"-type cluster in place, together with a batch queueing system (perhaps you can ask your IT staff?): http://en.wikipedia.org/wiki/Beowulf_(computing) As I said, this is just a hack, better ways to parallelize/optimize this need to be worked on further. Regarding the best practice to use git, I would recommend you to "fork" Brad's repository by following this guide: http://help.github.com/fork-a-repo/ Once you're happy with your changes, you may issue "Pull requests" towards Brad: http://help.github.com/pull-requests/ Hope it all helps ! ;) |
Paul; Roman is spot on with his GitHub suggestions. Once you make a fork you can keep a repository of your own scripts in utils or wherever, and we can merge ones back into the main trunk. While you are developing you can keep pulling in changes from the main repository and git will help with merging differences. Thanks guys. |
Thanks Roman and Brad. Yes, I think the script is doing very well for a single flowcell with 8-cores, 48G RAM. It can be done in 2-3 days. No problem with that. And I also looking into the "beowulf"-type cluster. It makes configuration easier by syncing the OS of the servers. Thanks again for all the advices and help here! Best, |
Brad, I have just tried to current version of the pipeline. However, the text from FastQC is still weird and the subtitle is missing. Paul |
Paul; |
Brad, Sorry for being unclear. I found that it's a problem from FastQC. The Thanks, On Thu, May 5, 2011 at 3:00 PM, chapmanb
|
we're getting stable with the pipeline now and have plans to move the analysis part to the cluster. Thanks again for all the helping. I have also started to fork the repository, and hopefully I can start to contribute back to the pipeline development. Thanks again for all the helps in the past few months. |
Paul; |
fastq_screen fixes & basecalling parameters
Brad,
It's not really an issue. But I want to know, from your experience, how much time you would save from switching to FastQC from SolexaQA?
Thanks,
Paul
The text was updated successfully, but these errors were encountered: