Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

Good's coverage estimate #255

Closed
jansuategui opened this issue Oct 15, 2012 · 25 comments
Closed

Good's coverage estimate #255

jansuategui opened this issue Oct 15, 2012 · 25 comments

Comments

@jansuategui
Copy link

Good's Coverage estimate using Qiime

@gregcaporaso
Copy link
Contributor

Would this generally be useful functionality? Respond with +1 if you would use this (and ideally describe a specific use case).

@antgonza
Copy link
Contributor

+1 This is a metric that a few users, examples below, have asked for.
The calculation is pretty simple, algorithm described in the first
link.

https://groups.google.com/forum/?fromgroups#!topic/qiime-forum/0S_WyC5q79s
https://groups.google.com/forum/#!msg/qiime-forum/husZ_TVFGOM/nmOspuCZ4DEJ
https://groups.google.com/forum/?fromgroups=#!topic/qiime-forum/4E_jr-34G7k

@gregcaporaso
Copy link
Contributor

OK, any takers on this one? @wdwvt1 or @justin212k seem like likely candidates.

@rob-knight
Copy link

+1 should integrate with Manuel's diversity estimation stuff (not sure where that ended up). Is very common use case to justify sampling effort to reviewers.

Rob

On Nov 28, 2012, at 7:34 AM, Greg Caporaso <notifications@github.commailto:notifications@github.com> wrote:

OK, any takers on this one? @wdwvt1https://github.com/wdwvt1 or @justin212khttps://github.com/justin212k seem like likely candidates.


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-10804480.

@justin212k
Copy link
Contributor

Hmm, we could add another file or two to qiime/pycogent_backports. But that adds a decent amount of complexity to the qiime codebase. What do you all think of adding good's coverage to alpha_diversity.py directly?

@antgonza
Copy link
Contributor

antgonza commented Dec 6, 2012

I think that should work ...

@rob-knight
Copy link

Agree. Probably want to merge in whatever module has Jens's implementation of Manuel's coverage estimators.

On Dec 6, 2012, at 5:41 AM, Antonio Gonzalez <notifications@github.commailto:notifications@github.com> wrote:

I think that should work ...


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11083913.

@justin212k
Copy link
Contributor

Sounds good. @jens_the_kraut, you know which module that is?

On Thu, Dec 6, 2012 at 9:17 AM, Rob Knight notifications@github.com wrote:

Agree. Probably want to merge in whatever module has Jens's implementation
of Manuel's coverage estimators.

On Dec 6, 2012, at 5:41 AM, Antonio Gonzalez <notifications@github.com
mailto:notifications@github.com> wrote:

I think that should work ...


Reply to this email directly or view it on GitHub<
https://github.com/qiime/qiime/issues/255#issuecomment-11083913>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11094643.

@jensreeder
Copy link
Contributor

That would be conditional_uncovered_probability.py.
It might be useful to rename this script into coverage.py, as this seems to be the term that people look for.
This requires to (1-x) the probabilities from Manuel's estimators.

In the long term, I suggest to put goods into pycogent as that is where e.g. the robbins estimator lives as well.
Maybe for the upcoming release, stashing it in qiime is the best solution.

Practically, we could also combine the metrics in this module with the alpha_diversity.py
I am already using AlphaDiversityCalcs(), so merging would be a no brainer.

@rob-knight
Copy link

All this stuff is related to alpha_diversity but having a separate coverage module that encompasses all this stuff (and is imported from alpha_diversity) might make sense. pycogent is a more logical home than qiime for all the general-purpose stuff like metrics I agree.

On Dec 6, 2012, at 10:53 AM, jensreeder <notifications@github.commailto:notifications@github.com> wrote:

That would be conditional_uncovered_probability.py.
It might be useful to rename this script into coverage.py, as this seems to be the term that people look for.
This requires to (1-x) the probabilities from Manuel's estimators.

In the long term, I suggest to put goods into pycogent as that is where e.g. the robbins estimator lives as well.
Maybe for the upcoming release, stashing it in qiime is the best solution.

Practically, we could also combine the metrics in this module with the alpha_diversity.py
I am already using AlphaDiversityCalcs(), so merging would be a no brainer.


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11096119.

@gregcaporaso
Copy link
Contributor

We currently don't have anyone assigned to this issue. @justin212k-the-moustache-enthusiast, do you want to take this one?

@justin212k
Copy link
Contributor

sign me up. (I don't think I can do that myself).

@ghost ghost assigned justin212k Dec 7, 2012
@gregcaporaso
Copy link
Contributor

sign me up. (I don't think I can do that myself).

Done - you should be able to sign yourself up.

@rob-knight
Copy link

Anyone know the answer to this? It looked like from earlier emails that this might be in progress?

Rob

Begin forwarded message:

From: "marzia@berkeley.edumailto:marzia@berkeley.edu" <marzia@berkeley.edumailto:marzia@berkeley.edu>
Subject: QIIME and Roche v2.8 software
Date: December 8, 2012 12:28:08 PM MST
To: Rob Knight <rob.knight@colorado.edumailto:rob.knight@colorado.edu>

Dear Rob,

I am the postdoc working with Steven Lindow at UC Berkeley on the
Sloan-funded indoor air microbial ecology project (BIMERC). We talked in
Boulder on October, I attended the workshop you and Mitch Sogin organized
for QIIME and VAMPS (I enjoyed it very much! And thanks for posting the
videos of the presentations online, really useful).

I have a question for you about QIIME. I recently sent some sample for
sequencing and I have been told that Roche made available a new software
(Roche v2.8 software with flow Pattern B) that apparently increases
quality and quantity from amplicon sequencing runs. I heard also that it
does not currently work with the QIIME denoising tool, but I also heard
that you guys are working with Roche to fix this problem. I was asked how
I want my samples processed, with the original software or with the new
software.

I would go for the new Roche software that apparently improves the
quality/quantity of data. But I also do want to use QIIME for my analyses.
So I was wondering if you could kindly give me an update on your work with
Roche, and a suggestion on how to proceed.

Thank you and I wish you a nice weekend!

Marzia

@jensreeder
Copy link
Contributor

454 runs with the randomized flow pattern B can not be denoised with Qiime
at this point.
I briefly looked into the code and figured that it will take me some time
to fix it.
My previous suggestions in the other thread was to ask the sequencing
center to keep the regular flow order.

Up to now, I haven't seen any official documentation of this new feature,
so I am hesitant to jump at it without more information. I will bug the
sequencing folks here at work and see if they know anything about it.

In any case, I think we have to caution people to blindly denoise FLX+ data
using the Titanum or FLX error profiles.
As I have no idea how much the profiles differ for these extremely long
reads, I can't really say anything about the effectiveness. Maybe someone
should run a mock community on FLX+ an have a look at the denoising outcome.

Jens

On Sun, Dec 9, 2012 at 2:37 PM, Rob Knight notifications@github.com wrote:

Anyone know the answer to this? It looked like from earlier emails that
this might be in progress?

Rob

Begin forwarded message:

From: "marzia@berkeley.edumailto:marzia@berkeley.edu" <
marzia@berkeley.edumailto:marzia@berkeley.edu>
Subject: QIIME and Roche v2.8 software
Date: December 8, 2012 12:28:08 PM MST
To: Rob Knight <rob.knight@colorado.edumailto:rob.knight@colorado.edu>

Dear Rob,

I am the postdoc working with Steven Lindow at UC Berkeley on the
Sloan-funded indoor air microbial ecology project (BIMERC). We talked in
Boulder on October, I attended the workshop you and Mitch Sogin organized
for QIIME and VAMPS (I enjoyed it very much! And thanks for posting the
videos of the presentations online, really useful).

I have a question for you about QIIME. I recently sent some sample for
sequencing and I have been told that Roche made available a new software
(Roche v2.8 software with flow Pattern B) that apparently increases
quality and quantity from amplicon sequencing runs. I heard also that it
does not currently work with the QIIME denoising tool, but I also heard
that you guys are working with Roche to fix this problem. I was asked how
I want my samples processed, with the original software or with the new
software.

I would go for the new Roche software that apparently improves the
quality/quantity of data. But I also do want to use QIIME for my analyses.
So I was wondering if you could kindly give me an update on your work with
Roche, and a suggestion on how to proceed.

Thank you and I wish you a nice weekend!

Marzia


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11177190.

@rob-knight
Copy link

Thanks, Jens. Is it just denoising that fails, i.e. they can do the rest of the analysis? Can they use e.g. Acacia or ampliconnoise for denoising?

Rob

On Dec 9, 2012, at 4:04 PM, jensreeder <notifications@github.commailto:notifications@github.com> wrote:

454 runs with the randomized flow pattern B can not be denoised with Qiime
at this point.
I briefly looked into the code and figured that it will take me some time
to fix it.
My previous suggestions in the other thread was to ask the sequencing
center to keep the regular flow order.

Up to now, I haven't seen any official documentation of this new feature,
so I am hesitant to jump at it without more information. I will bug the
sequencing folks here at work and see if they know anything about it.

In any case, I think we have to caution people to blindly denoise FLX+ data
using the Titanum or FLX error profiles.
As I have no idea how much the profiles differ for these extremely long
reads, I can't really say anything about the effectiveness. Maybe someone
should run a mock community on FLX+ an have a look at the denoising outcome.

Jens

On Sun, Dec 9, 2012 at 2:37 PM, Rob Knight <notifications@github.commailto:notifications@github.com> wrote:

Anyone know the answer to this? It looked like from earlier emails that
this might be in progress?

Rob

Begin forwarded message:

From: "marzia@berkeley.edumailto:marzia@berkeley.edumailto:marzia@berkeley.edu" <
marzia@berkeley.edumailto:marzia@berkeley.edumailto:marzia@berkeley.edu>
Subject: QIIME and Roche v2.8 software
Date: December 8, 2012 12:28:08 PM MST
To: Rob Knight <rob.knight@colorado.edumailto:rob.knight@colorado.edumailto:rob.knight@colorado.edu>

Dear Rob,

I am the postdoc working with Steven Lindow at UC Berkeley on the
Sloan-funded indoor air microbial ecology project (BIMERC). We talked in
Boulder on October, I attended the workshop you and Mitch Sogin organized
for QIIME and VAMPS (I enjoyed it very much! And thanks for posting the
videos of the presentations online, really useful).

I have a question for you about QIIME. I recently sent some sample for
sequencing and I have been told that Roche made available a new software
(Roche v2.8 software with flow Pattern B) that apparently increases
quality and quantity from amplicon sequencing runs. I heard also that it
does not currently work with the QIIME denoising tool, but I also heard
that you guys are working with Roche to fix this problem. I was asked how
I want my samples processed, with the original software or with the new
software.

I would go for the new Roche software that apparently improves the
quality/quantity of data. But I also do want to use QIIME for my analyses.
So I was wondering if you could kindly give me an update on your work with
Roche, and a suggestion on how to proceed.

Thank you and I wish you a nice weekend!

Marzia


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11177190.


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11177562.

@jensreeder
Copy link
Contributor

It's just denoising that fails, the rest of qiime will be fine.
Not sure how ampliconnoise or Acacia behave, but a simple grep over
ampliconnoise's code base showed several hardcoded occasions of the regular
flow order TACG, so I assume that it might have some issues as well.

Jens

On Sun, Dec 9, 2012 at 3:06 PM, Rob Knight notifications@github.com wrote:

Thanks, Jens. Is it just denoising that fails, i.e. they can do the rest
of the analysis? Can they use e.g. Acacia or ampliconnoise for denoising?

Rob

On Dec 9, 2012, at 4:04 PM, jensreeder <notifications@github.com<mailto:
notifications@github.com>> wrote:

454 runs with the randomized flow pattern B can not be denoised with Qiime
at this point.
I briefly looked into the code and figured that it will take me some time
to fix it.
My previous suggestions in the other thread was to ask the sequencing
center to keep the regular flow order.

Up to now, I haven't seen any official documentation of this new feature,
so I am hesitant to jump at it without more information. I will bug the
sequencing folks here at work and see if they know anything about it.

In any case, I think we have to caution people to blindly denoise FLX+
data
using the Titanum or FLX error profiles.
As I have no idea how much the profiles differ for these extremely long
reads, I can't really say anything about the effectiveness. Maybe someone
should run a mock community on FLX+ an have a look at the denoising
outcome.

Jens

On Sun, Dec 9, 2012 at 2:37 PM, Rob Knight <notifications@github.com
mailto:notifications@github.com> wrote:

Anyone know the answer to this? It looked like from earlier emails that
this might be in progress?

Rob

Begin forwarded message:

From: "marzia@berkeley.edumailto:marzia@berkeley.edu<mailto:
marzia@berkeley.edu>" <
marzia@berkeley.edumailto:marzia@berkeley.edu<mailto:
marzia@berkeley.edu>>
Subject: QIIME and Roche v2.8 software
Date: December 8, 2012 12:28:08 PM MST
To: Rob Knight <rob.knight@colorado.edu<mailto:rob.knight@colorado.edu
mailto:rob.knight@colorado.edu>

Dear Rob,

I am the postdoc working with Steven Lindow at UC Berkeley on the
Sloan-funded indoor air microbial ecology project (BIMERC). We talked in
Boulder on October, I attended the workshop you and Mitch Sogin
organized
for QIIME and VAMPS (I enjoyed it very much! And thanks for posting the
videos of the presentations online, really useful).

I have a question for you about QIIME. I recently sent some sample for
sequencing and I have been told that Roche made available a new software
(Roche v2.8 software with flow Pattern B) that apparently increases
quality and quantity from amplicon sequencing runs. I heard also that it
does not currently work with the QIIME denoising tool, but I also heard
that you guys are working with Roche to fix this problem. I was asked
how
I want my samples processed, with the original software or with the new
software.

I would go for the new Roche software that apparently improves the
quality/quantity of data. But I also do want to use QIIME for my
analyses.
So I was wondering if you could kindly give me an update on your work
with
Roche, and a suggestion on how to proceed.

Thank you and I wish you a nice weekend!

Marzia


Reply to this email directly or view it on GitHub<
https://github.com/qiime/qiime/issues/255#issuecomment-11177190>.


Reply to this email directly or view it on GitHub<
https://github.com/qiime/qiime/issues/255#issuecomment-11177562>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11177588.

@rob-knight
Copy link

OK thanks. Can anyone confirm whether acacia is wrapped in qiime yet as an alternative denoising procedure?

On Dec 9, 2012, at 5:04 PM, jensreeder <notifications@github.commailto:notifications@github.com> wrote:

It's just denoising that fails, the rest of qiime will be fine.
Not sure how ampliconnoise or Acacia behave, but a simple grep over
ampliconnoise's code base showed several hardcoded occasions of the regular
flow order TACG, so I assume that it might have some issues as well.

Jens

On Sun, Dec 9, 2012 at 3:06 PM, Rob Knight <notifications@github.commailto:notifications@github.com> wrote:

Thanks, Jens. Is it just denoising that fails, i.e. they can do the rest
of the analysis? Can they use e.g. Acacia or ampliconnoise for denoising?

Rob

On Dec 9, 2012, at 4:04 PM, jensreeder <notifications@github.commailto:notifications@github.com<mailto:
notifications@github.commailto:notifications@github.com>> wrote:

454 runs with the randomized flow pattern B can not be denoised with Qiime
at this point.
I briefly looked into the code and figured that it will take me some time
to fix it.
My previous suggestions in the other thread was to ask the sequencing
center to keep the regular flow order.

Up to now, I haven't seen any official documentation of this new feature,
so I am hesitant to jump at it without more information. I will bug the
sequencing folks here at work and see if they know anything about it.

In any case, I think we have to caution people to blindly denoise FLX+
data
using the Titanum or FLX error profiles.
As I have no idea how much the profiles differ for these extremely long
reads, I can't really say anything about the effectiveness. Maybe someone
should run a mock community on FLX+ an have a look at the denoising
outcome.

Jens

On Sun, Dec 9, 2012 at 2:37 PM, Rob Knight <notifications@github.commailto:notifications@github.com
mailto:notifications@github.com> wrote:

Anyone know the answer to this? It looked like from earlier emails that
this might be in progress?

Rob

Begin forwarded message:

From: "marzia@berkeley.edumailto:marzia@berkeley.edumailto:marzia@berkeley.edu<mailto:
marzia@berkeley.edumailto:marzia@berkeley.edu>" <
marzia@berkeley.edumailto:marzia@berkeley.edumailto:marzia@berkeley.edu<mailto:
marzia@berkeley.edumailto:marzia@berkeley.edu>>
Subject: QIIME and Roche v2.8 software
Date: December 8, 2012 12:28:08 PM MST
To: Rob Knight <rob.knight@colorado.edumailto:rob.knight@colorado.edu<mailto:rob.knight@colorado.edu
mailto:rob.knight@colorado.edu>

Dear Rob,

I am the postdoc working with Steven Lindow at UC Berkeley on the
Sloan-funded indoor air microbial ecology project (BIMERC). We talked in
Boulder on October, I attended the workshop you and Mitch Sogin
organized
for QIIME and VAMPS (I enjoyed it very much! And thanks for posting the
videos of the presentations online, really useful).

I have a question for you about QIIME. I recently sent some sample for
sequencing and I have been told that Roche made available a new software
(Roche v2.8 software with flow Pattern B) that apparently increases
quality and quantity from amplicon sequencing runs. I heard also that it
does not currently work with the QIIME denoising tool, but I also heard
that you guys are working with Roche to fix this problem. I was asked
how
I want my samples processed, with the original software or with the new
software.

I would go for the new Roche software that apparently improves the
quality/quantity of data. But I also do want to use QIIME for my
analyses.
So I was wondering if you could kindly give me an update on your work
with
Roche, and a suggestion on how to proceed.

Thank you and I wish you a nice weekend!

Marzia


Reply to this email directly or view it on GitHub<
https://github.com/qiime/qiime/issues/255#issuecomment-11177190>.


Reply to this email directly or view it on GitHub<
https://github.com/qiime/qiime/issues/255#issuecomment-11177562>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11177588.


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11178241.

@gregcaporaso
Copy link
Contributor

I do not think that it is, and a search for acacia in the full code base doesn't return any hits.

@justin212k
Copy link
Contributor

Hey folks, note that all these emails are being posted on github under Issue #255.

@justin212k
Copy link
Contributor

and returning to Issue #255, what does everyone think of merging the coverage stuff with alpha_diversity.py. E.g. Good's coverage isn't estimating the diversity of the community, but instead the extent to which it's been adequately sampled. But rarefaction curves with e.g. Good's seem informative, and it'd be nice to have all the workflow scripts that interact with alpha_diversity.py work with e.g. Good's coverage. I'd like to 1-x the metrics in conditional_uncovered_probability.py, delete that file, and put the new coverage estimators in alpha_diversity.py. Then add a little documentation noting how we've blurred the boundaries of what alpha_diversity.py does.

I'm tepid myself, anyone dislike this idea?

@gregcaporaso
Copy link
Contributor

I don't think this is a bad idea, but note that we could also modify the alpha_rarefaction.py workflow to use either script if you think that will be clearer/easier to document.

@justin212k
Copy link
Contributor

Sorry folks, I don't think I can get this in by dec 13th at 7am. I've made a few changes, but nothing near ready for a pull request.

@gregcaporaso
Copy link
Contributor

Would an extra day help?

@wdwvt1
Copy link
Contributor

wdwvt1 commented Dec 13, 2012

I may be able to help - I have finished the gini index stuff. How far are
you along?
On Dec 13, 2012 1:36 PM, "Greg Caporaso" notifications@github.com wrote:

Would an extra day help?


Reply to this email directly or view it on GitHubhttps://github.com//issues/255#issuecomment-11352032.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants