Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: adds API for comparing sourcetracker results #64

Merged
merged 2 commits into from
Sep 6, 2016

Conversation

gregcaporaso
Copy link
Member

@gregcaporaso gregcaporaso commented Aug 30, 2016

This partially addresses #57 and #31.

@wdwvt1, @johnchase, @lkursell this is an API to compare sourcetracker results. This could be used for the current benchmark and optimization projects as a way to determine how similar a pair of sourcetracker results are. I'm looking for input on the API before I spend too much time more time on it.

This defines two functions compare_sinks and compare_sink_metrics that become part of the public API as of the merge of this pull request (we'll commit to alpha-level stability of these in the next release - we're free to change these until then).

compare_sinks would take two pd.DataFrame objects containing observed and expected mixing proportions and a metric to use for comparing the mixing proportions. It would return a pd.DataFrame containing metric-specific data on the similarity/difference of the mixing proportions.

compare_sink_metrics is a simple helper function that returns a list of the available metrics. This would be necessary for a QIIME 2 plugin, so interfaces can determine what the available choices are for the metric parameter of compare_sinks.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.9%) to 89.862% when pulling 7a3b194 on gregcaporaso:issue-57 into 08112ca on biota:master.

@gregcaporaso
Copy link
Member Author

@wdwvt1, @johnchase, @lkursell this is ready for review/merge whenever you guys are ready (I know you're working toward a deadline, so no rush).

@coveralls
Copy link

Coverage Status

Coverage increased (+2.2%) to 90.135% when pulling 1f91b8d on gregcaporaso:issue-57 into 08112ca on biota:master.

@wdwvt1
Copy link
Contributor

wdwvt1 commented Sep 3, 2016

@gregcaporaso - this looks very useful!

Two real comments
I think the _absolute_difference function should sum the differences in each index. So rather than returning a per-source difference, it returns a total difference. In the test you have, I think it should be:

obs_m = [.5, .25, .25]
exp_m = [.1, .8, .1]
abs_diff = .4 + .65 + .15 = 1.2

There is no test for _validate_dataframe.

One style question
Do we need to document the private functions? Although users won't be accessing them directly, developers will (or we'll want to know in the future).

@coveralls
Copy link

Coverage Status

Coverage increased (+2.1%) to 90.351% when pulling 1a948ee on gregcaporaso:issue-57 into 33f5bff on biota:master.

@wdwvt1 wdwvt1 merged commit cef980f into caporaso-lab:master Sep 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants