ENH: adds API for comparing sourcetracker results #64

gregcaporaso · 2016-08-30T00:04:53Z

This partially addresses #57 and #31.

@wdwvt1, @johnchase, @lkursell this is an API to compare sourcetracker results. This could be used for the current benchmark and optimization projects as a way to determine how similar a pair of sourcetracker results are. I'm looking for input on the API before I spend too much time more time on it.

This defines two functions compare_sinks and compare_sink_metrics that become part of the public API as of the merge of this pull request (we'll commit to alpha-level stability of these in the next release - we're free to change these until then).

compare_sinks would take two pd.DataFrame objects containing observed and expected mixing proportions and a metric to use for comparing the mixing proportions. It would return a pd.DataFrame containing metric-specific data on the similarity/difference of the mixing proportions.

compare_sink_metrics is a simple helper function that returns a list of the available metrics. This would be necessary for a QIIME 2 plugin, so interfaces can determine what the available choices are for the metric parameter of compare_sinks.

coveralls · 2016-08-30T00:29:36Z

Coverage increased (+1.9%) to 89.862% when pulling 7a3b194 on gregcaporaso:issue-57 into 08112ca on biota:master.

fixes caporaso-lab#57

gregcaporaso · 2016-08-30T19:35:15Z

@wdwvt1, @johnchase, @lkursell this is ready for review/merge whenever you guys are ready (I know you're working toward a deadline, so no rush).

coveralls · 2016-08-30T19:47:41Z

Coverage increased (+2.2%) to 90.135% when pulling 1f91b8d on gregcaporaso:issue-57 into 08112ca on biota:master.

wdwvt1 · 2016-09-03T19:09:12Z

@gregcaporaso - this looks very useful!

Two real comments
I think the _absolute_difference function should sum the differences in each index. So rather than returning a per-source difference, it returns a total difference. In the test you have, I think it should be:

obs_m = [.5, .25, .25]
exp_m = [.1, .8, .1]
abs_diff = .4 + .65 + .15 = 1.2

There is no test for _validate_dataframe.

One style question
Do we need to document the private functions? Although users won't be accessing them directly, developers will (or we'll want to know in the future).

coveralls · 2016-09-06T23:49:17Z

Coverage increased (+2.1%) to 90.351% when pulling 1a948ee on gregcaporaso:issue-57 into 33f5bff on biota:master.

ENH: adds API for comparing sourcetracker results

54a1426

fixes caporaso-lab#57

gregcaporaso force-pushed the issue-57 branch from 7a3b194 to 1f91b8d Compare August 30, 2016 19:32

gregcaporaso mentioned this pull request Aug 30, 2016

add functions for comparing sourcetracker results #57

Closed

3 tasks

API: adds compare functionality to public API

1a948ee

gregcaporaso force-pushed the issue-57 branch from 1f91b8d to 1a948ee Compare September 6, 2016 23:31

wdwvt1 merged commit cef980f into caporaso-lab:master Sep 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: adds API for comparing sourcetracker results #64

ENH: adds API for comparing sourcetracker results #64

gregcaporaso commented Aug 30, 2016 •

edited

Loading

coveralls commented Aug 30, 2016

gregcaporaso commented Aug 30, 2016

coveralls commented Aug 30, 2016

wdwvt1 commented Sep 3, 2016

coveralls commented Sep 6, 2016

ENH: adds API for comparing sourcetracker results #64

ENH: adds API for comparing sourcetracker results #64

Conversation

gregcaporaso commented Aug 30, 2016 • edited Loading

coveralls commented Aug 30, 2016

gregcaporaso commented Aug 30, 2016

coveralls commented Aug 30, 2016

wdwvt1 commented Sep 3, 2016

coveralls commented Sep 6, 2016

gregcaporaso commented Aug 30, 2016 •

edited

Loading