Sync series and dataframe in statistical_inefficieny() #199

ptmerz · 2022-06-22T07:30:01Z

This adds a number of tests for slicing() and statistical_inefficieny():

Test that slicing() respects upper and lower time bounds (currently
passes)
Test that statistical_inefficieny() respects upper and lower time bounds
when it is used without series to subsample (currently passes)
Test that statistical_inefficieny() respects upper and lower time bounds
when it is used with series to subsample (currently fails)
Test that first using slicing() on the data frame, then
statistical_inefficieny() without time bounds yields the same results as
a single call to statistical_inefficieny() with time bounds (currently
fails)

It then makes sure that both the subsampling series and the dataframe are sliced
when calling statistical_inefficieny(). This ensure that the series and
the dataframe are in sync before the subsampled indices of the series
are applied to the dataframe. This fixes the above tests.

Fixes #198

This adds a number of tests for slicing() and statistical_inefficieny(): * Test that slicing() respects upper and lower time bounds (currently passes) * Test that statistical_inefficieny() respects upper and lower time bounds when it is used without series to subsample (currently passes) * Test that statistical_inefficieny() respects upper and lower time bounds when it is used without series to subsample (currently fails) * Test that first using slicing() on the data frame, then statistical_inefficieny() without time bounds yields the same results as a single call to statistical_inefficieny() with time bounds (currently fails) Refs alchemistry#198

Make sure that both the subsampling series and the dataframe are sliced when calling statistical_inefficieny(). This ensure that the series and the dataframe are in sync before the subsampled indices of the series are applied to the dataframe. Fixes alchemistry#198

codecov · 2022-06-22T07:36:22Z

Codecov Report

Merging #199 (cc122a8) into master (72622c9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #199   +/-   ##
=======================================
  Coverage   98.15%   98.15%           
=======================================
  Files          24       24           
  Lines        1352     1353    +1     
  Branches      295      295           
=======================================
+ Hits         1327     1328    +1     
  Misses          5        5           
  Partials       20       20

Impacted Files	Coverage Δ
src/alchemlyb/preprocessing/subsampling.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72622c9...cc122a8. Read the comment docs.

xiki-tempula

Many thanks for the contribution. This fix is very important.
I think the test could be simplified as bit using the @pytest.mark.parametrize

You could do something like

@pytest.mark.parametrize('data', (gmx_benzene_dHdl(), gmx_benzene_u_nk())
@pytest.mark.parametrize('test_series', (True, False))
def test_lower_and_upper_bound_slicer(self, data, test_series):
        """
        Test that the lower and upper time is respected when using statistical_inefficiency
        without a series. In this case, statistical_inefficiency should behave like slicing
        """
        lower, upper = 1000, 34000
        original_length = len(data)
        # Check that the input data is appropriate for the test
        assert any(data.reset_index()['time'] < lower)
        assert any(data.reset_index()['time'] > upper)

        if test_series:
           series=data.sum(axis=1)
        else:
           series=None

        # Slice data, and check that we don't observe times outside
        # the prescribed range
        sliced = self.slicer(data,
                             series= series,
                             lower=lower,
                             upper=upper,
                             step=5)
        assert all(sliced.reset_index()['time'] >= lower)
        assert all(sliced.reset_index()['time'] <= upper)

        # Make sure we didn't change input data
        assert len(data) == original_length

Then for test_slicing_inefficiency_equivalence just strip away all the checking.

src/alchemlyb/tests/test_preprocessing.py

xiki-tempula · 2022-06-24T09:20:13Z

src/alchemlyb/tests/test_preprocessing.py

+        # Slice data, and check that we don't observe times outside
+        # the prescribed range
+        sliced = self.slicer(data,
+                             series=None,


Sorry, I didn't quite see how is this different from test_lower_and_upper_bound as the series is default to None?

Well, this test class tests statistical_inefficiency(...), the one above tests slicing(...).

So this tests makes sure that statistical_inefficiency respects the upper and lower bounds when it's used without series. The test above makes sure that slicing respects the upper and lower bounds.

This test could be rewritten as equivalence between statistical_inefficiency and slicing, if you prefer.

@ptmerz Why do you need to check slicing when you are changing the code in statistical_inefficiency?

I don't need to. While investigating the bug I reported I noticed that there was no test for ranges in slicing or in statistical_inefficiency, which made it possible for the bug in statistical_inefficiency to go unnoticed until now. I thought it would be helpful to broaden the test coverage also for slicing to avoid any future bugs. Do you prefer me to remove them?

src/alchemlyb/tests/test_preprocessing.py

ptmerz · 2022-06-29T08:43:08Z

src/alchemlyb/tests/test_preprocessing.py

+        # Slice data, and check that we don't observe times outside
+        # the prescribed range
+        sliced = self.slicer(data,
+                             series=None,


Well, this test class tests statistical_inefficiency(...), the one above tests slicing(...).

So this tests makes sure that statistical_inefficiency respects the upper and lower bounds when it's used without series. The test above makes sure that slicing respects the upper and lower bounds.

This test could be rewritten as equivalence between statistical_inefficiency and slicing, if you prefer.

src/alchemlyb/tests/test_preprocessing.py

ptmerz · 2022-06-29T08:44:49Z

src/alchemlyb/tests/test_preprocessing.py

+        assert all(sliced.reset_index()['time'] >= lower)
+        assert all(sliced.reset_index()['time'] <= upper)
+        # Make sure we didn't change input data
+        assert len(data) == original_length


Note: We should treat this in the same way as the test above (see comment line 92).

@ptmerz It might be good to abstract out the core of these three tests out as fixture and use parametrize to go through them

something like

@pytest.mark.parametrize( 'output_data,original_length', [ (lazy_fixture('inefficiency'), 100), (lazy_fixture('slicing'), 1000), ] def test_original_length(output_data, original_length): assert len(data) == original_length

src/alchemlyb/tests/test_preprocessing.py

ptmerz · 2022-06-29T09:01:39Z

I think the test could be simplified as bit using the @pytest.mark.parametrize

Agreed that this could unify the second and third added test (TestStatisticalInefficiency. test_lower_and_upper_bound_slicer and TestStatisticalInefficiency.test_lower_and_upper_bound_inefficiency). Unless we decide to rewrite TestStatisticalInefficiency. test_lower_and_upper_bound_slicer as an equivalence between statistical_inefficiency and slicing, of course.

So I'd suggest to wait until we have agreed on how to proceed with the remaining points raised, and I will implement this simplification if it's still viable!

orbeckst

Overall, looks good to me — just small comments on the tests.

src/alchemlyb/tests/test_preprocessing.py

orbeckst · 2022-06-30T09:39:15Z

src/alchemlyb/tests/test_preprocessing.py

+                                 (gmx_benzene_dHdl(), 1000, 34000),
+                                 (gmx_benzene_u_nk(), 1000, 34000),


It's not great to call the data provider functions here because this adds to the time during which tests are gathered in serial as opposed during test execution (which can be parallelized). I know that we already have plenty examples of doing exactly that elsewhere so it's really not your fault.

I am just wondering if there are better ways to do this, e.g., passing the function itself as data_generator and then calling it as data = data_generator() might be easiest. It would be better if we could generate them as fixtures (but parametrized fixtures are still awkward IIRC).

@orbeckst The parametrized fixtures could be made easy with pytest-lazy-fixture

I have posted an example here as is noted in their website https://pypi.org/project/pytest-lazy-fixture/. This is conda installable as well.

@pytest.fixture(params=[ pytest.lazy_fixture('one'), pytest.lazy_fixture('two') ]) def some(request): return request.param @pytest.fixture def one(): return 1 @pytest.fixture def two(): return 2

This makes a lot of sense @orbeckst, I wasn't aware of this issue. Unfortunately I haven't had time to look into this yet. Current version therefore still has the greedy evaluation.

lazy_fixture looks like a convenient solution. I looked at the package and I am not sure how well maintained it is. There's a commit from Jan 2022 in https://github.com/tvorog/pytest-lazy-fixture but the pypi page https://pypi.org/project/pytest-lazy-fixture/ states that the highest supported Python version is 3.7. @xiki-tempula , do you have experience with it in production code? It would be annoying to rewrite quite a few of our tests, just to discover that it breaks with a new release of pytest...

I had a look around in our tests and we do have examples of evaluating the loading functions later: see e.g.

alchemlyb/src/alchemlyb/tests/test_fep_estimators.py

Lines 145 to 168 in 72622c9

class TestMBAR(FEPestimatorMixin):

"""Tests for MBAR.

"""

cls = MBAR

@pytest.fixture(scope="class",

params=[(gmx_benzene_coul_u_nk, 3.041, 0.02088),

(gmx_benzene_vdw_u_nk, -3.007, 0.04519),

(gmx_expanded_ensemble_case_1, 75.923, 0.14124),

(gmx_expanded_ensemble_case_2, 75.915, 0.14372),

(gmx_expanded_ensemble_case_3, 76.173, 0.11345),

(gmx_water_particle_with_total_energy, -11.680, 0.083655),

(gmx_water_particle_with_potential_energy, -11.675, 0.083589),

(gmx_water_particle_without_energy, -11.654, 0.083415),

(amber_bace_example_complex_vdw, 2.40200, 0.0618453),

(gomc_benzene_u_nk, -0.79994, 0.091579),

])

def X_delta_f(self, request):

get_unk, E, dE = request.param

return get_unk(), E, dE

def test_mbar(self, X_delta_f):

self.compare_delta_f(X_delta_f)

This is not too bad.

I raised #206 for fixing the other instances of upfront data loading.

@orbeckst yes, I used the lazy_fixture in our production code but the reason that I use it is that it was there. It works fine for me but I do share your concern about it not being updated for years.

On the other hand, one could also use the official pytest solution

@pytest.fixture(params=['dir1_fixture', 'dir2_fixture']) def dirname(request): return request.getfixturevalue(request.param)

Then let's use the pytest-internal solution!

I'll update #206.

This looks straightforward

@pytest.mark.parametrize(('dataloader', 'size'), [('gmx_benzene_dHdl', 661), ('gmx_benzene_u_nk', 661)]) def test_basic_slicing(self, dataloader, size, request): data = request.getfixturevalue(dataloader) assert len(self.slicer(data, lower=1000, upper=34000, step=5)) == size

src/alchemlyb/tests/test_preprocessing.py

* Extract test for unchanged data in separate test * Parametrize tests of statistical_inefficiency to merge cases which use `series` with the ones that don't * Parametrize all tests to use `conservative` `True` and `False` * Remove check for unchanged data that are now unnecessary

ptmerz · 2022-07-04T11:31:19Z

@xiki-tempula @orbeckst

Thank you for your reviews. I did a first pass, which should address most of your comments. I'm copying the git log here as a summary:

Extract input sanity check in separate function
Extract test for unchanged data in separate test
Parametrize tests of statistical_inefficiency to merge cases which use series with the ones that don't
Parametrize all tests to use conservative True and False
Remove check for unchanged data that are now unnecessary (in the last equivalence test, since we now know that all operations don't change the input data)

This doesn't yet address the issue with greedy parametrization, I unfortunately did not have time to look into this yet.

xiki-tempula

LGTM. Thanks for the work and for addressing the comments.

orbeckst

Thanks a lot for all the work @ptmerz and explaining your choices to us — very much appreciated!

After looking at the up-front loading I'd ask you to change your code to load during run time and not set up. I posted an example from our code that should make it relatively straightforward to implement.

I will raise an issue to eventually change the other parts of the tests in a similar fashion.
Please add .. versionchanged:: 0.7.0 entries to the docs to summarize the change in behavior. We count it as a bug fix but because people may have unwittingly relied on the old behavior and produced results with it, we want to be clear that numbers will change.
Please also add yourself to AUTHORS and add an entry to CHANGES.

Thank you!!

ptmerz · 2022-07-07T10:54:19Z

@orbeckst , I uploaded new commits that should hopefully address your points. Note that currently, the data loading functions had to be duplicated, as they can't be normal functions and fixtures at the same time. If after resolving #206, you only use them as fixtures, you could remove this duplication.

orbeckst

I have some small changes that I will apply directly. Everything looks really good!

One source code line change, lots of other stuff around it ... but that's what makes the difference for a well-tested library. Thank you for your willingness to see the whole PR through to the end! We very much appreciated your contribution.

CHANGES

src/alchemlyb/preprocessing/subsampling.py

src/alchemlyb/tests/test_preprocessing.py

ptmerz added 2 commits June 22, 2022 09:11

orbeckst assigned xiki-tempula Jun 23, 2022

xiki-tempula requested changes Jun 24, 2022

View reviewed changes

Merge branch 'master' into sync-subsampled-dataframe

d0fdc0a

ptmerz commented Jun 29, 2022

View reviewed changes

Merge branch 'master' into sync-subsampled-dataframe

248f216

orbeckst requested changes Jun 30, 2022

View reviewed changes

Remove conservative keyword where it didn't belong

32b82d2

xiki-tempula approved these changes Jul 6, 2022

View reviewed changes

orbeckst requested changes Jul 6, 2022

View reviewed changes

orbeckst mentioned this pull request Jul 6, 2022

[TST] avoid evaluating data loader functions in parametrized tests at set-up time #206

Closed

2 tasks

ptmerz added 3 commits July 7, 2022 11:36

Use pytest fixture to load data at test time

192bb46

Add notes to docstring, CHANGES, and AUTHORS

816910c

Use fixtures properly

92796d7

orbeckst approved these changes Jul 7, 2022

View reviewed changes

CHANGES Outdated Show resolved Hide resolved

src/alchemlyb/preprocessing/subsampling.py Outdated Show resolved Hide resolved

src/alchemlyb/tests/test_preprocessing.py Outdated Show resolved Hide resolved

src/alchemlyb/tests/test_preprocessing.py Show resolved Hide resolved

Apply suggestions from code review

cc122a8

orbeckst merged commit bc39ef3 into alchemistry:master Jul 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync series and dataframe in statistical_inefficieny() #199

Sync series and dataframe in statistical_inefficieny() #199

ptmerz commented Jun 22, 2022 •

edited

codecov bot commented Jun 22, 2022 •

edited

xiki-tempula left a comment •

edited

xiki-tempula Jun 24, 2022

ptmerz Jun 29, 2022

xiki-tempula Jul 3, 2022

ptmerz Jul 4, 2022

ptmerz Jun 29, 2022

ptmerz Jun 29, 2022

xiki-tempula Jul 3, 2022 •

edited

ptmerz commented Jun 29, 2022

orbeckst left a comment

orbeckst Jun 30, 2022

xiki-tempula Jul 3, 2022

ptmerz Jul 4, 2022

orbeckst Jul 6, 2022

orbeckst Jul 6, 2022

orbeckst Jul 6, 2022

xiki-tempula Jul 6, 2022

orbeckst Jul 6, 2022

orbeckst Jul 6, 2022

ptmerz commented Jul 4, 2022 •

edited

xiki-tempula left a comment

orbeckst left a comment •

edited

ptmerz commented Jul 7, 2022

orbeckst left a comment

		(gmx_benzene_dHdl(), 1000, 34000),
		(gmx_benzene_u_nk(), 1000, 34000),

	class TestMBAR(FEPestimatorMixin):
	"""Tests for MBAR.

	"""
	cls = MBAR

	@pytest.fixture(scope="class",
	params=[(gmx_benzene_coul_u_nk, 3.041, 0.02088),
	(gmx_benzene_vdw_u_nk, -3.007, 0.04519),
	(gmx_expanded_ensemble_case_1, 75.923, 0.14124),
	(gmx_expanded_ensemble_case_2, 75.915, 0.14372),
	(gmx_expanded_ensemble_case_3, 76.173, 0.11345),
	(gmx_water_particle_with_total_energy, -11.680, 0.083655),
	(gmx_water_particle_with_potential_energy, -11.675, 0.083589),
	(gmx_water_particle_without_energy, -11.654, 0.083415),
	(amber_bace_example_complex_vdw, 2.40200, 0.0618453),
	(gomc_benzene_u_nk, -0.79994, 0.091579),
	])
	def X_delta_f(self, request):
	get_unk, E, dE = request.param
	return get_unk(), E, dE

	def test_mbar(self, X_delta_f):
	self.compare_delta_f(X_delta_f)

Sync series and dataframe in statistical_inefficieny() #199

Sync series and dataframe in statistical_inefficieny() #199

Conversation

ptmerz commented Jun 22, 2022 • edited

codecov bot commented Jun 22, 2022 • edited

Codecov Report

xiki-tempula left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiki-tempula Jul 3, 2022 • edited

Choose a reason for hiding this comment

ptmerz commented Jun 29, 2022

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptmerz commented Jul 4, 2022 • edited

xiki-tempula left a comment

Choose a reason for hiding this comment

orbeckst left a comment • edited

Choose a reason for hiding this comment

ptmerz commented Jul 7, 2022

orbeckst left a comment

Choose a reason for hiding this comment

ptmerz commented Jun 22, 2022 •

edited

codecov bot commented Jun 22, 2022 •

edited

xiki-tempula left a comment •

edited

xiki-tempula Jul 3, 2022 •

edited

ptmerz commented Jul 4, 2022 •

edited

orbeckst left a comment •

edited