Error on processing `moto`-based files within scheduled, un-locked dependency tests #198

d33bs · 2024-04-24T16:45:56Z

An error occurred with two of the tests during a scheduled and un-locked dependency test earlier today. The only failing test in these cases is: tests/test_convert_threaded.py::test_convert_s3_path_csv. This seems to happen due to a missing file from the mocked moto bucket to a local file, resulting in a failed data query from DuckDB.

Reference to the job: https://github.com/cytomining/CytoTable/actions/runs/8812487779/job/24204265980

duckdb.duckdb.InvalidInputException: Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result
Error: IO Error: No files found that match the pattern "/tmp/tmpv8ak2gtt/example/2/cells.csv"
...
FAILED tests/test_convert_threaded.py::test_convert_s3_path_csv - parsl.dataflow.errors.DependencyError: <exception str() failed>

The text was updated successfully, but these errors were encountered:

d33bs · 2024-04-25T23:01:58Z

This turned out to be a cloudpathlib caching issue, which appears to have changed enough in recent versions enough that the garbage collector removes cached files before they can be referenced by CytoTable. I believe this was a bug present in the code prior to the updates, rather than being caused by the dependencies themselves (though cloudpathlib or related dependencies may have impacted the results).

On resolving the caching issue, there appears to be a credentialing issue from botocore.

referencing cytomining#198

d33bs · 2024-06-06T16:31:54Z

Looking further into this I feel that there's likely something happening which doesn't play nice between moto and cloudpathlib. There are alternatives such as localstack which we could explore, but generally I think we're following a path towards a possible anti-pattern with mocked resources which is consuming too much time for the focus of the project. A reference from Software Engineering at Google in Chapter 13 under the "Real Implementations" section somewhat backs this up:

Although test doubles can be invaluable testing tools, our first choice for tests is to use the real implementations of the system under test’s dependencies; that is, the same implementations that are used in production code. Tests have higher fidelity when they execute code as it will be executed in production, and using real implementations helps accomplish this.
...
Using real implementations for dependencies makes the system under test more realistic given that all code in these real implementations will be executed in the test.

I chatted with @gwaybio about this briefly, and based on our discussion feel that removing moto to rely on a real-world test would have better outcomes. This would allow us to avoid a mocked resource dependency for testing (and related issues), as well as provide us an avenue for a larger data test (data which we don't want to store in the repository)(possibly addressing #193). Likely it'd be good to follow the patterns and suggestions on a testing resource from cytomining/pycytominer#375 .

Rough steps:

Remove moto dependency
Remove pytest fixutres related to moto
Modify moto-related tests to read from cellpainting-gallery or similar S3 resource.
Limit these renewed tests to a certain time limit within reason (possibly leveraging something like https://github.com/bloomberg/pytest-memray or other built-in features of pytest if possible)
Use custom markers on these renewed tests to skip on local development and be used for periodic GH Actions-based scheduled testing
Modify GH Actions workflows to run these tests weekly.

d33bs added the bug Something isn't working label Apr 24, 2024

d33bs added a commit to d33bs/CytoTable that referenced this issue Apr 25, 2024

update tests for persistent caching

4714cf8

referencing cytomining#198

d33bs self-assigned this Jun 6, 2024

d33bs mentioned this issue Jun 11, 2024

Increase sorting scalability via CytoTable metadata columns #204

Merged

13 tasks

d33bs mentioned this issue Jun 25, 2024

Use real AWS S3 data tests and apply related fixes #212

Merged

13 tasks

d33bs closed this as completed in #212 Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on processing `moto`-based files within scheduled, un-locked dependency tests #198

Error on processing `moto`-based files within scheduled, un-locked dependency tests #198

d33bs commented Apr 24, 2024

d33bs commented Apr 25, 2024 •

edited

Loading

d33bs commented Jun 6, 2024 •

edited

Loading

Error on processing moto-based files within scheduled, un-locked dependency tests #198

Error on processing moto-based files within scheduled, un-locked dependency tests #198

Comments

d33bs commented Apr 24, 2024

d33bs commented Apr 25, 2024 • edited Loading

d33bs commented Jun 6, 2024 • edited Loading

Error on processing `moto`-based files within scheduled, un-locked dependency tests #198

Error on processing `moto`-based files within scheduled, un-locked dependency tests #198

d33bs commented Apr 25, 2024 •

edited

Loading

d33bs commented Jun 6, 2024 •

edited

Loading