Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on processing moto-based files within scheduled, un-locked dependency tests #198

Closed
d33bs opened this issue Apr 24, 2024 · 2 comments · Fixed by #212
Closed

Error on processing moto-based files within scheduled, un-locked dependency tests #198

d33bs opened this issue Apr 24, 2024 · 2 comments · Fixed by #212
Assignees
Labels
bug Something isn't working

Comments

@d33bs
Copy link
Member

d33bs commented Apr 24, 2024

An error occurred with two of the tests during a scheduled and un-locked dependency test earlier today. The only failing test in these cases is: tests/test_convert_threaded.py::test_convert_s3_path_csv. This seems to happen due to a missing file from the mocked moto bucket to a local file, resulting in a failed data query from DuckDB.

Reference to the job: https://github.com/cytomining/CytoTable/actions/runs/8812487779/job/24204265980

duckdb.duckdb.InvalidInputException: Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result
Error: IO Error: No files found that match the pattern "/tmp/tmpv8ak2gtt/example/2/cells.csv"
...
FAILED tests/test_convert_threaded.py::test_convert_s3_path_csv - parsl.dataflow.errors.DependencyError: <exception str() failed>
@d33bs d33bs added the bug Something isn't working label Apr 24, 2024
@d33bs
Copy link
Member Author

d33bs commented Apr 25, 2024

This turned out to be a cloudpathlib caching issue, which appears to have changed enough in recent versions enough that the garbage collector removes cached files before they can be referenced by CytoTable. I believe this was a bug present in the code prior to the updates, rather than being caused by the dependencies themselves (though cloudpathlib or related dependencies may have impacted the results).

On resolving the caching issue, there appears to be a credentialing issue from botocore.

d33bs added a commit to d33bs/CytoTable that referenced this issue Apr 25, 2024
@d33bs
Copy link
Member Author

d33bs commented Jun 6, 2024

Looking further into this I feel that there's likely something happening which doesn't play nice between moto and cloudpathlib. There are alternatives such as localstack which we could explore, but generally I think we're following a path towards a possible anti-pattern with mocked resources which is consuming too much time for the focus of the project. A reference from Software Engineering at Google in Chapter 13 under the "Real Implementations" section somewhat backs this up:

Although test doubles can be invaluable testing tools, our first choice for tests is to use the real implementations of the system under test’s dependencies; that is, the same implementations that are used in production code. Tests have higher fidelity when they execute code as it will be executed in production, and using real implementations helps accomplish this.
...
Using real implementations for dependencies makes the system under test more realistic given that all code in these real implementations will be executed in the test.

I chatted with @gwaybio about this briefly, and based on our discussion feel that removing moto to rely on a real-world test would have better outcomes. This would allow us to avoid a mocked resource dependency for testing (and related issues), as well as provide us an avenue for a larger data test (data which we don't want to store in the repository)(possibly addressing #193). Likely it'd be good to follow the patterns and suggestions on a testing resource from cytomining/pycytominer#375 .

Rough steps:

  • Remove moto dependency
  • Remove pytest fixutres related to moto
  • Modify moto-related tests to read from cellpainting-gallery or similar S3 resource.
  • Limit these renewed tests to a certain time limit within reason (possibly leveraging something like https://github.com/bloomberg/pytest-memray or other built-in features of pytest if possible)
  • Use custom markers on these renewed tests to skip on local development and be used for periodic GH Actions-based scheduled testing
  • Modify GH Actions workflows to run these tests weekly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant