New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting PickledObjectFilesystemIOManager to use UPathIOManager #10273
Converting PickledObjectFilesystemIOManager to use UPathIOManager #10273
Conversation
@danielgafni is attempting to deploy a commit to the Elementl Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
@sryza here is the new PR |
What should we do with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love to see lines of code deleted.
Overall, this change looks great.
I'm seeing a few test failures that I think might be related to these changes:
[2022-11-03T15:19:34Z] dagster_tests/core_tests/test_multiple_outputs.py::test_multiple_outputs_only_emit_one_multiproc FAILED [ 3%]
[2022-11-03T15:19:56Z] dagster_tests/core_tests/test_pipeline_execution.py::test_multiproc_reexecution_fs_storage_after_fail FAILED [ 6%]
[2022-11-03T15:32:10Z] dagster_tests/core_tests/partition_tests/test_partitioned_io_manager.py::test_partitioned_io_manager FAILED [ 66%]
Mappiong type annotation don't seem to work with Dagster out of the box (without DagsterType), so I'm changing the partitioned typing to Dict. |
Also, do you know anything about this TODO? I get an error when trying to log something for partition_key, path in paths.items():
context.log.debug(f"Loading partition from {path} using {self.__class__.__name__}")
try:
obj = self.load_from_path(context=context, path=path)
objs[partition_key] = obj
except FileNotFoundError as e:
if not allow_missing_partitions:
raise e
context.log.debug(
f"Couldn't load partition {path} and skipped it "
f"because the input metadata includes allow_missing_partitions=True"
)
# TODO: context.add_output_metadata fails in the partitioned context. this should be fixed?
return objs |
We're going to remove both I think we should ultimately handle the versioning logic in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing a few complaints from the build:
dagster_tests/core_tests/storage_tests/test_upath_io_manager.py:6: in <module>
| [2022-11-08T22:27:23Z] import pandas as pd
| [2022-11-08T22:27:23Z] E ModuleNotFoundError: No module named 'pandas'
It looks like Pandas isn't currently a dependency of the Dagster tests. I'm a tiny bit hesitant to add it because installing Pandas could increase build times. If you do think it makes a lot of sense to use Pandas here, you could add it as part of the testenv in python_modules/dagster/tox.ini.
[2022-11-08T22:23:41Z] +ENOENT: no such file or directory, open '/workdir/examples/docs_snippets/docs_snippets/concepts/io_management/upath_io_manager.py'
Did you possibly forget to commit a file?
btw, this is what I run locally to check for these kinds of errors:
cd $HOME/dagster/docs/
make mdx-format
grep -r --include \*.mdx ENOENT ./
cd ..
one more:
[2022-11-08T22:25:39Z] /workdir/python_modules/dagster/dagster/_core/storage/upath_io_manager.py:docstring of dagster._core.storage.upath_io_manager.UPathIOManager:6: WARNING: Bullet list ends without a blank line; unexpected unindent.
Thanks for all the revisions you're doing on this one!
I don't think Pandas is really needed. I just wanted to test a type different than |
This is very close! It looks like buildkite has a few outstanding complaints: This one is confusing to me because you listed universal_pathlib in Dagster's setup.py. Let me know if you can't figure this one out, and I can help.
Ran into the following error in the GraphQL tests. I think what's going on is that we're no longer doing
A couple mypy errors in the tests. Let me know if you want help figuring these out.
One of the toys is failing. This toy isn't super important to keep around, because it covers functionality that we've gotten rid of, so feel free to remove it, but I think it's worth seeing why it's failing to see if it might cause failures in some legitimate situation.
|
|
Sorry, still seeing a few issues. Building API docs. You can test this out with
Some GraphQL tests appear to still be failing for the reason above:
And a couple lint errors:
Let me know if you get burnt out on this back and forth. I'd be happy to patch up these final issues on my own if helpful. Btw, I'm pushing harder on our infra team to see if we can get buildkite exposed. |
Thanks! I'm doing fine, I was really trying to get everything ready for this release, but I can be more relaxed now since we are waiting for the next one :) Re: buildkite - it would be really nice! It's very annoying to wait for you to run the build (I'm sure it's equally annoying for you too), especially since I'm in a very different time zone. It's also easy to forget to run some checks locally. Perhaps a single make command for running all the checks could simplify the process? Another problem is with the source of errors - because the repo is already in a dirty state it's harder to identify which errors did your commits introduce. Ideally it would be nice to have all the errors around the repo be fixed and merging with failed pipelines be forbidden... but I understand it's a lot of work to do. |
Another problem is running tests - it's taking a really long time. So instead of running all the tests I usually only run them for some files which I know I'm affecting. The problem is, sometimes there are tests I don't know I'm affecting, and I can't know about it until running the CI. |
Latest build: The GraphQL error appears to be gone. Our infra folks are actively looking into exposing buildkite publicly. |
The apidoc-build errors are annoying - you need to scroll up to find them. It's this one:
|
The docs build is now running correctly! |
One last tiny issue!
Everything else looks great |
fixed |
The buildkite failures look unrelated. Going to merge this! |
Summary & Motivation
The default IO manager can now be built on top of the recently added UPathIOManager.
The
PickledObjectFilesystemIOManager
class can now be used with any filesystems.The
fs_io_manager
object, however, is still meant to be used with the local filesystemHow I Tested These Changes
Running existing tests