Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AsyncFileSystem monkeypatching breaking 3rd-party file systems #1241

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

provinzkraut
Copy link
Contributor

The monkeypatching done for AsyncFileSystem currently breaks 3rd party AsyncFileSystem because it doesn't use cooperative inheritance, so not __init__s in a classes MRO are called when subclassing AsyncFileSystem, breaking filesystems like adlfs.

This PR fixes that by adding the missing super().__init__ call.

@provinzkraut
Copy link
Contributor Author

After reviewing the code, it's not clear to my why the AsyncFileSystem is patched in the first place, as it doesn't seem to be used here.

Another note, maybe the patching should be opt-in for users, as it might be unexpected that (transient) dependencies break other installed packages simply by importing something.

@gutzbenj
Copy link
Member

gutzbenj commented Mar 7, 2024

Dear @provinzkraut ,

the patching has a long story you could break down to the following problems:

  • historical 1_minute precipitation data [1] is split into year folders with dynamic filenames that contain the date range of the data
  • thus we have to create an index of all roughly thirty folders (one for each year) which takes a lot of time
  • to prevent that we wanted to create a caching system for the index
  • this doesn't exist in fsspec so we had to patch it
  • I also created a PR at [2] but will have to go back there first

What do you think about index caching? And would you like to work on that as well on fsspec side? The PR currently needs to be fixed.

[1] https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/
[2] fsspec/filesystem_spec#895

@gutzbenj gutzbenj merged commit 54e60a7 into earthobservations:main Mar 8, 2024
7 of 15 checks passed
@provinzkraut
Copy link
Contributor Author

Thanks for the detailed explanation @gutzbenj!

One thing I don't fully understand though is this part:

this doesn't exist in fsspec so we had to patch it

You're patching the HTTPFileSystem, but from briefly looking through the codebase, I don't see anything from preventing you from just using your own HTTPFileSystem subclass without monkeypatching the library. At a first glance, this seems like a convenient solution. Is there a reason why this can't be done?

@gutzbenj
Copy link
Member

gutzbenj commented Mar 8, 2024

The reason lays here: https://github.com/fsspec/filesystem_spec/blob/1dfed9d9c23aa30b19bad536a578cda6addb7de0/fsspec/spec.py#L149
The self.dircache component is set in the base class but can't be configured from subclasses. Anyhow as the caching doesn't seem to work atm, I will remove the monkeypatch and look into working on the PR at fsspec itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants