Skip to content

Call fs.invalidate_cache in to_parquet#8994

Merged
jcrist merged 2 commits intodask:mainfrom
jcrist:invalidate-cache-to-parquet
Apr 28, 2022
Merged

Call fs.invalidate_cache in to_parquet#8994
jcrist merged 2 commits intodask:mainfrom
jcrist:invalidate-cache-to-parquet

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Apr 28, 2022

Invalidate the client-side filesystem listings cache before returning
from to_parquet. This helps ensure that doing a write followed by an
immediate read works - without it the filesystem might use outdated
cache information when reading recently written files resulting in
FileNotFound errors.

There's not really a good way to test this, so using a mock for now.

Fixes #8028
Fixes #7965

  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

Invalidate the client-side filesystem listings cache before returning
from `to_parquet`. This helps ensure that doing a write followed by an
immediate read works - without it the filesystem might use outdated
cache information when reading recently written files resulting in
`FileNotFound` errors.

There's not really a good way to test this, so using a mock for now.
Copy link
Copy Markdown
Contributor

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jcrist, this looks like it will fix the issue. I agree about the test, hard to check otherwise.

@jcrist jcrist merged commit 6999390 into dask:main Apr 28, 2022
@jcrist jcrist deleted the invalidate-cache-to-parquet branch April 28, 2022 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet filepath parsing error Dask Dataframe to_parquet S3 repeated append broken

2 participants