Fix stale dataset cache and converter updates for Arrow-backed datasets#625
Merged
cristian-tamblay merged 4 commits intoMay 18, 2026
Merged
Conversation
…converter - Switch pa.memory_map to pa.OSFile in all four read sites to release Windows file lock (WinError 1224) so converter job can write data.arrow - Add mtime check in _FilteredTableCache.get so cache auto-invalidates when data.arrow is written by a converter job, preventing stale previews
…e cache shutil.copytree uses copy2 which preserves the source file's mtime. When deleting the only converter (no previous ones to re-run), the restored data.arrow has an older mtime than the cache entry, so the mtime-based cache invalidation never fires and the table still shows the old transformed data. Touching data.arrow after the copy ensures a fresh mtime.
- Return actual job ID from delete_converter so the frontend polls for re-run completion before refreshing (previously job_ids was always empty due to hasattr check before put()) - Refresh column types via handleStatusChange (ConverterBox path) and FormConverterSection onSuccess (job-polling path) instead of a redundant useEffect in DatasetPreviewNotebook - DatasetPreviewNotebook syncs localColumnTypes from context only, eliminating the extra type fetch on initial notebook load (3→1)
Base automatically changed from
perf/reduce-notebook-fetches
to
perf/rows-columns-database
May 18, 2026 13:33
cristian-tamblay
approved these changes
May 18, 2026
52eba7f
into
perf/rows-columns-database
33 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request improves Arrow file handling in dataset API endpoints by replacing
pa.memory_mapwithpa.OSFile, fixing converter updates not being reflected in the frontend, and adding cache invalidation based on Arrow file modification times to avoid serving stale data.Type of Change
Check all that apply like this [x]:
Changes (by file)
DashAI/back/api/datasets.py:pa.memory_mapwithpa.OSFilein:_load_and_filter_tableget_dataset_fileexport_dataset_as_csvexport_dataset_csv_by_iddata.arrowfiles and refreshes cached entries if the file has changed.osmodule to support file modification time checks.Testing (optional)
data.arrowfile.