Skip to content

Optimize dataset info retrieval#273

Merged
cristian-tamblay merged 1 commit into
developfrom
feat/optimize-dataset-info
Aug 28, 2025
Merged

Optimize dataset info retrieval#273
cristian-tamblay merged 1 commit into
developfrom
feat/optimize-dataset-info

Conversation

@Irozuku
Copy link
Copy Markdown
Collaborator

@Irozuku Irozuku commented Aug 25, 2025

This pull request enhances the way dataset metadata is stored and retrieved, making dataset information more accessible and reducing redundant computation. The main improvements are in how metadata such as the number of rows and column names are saved and used.

  • When saving a dataset using save_dataset, the metadata file (splits.json) is now enriched to include the total number of rows and the list of column names, making this information directly available without needing to read the data file.

  • The get_dataset_info function now reads total_rows and column_names directly from the metadata file instead of computing them by reading the Arrow data file, improving efficiency and consistency.

@cristian-tamblay cristian-tamblay merged commit 68b82e5 into develop Aug 28, 2025
4 checks passed
@cristian-tamblay cristian-tamblay deleted the feat/optimize-dataset-info branch August 28, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants