Optimize dataset info retrieval by Irozuku · Pull Request #273 · DashAISoftware/DashAI

Irozuku · 2025-08-25T14:47:15Z

This pull request enhances the way dataset metadata is stored and retrieved, making dataset information more accessible and reducing redundant computation. The main improvements are in how metadata such as the number of rows and column names are saved and used.

When saving a dataset using save_dataset, the metadata file (splits.json) is now enriched to include the total number of rows and the list of column names, making this information directly available without needing to read the data file.
The get_dataset_info function now reads total_rows and column_names directly from the metadata file instead of computing them by reading the Arrow data file, improving efficiency and consistency.

…unctions

feat: enhance dataset metadata in save_dataset and get_dataset_info f…

541209d

…unctions

cristian-tamblay approved these changes Aug 28, 2025

View reviewed changes

cristian-tamblay merged commit 68b82e5 into develop Aug 28, 2025
4 checks passed

cristian-tamblay deleted the feat/optimize-dataset-info branch August 28, 2025 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize dataset info retrieval#273

Optimize dataset info retrieval#273
cristian-tamblay merged 1 commit into
developfrom
feat/optimize-dataset-info

Irozuku commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Irozuku commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants