Skip to content

fix: refactor get_dataset_file to obtain total rows from metadata#287

Merged
cristian-tamblay merged 1 commit into
developfrom
fix/total-rows-table
Sep 8, 2025
Merged

fix: refactor get_dataset_file to obtain total rows from metadata#287
cristian-tamblay merged 1 commit into
developfrom
fix/total-rows-table

Conversation

@Irozuku
Copy link
Copy Markdown
Collaborator

@Irozuku Irozuku commented Sep 5, 2025

This pull request refactors how the total number of rows is determined in the get_dataset_file endpoint. Instead of incrementally counting rows while reading batches from the Arrow file, it now retrieves the total row count using the get_dataset_info function, which simplifies the code and fixes the count of rows

Dataset row count calculation:

  • Removed the on-the-fly calculation of total_rows during batch iteration, eliminating the need to incrementally count rows while processing the file. [1] [2]
  • Added a call to get_dataset_info to retrieve the total number of rows after batch processing, streamlining the logic and making the code easier to maintain.

Before

image

After

image

@cristian-tamblay cristian-tamblay merged commit ace328b into develop Sep 8, 2025
5 checks passed
@cristian-tamblay cristian-tamblay deleted the fix/total-rows-table branch September 8, 2025 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants