Handle large CSV files + async preprocessing by MatiasArriola · Pull Request #363 · EyeSeeTea/glass-dev

MatiasArriola · 2025-11-17T17:40:27Z

📌 References

Issue: Closes https://app.clickup.com/t/8699wradu #8699wradu

📝 Implementation

Create new script yarn run async-preprocessing
- replicated from async-uploads
- only triggered for non-CSV files that exceed fileSizeLimit value stored in dataStore glass/general
- This process validates headers, computes rows and specimen fields, udpdates the dataStore, and moves to the async-uploads queue.
For validation, always read CSV in chunks using papaparse instead of loading large files with the XLSX library
- Methods included directly in CSVUtils instead of creating a custom repository object.
Refactor: extract types ValidationResult, ValidationResultWithSpecimens
Changed AsyncImportRISIndividualFungalFile to make async-uploads work for this file
- now the chunking is made as the first step. Loading a 500mb file with XLSX was making the process idle and consume a lot of memory.
- First a pass of validations in chunks, and then we make another pass in chunks for importing the records.
- We need to evaluate impact of not loading all the rows at once for the program rules validation and make sure there are no rules that depends on other rows outside the chunk (I don't think so, but just in case).

Requires dataStore changes otherwise it fall backs to defaults

TODO:

For non-CSV files marked to be preprocessed, handle it in the UI (show some message, check the status is correctly displayed in the files grid)
async-uploads and async-deletions: review performance for large CSV files and implement CSV reading in chunks if needed
async-uploads will fail when saving a considerable amount of individual import reports. For example for a file with 3M rows, trying to JSON.stringify an array of 10k validation reports will fail (not to mention the space required in dataStore for that). We need a change here to save the summaries in other way.

📹 Screenshots/Screen capture

🔥 Testing

In my local setup, I had to make the following changes to allow increasing the file size upload limit:
- dhis_2.conf: max.file_upload_size = 5120000000
- /usr/lib/python3.13/site-packages/d2_docker/config/nginx.conf (or check the proper path inspecting the d2-docker volume for nginx): client_max_body_size 1000m;

refactor RISIndividualFungalDataRepository.validate to accept a Blob, refactor SpreadsheetXlsxDataSource

refactor: reuse types for validation results make specimens and rows optional

…). Set PREPROCESSING_FAILED

…to Uploads

…SIndiv repo

anagperal and others added 11 commits October 30, 2025 10:31

WiP async preprocessing feature

a05ebce

WIP implement manageAsyncPreprocess

bac9f7f

refactor RISIndividualFungalDataRepository.validate to accept a Blob, refactor SpreadsheetXlsxDataSource

WIP Implement needsPreprocessing validationResult

8b20457

refactor: reuse types for validation results make specimens and rows optional

Mark uploads to be preprocessed

edba37a

Fix handling of File in node. Handle error (increase attempts, delete…

c46f5f3

…). Set PREPROCESSING_FAILED

Chunked validation for CSV files (without async-preprocessing)

ccb6758

Remove unused imports

124bfc9

Remove commented code

a02311f

Handle async preprocessing scenario, add async preprocessing entries …

a945645

…to Uploads

Add utilities to parse a csv in chunks, add getFromBlobInChunks to RI…

415b123

…SIndiv repo

Process AsyncImportRISIndividualFungalFile in chunks

0e8361b

anagperal mentioned this pull request Dec 16, 2025

AMR indiv and fungal large files uploads management #367

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle large CSV files + async preprocessing#363

Handle large CSV files + async preprocessing#363
MatiasArriola wants to merge 11 commits into
developmentfrom
feature/async-preprocessing

MatiasArriola commented Nov 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MatiasArriola commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 References

📝 Implementation

📹 Screenshots/Screen capture

🔥 Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MatiasArriola commented Nov 17, 2025 •

edited

Loading