You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You load all the arrow files into memory. The pre-training data have hundreds of gigabytes. Is it possible that this may cause out-of-memory issue? Or does this implementation assume large machine memory?
Thanks,
The text was updated successfully, but these errors were encountered:
Apache Arrow's read_all() function is actually doing a lazy loading, so there will be no OOM issue.
Though if you call the .to_pandas() method, then Arrow will load the dataset eagerly and you will face the OOM issue.
Hello,
I have read through your code, but haven't run the code yet. One question about the dataloader implementation. According to
https://github.com/dandelin/ViLT/blob/master/vilt/datasets/base_dataset.py#L43
You load all the arrow files into memory. The pre-training data have hundreds of gigabytes. Is it possible that this may cause out-of-memory issue? Or does this implementation assume large machine memory?
Thanks,
The text was updated successfully, but these errors were encountered: