train error with Google Drive data #32

MrDotOne · 2023-05-03T20:03:59Z

I am trying to test this out before turning it over to the researchers, and i have been going over the various steps. I was able to successfully run

(medsam) [root@lri-uapps-1 MedSAM]# python utils/precompute_img_embed.py -i /data/train -o /data/Tr_emb

however the actual model seems to be failing due to too many files:

(medsam) [root@lri-uapps-1 MedSAM]# python train.py -i /data/Tr_emb --task_name SAM-ViT-B --num_epochs 1000 --batch_size 8 --lr 1e-5
Traceback (most recent call last):
File "/usr/local/MedSAM/train.py", line 83, in
train_dataset = NpzDataset(args.npz_tr_path)
File "/usr/local/MedSAM/train.py", line 24, in init
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/MedSAM/train.py", line 24, in
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/anaconda3/envs/medsam/lib/python3.10/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
OSError: [Errno 24] Too many open files: '/data/Tr_emb/Tr_000000990.npz'

(medsam) [root@lri-uapps-1 MedSAM]# ls /data/Tr_emb/ | wc -l
161857

Could this be a numpy error perhaps?

JunMa11 · 2023-05-03T21:44:00Z

could you please try this?

https://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files

Loading all files at once may not be a suitable solution for low ram settings. I will try to change it to npy dataloader this weekend.

MrDotOne · 2023-05-03T21:48:44Z

Oh i already tried that 'ulimit -n 166000' and it ran until it ran out of memory, and I have 2Tb of RAM in this node

JunMa11 · 2023-05-03T22:19:26Z

this is wired. our node has 1TB RAM and it works. i will check it again.

MrDotOne · 2023-05-03T22:23:09Z

Having set ulimit over the number of files (i.e. >161k) i do get it to run, however it eats all the memory available in the node. Hope you can see this screen shot.

JunMa11 · 2023-05-04T14:37:40Z

Yes. This can bring faster training speed. There is always a trade-off.

I will provide a script for batch loading by the end of this week.

JunMa11 · 2023-05-07T03:09:03Z

Hi @MrDotOne ,

I have changed the dataloader to npy dataloader. The data loading should cost less RAM now (at the cost of using more space on hard drive).

JunMa11 added the enhancement New feature or request label May 5, 2023

JunMa11 closed this as completed May 7, 2023

JunMa11 mentioned this issue May 7, 2023

About the implementation of NpzDataset #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train error with Google Drive data #32

train error with Google Drive data #32

MrDotOne commented May 3, 2023

JunMa11 commented May 3, 2023

MrDotOne commented May 3, 2023 via email •

edited

Loading

JunMa11 commented May 3, 2023

MrDotOne commented May 3, 2023 •

edited

Loading

JunMa11 commented May 4, 2023

JunMa11 commented May 7, 2023

train error with Google Drive data #32

train error with Google Drive data #32

Comments

MrDotOne commented May 3, 2023

JunMa11 commented May 3, 2023

MrDotOne commented May 3, 2023 via email • edited Loading

JunMa11 commented May 3, 2023

MrDotOne commented May 3, 2023 • edited Loading

JunMa11 commented May 4, 2023

JunMa11 commented May 7, 2023

MrDotOne commented May 3, 2023 via email •

edited

Loading

MrDotOne commented May 3, 2023 •

edited

Loading