-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train error with Google Drive data #32
Comments
could you please try this? https://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files Loading all files at once may not be a suitable solution for low ram settings. I will try to change it to npy dataloader this weekend. |
Oh i already tried that 'ulimit -n 166000' and it ran until it ran out of memory, and I have 2Tb of RAM in this node
|
this is wired. our node has 1TB RAM and it works. i will check it again. |
Yes. This can bring faster training speed. There is always a trade-off. I will provide a script for batch loading by the end of this week. |
Hi @MrDotOne , I have changed the dataloader to npy dataloader. The data loading should cost less RAM now (at the cost of using more space on hard drive). |
I am trying to test this out before turning it over to the researchers, and i have been going over the various steps. I was able to successfully run
(medsam) [root@lri-uapps-1 MedSAM]# python utils/precompute_img_embed.py -i /data/train -o /data/Tr_emb
however the actual model seems to be failing due to too many files:
(medsam) [root@lri-uapps-1 MedSAM]# python train.py -i /data/Tr_emb --task_name SAM-ViT-B --num_epochs 1000 --batch_size 8 --lr 1e-5
Traceback (most recent call last):
File "/usr/local/MedSAM/train.py", line 83, in
train_dataset = NpzDataset(args.npz_tr_path)
File "/usr/local/MedSAM/train.py", line 24, in init
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/MedSAM/train.py", line 24, in
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/anaconda3/envs/medsam/lib/python3.10/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
OSError: [Errno 24] Too many open files: '/data/Tr_emb/Tr_000000990.npz'
(medsam) [root@lri-uapps-1 MedSAM]# ls /data/Tr_emb/ | wc -l
161857
Could this be a numpy error perhaps?
The text was updated successfully, but these errors were encountered: