Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue training base model #19

Open
christegho opened this issue Nov 26, 2019 · 8 comments
Open

Issue training base model #19

christegho opened this issue Nov 26, 2019 · 8 comments

Comments

@christegho
Copy link

christegho commented Nov 26, 2019

I have been trying to train a base model for some time now.

I have had issues with the version of pytorch the code was built on. 0.3.1 would not work with CUDA versions past 8.0. But my GeGorce RTX 2080 would not work with CUDA versions below 9.0.

I managed to have the code base work with PyTorch 0.4.0 and 0.4.1, with CUDA 10.1.

I have two GPUs, each with 10986MB. I managed to have the base training run for many epochs, but then my whole machine would shut down all of the sudden, through the training. I suspect this is because of my RAM.

I did have to reduce the batch size and subdivisions, to get the training to start.

But this is all to say that I am not able to get a base model, and I am wondering if there is anyone who has a model to share?

I will commit my code for PyTorch >= 0.4.0 soon, on my fork, but it would be so nice to have weights I could use.

@XinyiYS
Copy link

XinyiYS commented Dec 2, 2019

You could try my trained base model: https://drive.google.com/open?id=1CSVFhfOHmRlbUsMu_eyBCvBWn_06a9zH

@christegho
Copy link
Author

christegho commented Dec 2, 2019 via email

@XinyiYS
Copy link

XinyiYS commented Dec 2, 2019

Thanks for sharing your trained base model. This is very helpful!

No problem, Chris. Give it a go. I didn't change any setting, it should give decent results on the base classes.

@HuangLian126
Copy link

@christegho Hi, I try to train the base with torch 1.2.0 , torchvision 0.4.0 and CUDA 10.1. However, I get this error:

File "/home/hl/hl/Fewshot_Detection-master/region_loss.py", line 330, in forward
pred_boxes[0] = x.data + grid_x
RuntimeError: The size of tensor a (13) must match the size of tensor b (38870) at non-singleton dimension 3

The shape of x is torch[46,5,13,13], and the shape of x is torch[38870]. How do you fix this error?

@Fly-dream12
Copy link

Have you solved it ? @ HuangLian126

@li-yanling
Copy link

@XinyiYS @christegho Could you please share your base model? The google drive link has expired. Many thanks!

@XinyiYS
Copy link

XinyiYS commented Jun 30, 2021

@XinyiYS @christegho Could you please share your base model? The google drive link has expired. Many thanks!

Hi yanling, sorry that I have removed the model from my google drive due to storage limit. Somehow I don't have a local backup of it. Apologies. Perhaps see if Chris would be able to provide a copy.

@li-yanling
Copy link

@XinyiYS @christegho Could you please share your base model? The google drive link has expired. Many thanks!

Hi yanling, sorry that I have removed the model from my google drive due to storage limit. Somehow I don't have a local backup of it. Apologies. Perhaps see if Chris would be able to provide a copy.
Hi Xinyi, thanks for your reply:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants