Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training schedule #17

Closed
luckychay opened this issue Jul 30, 2022 · 8 comments
Closed

The training schedule #17

luckychay opened this issue Jul 30, 2022 · 8 comments

Comments

@luckychay
Copy link

Dear Author,

In the paper, I see that every task is trained for 50 epochs and finetuned for 20 epochs as
13291a30151dcbb1fe48d3092ce85ce

However, in configs/OWOD_new_split.sh, I see the training schedule is following a different setting as highlighted by the red boxes.
e2489ae780a580596b7bbb8e7fb1c3a

Is there anything I missed? Looking forward to your reply. Thanks.

@ghost
Copy link

ghost commented Aug 3, 2022

I actually trained using their scripts but could not recreate their results.
For T1 (after 50 epochs):

Prev class AP50: tensor(43.3941)
Prev class Precisions50: 5.411373414254498
Prev class Recall50: 71.44982028560455
Current class AP50: tensor(22.2871)
Current class Precisions50: 1.8076766423054116
Current class Recall50: 57.50498649613736
Known AP50: tensor(32.3129)
Known Precisions50: 3.519432608981228
Known Recall50: 64.12878254613427
Unknown AP50: tensor(0.0863)
Unknown Precisions50: 0.9169071669071669
Unknown Recall50: 7.730652247380871

AND one thing to note is that they continue numbering the epochs across tasks, so for the second one it will resume at epoch 50 and train for an additional 50 epoch

@luckychay
Copy link
Author

Yeah, I did notice that they continue numbering the epochs across tasks. But even though in that case, theirs scripts are apparently different from what discribed in the paper.

@akshitac8
Copy link
Owner

Hello @luckychay @orrzohar-stanford The paper use 2 open-world splits and I have updated the repo with both splits configs. Can you please let me know which split config is causing the problem?

@ghost
Copy link

ghost commented Aug 5, 2022

Dear authors,
Thank you for responding! I have been using the new proposed data splits. After training the model for 40 epochs + 10 fine-tuning, I am getting results closer to what was reported - but still a little off. I am not sure why - I used the (unmodified) bash scripts provided (trained on a similar machine with 8 V100 GPUs/etc). Any reason you can think of?

<style> </style>
Task IDs Task 1  
  U-Recall mAP
ORE-EBUI 1.5 61.4
Ours: OW-DETR 5.7 71.5
Original codebase 3.9 71.85
Amended (40+10) 5.05 71.9

And overall:
image

@luckychay
Copy link
Author

Hello @luckychay @orrzohar-stanford The paper use 2 open-world splits and I have updated the repo with both splits configs. Can you please let me know which split config is causing the problem?

Thanks for your reply. I am using old splits from ORE, in fact no problem is causing by the config for me. I am just confused about how many epochs I should train and finetune in incremental step. I notice that your newly uploaded scripts train about 5 epochs for task2,3,4 and finetune 45,30,20 epochs respectivly. The training epochs is much less than 50 epochs. How could that happen?

I am not familiar with this part and thank you for your patience.

@Went-Liang
Copy link

Dear authors, Thank you for responding! I have been using the new proposed data splits. After training the model for 40 epochs + 10 fine-tuning, I am getting results closer to what was reported - but still a little off. I am not sure why - I used the (unmodified) bash scripts provided (trained on a similar machine with 8 V100 GPUs/etc). Any reason you can think of?

<style> </style>

Task IDs Task 1  
  U-Recall mAP
ORE-EBUI 1.5 61.4
Ours: OW-DETR 5.7 71.5
Original codebase 3.9 71.85
Amended (40+10) 5.05 71.9
And overall: image

Hello, could you share your trained models?

@akshitac8
Copy link
Owner

The weights are uploaded in the repository. According to the results you have shared they look pretty close so it just might be environment change or because of machine changes but you can always visualize and check how the unknown classes are responding to your code.

@zhongxiangzju
Copy link

Dear @akshitac8 and @orrzohar-stanford,

May I ask that how long it takes to train the model on OWOD_split_task1 using 8 V100 GPUs for 50 epoches ? I only have 2 RTX3090 GPUs, and I am estimating if it is possible and how long it may take to train the models.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants