Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with @ex.main and if __name__ == '__main__': #854

Open
HanGuangXin opened this issue Dec 22, 2021 · 4 comments
Open

Error with @ex.main and if __name__ == '__main__': #854

HanGuangXin opened this issue Dec 22, 2021 · 4 comments

Comments

@HanGuangXin
Copy link

When I use @ex.main and if __name__ == '__main__':, MongoObserver collect no data.

There is minimal code to reproduce my error:

from sacred import Experiment
from sacred.observers import MongoObserver
ex = Experiment('OBB_Swin')
ex.observers.append(MongoObserver(url='localhost:27017', db_name='OBB'))

@ex.main
def my_main():
    print('test')

if __name__ == '__main__':
    # ex.run_commandline()          # correct 
    # ex.run()                      # correct 
    my_main()

Looking forward to your reply!

@HanGuangXin
Copy link
Author

There are some reasons I can't use ex.run_commandline() and ex.run(). For ex.run_commandline(), It can't work with an existing argparse. And for ex.run(), it can't work with multiple GPU training (for example: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py)

@thequilo
Copy link
Collaborator

thequilo commented Jan 6, 2022

Hi @HanGuangXin! Happy new year! Unfortunately, you have to use ex.run (or ex.run_commandline) for everything to work. ex.run contains the code to set up the configuration and observers. @ex.main doesn't modify my_main, it just registers it as the default main function for ex.run.

For the multi-GPU training: what exactly is not working and do you know why?

@Guptajakala
Copy link

+1
Multi-GPU is used more and more frequently nowadays but does not work with sacred. Because the there are additional stuff in the command line to start python, just like what @HanGuangXin mentioned: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py

@BDHU
Copy link

BDHU commented Feb 22, 2023

+1
Making scared work alongside torch multiprocessing is an absolute pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants