Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Configuration of paths to log parameters and save artifacts in an on premise installation #121

Closed
SebastianGergen opened this issue Apr 6, 2020 · 2 comments
Labels
bug Something isn't working

Comments

@SebastianGergen
Copy link

Platform & setup
Ubuntu 18.04 server, atlas on server, sending python jobs from a remote laptop to the master.

Describe the bug
When I submit a job from my remote machine to the server, and I want to use atlas to log parameters and metrics, as well as to save artifacts, I got the following error in the job detail:

Foundations WARNING: Foundations has been imported, but no default configuration file has been found. Refer to the documentation for more information. Without a default configuration file, no foundations code will be executed.

...

Foundations WARNING: Script not run with Foundations.
Foundations WARNING: Cannot save artifact outside of job.
Foundations WARNING: Cannot save artifact outside of job.

I figured, that when I modify my remote.config.yaml on the development machine from the username (here: foo)
'''
cache_config:
end_point: /cache_end_point
container_config_root: /home/foo/.foundations/config/local_docker_scheduler/worker_config
job_deployment_env: local_docker_scheduler_plugin
job_results_root: /home/foo/.foundations/job_data
scheduler_url: http://192.168.168.81:5558
job_store_dir_root: /home/foo/.foundations/local_docker_scheduler/work_dir
working_dir_root: /atlas_work_dir
'''
to the username of the user on the master/server (here: bar):
'''
cache_config:
end_point: /cache_end_point
container_config_root: /home/bar/.foundations/config/local_docker_scheduler/worker_config
job_deployment_env: local_docker_scheduler_plugin
job_results_root: /home/bar/.foundations/job_data
scheduler_url: http://192.168.168.81:5558
job_store_dir_root: /home/bar/.foundations/local_docker_scheduler/work_dir
working_dir_root: /atlas_work_dir
'''
then the logs and artifacts are stored correctly and are available in the GUI.

Even if the ML-developer would know the username of the user on the server, this configuration detail is not described anywhere and might not be desired like that.
Or am I missing something? :)

@SebastianGergen SebastianGergen added the bug Something isn't working label Apr 6, 2020
@SebastianGergen SebastianGergen changed the title [BUG] Configuration [BUG] Configuration of paths to log parameters and save artifacts in an on premise installation Apr 6, 2020
@pippinlee
Copy link

Thanks, @SebastianGergen––first, this reminds me that we need to update those warnings to be the consistent! Specifically "Script not run with Foundations" should be "Cannot save {log_param/log_metric} outside of job." I'll create a separate ticket for that.

Can you confirm the following: when you setup Atlas on the master/server you should then be able to find the relevant config at ~/.foundations/config/local_docker_scheduler/worker_config/submission/scheduler.config.yaml, see this reference.

This config can then be used by any user of the system and placed in ~/.foundations/config/submissions/ to then send jobs to the master/server. Let me know if this helps clarify.

@SebastianGergen
Copy link
Author

Hello @pippinlee, my bad, you are right. I actually tried atlas first on a virtual machine as server, were the username actually was the same as on my laptop. Then, moving to the real server with the framework, where the username is different, I just modified the previously generated remote.config.yaml, instead of copying the new one form the server - my bad. Thanks for helping so quickly! I think this is closed and not an issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants