Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloning local (conda) venv #8

Closed
Zeko403 opened this issue Feb 27, 2020 · 8 comments
Closed

Cloning local (conda) venv #8

Zeko403 opened this issue Feb 27, 2020 · 8 comments

Comments

@Zeko403
Copy link

Zeko403 commented Feb 27, 2020

Hi all I have been trying out trains-agent for a while now and I was wondering if there is an option for it to clone the venv from the preexisting conda venv?
So for example, I have a few projects I have been working on, each having their own virtual environment made in conda. Could these project's environments be somehow cloned to a trains-agent so that it doesn't redownload the whole requirements and git folders etc.?
Maybe this option already exists but I might have missed it.

Thanks!

@bmartinn
Copy link
Member

Hi @Zeko403 ,

When executing your code "manually" inside a conda environment , the trains package records all the environment dependencies (python packages and versions).
Then you can clone the experiment in the UI (and if needed edit the packages, see Execution -> "Installed Packages")

If you want the trains-agent to use conda instead of the default pip package manager, simply edit the following line in the ~/trains.conf on the agent's machine :
Change this line from pip to conda and rerun the trains-agent.

The trains-agent will now use conda when setting the environment for all the experiments it is executing.

Now you can simple enqueue the experiment in the execution queue, and the trains-agent should replicate the entire conda environment for you.

@Zeko403
Copy link
Author

Zeko403 commented Feb 28, 2020

Hi @bmartinn

I understand it clones the environment dependencies, but I was wondering if this cloning could be from a local virtual environment that for example I have in my old projects and not to go online and download them trough git etc. The reason is that sometimes, some of the machines do not have internet access but they have local environments on them. I'm guessing when you clone a task in the UI of the experiments, the new clone will download everything (libs and etc,) instead of using a local copy right?

@bmartinn
Copy link
Member

bmartinn commented Feb 28, 2020

Long story short, No you cannot use the local copy :(

That said, trains-agent cache both git repositories as well as python packages and libs. This means that you will be downloading the python packages and git repository only once, then the next time you need the same git repository or python package, it will first update the cache (if needed) then install them.
This means that re-executing is fairly quick in terms of setting up the environment, but it does need to first have everything in the cache.

Regrading the machine without internet access, I'm assuming packages were somehow brought to those machines?! Usually when that is the case, companies use an on-prem artifactory and git server. You can quickly set conda to use the local artifactory, instead of the default one: either config it specifically in the ~/trains.conf see here or configure it for the entire system by changing the conda global settings.

@Zeko403
Copy link
Author

Zeko403 commented Feb 28, 2020

Thanks a lot I will look into that!

@Zeko403
Copy link
Author

Zeko403 commented Mar 2, 2020

OK, I got things set up now, but I keep getting the error when I try to pull things from company's git the "Permission denied (publickey)."
I tried using git bash on the same machine to clone repositories and it worked, however when I run the trains-agent I get the publickey error.

I also added --git-pass with a passphrase for the ssh, but it didn't help.

The https is not an option as the repo is allowing cloning only with ssh.
Is there a way debug the problem or add the local publickey that is already put in ~/.ssh/id_rsa ?

@bmartinn
Copy link
Member

bmartinn commented Mar 2, 2020

@Zeko403 there are two options for passing the git user/pass to the trains-agent

  • Configure the user/pass in the ~/trains.conf , example here
  • Make sure that on the instance running the trains-agent you have the git key-ring configured, which as you pointed is in ~/.ssh/id_rsa. Make sure that in ~/trains.conf the user/pass are empty, otherwise the trains-agent wll try to use them instead of the git keyring.

I'm assuming that the seconds option is preferable in your case, and from what I understand there is still an issue when pulling the repository.

Debugging questions:

  1. What is the OS/Ver trains-agent is running on
  2. trains-agent version?
  3. What happens when you try to clone the same failing repository yourself from the same user that is running the trains-agent?
  4. Do you have the git password stored in the git key-ring, or are you getting a password prompt?
  5. Please attach logs, without any pass/users or any other sensitive information

@Zeko403
Copy link
Author

Zeko403 commented Mar 3, 2020

Thanks for your reply @bmartinn

I checked that git_pass and user are empty in the trains.config

Regarding your questions:
1 Windows 10
2 trains-agent=0.13.2
3 I set up the git bash in a way that when I open it, it asks me for the ssh passphrase after which it runs the ssh-agent which then allows me to run the same git clone git@repository.com command that is then successfully executed
4 I guess the 3 answers this question, I do not have set up git key-ring. If that is what is missing, could you point me to how to set that up?

5 Log from git bash:

Enter passphrase for /c/Users/engineer/.ssh/id_rsa:
Identity added: /c/Users/engineer/.ssh/id_rsa (user@com.com)

engineer@computer MINGW64 ~
$ ps p
      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
      443     413     443      16224  pty0      197609 11:52:24 /usr/bin/ps
      438       1     438       2696  ?         197609 11:52:06 /usr/bin/ssh-agent
      412       1     412      12140  ?         197609 11:52:05 /usr/bin/mintty
      413     412     413       2008  pty0      197609 11:52:05 /usr/bin/bash

engineer@computer MINGW64 ~/PycharmProjects/tmp
$ git clone git@repository.com:Project.git
Cloning into 'Project'...
remote: Enumerating objects: 2515, done.
remote: Total 2515 (delta 0), reused 0 (delta 0), pack-reused 2515
Receiving objects: 100% (2515/2515), 421.24 MiB | 9.78 MiB/s, done.
Resolving deltas: 100% (1450/1450), done.
Updating files: 100% (326/326), done.

Log from the trains-agent that was started after running the git bash:

No tasks in queue 17a66df1d3dc43a5a5e53e4568343d69
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 17a66df1d3dc43a5a5e53e4568343d69
No tasks in Queues, sleeping for 5.0 seconds
task 091019897c424d579668e0ec722810c0 pulled from 17a66df1d3dc43a5a5e53e4568343d69 by worker computer:0
Running task '091019897c424d579668e0ec722810c0'
Storing stdout and stderr log to 'C:\Users\engineer\AppData\Local\Temp\.trains_agent_out.8vccfn1g.txt', 'C:\Users\engineer\AppData\Local\Temp\.trains_agent_out.8vccfn1g.txt'
Current configuration (trains_agent v0.13.2, location: C:/Users/engineer/AppData/Local/Temp/.trains_agent.0d9psid8.cfg):
----------------------
agent.worker_id = computer:0
agent.worker_name = computer
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.venvs_dir = C:/Users/engineer/PycharmProjects/untitled
agent.vcs_cache.enabled = false
agent.vcs_cache.path = C:/Users/engineer/.trains/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = C:/Users/engineer/.trains/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = C:/Users/engineer/.trains/pip-cache
agent.docker_apt_cache = C:/Users/engineer/.trains/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda
agent.git_user =
agent.default_python = 3.6
agent.cuda_version = 102
agent.cudnn_version = 0
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://127.0.0.1:8008
api.web_server = http://127.0.0.1:8080
api.files_server = http://127.0.0.1:8081
api.credentials.access_key = BJV11V4Y73EC7V7VSS9K
api.host = http://127.0.0.1:8008
sdk.storage.cache.default_base_dir = ~/.trains/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff_on_train = true
sdk.development.support_stopping = true
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.default_output_uri =

Executing task id [091019897c424d579668e0ec722810c0]:
repository = git@repository.com:Project.git
branch =
version_num = 2035a6c8915de576b94e6cd4bcee65edd47b8afa
tag =
entry_point = start.py
working_dir = main

Using base prefix 'c:\\users\\engineer\\.conda\\envs\\env'
  No LICENSE.txt / LICENSE found in source
New python executable in C:\Users\engineer\PycharmProjects\untitled\3.6\Scripts\python.exe
Installing setuptools, pip, wheel...
done.


cloning: git@repository.com:Project.git
git@repository.com: Permission denied (publickey).

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Repository cloning failed: Command '['clone', 'git@repository.com:Project.git:Project.git', 'c:%5CUsers%5Cengineer%5C.trains%5Cvcs-cache%5CProject.git.6860ab533828dc61becee3fdc8f6b0c8%5CProject.git', '--quiet', '--recursive']' returned non-zero exit status 128.

trains_agent: ERROR: Failed cloning repository.
1) Make sure you pushed the requested commit:
(repository='git@repository.com:Project.git', branch='', commit_id='2035a6c8915de576b94e6cd4bcee65edd47b8afa', tag='', entry_point='start.py', working_dir='main')
2) Check if remote-worker has valid credentials [see worker configuration file]
DONE: Running task '091019897c424d579668e0ec722810c0', exit status 1

@Zeko403
Copy link
Author

Zeko403 commented Mar 4, 2020

@bmartinn

Hi again, I was playing around with the different solutions to the "keychain" on windows and after I installed Git Credential Manager for Windows it now worked regarding git clone that ran successful!
I will double check if taht is just an accident or it works for a longer period, then I will close the issue.

Thanks again for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants