Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Task Plugin] Abstract python related tasks #9942

Closed
2 of 3 tasks
EricGao888 opened this issue May 7, 2022 · 6 comments
Closed
2 of 3 tasks

[Feature][Task Plugin] Abstract python related tasks #9942

EricGao888 opened this issue May 7, 2022 · 6 comments
Assignees
Labels
backend discussion discussion feature new feature
Milestone

Comments

@EricGao888
Copy link
Member

EricGao888 commented May 7, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

  • Currently we cannot alter python virtual environment for python task plugin once ds starts. However, in practice, sometimes we need different python tasks to run in different virtual env.
  • We might add a field env or something like this so that users could choose the specific virtual environments for their python tasks. To implement this, we could execute something like conda activate xxx before executing the python script in python task plugin.

Use case

  • Already described above.

Related issues

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@EricGao888 EricGao888 added feature new feature Waiting for reply Waiting for reply labels May 7, 2022
@github-actions
Copy link

github-actions bot commented May 7, 2022

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.

  • In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
  • If you haven't received a reply for a long time, you can join our slack and send your question to channel #troubleshooting

@EricGao888
Copy link
Member Author

Currently, we have supported several python dependency management approaches in Jupyter Task, which is a specific type of python task. Some related discussions: #10658 (comment)

@EricGao888
Copy link
Member Author

There are some points to be discussed here:

  • We could abstract python type task, other specific python related task could inherit it, together with its several dependency management approaches.
  • Which one to use? Choose Anaconda or venv to manage virtual python environment. Or maybe we could provide both of them for users to choose.

@jieguangzhou
Copy link
Member

jieguangzhou commented Aug 16, 2022

/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.dolphinscheduler.plugin.task.pytorch;
import lombok.Getter;
import lombok.Setter;
import lombok.ToString;
@Getter
@Setter
@ToString
public class PythonEnvManager {
public static final String ENV_TOOL_VENV = "virtualenv";
public static final String ENV_TOOL_CONDA = "conda";
private static final String PATTERN_ENVIRONMENT_PYTHON = "python[\\d\\.]*$";
private static final String PATTERN_ENVIRONMENT_REQUIREMENT = "\\.txt$";
private static final String CREATE_ENV_NAME = "./venv";
private static final String CONDA_SOURCE = "source activate %s";
private static final String CONDA_BUILD = "conda create -y python=%s -p %s";
private static final String VIRTUALENV_SOURCE = "source %s/bin/activate";
private static final String VIRTUALENV_BUILD = "virtualenv -p ${PYTHON_HOME} %s";
private static final String INSTALL_COMMAND = "python -m pip install -r %s";
private String pythonEnvTool = "virtualenv";
private String condaPythonVersion = "3.9";
public String getBuildEnvCommand(String requirementPath) {
String buildCommand = "";
String sourceCommand = getSourceEnvCommand(CREATE_ENV_NAME);
if (pythonEnvTool.equals(ENV_TOOL_VENV)) {
buildCommand = String.format(VIRTUALENV_BUILD, CREATE_ENV_NAME);
} else if (pythonEnvTool.equals(ENV_TOOL_CONDA)) {
buildCommand = String.format(CONDA_BUILD, condaPythonVersion, CREATE_ENV_NAME);
}
String installCommand = String.format(INSTALL_COMMAND, requirementPath);
String command = buildCommand + " && " + sourceCommand + " && " + installCommand;
return command;
}
private String getSourceEnvCommand(String envName) {
String command = "";
if (pythonEnvTool.equals(ENV_TOOL_VENV)) {
command = String.format(VIRTUALENV_SOURCE, envName);
} else if (pythonEnvTool.equals(ENV_TOOL_CONDA)) {
command = String.format(CONDA_SOURCE, envName);
}
return command;
}
public String getPythonCommand() {
return String.format("%s/bin/python", CREATE_ENV_NAME);
}
}

I write a PythonEnvManager in #11498, include the method below

  • getBuildEnvCommand: build a python environment
  • getSourceEnvCommand: after we build a environment, we can get source command to change python environment
  • getPythonCommand: after we build a environment, we can get python command directly

Currently, it mainly provides environment creation functions

@EricGao888 EricGao888 changed the title [Feature][Task Plugin] Enable environment switch for python task plugin [Feature][Task Plugin] Abstract python related tasks Aug 16, 2022
@fuchanghai
Copy link
Member

  • If on a WORKER node I want to run two python nodes at the same time, and the two nodes require different python versions. So the source is not enough, is it necessary to write the absolute path to meet this requirement?
  • Whether to alias the environment variable and configure it in the corresponding profile, because the same environment variable of each WORKER node may be in different paths. For example, the python2 of worker A is under a/xxx/python2, and the python2 of worker B is under b/xx/cc/python2。we can config it in every node's dolpscheduler.env file

@EricGao888
Copy link
Member Author

  • If on a WORKER node I want to run two python nodes at the same time, and the two nodes require different python versions. So the source is not enough, is it necessary to write the absolute path to meet this requirement?
  • Whether to alias the environment variable and configure it in the corresponding profile, because the same environment variable of each WORKER node may be in different paths. For example, the python2 of worker A is under a/xxx/python2, and the python2 of worker B is under b/xx/cc/python2。we can config it in every node's dolpscheduler.env file
  1. In Jupyter Task Plugin, we will source conda.sh, which is configurable, each time before running a task. Therefore, you could use different python virtual environments for different python task nodes even on the same worker, they will not affect one another.
  2. After sourcing conda.sh, we use conda activate xxx to activate the specific environment and execute python script with the python interpreter in that environment, therefore, users do not need to worry about the python path.

Assert.assertEquals(jupyterTask.buildCommand(),
"set +e \n " +
"source /opt/anaconda3/etc/profile.d/conda.sh && " +
"conda create -n jupyter-tmp-env-123456789 -y && " +
"conda activate jupyter-tmp-env-123456789 && " +
"pip install -r requirements.txt && " +
"papermill " +
"/test/input_note.ipynb " +
"/test/output_note.ipynb " +
"--parameters city Shanghai " +
"--parameters factor 0.01 " +
"--kernel python3 " +
"--engine default_engine " +
"--execution-timeout 10 " +
"--start-timeout 3 " +
"--version " +
"--inject-paths " +
"--progress-bar \n " +
"conda deactivate && conda remove --name jupyter-tmp-env-123456789 --all -y"
);

@caishunfeng caishunfeng modified the milestones: 3.1.0, 3.2.0 Sep 27, 2022
@zhongjiajie zhongjiajie modified the milestones: 3.2.0, 3.3.0 Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend discussion discussion feature new feature
Projects
None yet
Development

No branches or pull requests

7 participants