Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default_qsub_arguments -V not working #781

Merged
merged 1 commit into from Sep 13, 2018

Conversation

@sakshamgarg
Copy link
Contributor

commented Aug 16, 2018

Bug/feature Description

  • qsub -V option is not working when it is set using default_qsub_arguments.
  • Bugs: Once the default_qsub_arguments is set for -V option, "qstat -f" doesn't display environment variable except those which are related to PBS or which are set using -v option.

Affected Platform(s)

  • ALL

Cause / Analysis / Design

  • Bugs: Environment variable list is not stored after checking that -V option is set or not in default_qsub_arguments.

Solution Description

  • After the options for default_qsub_arguments are processed, -V option is checked whether it is set or not.
  • If the -V option is set then using the variable 'environ', all the environment variable are stored as a list in a variable.

Testing logs/output

Checklist:

For further information please visit the Developer Guide Home.

@sakshamgarg sakshamgarg force-pushed the sakshamgarg:default_arg_V branch 2 times, most recently from e2bf2a7 to a7bc129 Aug 16, 2018


/* qsub_envlist might be needed if -V option is set in default flags.
* It will be unset if not needed then. */
qsub_envlist = env_array_to_varlist(envp);

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Aug 16, 2018

Contributor

We need to call the function env_array_to_varlist() only when desired i.e. when V_opt is set either while calling qsub or through default_qsub_arguments.
In other words before calling the function set_job_env() make sure that V_opt(This should be set only when qsub is called through -V option or default_qsub_arguments contains -V option) is set or not. If set then only call env_array_to_varlist to populate qsub_envlist and then pass it to set_job_env()

@suresh-thelkar

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2018

Also we need to write a few PTL test cases to verify the fix. Following are the test cases that I can think of right now.

  1. -V option is not set in qsub but in default_qsub_arguments it is set.
  2. -V option is not set in qsub but in default_qsub_arguments it is set. Submit a job script with and without -V option.
  3. Submit a test case by only setting default_qsub_arguments through job script i.e. default_qsub_arguments should not contain -V option.

If required all the above test cases can be clubbed into a single test case.

"""
os.environ["SET_IN_SUBMISSION"] = "true"
self.server.manager(MGR_CMD_SET, SERVER, {'default_qsub_arguments': '-V'})
j = Job(ROOT_USER)

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 21, 2018

Contributor

Why do need to submit job as root user?

This comment has been minimized.

Copy link
@sakshamgarg

sakshamgarg Aug 21, 2018

Author Contributor

I want to submit the job through a single user only, need not be a root user.
This is needed because the sample environment variable which is set will not be accessed by the job running on other users. The sample environment variable is set only for the current user.

"\nexit 0" +
"\nfi" )
jid = self.server.submit(j)
self.server.log_match(jid + ";Exit_status=0", id=jid)

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 21, 2018

Contributor

Missing new line.

j = Job(ROOT_USER)
j.create_script("#PBS -N default_qsub_arguments"
"\n#PBS -V" +
"\nif [ -z '$SET_IN_SUBMISSION' ]" +

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 21, 2018

Contributor

In if [ -z '$SET_IN_SUBMISSION' ], I doubt that this SET_IN_SUBMISSION will get expanded because it is in single quote

@sakshamgarg sakshamgarg force-pushed the sakshamgarg:default_arg_V branch from 5adfc52 to db6e3d5 Aug 22, 2018

@suresh-thelkar
Copy link
Contributor

left a comment

Thanks for making the suggested code changes. It looks much better now. Here are a few more comments on PTL.

This Test case exports an environment variable. It sets
-V option through a job script directive and then changes
the value of the environment variable. It expects another job
script to have the changed value of the environment varible.

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Aug 22, 2018

Contributor

Instead of saying another job script we can say that "The job under execution should get the updated value of the environment variable" since its(Job Script Directives) precedence is higher than the others"

script to have the changed value of the environment varible.
"""
os.environ["SET_IN_SUBMISSION"] = "true"
j = Job(ROOT_USER)

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Aug 22, 2018

Contributor

Instead of ROOT_USER you can actually use one of TEST_USER variables.

os.environ["SET_IN_SUBMISSION"] = "true"
self.server.manager(MGR_CMD_SET, SERVER,
{'default_qsub_arguments': '-V'})
j = Job(ROOT_USER)

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Aug 22, 2018

Contributor

Instead of ROOT_USER you can actually use one of TEST_USER variables.

'SET_IN_SUBMISSION=%s' % env_var)},
id=jid)

def test_option_V_with_script(self):

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Aug 22, 2018

Contributor

Better to rename the test case as test_qsub_opt_V_with_jobscript().

@sakshamgarg

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2018

Thanks for all the suggestions and comments. I have updated the PTL test and tested it on the test machine. I have attached the log file for the same.
LOG_default_qsub_argument_V.txt

@@ -4914,6 +4914,10 @@ do_submit(char *retmsg)
if (rc != 0)
return (rc);
}

/* Get required environment variables */
if (V_opt)

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 23, 2018

Contributor

Just curios to know: Can you do little qsub performance test with large environment, before and after this change? I am suspecting that by doing this it will slow down qsub. (you use below script to do test)

#!/bin/bash

qmgr -c "s s default_qsub_arguments='-V'"
for i in $(seq 1 1 100)
do
    export TEST_VAR${i}="$(printf "${i}%.0s" {1..1000})"
done
start=$(date +'%s')
for i in $(seq 1 1 100)
do
    qsub -- /bin/sleep 1000
done
end=$(date +'%s')
echo $((end - start))

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Aug 24, 2018

Contributor

As I said yesterday during our offline discussion it is not good to compare qsub performance with large environment before and after the current code fix. Prior to this code change the function env_array_to_varlist() was not getting called when we set default_qsub_arguments and hence the current bug.
As part of the fix this function call is reintroduced at the right place. So if we compare with and without code fix the performance of qsub after the fix is expected to slow down with -V option. So the ideal test case for performance measure would be to compare qsub with a older version where this bug was not present vs with the current code fix. @sakshamgarg is verifying this test case and will be posting the results and logs to the ticket.

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 24, 2018

Contributor

@suresh-thelkar Thanks for update, I was not aware that this issue was regression issue. So now I am fine if we don't want to do performance measure as with -V qsub will be slow is expected, but still up to you and @sakshamgarg for doing perf measurement.

This comment has been minimized.

Copy link
@sakshamgarg

sakshamgarg Aug 24, 2018

Author Contributor

I agree with what @suresh-thelkar has suggested in the second part. I did the qsub performance test, between the older version where this bug was not there and the current code fix which has been done, using the above python script. There is no significant change in the performance test.

@sakshamgarg

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

I have done manual testing for the test case which uses a job script to set -V option as I was facing user permission issue while trying to automate the test.

This test case exports an environment variable. It sets -V option through a job script directive and then changes the value of the environment variable. It expects the job under execution to get the updated value of the environment variable since its (Job Script Directive's) precedence is higher than the others.

Below is the log for manual testing and I have attached the script files.

script.sh.txt
sample.py.txt

[hadoop@centos2 ~]$ export EXAMPLE=true
[hadoop@centos2 ~]$ qsub script.sh
12558.centos2
[hadoop@centos2 ~]$ cat sample.e12558
[hadoop@centos2 ~]$ cat sample.o12558
Value of EXAMPLE variable before changing:
EXAMPLE=true
Value of EXAMPLE variable after changing
EXAMPLE=false
job script returned changed value

@suresh-thelkar

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

@sakshamgarg, Thanks for making all the suggested changes and PTL test cases. I sign off.

@sakshamgarg sakshamgarg force-pushed the sakshamgarg:default_arg_V branch from 917a4ee to 83b77eb Aug 24, 2018

@hirenvadalia
Copy link
Contributor

left a comment

On second thought, I don't think this is right fix as I did quick test as below:

#!/bin/bash

qmgr -c "s s default_qsub_arguments = '-V'"
export EXAMPLE=true
qsub -- /bin/sleep 1000
export EXMAPLE=false
qsub -- /bin/sleep 1000

Means for first job EXAMPLE should be true and for second it should be false
But with this fix, EXAMPLE is always (atleast for 60 sec) true

[root@testdev pbspro]# qstat -f | grep 'EXAMPLE'
	/pbs/bin:/opt/pbs/sbin,PWD=/pbspro,EXAMPLE=true,HOME=/root,SHLVL=2,
	/pbs/bin:/opt/pbs/sbin,PWD=/pbspro,EXAMPLE=true,HOME=/root,SHLVL=2,

I know this happens because of background qsub daemon has old environ when it was forked.
So we need to look for another (right one) fix.

@suresh-thelkar

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

Good catch @hirenvadalia. It looks like qsub's environment is not passed to qsub daemon that we launch internally. Whenever a request is made to qsub daemon we should pass the new environment to it if V_opt is set. If passing this environment turns out to be a costly operation with respect to performance then we can also think of not redirecting the request to qsub daemon when V_opt is set.
Also it would be better to check whether this behaviour was working earlier or not. This might help us in measuring the qsub performance with large environment correctly.

@sakshamgarg

This comment has been minimized.

Copy link
Contributor Author

commented Aug 31, 2018

Below is the log after fixing the issue as reported by @hirenvadalia.

[saksham@centos ~]$ qmgr -c "s s default_qsub_arguments='-V'"
[saksham@centos ~]$ export EXAMPLE=true
[saksham@centos ~]$ qsub -- /bin/sleep 1000
6.centos
[saksham@centos ~]$ export EXAMPLE=false
[saksham@centos ~]$ qsub -- /bin/sleep 1000
7.centos
[saksham@centos ~]$ qstat -f | grep EXAMPLE
/jre,LANG=en_US.UTF-8,GDM_LANG=en_US.UTF-8,EXAMPLE=true,
/jre,LANG=en_US.UTF-8,GDM_LANG=en_US.UTF-8,EXAMPLE=false,
[saksham@centos ~]$

I am also attaching the logs for the manual testing as described in previous comment.
log_manualTesting.txt

@suresh-thelkar
Copy link
Contributor

left a comment

Thanks for making all the code changes. I sign off.

Show resolved Hide resolved src/cmds/qsub.c Outdated
validate default_qsub_argument option.
"""

def test_option_V(self):

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 31, 2018

Contributor

Please add test for recently discovered use cases (EXAMPLE=true and false one)


class TestDefaultQsubArgument(TestFunctional):
"""
validate default_qsub_argument option.

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 31, 2018

Contributor

default_qsub_argument == default_qsub_arguments

Also I suggest this docstring to something like this: Validate whether qsub sends all environ when -V is enabled via default_qsub_arguments and or on command line.

env_var = os.environ.get("SET_IN_SUBMISSION")
self.server.expect(JOB, {'Variable_List': (MATCH_RE,
'SET_IN_SUBMISSION=%s' % env_var)},
id=jid)

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 31, 2018

Contributor

Please add test case for testing passing full environ via -V on qsub command line

Show resolved Hide resolved test/tests/functional/pbs_default_qsub_argument.py Outdated
os.environ["SET_IN_SUBMISSION"] = "true"
self.server.manager(MGR_CMD_SET, SERVER,
{'default_qsub_arguments': '-V'})
j = Job(PBSROOT_USER)

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 31, 2018

Contributor

You are setting SET_IN_SUBMISSION in current user's session and submitting job as pbsroot user, which internally PTL will do sudo qsub, and SET_IN_SUBMISSION might not get passed by sudo.
I suggest to submit job as current user's only, you can use self.du.get_current_user() to get current user name

Show resolved Hide resolved test/tests/functional/pbs_default_qsub_argument.py Outdated
Show resolved Hide resolved test/tests/functional/pbs_default_qsub_argument.py Outdated
from tests.functional import *


class TestDefaultQsubArgument(TestFunctional):

This comment has been minimized.

Copy link
@hirenvadalia
if (V_opt)
qsub_envlist = env_array_to_varlist(envp);

qsub_envlist = env_array_to_varlist(envp);

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Aug 31, 2018

Contributor

Calling env_array_to_varlist in every condition with definitely decrease performance of qsub even with out -V because env_array_to_varlist is little heavy function which traverse whole environ char by char.

This comment has been minimized.

Copy link
@sakshamgarg

sakshamgarg Sep 3, 2018

Author Contributor

I agree with your statement that it will affect performance of qsub.
The other alternative solution to this problem was to connect to the server at the very starting, before it was loading value to qsub_envlist in the "main" function, and fetch the default qsub arguments from the server. But this renders the use of a daemon pointless.
Although this result in performance degradation but it is the only (valid) solution for this bug.
If needed, we can enforce that -V option never be set in default arguments to ensure performance.

This comment has been minimized.

Copy link
@hirenvadalia

hirenvadalia Sep 3, 2018

Contributor

@sakshamgarg Another solution might be possible which is let qsub daemon ask foreground qsub to send envp is -V is enabled in default_qsub_arguments on server.
Flow will be something like this:

  1. foreground qsub send all data to daemon qsub except envp if no -V on command line, if -V is passed on qsub command line then pass envp along with another data
  2. daemon qsub will accept data
  3. if -V is enabled in default_qsub_arguments then ask foreground to send envp if not already send
  4. daemon qsub will accept envp from foreground qsub
  5. daemon qsub will submit job

This solution will only qsub performance only in case of -V only which is expected but it doesn't affect performance normal qsub and still we can have daemon qsub

@subhasisb

This comment has been minimized.

Copy link
Collaborator

commented Sep 3, 2018

As discussed, the easiest (and right) solution would be to drop backgrounding in case a -V is detected in the qsub default arguments (or qsub command line). Passing on large environments from froreground to background qsubs would be performing bad enough that we might as well do a implied "-f" or direct connect to server (a.k.a no backgrounding)

@sakshamgarg sakshamgarg force-pushed the sakshamgarg:default_arg_V branch from 2183114 to b496281 Sep 4, 2018

@@ -164,6 +164,7 @@
#define QSUB_DMN_TIMEOUT_LONG 60 /* timeout for qsub background process */
#define QSUB_DMN_TIMEOUT_SHORT 5

#define VOPT_ERROR 4321 /* Error code if -V option is detected in background qsub*/

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Sep 7, 2018

Contributor

Having option 'V' is not an error. May be we can rename it like VOPT_SET.

@@ -4877,6 +4878,28 @@ do_connect(char *server_out, char *retmsg)
return 0;
}

static int
parse_dfltqsubargs(char *retmsg)

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Sep 7, 2018

Contributor

Please write the doxygen header for this function.

*/
refresh_dfltqsubargs();
rc = parse_dfltqsubargs(retmsg);
if(rc || V_opt){

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Sep 7, 2018

Contributor

You can make it more readable by saying rc == 0 && V_opt.

*/
*do_regular_submit = 0;
if(rc != VOPT_ERROR)

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Sep 7, 2018

Contributor

Instead of checking whether VOPT_ERROR is not set, do a check for rc == VOPT_ERROR, if yes bypass the following code. Basically you need to optimise the code here.

@subhasisb

This comment has been minimized.

Copy link
Collaborator

commented Sep 7, 2018

@sakshamgarg the server forces the background daemons to update the default qsub args (When it changes via qmgr), this code is already there. We need to check it after do_dir() in do_submit() and return with a "Unique" error code. This should percolate all the way back to the foreground qsub as a submission return code (only that it is not), and we need to check that in the foreground and set do_forground_qsub=1

@@ -5982,6 +5961,8 @@ fork_and_stay(void)
/* set single threaded mode */
pbs_client_thread_set_single_threaded_mode();

/* set when background qsub is running */
is_background = 1;

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 11, 2018

Collaborator

indentation

@@ -236,6 +236,7 @@ static char *pbs_hostvar = NULL; /* buffer containing ",PBS_O_HOST=" and host na
static int pbs_o_hostsize = sizeof(",PBS_O_HOST=") + 1; /* size of prefix for hostvar */
static char *display; /* environment variable DISPLAY */

static int is_background = 0; /* flag to check if background qsub is active */

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 11, 2018

Collaborator

Change the comment to something like "am i the background process? This variable is set only once and is read-only afterwards"

/*
* get environment variable if -V option is set. Return the code
* VOPT_SET if -V option is detected in background qsub.
*/
if(V_opt) {

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 11, 2018

Collaborator

space before (

if(V_opt) {
if(is_background)

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 11, 2018

Collaborator

space before (

rc = -1;
sprintf(retmsg, "Failed to recv data from background qsub\n");
/* Error message will be printed in caller */
if(rc != VOPT_SET){

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 11, 2018

Collaborator

space before (

@@ -164,7 +164,7 @@
#define QSUB_DMN_TIMEOUT_LONG 60 /* timeout for qsub background process */
#define QSUB_DMN_TIMEOUT_SHORT 5

#define VOPT_ERROR 4321 /* Error code if -V option is detected in background qsub*/
#define VOPT_SET 4 /* return code if -V option is detected in background qsub*/

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 11, 2018

Collaborator

It is probably best to rename this return code to be more generic, so that it can be used later for other similar purposes, if necessary. Rename it to something like DMN_CANT_SERVE (or think something better)

if(V_opt) {
if(is_background)
return VOPT_SET;

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 11, 2018

Collaborator

Also, it would be better if we can check this in do_daemon_stuff and exit the daemon in this return case. This way future qsubs will not have to continuously consult with the background qsub just to know that it should not be going to the background

@@ -6246,9 +6246,8 @@ main(int argc, char **argv, char **envp) /* qsub */
basic_envlist = job_env_basic();
if (basic_envlist == NULL)
exit_qsub(3);
if(V_opt) {
if(V_opt)

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 12, 2018

Collaborator

still has an indentation problem


/*
* Something bad happened, either background submitted
* and failed to send us response, or it failed before
* submitting. If background qsub detects -V option, then
* submit the job through foreground.
*/
if(rc != VOPT_SET){
if (rc != DMN_REFUSE_EXIT){

This comment has been minimized.

Copy link
@subhasisb

subhasisb Sep 12, 2018

Collaborator

still has a indentation issue between ){

@suresh-thelkar
Copy link
Contributor

left a comment

Thanks for making the code changes. You can implement optional minor comment that I have given if you would like to. However I sign off.

@@ -4915,6 +4922,16 @@ do_submit(char *retmsg)
return (rc);
}

/*
* get environment variable if -V option is set. Return the code

This comment has been minimized.

Copy link
@suresh-thelkar

suresh-thelkar Sep 12, 2018

Contributor

Can you repharse it as "get the variable environ which points to an array of pointers to strings of the user environment if -V option is set....".

@sakshamgarg

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2018

Attaching the logs of PTL tests.
ptl_output_-V.txt

@hirenvadalia
Copy link
Contributor

left a comment

LGTM

@sakshamgarg sakshamgarg force-pushed the sakshamgarg:default_arg_V branch 2 times, most recently from c27c79c to 28b20e8 Sep 12, 2018

default_qsub_arguments -V not working
Not starting background qsub if -V option is set

@sakshamgarg sakshamgarg force-pushed the sakshamgarg:default_arg_V branch from 28b20e8 to 7847e71 Sep 12, 2018

@sakshamgarg

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2018

@subhasisb I have squashed all commits and rebased.

@subhasisb subhasisb merged commit 7847e71 into PBSPro:master Sep 13, 2018

4 checks passed

Codacy/PR Quality Review Up to standards. A positive pull request.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details

@sakshamgarg sakshamgarg deleted the sakshamgarg:default_arg_V branch Sep 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.