Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define API to set/get SCR params #40

Closed
adammoody opened this issue Mar 7, 2017 · 8 comments
Closed

Define API to set/get SCR params #40

adammoody opened this issue Mar 7, 2017 · 8 comments
Assignees
Milestone

Comments

@adammoody
Copy link
Contributor

No description provided.

@adammoody
Copy link
Contributor Author

@white238 , as the next task, I'd like to look at creating an API to set SCR parameters from within the code. I'd like to have one of our new SCR developers assist with that.

Do you have an idea of different parameters you'd like to set to get started?

@adammoody
Copy link
Contributor Author

adammoody commented Mar 20, 2019

Some of these params are used only in the scripts (like SCR_RUNS), some are only in the library (like SCR_SET_SIZE), and some are in both (like SCR_CACHE_BASE). The ones used in both must be consistent so that the scripts and the library see the same values. For the scripts, some portion of the scripts run before the library and some portion runs after the library.

As a first task, we can list all params and where they must be valid. If a script needs a value, but the library runs first, we could have the library record its set of params in a file that the scripts can read after the library has run. If a script needs a value before the library runs, we probably should not allow the application to set that param via the API, since that would lead to inconsistent values.

Most of the params, but probably not all, are listed here:
https://scr.readthedocs.io/en/latest/users/config.html#scr-parameters

@rhaas80
Copy link
Collaborator

rhaas80 commented Aug 5, 2019

I have a kind-of API implemented in https://github.com/rhaas80/scr/tree/rhaas/scr_param_set which does let you add parameters to be used by scr_get_params() (or any of the scr_param_get calls). This lets a code set all SCR parameters but right now, at least without some largish changes to SCR, will require that these calls are made before SCR_Init (since it calls scr_get_params()).
There's multiple things that are not nice about this:

  • it will not transfer these parameters to eg a daemon process
  • having to set them before SCR_Init is annoying
  • it is impossible to change parameters after SCR_Init (well one can change the parameters but b/c the data has been parsed into global variables changing parameters no longer does anything)

The current API follows: ECP-VeloC/AXL#38 ie there is a call

SCR_Config("<string>")

that basically takes a line of the configuration files. If there is an item in the string that does not have an equal sign then this is just a query and no value is set.
Since I am just getting started with adding to SCR I would rather get some feedback if this is at all the desired direction to go to or if there is a desire to have an API where the global scr variables are updated (which probably requires some sort of "listener" interface to update the globals when the parameter hashes are updated).
Right now there's no check for "valid" parameters, mostly b/c no such check exists inside of SCR anywhere right now.

@adammoody
Copy link
Contributor Author

@rhaas80 , I may not have time to review your code in detail, but your description here sounds good. @gonsie mentioned that you had your own parser logic in there. You could perhaps simplify things if you can use the existing functions that are used to parse the SCR config files instead. It would be nice for the end user to support the same syntax in both places if possible.

https://github.com/rhaas80/scr/blob/ddfa45939b92ca3997b51153bba37f2b23c73cc3/src/scr_config.c#L265

I wouldn't worry about setting properties after SCR_Init. Perhaps return an error if someone tries to do that? For that matter, it could be nice to return an error in the case someone tries to set a parameter that can't be set using SCR_Config even before SCR_Init. For example, any parameter that has to be set before the user's job runs cannot be set with SCR_Config. Having said that, I think returning an error is a nice to have thing, rather than a need.

We'll also want to document which parameters can be set when. We could create a table to show whether the parameter can be set: as an environment variable, in the user config file, the system config file, and SCR_Config.

@rhaas80
Copy link
Collaborator

rhaas80 commented Aug 8, 2019

@adammoody thank you. I was certainly not expecting a full review. Mostly I had the code written to make sure I had an understanding of what needed changing and to be able to point to an explicit implementation.
Since I am new to the SCR project I wanted to get a feeling about what kind of style of implementation is acceptable and a rough idea whether I was going in the right direction. This is both to avoid stepping on people's toe's as well as avoiding putting in a lot of work that needs removing.

I had thought about trying to re-use the file parsing code, but in the end ended up written my own parser which should (untested, so likely untrue) parse the same set of configuration lines (minus comments and minus expanding environment variables I think). The main reason I decided that I could not use the existing function (easily) is because that function is tied to read from a file eg in the whitespace skipping code https://github.com/rhaas80/scr/blob/ddfa45939b92ca3997b51153bba37f2b23c73cc3/src/scr_config.c#L277 .

There's also a difference in syntax because in the configuration files a line like

CKPT=1 STORE

is invalid but is is a valid query for the configuration function. And finally I did not want to make the existing config file reader use a (modified) version of my parser since such a change almost always ends up causing changes in corner case behaviour that someone relied on.

If indeed changing parameters after SCR_init is not required then this makes my life much easier. I will set up a hash similar to the exisiting no_user hash (or maybe that one is already what I need) to forbid some parameters from being set programmatically since eg they need to be available for scripts before / after the main code runs.

I will go ahead and collect information on which parameters can be set programmatically and will try and see what would be required to have a single function parse either lines from a configuration file or strings passed to the config functions.

@rhaas80 rhaas80 closed this as completed Aug 8, 2019
@rhaas80 rhaas80 reopened this Aug 10, 2019
@rhaas80
Copy link
Collaborator

rhaas80 commented Oct 16, 2019

Parameters used by non-main code installed into bin:

  • grep -- '->get' $(grep -rl 'use .*scr_param.*')
	bin/scr_scavenge:my $param_buf_size = $param->get("SCR_FILE_BUF_SIZE");
	bin/scr_scavenge:my $param_crc = $param->get("SCR_CRC_ON_FLUSH");
	bin/scr_list_dir:  my $cachedesc = $param->get_hash("CACHE");
	bin/scr_list_dir:  push @bases, $param->get("SCR_CNTL_BASE");
	bin/scr_watchdog:my $param_timeout = $param->get("SCR_WATCHDOG_TIMEOUT");
	bin/scr_watchdog:my $param_timeout_pfs = $param->get("SCR_WATCHDOG_TIMEOUT_PFS");
	bin/scr_list_down_nodes:my $exclude = $param->get("SCR_EXCLUDE_NODES");
	bin/scr_list_down_nodes:  my $cntldirs = $param->get_hash("CNTLDIR");
	bin/scr_list_down_nodes:  my $cachedirs = $param->get_hash("CACHEDIR");
  • binaries that match scr_param_get:
    • scr_log_event
    • scr_log_transfer
    • scr_retries_halt
    • scr_halt
  • scr_log_event.c:
    • SCR_LOG_ENABLE
    • calls scr_log_init(), scr_log_job(), scr_log_event()
  • scr_log.c:
    • SCR_DB_DEBUG
    • SCR_DB_ENABLE
    • SCR_DB_HOST
    • SCR_DB_NAME
    • SCR_DB_PASS
    • SCR_DB_USER
  • scr_log_transfer.c:
    • SCR_LOG_ENABLE
    • calls scr_log_init(), scr_log_job(), scr_log_transfer()
  • scr_retries_halt.c:
    • SCR_HALT_SECONDS
    • calls scr_file_exists, scr_halt_read
  • scr_halt.c:
    • calls scr_file_is_readable, scr_open, scr_file_lock_read, scr_close, scr_file_unlock
  • scr_io.c:
    • no calls outside of file itself

scr_globals.h defines a good number of global variables, all prefixed by scr_ but does not seem to be used by any of the helper tools (anymore, removed use from scr_index.c). Either directly or indirectly.

  • scr_param.pm uses its own get() function:
    • SCR_CACHE_BASE
    • SCR_CACHE_SIZE
  • scr_param.pm directly uses ENV variables for:
    • SCR_PREFIX
    • SCR_CONF_FILE

Scripts only in scr but not installed on my workstation contain some more perl code, grep -r -- '->get' $PWD:

	TLCC/scr_scavenge.in:my $param_buf_size = $param->get("SCR_FILE_BUF_SIZE");
	TLCC/scr_scavenge.in:my $param_crc = $param->get("SCR_CRC_ON_FLUSH");
	TLCC/scr_list_down_nodes.in:my $exclude = $param->get("SCR_EXCLUDE_NODES");
	TLCC/scr_list_down_nodes.in:  my $cntldirs = $param->get_hash("CNTLDIR");
	TLCC/scr_list_down_nodes.in:  my $cachedirs = $param->get_hash("CACHEDIR");
	common/scr_list_dir.in:  my $cachedesc = $param->get_hash("CACHE");
	common/scr_list_dir.in:  push @bases, $param->get("SCR_CNTL_BASE");
	common/scr_list_dir:  my $cachedesc = $param->get_hash("CACHE");
	common/scr_list_dir:  push @bases, $param->get("SCR_CNTL_BASE");
	common/scr_watchdog:my $param_timeout = $param->get("SCR_WATCHDOG_TIMEOUT");
	common/scr_watchdog:my $param_timeout_pfs = $param->get("SCR_WATCHDOG_TIMEOUT_PFS");
	common/scr_watchdog.in:my $param_timeout = $param->get("SCR_WATCHDOG_TIMEOUT");
	common/scr_watchdog.in:my $param_timeout_pfs = $param->get("SCR_WATCHDOG_TIMEOUT_PFS");
	cray_xt/scr_scavenge.in:my $param_buf_size = $param->get("SCR_FILE_BUF_SIZE");
	cray_xt/scr_scavenge.in:my $param_crc = $param->get("SCR_CRC_ON_FLUSH");
	cray_xt/scr_scavenge.in:my $param_container = $param->get("SCR_USE_CONTAINERS");
	cray_xt/scr_list_down_nodes.in:my $exclude = $param->get("SCR_EXCLUDE_NODES");
	cray_xt/scr_list_down_nodes.in:  my $cntldirs = $param->get_hash("CNTLDIR");
	cray_xt/scr_list_down_nodes.in:  my $cachedirs = $param->get_hash("CACHEDIR");
	param_test:$value = $param->get_param('SCR_FLUSH');
	param_test:$value = $param->get_param('HOMEY');
	param_test:$value = $param->get_subparam('CACHEDIR', '/tmp', 'BYTES');
	LSF/scr_scavenge.in:my $param_buf_size = $param->get("SCR_FILE_BUF_SIZE");
	LSF/scr_scavenge.in:my $param_crc = $param->get("SCR_CRC_ON_FLUSH");
	LSF/scr_list_down_nodes.in:my $exclude = $param->get("SCR_EXCLUDE_NODES");
	LSF/scr_list_down_nodes.in:#  my $cntldirs = $param->get_hash("CNTLDIR");
	LSF/scr_list_down_nodes.in:#  my $cachedirs = $param->get_hash("CACHEDIR");
	PMIX/scr_scavenge.in:my $param_buf_size = $param->get("SCR_FILE_BUF_SIZE");
	PMIX/scr_scavenge.in:my $param_crc = $param->get("SCR_CRC_ON_FLUSH");
	PMIX/scr_scavenge.in:my $param_container = $param->get("SCR_USE_CONTAINERS");
	PMIX/scr_list_down_nodes.in:my $exclude = $param->get("SCR_EXCLUDE_NODES");
	PMIX/scr_list_down_nodes.in:  my $cntldirs = $param->get_hash("CNTLDIR");
	PMIX/scr_list_down_nodes.in:  my $cachedirs = $param->get_hash("CACHEDIR");

Kathryn's grep list of parameters

  • Common/scr_param.pm: SCR_PREFIX, SCR_CONF_FILE
  • Common/scr_prefix.pm: SCR_PREFIX
  • Common/scr_list_dir: CACHE, SCR_CNTL_BASE
  • Common/scr_watchdog: SCR_WATCHDOG_TIMEOUT, SCR_WATCHDOG_TIMEOUT_PFS
  • TLCC/scr_list_down_nodes: SCR_EXCLUDE_NODES, CNTLDIR, CACHEDIR
  • TLCC/scr_scavenge: SCR_FILE_BUF_SIZE, SCR_CRC_ON_FLUSH, SCR_USE_CONTAINERS
  • Cray_xt: scr_list_down_Nodes and scr_scavenge same as above
  • Cray_xt: scr_env: PBS_NUM_NODES,USER,PBS_JOBID
  • Common/scr_env: USER, SLURM_JOBID, SLURM_NODELIST

Compile time:

  • SCR_CNTL_BASE

What is not used in src:

  • SCR_WATCHDOG*, SCR_EXCLUDE_NODES

@rhaas80 rhaas80 self-assigned this Jan 28, 2020
@adammoody
Copy link
Contributor Author

@rhaas80 , how are things coming along on this work?

I'm working several codes teams to update the SCR calls in their applications. It's a prime opportunity to add the new config API as part of that effort.

@rhaas80
Copy link
Collaborator

rhaas80 commented Jun 10, 2020

Addressed in merge request #184

@rhaas80 rhaas80 closed this as completed Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants