Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [BUG]: Regression handling DATA_PERSIST_ROOT setting #270

Closed
fmigneault opened this issue Dec 8, 2022 · 2 comments · Fixed by #272
Closed

🐛 [BUG]: Regression handling DATA_PERSIST_ROOT setting #270

fmigneault opened this issue Dec 8, 2022 · 2 comments · Fixed by #272
Assignees
Labels
bug Something isn't working

Comments

@fmigneault
Copy link
Collaborator

fmigneault commented Dec 8, 2022

Summary

Regression in handling DATA_PERSIST_ROOT setting does not produce the intended effect if anything other than defaults are used.

Details

Regression introduced in d4f2f44 regarding the use of DATA_PERSIST_ROOT.

New variable MAGPIE_PERSIST_DIR (there might be others doing similar things in other commits [?] ) have been defined in default.env making use of the (up to that point) default value of DATA_PERSIST_ROOT=/data.

edit @fmigneault
I can confirm that all modified directories for jupyter images, geoserver data and magpie's postgres and misbehaving in the data direcotory paths using a quick local test. I personnally use DATA_PERSIST_ROOT=/data/daccs and they have all been generated as /data/magpie_persists, /data/geoserver and /data/jupyterhub_user_data instead of their usual location nested under /data/daccs.

If the user defined instead their own DATA_PERSIST_ROOT in the env.local for an alternate storage location, all those derived variables do not consider the new value because they are already set.

This produces inconsistent directory storage locations and does not respect the expected behavior defined by the user.

To Reproduce

Steps to reproduce the behavior:

  1. Define DATA_PERSIST_ROOT=/my-data
  2. pavics-compose up -d

The volumes that make direct reference to ${DATA_PERSIST_ROOT} in docker-compose,yml (and component overrides) will have the correct path.
When using derived directory variables, they will still be under /data/....

Environment

Information Value
Server/Platform URL all that define DATA_PERSIST_ROOT with another value
Version Tag/Commit since 1.19.1
Related issues/PR #249
Related components magpie, postgres-magpie
Custom configuration any with custom DATA_PERSIST_ROOT and no explicit MAGPIE_PERSIST_DIR

Concerned Organizations

@matprov @tlvu

@fmigneault fmigneault added the bug Something isn't working label Dec 8, 2022
@tlvu
Copy link
Collaborator

tlvu commented Dec 8, 2022

In this case, I can only see 2 solutions:

1- keep DATA_PERSIST_ROOT but then have to remove GEOSERVER_DATA_DIR, JUPYTERHUB_USER_DATA_DIR, MAGPIE_PERSIST_DIR and just manually repeat their values as $DATA_PERSIST_ROOT/magpie_persist and so on with the risk of copy/paste typo as we keep retyping the same string

2- remove DATA_PERSIST_ROOT and keep the other 3, which opens the door to other apps to also put their data elsewhere without copy/paste error

In all cases we can document that to "move" /data, put a symlink on disk to where you want, avoid using the variable to move /data because it is error prone.

I have a préférence for option 2, what do you guys think?

@fmigneault
Copy link
Collaborator Author

fmigneault commented Dec 8, 2022

From these propositions, I personally prefer option 1 over 2 (though not optimal) because it preserves the original behavior (before the bug was introduced). The current changes unexpectedly broke my setup, and I don't want this to happen every time a new variable is added. Users should not have to worry about always adding new variables to support new features, as it makes feature integration and migration harder for them. When adding new variables, omitting them should preserve the original behavior.

I do not like the symlink fix approach. The target /data directory in my case happens to have a lot of other projects. It cannot be hacked around just for the needs of birdhouse. For the same reason, option 2 is a no-go for me, because I must be able to provide an alternate location.

I have 2 possible alternative solutions to propose:

using a post.env

There could be a post.env file that simply defines variables that depend on another variable in this fashion:

# post.env
export MAGPIE_PERSIST_DIR=${MAGPIE_PERSIST_DIR:-${DATA_PERSIST_DIR}/magpie_persist}

Basically, they define similar items to defaults.env only if not predefined by some env.local override.
The default.env would be reserved only for the case of hard-coded values (like it previously was).

The drawback is that it makes "defaults" lookup harder because there are 2 files to look for.
It might not be intuitive, and we must remember to define the variables in the right location based on context.

using eval and specific quotes syntax

Defining the following files:

# defaults.env
export TEST1=1
export TEST2=2
export TEST3='test-$TEST1-$TEST2'
# local.env
export TEST2=222
# pseudo-pavics-compose.sh ; simulates operations defined in pavics-compose.sh
. /tmp/daccs-test/defaults.env

echo Before [local.env]
echo TEST1: [${TEST1}]
echo TEST2: [${TEST2}]
echo TEST3: [${TEST3}]

. /tmp/daccs-test/local.env

export TEST1=$(eval "echo $TEST1")
export TEST2=$(eval "echo $TEST2")
export TEST3=$(eval "echo $TEST3")

echo Loaded [local.env]
echo TEST1: [${TEST1}]
echo TEST2: [${TEST2}]
echo TEST3: [${TEST3}]

Produces:

❯ ./pseudo-pavics-compose.sh
Before [local.env]
TEST1: [1]
TEST2: [2]
TEST3: [test-$TEST1-$TEST2]
Loaded [local.env]
TEST1: [1]
TEST2: [222]
TEST3: [test-1-222]

Essentially, defaults.env "looks and feels" the same, though the use of single quotes must be explicit to avoid interpretation of the nested variables at that point. The values of defaults.env are evaluated only after loading.

I think that would be a fair approach because defaults.env reads more naturally, and the single quote notation is already used in this manner (https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/pavics-compose.sh#L10-L79) to load them only when needed (https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/pavics-compose.sh#L111-L112).

A "warning" note could be added to defaults.env to indicate the use of those single quotes to allow this mechanism to take effect.

@tlvu tlvu closed this as completed in #272 Dec 23, 2022
tlvu added a commit that referenced this issue Dec 23, 2022
…ot take effect and warn when a dir in `EXTRA_CONF_DIRS` does not exist (#272)

## Fixes:

- Overriding `DATA_PERSIST_ROOT` in `env.local` do not take effect for
`JUPYTERHUB_USER_DATA_DIR`, `MAGPIE_PERSIST_DIR`, and
`GEOSERVER_DATA_DIR`.

These 3 vars will have to be delayed evaluated for override in
`env.local` to
  take effect.

  For a variable to be delayed evaluated, it has to be defined using
single-quote and be added to the list of `DELAYED_EVAL` in
`default.env`.

  If those steps are forgotten in `env.local`, it will still work since
`env.local` is the last file to be read. However those steps should not
be
  forgotten in any `default.env` for all components.

So the impact or burden is on the developpers to write their
`default.env`
  file properly, not on the users that only modify the `env.local` file.

All `default.env` files header have been updated with notice about this
new
  delayed evaluation feature.

  Fixes #270

## Changes:

- Warn when a dir in `EXTRA_CONF_DIRS` does not exist.

Most likely a typo in a new dir. Just warn and not exit directly to
avoid
leaving the entire platform down during an unattended autodeploy since
no
  one is around to take immediate action.

  Fixes #266
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants