Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor WMAgent initialization #1410

Merged
merged 3 commits into from
Sep 26, 2023

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Aug 23, 2023

fixes dmwm/WMCore#11627

READY

With the current PR few modifications are suggested:

  • Start reading every parameter needed for the agent initialization process only from WMAgent.secrets file and create the config file based only on those (avoid reading any parameter from the command line)

  • Removes the stage of initializing stuff to the container at runtime (like copying the WMAgent secrets file every time from the host to the docker image and working only with the copy).

  • Removes the HostAdmin mount point and leaves the container to interact directly with the so provided WMAgent.secrets file.

  • Improves WMAgent.secrets validation, parsing and loading mechanism.

  • Improves the container runtime wrapper, to properly create current links to the actual wmagent tag to be used in the host mount area.

  • Adds a file with common definitions of functions between manage and run.sh.

  • Unifies the initialization steps validation between manage and run.sh in a single mechanism.

  • Splits actions between the run.sh and manage scripts such that

    • run.sh takes care of all initialization and validation markers creation and keeping track of, while the other
    • manage acts just as a sole executor of management commands and only checks if a given management action is allowed or not.
  • Adds fine grained initialization markers on every initialization step rather than on full areas as it was before

  • Removes most hard coded paths and rely only on the relational paths created at the Docker file stemming from a single $WMA_ROOT_DIR

  • Expose everything from the runtime on proper (extended) set of mount points.

  • The above is done in parallel with reorganization of the directory structure of the agent, such as:

    • Removing extra nested levels. We no longer need
      * /data/srv/wmagnet/current/install/{wmagent,myslq,couchdb} or
      * /data/srv/wmagent/current/config/{wmagent,myslq,couchdb}
      * /data/srv/wmagent/current/logs/{wmagent,myslq,couchdb}
      * /data/srv/wmagent/current/state/{wmagent,myslq,couchdb}
      Because now every service is supposed to run in a separate container with it's won mount point at the host, so we now need the following structure:
      * /data/srv/wmagnet/current/{install,config,logs,state} and
      * /data/srv/couchdb/current/{install,config,logs,state} and
      * /data/srv/mysql/current/{install,config,logs,state}
    • Adopt a common directory structure for all docker containers as described above - to be applied later to the wmagent-couch and wmagent-mysql containers as well
  • Improves variable naming convention by adding WMA_ prefix to anything related to the agent container and environment

  • Simplifies some manage functionalities and fixes others.

  • Adds minimal help to the manage script.

  • Adds automatic initialization of the whole environment at login time - improves operations. E.g manage script usage now is as simple as: [1]

  • Removes management functions related to databases manipulation from the manage script and prepares them to be adopted in the manage scripts for the relevant CouchDB and MariaDB docker container. (I've already prepared a big portion of them, so half of the future work done as well.)

  • Removes socket connections to the SQL database and connects only through 127.0.0.1

  • Improves status checks for all databases both Relational and CouchDB.

  • Adds checks of databases directly for both Mysql and Oracle, rather than, as of now, by simply tracking the Mysql server process (this PIDFile is not reachable anymore).

    • Adds marking of the SQL database with the current WMAgent Build Id and hostname in order to prevent a container to database mismatch
    • Adds a full schema validation at initialization time to protect database integrity.

[1]

$ docker exec -it wmagent bash
(WMAgent-2.2.3.2) [cmst1@:current]$ manage status 
----------------------------------------------------------------------
Status of services:
+ Status of CouchDB:
++ {"couchdb":"Welcome","version":"3.3.2","git_sha":"11a234070","uuid":"72e2936520d0a6104b7cfb64c986c6ab","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"]}

+ Status of MySQL
Uptime: 139112  Threads: 1  Questions: 5744  Slow queries: 0  Opens: 151  Flush tables: 2  Open tables: 90  Queries per second avg: 0.041
+ MySQL connection is OK!

----------------------------------------------------------------------
Status of WMAgent components:
ERROR: This agent is not fully initialized. Cannot use it.
----------------------------------------------------------------------

@todor-ivanov todor-ivanov changed the title Refactor WMAgent initialization && Fix WMA_TAG && Rename WMA_ROOT_DIR to HOST_MOUNT_DIR && Remove HOSTADMIN mount point. Refactor WMAgent initialization Aug 24, 2023
@todor-ivanov
Copy link
Contributor Author

hi @amaltaro, As the last time, I have no rights in this repository to ask for a review. I do not know why. But anyway.. Things are starting to get shape now. Feel free to look at the code at you convenience.

@amaltaro
Copy link
Contributor

@todor-ivanov please consider adding a clear description of this pull request.
Once we settle with this development, please also update this wiki accordinly:
https://github.com/dmwm/WMCore/wiki/WMAgent-in-Docker

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro I am adding the description in a while.
Indeed, the documentation in this wiki is quite old and relates actually to this really old image we were having based o n the rpm packages. I can start modifying this one immediately actually.

@todor-ivanov
Copy link
Contributor Author

@amaltaro , Sorry it took a while but I am done with the update of this PR description now. Please read it again.

@todor-ivanov
Copy link
Contributor Author

@amaltaro, @vkuznet, @khurtado

The code is ready and tested
please feel free to start start reviewing it at your convenience



### List of all config/.init* files
* config/.initActive
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, why do you name all init files with dot prefix by making them hidden? Is there any concern about their visibility? Second question, where these files? and what is their structure? Which env/and other setup parameters they set. Finally, is there any order in their initialization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well..., this has been like this before. And mostly, having them hidden prevents them from messing up with the rest of the config and stateful files mounted at the host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where these files

Previously those were scattered all over the WMAgent deployment area, depending to which step they were referring to. Now they are all concentrated only in the config directory. Meaning one can find all of them there:

  • From inside the container at /data/srv/wmagent/<WMA_TAG>/config/.init*
  • From the host at /data/dockerMount/srv/wmagent/<WMA_TAG>/config/.init*

Copy link
Contributor Author

@todor-ivanov todor-ivanov Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which env/and other setup parameters they set.

They hold only one variable - $WMA_BUILD_ID - which is the WMAgent image unique identifier set at build time

Finally, is there any order in their initialization?

yes, there is. I think I have listed them in the README in thee order of initialization. if I have not, I will fix that

```
[cmst1@unit02:current]$ docker kill wmagent

[cmst1@unit02:current]$ rm /data/dockerMount/srv/wmagent/<WMA_TAG>/config/.init*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unclear to me why .init files should be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general mechanism is as follows:
On every container restart, the run.sh script goes and makes a full check of all .init * files if they:

  • are present and
  • are containing the same wma_build_id
    if one of them is either missing or holding a different wma_build_id it means the step has never been properly completed and needs to be performed during this restart (and eventually any other step depending on it)

Same for force intialization. If one (for any reason) would want to repeat an initialization step togather with all it's dependent steps, he just need to delete the proper .init file.

docker/pypi/wmagent/Dockerfile Show resolved Hide resolved
@todor-ivanov
Copy link
Contributor Author

Hi @vkuznet,
I added the requested information to the README. You may find it in the latest version of the file: https://github.com/dmwm/CMSKubernetes/blob/7f775b51fd0cf810f2ee9cb6ca692634afbf214e/docker/pypi/wmagent/README.md Please go ahead and take another look.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Sep 8, 2023

Hi @amaltaro, @vkuznet, @khurtado,

A Major update on this....
I went a little bit further in the testing of this setup.
I was brave and decided to use this containerized setup for one of my next debugging. So I have deployed the docker image to vocms0260, using the basic databases setup as provided with the other two temporary PRs I have created for CouchDB and MariaDB. Then I connected the agent to cmsweb-testbed.cern.ch and this way I have tested not only the initialization process ... and all the scripts from this PR, but I also successfully started injecting work into the agent.
Here is one example:

https://cmsweb-testbed.cern.ch/reqmgr2/fetch?rid=tivanov_TaskChain_Prod_SiteBlockedList_v1_230908_100533_7777

p.s. I do keep the JobCreator down as I need to have this workflow blocked and not progressing any further than WMBS , so no condor jobs will be created out of this one, but I have no doubt, once I release this workflow it will go through successfully.

Copy link
Collaborator

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, I went through this PR multiple times and it is very difficult to follow due to large number of changes. I mostly paid attention to high level logic which seems fine to me. Said that, I still do not understand why did you remove couch and maria DB configs from pypi/wmagent/etc area as they are needed to start corresponding DB. My feedback on this PR is the following. If you managed it to run it we should merge and iterate on missing or misbehaving parts and manage them in smaller chunks. But if @amaltaro will insist on detailed review of every single line then I bet it will require few hours of careful screenining and I still we may easily miss many things due to large impact/changes to lots of different places. The more productive approach it seems to me to apply it to WMA deployment and verify how it will go.

@todor-ivanov
Copy link
Contributor Author

Thanks @vkuznet for your additional look into this PR.

I agree this is a big one, and yes it does change a lot of things, and yes it does work well. I am already using this setup for my constant debugging in vocms0260. People may login there, connect to the container and play around so they can get used to the environment.

As of:

I still do not understand why did you remove couch and maria DB configs from pypi/wmagent/etc area as they are needed to start corresponding DB

Those are not needed in the wmagent image any more. They are supposed to be moved in the relative CoucDB and MariaDB images which are the containers to drive those databases. Those two configs are mostly related to the respective server's configurations and access rights, and hardly to the databases themselves. The later are to be created during the initialization process and the pieces exercising it are part of the wmagent's code itself

This later is true at the very least for CouchDB. About the SQL wmagent database there are two different cases covered:

  • Oracle - predefined instances at the server, managed per user and identified during connection (no direct equivalence to the MySQL database per se, the closest one should be a tablespace - where the wmagent's schema should be created),
  • MariaDB - the container driving the SQL server needs to create the database at startup.

So long story short: other than which are the correct scripts from WMCore/bin to be called at init time ( through the command manage execute-agent .....) nothing more is needed in this container/image.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, in addition to the comments made along the code, I wanted to leave more general questions/concerns here:

  • Please make a PR against the WMCore repo updating the 2 config/secrets file under: https://github.com/dmwm/WMCore/tree/master/deploy
  • Please create a new WMCore GH issue reporting what are the Oracle functionalities that are pending implementation (list any MariaDB pending functionalities as well please)
  • As mentioned along the code, the new relational database table - for container management - should be defined in the WMCore repository, likely under: https://github.com/dmwm/WMCore/tree/master/src/python/WMCore/Agent/Database
    • We can address it in 2 ways, update this PR and create the relevant WMCore PR; Or create yet another WMCore ticket to re-refactor this. I would prefer the former, but given the changes to inject issues into this PR, maybe we should stick to the latter.
    • In addition, I don't think it is a good idea to use case sensitive table/column names, as both Oracle and MariaDB are not case sensitive. The best is to use underscore in the variable names, e.g. init_param.
  • Concerning the manage-common.sh script. Right now (RPM) we need to source the "init.sh" agent file in order to get sqlplus/mysqldump/etc in the environment. If a similar case is needed for the container, then please update the top of that script with the needed information.
  • As previously discussed, if it is simple enough, please provide a data flow diagram (even in ASCII will be fine) with the order of scripts executed in a standard container initialization.
  • Is there any functionality currently present in https://github.com/dmwm/WMCore/blob/master/deploy/deploy-wmagent.sh which is not yet in run.sh? I guess one of those is the addition of HPC resources, but there could be others. I would suggest to make a GH issue for this.

docker/pypi/wmagent/bin/manage Outdated Show resolved Hide resolved
docker/pypi/wmagent/bin/manage-common.sh Show resolved Hide resolved
docker/pypi/wmagent/bin/manage-common.sh Outdated Show resolved Hide resolved
docker/pypi/wmagent/bin/manage-common.sh Outdated Show resolved Hide resolved
docker/pypi/wmagent/bin/manage-common.sh Outdated Show resolved Hide resolved

echo "Start sleeping now ...zzz..."

while true; do sleep 10; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line just so the container keeps running? If that is the case, is there any reason to keep the start-agent line below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this line is just for keeping the container alive. (Which BTW leads to another change in the wmagent_docker_run.sh, but I will fix that one separately).

Regarding the line bellow - we better remove it indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro , I've always been with the impression the names of those two scripts should be swapped. And now as I look at the ASCII diagram showing script relations, this impression only solidifies. But I did not proceed with the renaming because this would smear all the changes and the Review would become practically impossible... I may suggest, once we are ready for merging as a last step I should swap the names of those two scripts in a single commit separate from the rest. What do you think about it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro , Sorry if it was not clear enough - I was talking about swapping the names of: init.sh <--> run.sh .
Such that run.sh gets back as a default entry point for Docker and just calls init.sh and goes into an endless loop, while init.sh drives the whole initialization process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov apologies for the delay. You have a better view of this than me, so please go ahead if you consider it better. If you do so though, please make sure the README file reflects it as well.

docker/pypi/wmagent/install.sh Outdated Show resolved Hide resolved
@@ -176,7 +189,7 @@ echo "Start $stepMsg"
chmod +x $WMA_DEPLOY_DIR/deploy/renew_proxy.sh $WMA_DEPLOY_DIR/deploy/restartComponent.sh

crontab -u $WMA_USER - <<EOF
55 */12 * * * $WMA_DEPLOY_DIR/deploy/renew_proxy.sh
55 */12 * * * $WMA_MANAGE_DIR/manage renew-proxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is where the continuous proxy renewal process is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the reason why it lives in install.sh instead of run.sh, is because it is a predefined logic independent of any runtime parameter. So no need to be part of the init process at all. It is is easily definable at build time. I may suggest while you are looking to the rest of the PR, if you notice there is more of the logic, which is independent of any runtime parameters or initial configuration parameters, to point them and we can make the effort to move them here as well.

docker/pypi/wmagent/run.sh Outdated Show resolved Hide resolved
@@ -52,15 +52,15 @@ done

dockerOpts=" --network=host --progress=plain --build-arg WMA_TAG=$WMA_TAG "

docker build $dockerOpts -t wmagent:$WMA_TAG -t wmagent:latest .
docker build $dockerOpts -t local/wmagent:$WMA_TAG -t local/wmagent:latest .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please educate me on why prefixing it with local?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this line based on experience with with coucdb container. Because there was an upstream repository called couchdb which is actually the official one.

Same here - it would be good during local builds and tests one to distinguish whether he/she runs from cmsweb/wmagent or local/wmagent.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Sep 14, 2023

Hi @amaltaro,

Thanks for this Review. I think I have addressed all requested code changes, but let me try to answer your general comments one at a time:

Please make a PR against the WMCore repo updating the 2 config/secrets file under: https://github.com/dmwm/WMCore/tree/master/deploy

Here it is: dmwm/WMCore#11717

Please create a new WMCore GH issue reporting what are the Oracle functionalities that are pending implementation (list any MariaDB pending functionalities as well please)

Here it is: dmwm/WMCore#11720

As mentioned along the code, the new relational database table - for container management - should be defined in the WMCore repository, likely under: https://github.com/dmwm/WMCore/tree/master/src/python/WMCore/Agent/Database

Here I'd chose your second suggestion - to create a new issue just for moving the meta data table creation under wmcore repository .

Here it is: dmwm/WMCore#11721

Concerning the manage-common.sh script. Right now (RPM) we need to source the "init.sh" agent file in order to get sqlplus/mysqldump/etc in the environment. If a similar case is needed for the container, then please update the top of that script with the needed information.

With this implementation init.sh is completly obsolete. Now the manage and run.sh scripts load the common definitions from manage-common.sh on every execution, and the set of environment variables come mostly from 3 different places as explained in the README:

  • variables set at buildtime by Dockerfile:
  • variables set at runtime by sourcing $WMA_ENV_FILE
  • variables set at runtime defined inside the manage-common.sh file itself

Everything is supposed to be already in the environment, so as long as one sticks to the usage of the manage script for executing commands it would all be fine. There could be few cases when one would want to execute some of those common functions directly in the shell instead of through manage, but this is fairly easy as well - just the WMAGent.secrets file needs to be loaded like that:

(WMAgent-2.2.3.2) [cmst1@unit02:current]$ source $WMA_DEPLOY_DIR/bin/manage-common.sh
(WMAgent-2.2.3.2) [cmst1@unit02:current]$ _load_wmasecrets
(WMAgent-2.2.3.2) [cmst1@unit02:current]$ _renew_proxy

This is also explained in the README file.

As previously discussed, if it is simple enough, please provide a data flow diagram (even in ASCII will be fine) with the order of scripts executed in a standard container initialization.

Here it is:

Docker entrypoint: init.sh

+--------------------+     +--------+     +------------------+
|    run.sh          | --> |init.sh | <-- | manage-common.sh |
| (goes to inf loop) |     |        |     +------------------+
+--------------------+     |        |                     /
                           |        |     +--------+     /
                           |        | --> | manage | <--+
                           |        |     | <Step> |
                           |        |     +--------+
                           |        | --> .init<Step>
                              ....
                           |        |
                           |        |     +--------+
                           |        | --> | manage |
                           |        |     | <Step> |
                           |        |     +--------+
                           |        | --> .init<Step>
                           +--------+


And it was also added to the README file as well.

Is there any functionality currently present in https://github.com/dmwm/WMCore/blob/master/deploy/deploy-wmagent.sh which is not yet in run.sh? I guess one of those is the addition of HPC resources, but there could be others. I would suggest to make a GH issue for this.

Here it is: dmwm/WMCore#11722

Alan, please feel free to look into this PR again, while I am working on preparing the new set of issues.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Sep 14, 2023

@amaltaro
I think I have addressed all of your requests for changes + I have updated my previous comment reflecting the new issues created and the new info, so please read it again. And feel free to take another look of the end result at your convenience.
Thanks!

BTW: There was only one remark I could not find what it relates to:

In addition, I don't think it is a good idea to use case sensitive table/column names, as both Oracle and MariaDB are not case sensitive. The best is to use underscore in the variable names, e.g. init_param

Originating from the same reasoning I intentionally used only names of tables and columns with underscores instead of case sensitive naming. If you have found such please let me know so I can fix that, but I personally double checked and found none.

  • The table name is wma_init
  • The column names are as follows:
    • initparam
    • initvalue
  • The value names preserved are as follows:
    • wma_build_id
    • hostname
    • is_active

@amaltaro
Copy link
Contributor

@todor-ivanov I don't understand your statement above. Did you keep:

The column names are as follows:
initparam
initvalue

or not? My suggestion was to actually rename them to init_param and init_value.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Sep 15, 2023

My suggestion was to actually rename them to init_param and init_value.

That's easy enough. I thought you were left with the impression those names are following some case sensitive naming convention e.g. initParam etc. Never mind - I'll make the change in a minute.

fi
[[ $errVal -eq 0 ]] && echo $WMA_BUILD_ID > $WMA_CONFIG_DIR/$service/.dockerInit
done
# TODO: This is to be removed once we decide to run it directly from the deploy area
Copy link
Contributor Author

@todor-ivanov todor-ivanov Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro I'd like to draw your attention to these two lines bellow. To me these are obsolete. I kept them just for legacy reasons, but the way how things are currently organized, there is no need to copy the manage script from the $WMA_DEPLOY_DIR to $WMA_MANAGE_DIR both are listed in the agent's PATH, so the script would be easily run from any of those locations. The only difference is that if we use it from $WMA_DEPLOY_DUR it would be run from the docker image, and would never show up in the host mount area ($WMA_MANAGE_DIR is equivalent currently to $WMA_CONFIG_DIR). The current situation gives us a little bit of freedom, though - we have the ability to test some new changes or fix bugs to the manage script without the need to build a new container every time. And this was the only reason I have not removed this step at once. I'd like to hear your opinion on this one as well.

@todor-ivanov
Copy link
Contributor Author

@amaltaro @vkuznet @khurtado , Yet another report of tests here:

As you know, I am already using this setup at vocms0260 for debugging. So far I have been able to test only the correct population of database tables and major functionalities about workflow creation and synchronization between central and local work queue etc. etc...

But today I also released the two paused workflows from the local queue and let them propagate down to condor. And here is the result:

cmst1@vocms0260:/data/dockerMount/srv/wmagent/2.2.3.2/install $ condor_q 


-- Schedd: vocms0260.cern.ch : <188.185.29.211:4080?... @ 09/16/23 03:01:40
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI    SIZE CMD
8209.0   cmst1           9/16 01:40   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v1_230908_100533_7777-Sandbox.tar.bz2 1 0
8209.1   cmst1           9/16 01:40   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v1_230908_100533_7777-Sandbox.tar.bz2 2 0
8209.2   cmst1           9/16 01:40   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v1_230908_100533_7777-Sandbox.tar.bz2 3 0
8209.3   cmst1           9/16 01:40   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v1_230908_100533_7777-Sandbox.tar.bz2 4 0
8209.4   cmst1           9/16 01:40   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v1_230908_100533_7777-Sandbox.tar.bz2 5 0
8209.5   cmst1           9/16 01:40   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v1_230908_100533_7777-Sandbox.tar.bz2 6 0
8210.0   cmst1           9/16 03:01   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v3_230915_234534_345-Sandbox.tar.bz2 7 0
8210.1   cmst1           9/16 03:01   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v3_230915_234534_345-Sandbox.tar.bz2 8 0
8210.2   cmst1           9/16 03:01   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v3_230915_234534_345-Sandbox.tar.bz2 9 0
8210.3   cmst1           9/16 03:01   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v3_230915_234534_345-Sandbox.tar.bz2 10 0
8210.4   cmst1           9/16 03:01   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v3_230915_234534_345-Sandbox.tar.bz2 11 0
8210.5   cmst1           9/16 03:01   0+00:00:00 I  190000  0.0 submit.sh tivanov_TaskChain_Prod_SiteBlockedList_v3_230915_234534_345-Sandbox.tar.bz2 12 0

Total for query: 12 jobs; 0 completed, 0 removed, 12 idle, 0 running, 0 held, 0 suspended 
Total for all users: 12 jobs; 0 completed, 0 removed, 12 idle, 0 running, 0 held, 0 suspended

I couldn't be happier...... Hope you like it as well.

@amaltaro
Copy link
Contributor

@todor-ivanov we have 38 commits in this PR, which will make our life miserable if we need to go through the history of these files. On the other hand, I'd rather not risk squashing commits and losing developments along the way. Please let me know how you would like to proceed with the merge of this PR.

Note that it is not ready yet and there is one change that you wanted to make (swap run.sh and init.sh). So let us wait for that one.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Sep 25, 2023

Thanks for taking yet another look @amaltaro

  • First bullet. About:

we have 38 commits in this PR, which will make our life miserable if we need to go through the history of these files. On the other hand, I'd rather not risk squashing commits and losing developments along the way

Well I do not think there is any groundbreaking history in this development that would be lost and we will not be able to restore to a previous state if we need. So I went and merged those 38 commits in 2 - one for code change and one for README updates.

  • Second, I am about to swap the names of those two files run.sh and init.sh, because this will make the startup process more clear and all the containers' entrypoints will be aligned.
  • Third.... while working on the above I just found, why all Condor jobs were failing with error code: 99303. It was due to path misalignment between condor and the docker container. I fixed that with my last latest commit. I will squash the rest of the code commits later.

@amaltaro
Copy link
Contributor

@todor-ivanov okay, please let me know when it's ready for a final look.

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro
Everything is now ready and tested. I have left the 3 new commits separate, so you may take a final look. Once you approve them, I can squash everything again to only 2 commits before the merge.

@amaltaro
Copy link
Contributor

@todor-ivanov for your last commit to renew proxy once it is below 24h. Please change it to 7 days.

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro

for your last commit to renew proxy once it is below 24h. Please change it to 7 days.

Done

@amaltaro
Copy link
Contributor

Thanks Todor. Please squash the commits accordingly and we can proceed with merging it.

… to HOST_MOUNT_DIR && Remove HOSTADMIN mount point.

Typo

Add all files needed for the new initialisation logic.

Remaking the interaction between run and manage scripts

Move *-new files as defaults && Remove unneded files.

Move _renew_proxy to manage-common && Add manage renew-proxy command && Fix broken _parse_wmasecrets call && Fix broken aliases && Strip login variables and exports from WMA_ENV_FILE

Remove redundant parameter to db-prompt && Fix password typos

Fix missing return value from init-agent

Add database status functions

Fix forgotten return from bad WMAgent.secrets parsing

Start building the full database checks mechanisms before agent startup

Move _status_of_couch to manage-common && call it from run.sh

Add rough, and currently broken, mechanism for databse validation

Add check for empty database  for mysql && reduce wmagent-docker* paramters && improve all functions' printout messages

Implement clean-mysql && Reduce global definitions in manage script

Remove wmaInitSqlDB when cleaning the whole agent as well.

Reorder database checks and agent activation && Fix COUCH_HOST to be read from WMAgent.secrets && Improve printouts

Write WMA_BUILD_ID and hostname at the sql database.

Implement _sql_dbid_valid

Move wmagent database creation to init-agent

Stop cleaning databases on WMAgent.secrets file change && Fix RUCIO_HOME && Fix default wmaSchemaFile

Implement a single function _exec_mysql to use for all calls to mysql client

Fix _renew_proxy

Switch -v to -t option in install.sh.

Remove unneeded check_services function && Start the agent on a restart upon full intialization

Cleanup manage script from old comment lines

Fix sql schema dump options

Add better pager options to the mysql db-prompt

Addressing review comments

Fix renamed variables from MYSQL_ADDRESS to MYSQL_HOST

Rename wma_init columns

Fix broken tree at the host for condor jobs.

Fix broken tree at the host for condor jobs one level down - at /data/srv/wmagent instead of /data/srv .
Update README

Update README

Update README

Update README

Update README

Adding ASCII diagram to the README file && Cleaning comments && Removing a redundant base check.

Update README
Renew myProxy 7 days before expiration

Renew myProxy 7 days before expiration
@todor-ivanov
Copy link
Contributor Author

hi @amaltaro

Please squash the commits accordingly and we can proceed with merging it.

Done. I left the commit for swapping run.sh with init.sh separate, though. It is a significant change which is worth having it in it won commit. And I actually failed one attempt to move it up over the Update README commits and decided not to push too hard at the very end of the whole process. So let them be as they are (the 3 commits I mean).

@amaltaro
Copy link
Contributor

Thanks Todor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor/reorganize agent initialization and manage actions
3 participants