Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain shared matflow version on CSF? #25

Open
gcapes opened this issue Jul 4, 2024 · 15 comments
Open

Maintain shared matflow version on CSF? #25

gcapes opened this issue Jul 4, 2024 · 15 comments
Assignees

Comments

@gcapes
Copy link
Contributor

gcapes commented Jul 4, 2024

Yuchen recently asked on slack for this to be updated.
I just wanted a discussion around what the best policy would be for which version people should use, and whether a central install is the best idea?

Adam has previously told me that you need to have matflow installed in the environment you're using for your task schema (if you want all the functionality). As such, you (sometimes) have to install it yourself anyway, even if you're running your workflow using a different installation (which has to be the same version number as in your task schema environment). It seems to me that if you have to install it anyway, there's not much advantage to having a central install. But maybe I've missed something?

I don't think it's changed recently. It depends on the script_data_in and script_data_out action fields. If they are direct, this means the script will interact with matflow directly, so matflow must be installed. I will be adding a new format for script_data_in/out soon which will become the default, and this won't require matflow to be installed.

With that quoted text in mind, maybe it will make more sense to have a central/shared install, but perhaps they should have version names instead of matflow-dev or similar?

@aplowman
Copy link

aplowman commented Jul 5, 2024

A shared installation/configuration is in general a good idea I think, and should be kept and ideally updated semi-regularly. Updating the shared installation has two steps:

  1. Adding the new (single-file) matflow executable (e.g. the matflow-v0.3.0a128-linux artifact from the GH release https://github.com/hpcflow/matflow-new/releases/tag/v0.3.0a128) to the software/matflow_exes directory in our shared RDS space on CSF3; and updating the matflow-dev symbolic link in this directory to point to the new version.
  2. Triggering a new run of the matflow-environments action that generates an updated matflow_full_env packed conda environment; then uploading the resulting artifact matflow_full_env-linux.tar.gz to the software/matflow_conda_envs directory in our CSF3 shared RDS space, and following the instructions here for unpacking the conda environment.

With the shared installation, the users just need to have the software/matflow_exes directory on their path. They can also use previous shared versions by referencing the versioned files (e.g. matflow-v0.3.0a110-linux) since they also exist in that directory.

@gcapes
Copy link
Contributor Author

gcapes commented Jul 23, 2024

Hi @aplowman
Do I need a greater level of access to this repo to manually trigger a run of the GH action? I don't see an option to manually run it:
image

@aplowman
Copy link

I've just given you write permission to matflow-environments. Does that help?

@gcapes
Copy link
Contributor Author

gcapes commented Jul 23, 2024

It sure does, thanks!
However, the action fails. It looks like the action needs updating to use more recent versions of checkout and download-artifact. I'm currently rerunning the workflow with newer versions of the actions.

@gcapes
Copy link
Contributor Author

gcapes commented Jul 23, 2024

Nope, that wasn't (only) it. Using both the modified and unmodified workflow files gives the same missing library error:
https://github.com/hpcflow/matflow-environments/actions/runs/10058928391/job/27803060520

I'm not sure what to do about this but it looks like the action is forcing use of Node 20 rather than Node 16 for both versions of the workflow file, and centos 7 (is that what the CSF still runs?) doesn't have the required library. I'm not surprised because all the default software on the CSF3 is ancient (indeed Centos 7 looks to be from 2014).

@aplowman
Copy link

The issue is we are using an old CentOS docker image to ensure compatibility with the oldest possible version of GLIBC (as on CSF). GitHub have recently updated their actions like to use a more modern version of node which doesn't work with the old version of GLIBC. Adding the following to any jobs that use this docker container (in this case, it's the conda-lock and conda-pack-linux jobs):

    env:
      ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: "true"
      ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION: node16   

seems to work for now. However, I believe this will not work at some (potentially soon) unspecified future date. For reference, here is how these changes look in one of the hpcflow-new workflows: https://github.com/hpcflow/hpcflow-new/blob/9ab185487773b3cb1e9708e49c45b4e8f0a0baff/.github/workflows/test.yml#L137. Note we may need to use version 3 of the checkout action.

@gcapes
Copy link
Contributor Author

gcapes commented Jul 23, 2024

Thanks - trying that now.

@gcapes
Copy link
Contributor Author

gcapes commented Jul 24, 2024

Success! The action ran and built the artifacts, which work on CSF3 for the demo workflow - see hpcflow/matflow-new#256

@gcapes
Copy link
Contributor Author

gcapes commented Aug 7, 2024

In discussion recently with @aplowman it seems the main use-case for keeping an up-to-date version of MatFlow and compatible environment is to lower the barrier to entry for people to run demo workflows.

However, it also came up that while old versions of MatFlow are kept, the shared environment is replaced with the latest version each time, which is only guaranteed to work with the latest version of MatFlow. So keeping these old versions of MatFlow may be of limited use.

For clarity to the user, I would propose one of a couple of approaches:

  1. maintain a number of MatFlow versions with compatible shared environments, which are loaded via module files like other software on CSF3. This might be a lot of work.
  2. Have people install MatFlow into the MatFlow environment used for a given workflow, and activate that same python venv used in the MatFlow env to submit workflows. This is what I have been doing, though admittedly only using python and Abaqus, and it seems to work well. It requires more effort and understanding from the user, which might be a good thing in the long term.

@JQFonseca
Copy link

JQFonseca commented Aug 7, 2024 via email

@JQFonseca
Copy link

JQFonseca commented Aug 7, 2024 via email

@gcapes
Copy link
Contributor Author

gcapes commented Aug 8, 2024

I also prefer 2. I just tend to use venv because that's what I'm familiar with, but conda could be used if someone prefers or they need to package other software too (I think that's conda's selling point?).
I'll write up how to do this.

@gcapes
Copy link
Contributor Author

gcapes commented Sep 25, 2024

I've been thinking about this, and we should probably schedule a meeting to talk it through, but here's my thoughts:

  • There is an envs_CSF3.yaml file containing in a shared folder on the CSF, containing some shared matflow environments. Many of them only contain python packages, but some are for other software and don't have python in the environment:

    • damask
    • matlab

    These work with any version of MatFlow, and as such are the most useful.

  • The other named environments are needed for the various task schema to work, but could each activate the same python virtual environment, if the packages are installed there.

    • damask_parse
    • formable
    • defdap
    • python
  • These 4 MatFlow environments above which contain python packages also use MatFlow to pass variables around, so they each need to contain the MatFlow python package, and the same version as was used to submit the workflow.

I propose having a template environments file, which people modify to use their own python virtual environment. This could either be hard-coded to activate the environment like I did here:

- name: damask_parse_env
  setup: |
    module purge
    source /mnt/iusers01/support/mbexegc2/scratch/laura_gonzalez/laura_ve_loading/.venv/bin/activate
    ...

or possibly better (certainly more robust) would be to have each of the python-containing environments to just work the same as the python_env MatFlow environment, which doesn't activate a particular venv, but just uses the currently active venv:

- name: python_env
  executables:
    - label: python_script
    ...

that way users wouldn't need to keep editing their config files for new projects.

@gcapes
Copy link
Contributor Author

gcapes commented Sep 26, 2024

@JQFonseca @aplowman @cjfullerton Do you think we could set up a meeting in the next week or so to discuss this?

@cjfullerton
Copy link

Hi @gcapes - sure.

My outlook calendar is up to date, so feel free to use that.

Tuesday 9-12 & 3-4, and Friday 9-12 are the best slots for me.

Can we do this in 25 mins or do we need 50 mins?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants