Skip to content

Conversation

@mart-r
Copy link
Collaborator

@mart-r mart-r commented Jul 2, 2025

This PR aims to add install bundles to releases.

The install bundles it will (hopefully) include:

  • All target 64bit x86 and a Linux environment
  • Includes common optional extras
    • rel-cat
    • meta-cat
    • deid
    • spacy
  • There's 2 bundle for each supported python version
    • One that's CPU only (torch-wise)
    • One that allows GPU as well (torch-wise) (couldn't make it work as it was too big)
    • Supported versions are (currently) 3.9, 3.10, 3.11, and 3.12

So TLDR, there will be 8 4 total install bundles included.

The README for the install bundles can be found here:
https://github.com/CogStack/cogstack-nlp/blob/CU-8699my5eg-add-release-bundles/medcat-v2/.release/install_bundle_readme.md

PS:
Currently, the workflow will run during a PR, but that will (somewhat obviously) need to be changed before merging
Removed / fixed

EDIT:
Had to remove the GPU install bundle because it's too big. The limit is 2GiB and the cuda-enabled bundle is around 3GB.

@mart-r mart-r marked this pull request as draft July 2, 2025 16:32
@mart-r
Copy link
Collaborator Author

mart-r commented Jul 3, 2025

Just a comment:

Had to remove the GPU install bundle because it's too big. The limit is 2GiB and the cuda-enabled bundle is around 3GB.

@alhendrickson
Copy link
Collaborator

Commenting this so I can remember it, out of our slack talk

If we target this flow that you've put down, targeting just the air gap user in a trust - I'm feeling like it is a good specific problem to solve.
"download it from GitHub, copy it to their one drive (which is available inside and outside), download it to their VPN machines, and install based on that. "

I'm seeing two main features:

  • Making a zip containing everything, that can go into one drive
  • Make a script that can pip install from that zip, even if it's just for an example

Today, you've got the CPU zip in github, which looks good to me. Install script is basically what you do for the smoketest I think, but will be good to document this somewhere. I'm totally on board with what you have for this now

And these assumptions:

  • There's some manual actions on by the users side, which we can't automate - we cant control their one drive
  • Site unblocking and hosting not really required, its done by users
  • Whether the zip already exists, or users have to make it by running a script, doesnt affect it really
  • There can be some client side scripts done by the user

So in future I'm thinking we could do stuff like this:

  • Script making the zip by the user, instead of making it for them in github
  • Make scripts with options for GPU, architectures etc. Doesnt have to just be 10 complete zip files, but could still be
  • Add more requirements on the user side? EG put all this into docker images, they docker run locally instead which would solve OS issues, file size use, etc. . Mount a local drive in docker, run our image, which magically creates a zip

@mart-r mart-r marked this pull request as ready for review July 3, 2025 14:56
@mart-r mart-r merged commit 43834a0 into main Jul 7, 2025
13 checks passed
@mart-r mart-r deleted the CU-8699my5eg-add-release-bundles branch July 7, 2025 18:23
mart-r added a commit that referenced this pull request Jul 7, 2025
* CU-8699my5eg: Add workflow jobs/steps to create release bundles

* CU-8699my5eg: Hopefully fix a path issue with release workflow

* CU-8699my5eg: Add sanity check integration tests to release bundling job

* CU-8699my5eg: Build wheel with lowest supported python version for backwards compatibility

* CU-8699my5eg: [TEMP/TEST/TO_REMOVE] Make workflow run on pull request

* CU-8699my5eg: [TEMP/TEST/TO_REMOVE] Fix/hardcode branch name

* CU-8699my5eg: Allow unsafe index strategy for python 3.9 and cpu-only toch bundle.

Otherwise the dependencies are not able to be resolved. See comment in code for some more details

* CU-8699my5eg: Move to virtual environment when downloading wheels.

uv pip does not support a download command (at least not yet) so that cannot be used.
And uv python doesn't support using -m, so can't use that either.
So now just creating the env and using that instead

* CU-8699my5eg: Make sure there's a PIP to play with during bundling

* CU-8699my5eg: Clear venv after usage

* CU-8699my5eg: Fix typo regarding venv path

* CU-8699my5eg: Fix usage of wrong extra parts or GPU-enabled bundle

* CU-8699my5eg: Allow only binaries

* CU-8699my5eg: Allow only binaries during compilaton time

* CU-8699my5eg: Hopefully fix wheel artifact upload

* CU-8699my5eg: Add .tar.gz to uploaded wheel artifact

* CU-8699my5eg: Add kust oof donwnloaded artifacts as a step

* CU-8699my5eg: Update debug / ls output

* CU-8699my5eg: Update download artifact paths

* Revert "CU-8699my5eg: Update debug / ls output"

This reverts commit 8c670ed.

* CU-8699my5eg: Make sure bundles get included in release.

Previosuly the wheels download probably overwrote the bundles that were copied there.

* CU-8699my5eg: Move .tar.gz to dist as wel

* CU-8699my5eg: Add debug output after moving release bundles

* CU-8699my5eg: Add debug output reguarding all artifacts before moving release bundles

* CU-8699my5eg: Fix bundle upload path

* CU-8699my5eg: Remove GPU install bundle

* CU-8699my5eg: Include release version in bundle names

* CU-8699my5eg: Fix extraction of version tag

* CU-8699my5eg: Fix version tag in install bundle name

* CU-8699my5eg: Add release bundle README

* CU-8699my5eg: Add install bundle README to install bundles

* CU-8699my5eg: Rename release bundle readme to install bundle readme

* CU-8699my5eg: Rename release bundle readme to install bundle readme

* CU-8699my5eg: Add requirements file to install bundle

* Revert "CU-8699my5eg: [TEMP/TEST/TO_REMOVE] Fix/hardcode branch name"

This reverts commit afb148f.

* Revert "CU-8699my5eg: [TEMP/TEST/TO_REMOVE] Make workflow run on pull request"

This reverts commit b9c76af.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants