[0.3.0] Abstracting Pytorch example #24

Syakyr · 2024-02-27T17:03:40Z

Closes #17.

This is a big merge before freezing the code to 0.3.0 with a number of changes on top of abstracting the Pytorch example and creating a package-agnostic base template to implement models of the user's choosing.

Changes made:

Experimental RoCM support
Package-agnostic base template to build relevant problem templates from
Cosmetic changes for consistency and/or simplicity
More explicit methods of deployment depending on use case/availability of infrastructure
Refined checks with regards to the cookiecutter input
Removed dockerfiles for JupyterLab and VSCode images (moved to kapitan-hull-admin)

…lab/vscode dockerfiles

…ce is not part of the problem template list

…o post gen hook to remove unneeded banners

…model checkpoint location

…ml to problem-templates/cv

… os.path.join use

Syakyr · 2024-03-10T22:04:04Z

Problem with the gpu (model training) Dockerfile such that pytorch is forced to run cpu only due to the dependencies. Would need to find check if nvidia image base is needed by installing the gpu dependencies onto the current cpu image (to reduce image size). Concern is that CUDA components might be installed twice in anaconda namespace, thus increasing the image size for the gpu (model training) image.

Syakyr · 2024-03-10T22:26:44Z

So installing pytorch-cuda as a dependency on ubuntu:20.04 image would allow pytorch to use CUDA, circumventing the need to use the Nvidia images. Checking whether rebuilding the GPU image using the CPU dockerfile as well as modifying the conda yaml file to install gpu-enabled Pytorch packages would result in model training with GPU, or model training with CPU if GPU is not supplied.

deon and others added 30 commits February 22, 2024 14:28

[feat] abstract sample problem as a cookiecutter option

4126d74

Updates to restore same functionality as main branch minus outdated j…

15a0fc9

…lab/vscode dockerfiles

Moved mlflow_test to common src folder

e240b48

Moved docker to common folder

f14ad27

Add explicit global marker for PROBLEM_TEMPLATE

22835db

Initialised files in common src folder

0330b55

Rename dockerfiles, update gpu base docker image for gpu dockerfile

0c73937

Refactor conf folder and its references

42a3013

Rename batch_infer.py

4ae43bb

Initialised files in common conf folder

ddb58a4

Add 'none' to match case to override overwriting, raise error if choi…

11bbf77

…ce is not part of the problem template list

Rename default_project_template to project_template, added function t…

5389dcc

…o post gen hook to remove unneeded banners

More redundant items removed when template is generated

d428d5f

Update cuda version to latest supported version by Pytorch

eb8caf1

Removed push commands at the wrong section

fbc5ad6

Removed extra dollar signs, added extra GPU instructions, simplified …

7031fdd

…model checkpoint location

Update documentation on adding problem_template option

0e1546b

Experimental RoCM support

5f5be25

Moved files to shared src folder

d8f1702

Populate base src folder relating to process_data

ab6af0b

Move logging.yaml to common conf folder

929ce44

Fixed param renaming for hydra multirun in train_model

25ab78b

Update pre_gen check

e9cd59f

Change print statement to logger.error

0f6d80d

Change method name

da533fa

init py files moved to general src folder

e2945e2

Moved main.py to general fastapi folder

6537ed5

Change dummy text content to be platform-agnostic

0315fe6

Remove common test folder

ef78da7

Changes some cosmetic code in CV problem template

7f99c59

Syakyr added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 27, 2024

Syakyr added this to the 0.3.0 release milestone Feb 27, 2024

Syakyr self-assigned this Feb 27, 2024

Syakyr mentioned this pull request Feb 27, 2024

[Feature]: To remove Pytorch example and separate it as an example section in the documentation #17

Closed

5 tasks

Syakyr Surani added 9 commits February 28, 2024 10:16

Change registry path for gcp

3a63de7

Moved, update and modified the Pytorch MNIST example guide + runai ya…

6098a68

…ml to problem-templates/cv

Fixed some issues with src scripts + conf files

9dc5ee9

Change GCP test project name

517911f

Added aisg-context folder to the list of files to overwrite in, fixed…

f482bfe

… os.path.join use

Added -rocm to the env name for differentiation

12b6f7b

Update guide site and runai yaml files to fit into as a base template

a4becdb

Fixed template issues

838069a

Fixed template issue

141be64

Syakyr marked this pull request as draft February 29, 2024 13:43

Syakyr Surani and others added 6 commits March 1, 2024 17:43

Update changes to ccutter replay files

34eda34

Fixes

2cce7ea

Fixes

746dd91

Changed artifact_path

7f37caa

Added back dev-wksp and data-storage-versioning pages for base template

e0f2b46

Updates

21ce563

Syakyr added ready Ready to be merged and closed and removed bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Mar 11, 2024

Syakyr marked this pull request as ready for review March 11, 2024 04:40

Syakyr merged commit 672ac8b into main Mar 11, 2024

Syakyr deleted the 0.3.0-pytorch-abstract branch March 11, 2024 04:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.3.0] Abstracting Pytorch example #24

[0.3.0] Abstracting Pytorch example #24

Syakyr commented Feb 27, 2024 •

edited

Syakyr commented Mar 10, 2024

Syakyr commented Mar 10, 2024

[0.3.0] Abstracting Pytorch example #24

[0.3.0] Abstracting Pytorch example #24

Conversation

Syakyr commented Feb 27, 2024 • edited

Syakyr commented Mar 10, 2024

Syakyr commented Mar 10, 2024

Syakyr commented Feb 27, 2024 •

edited