Add support for python 3.11 #3190

ddelange · 2023-05-02T13:38:18Z

Issue #, if available: closes #2687

Description of changes:

bump dependency versions for which cp311 wheels/support is available
consequently add support for torch 2.0, torchmetrics 1.0, pytorch-lightning 2.0
run CI on 3.11
switch from cuda/cpu plus-notation to using pip's --force-reinstall flag

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

ddelange · 2023-05-02T14:18:37Z

CI fails on TypeError: __init__() got an unexpected keyword argument 'track_grad_norm'

as of pytorch-lightning v2.0.0 ref Lightning-AI/pytorch-lightning#16745

ref Lightning-AI/pytorch-lightning#16745

ddelange · 2023-05-02T15:26:14Z

last commit due to

ValueError: You selected an invalid strategy name: `strategy=None`. It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai/

ddelange · 2023-05-02T15:45:20Z

next fail:

tests/unittests/test_shift.py:66: KeyError: 'shift_features'

only present if detection_status is True from docs 🤔

Innixma · 2023-05-02T18:32:02Z

@ddelange Thanks for giving this a try, this is a great start!

I suspect there might be a non-trivial amount of changes needed since we need to upgrade to torch 2.0 and pytorch-lightning 2.0, but at least we should be able to start testing things now that CatBoost supports Python 3.11.

I'm putting a tentative timeline for v0.8 release, but it very much depends on how many changes need to happen.

gradientsky · 2023-05-03T01:11:49Z

This is still broken for Clip models:

import os
import numpy as np
import warnings
warnings.filterwarnings('ignore')
np.random.seed(123)

download_dir = './ag_automm_tutorial'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip'
from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)

import pandas as pd
dataset_path = download_dir + '/petfinder_processed'
train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
test_data = pd.read_csv(f'{dataset_path}/dev.csv', index_col=0)
label_col = 'AdoptionSpeed'

image_col = 'Images'
train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])


def path_expander(path, base_folder):
    path_l = path.split(';')
    return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])

train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))

train_data = train_data.sample(500, random_state=0)
test_data = test_data.sample(100, random_state=0)

from autogluon.multimodal import AutoMMPredictor
predictor = AutoMMPredictor(label=label_col)
predictor.fit(
    train_data=train_data,
    hyperparameters={
        "model.names": ["clip", "timm_image", "hf_text", "numerical_mlp", "fusion_mlp"],
        "model.timm_image.checkpoint_name": "swin_small_patch4_window7_224",
        "model.hf_text.checkpoint_name": "google/electra-small-discriminator",
        "env.num_gpus": 1,
    },
    time_limit=20, # seconds
)

scores = predictor.evaluate(test_data, metrics=["accuracy"])
print(scores)

│ /home/ubuntu/Projects/autogluon/multimodal/src/autogluon/multimodal/models/fusion/fusion_mlp.py: │
│ 126 in input_keys                                                                                │
│                                                                                                  │
│   123 │   def input_keys(self):                                                                  │
│   124 │   │   input_keys = []                                                                    │
│   125 │   │   for m in self.model:                                                               │
│ ❱ 126 │   │   │   assert hasattr(m, "input_keys"), f"invalid model {type(m)}, which doesn't ha   │
│   127 │   │   │   input_keys += m.input_keys                                                     │
│   128 │   │   return input_keys                                                                  │
│   129                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: invalid model <class 'autogluon.multimodal.models.clip.CLIPForImageText'>, which doesn't have a 'input_keys' attribute

sxjscience · 2023-05-03T04:04:51Z

Some failed tests point to utils/map.py. The version detection may not have taken effect:

autogluon/multimodal/src/autogluon/multimodal/utils/map.py

Lines 6 to 7 in 7dfefbd

    
           if version.parse(torchmetrics.__version__) > version.parse("0.12.0"): 
        
               from torchmetrics.detection.mean_ap import MeanAveragePrecision

I think it's safe to always use torchmetrics>0.12.0. @FANGAreNotGnu correct me if I'm wrong.

sxjscience · 2023-05-03T04:06:10Z

timeseries/setup.py

@@ -29,8 +29,8 @@
    "pandas",  # version range defined in `core/_setup_utils.py`
    "statsmodels>=0.13.0,<0.14",
    "gluonts>=0.12.4,<0.13",
-    "torch>=1.9,<1.14",
-    "pytorch-lightning>=1.7.4,<1.10.0",
+    "torch>=1.9,<2.1",


Due to security vulnerabilities of torch, we may need to force "torch>=1.13.1"

in that case, the minimum torchvision version would go to 0.14.0 ~~and your 0.12 comment becomes obsolete~~ 👍 sry that's torchmetrics

Do we want to be supporting multiple major versions of torch? That sounds intimidating to maintain. Is it possible to drop support of torch 1.x and only support 2.x?

e.g. torch-lightning supports 1.11-2.0, I think it should be ok to support v1 a while longer:

torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition.

ref https://pytorch.org/blog/pytorch-2.0-release/

personally I as user would be fine with >=2 as well

because of pytorch-lightning>=2.0.0,<2.1 (6a0387e), shall I consequently bump

- torch>=1.9,<2.1 + torch>=1.11,<2.1 - torchvision>=0.10.0,<0.16 + torchvision>=0.12.0,<0.16

?

edit: 4c94b71

sxjscience · 2023-05-03T04:09:56Z

The lightning upgrade will require some verification.

gradientsky · 2023-05-03T16:24:38Z

The lightning upgrade will require some verification.

I did smoke tests we usually ran for the containers: CLIP is broken (see the code above), but other scenarios looks ok.

yinweisu · 2023-05-03T17:13:02Z

.github/workflow_scripts/env_setup.sh

@@ -16,21 +16,23 @@ function setup_build_contrib_env {
 }

 function setup_torch_gpu {
-    # Security-patched torch.
-    python3 -m pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
+    PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cu118 reinstall_torch


So changes from this file actually won't be reflected in our CI because we are checking the permission of the PR submission and avoid people modifying our setup scripts.

Currently there's no easy solution to enable it to run. I'm thinking of allow the script to be used if we tag this PR as something like safe to run in the future. For now, we might need to give @ddelange write permission to our repo briefly and revoke the permission once the PR is ready. @Innixma @gradientsky Ideas?

sounds like good practice to keep that closed off 💪

fwiw, I don't mind if a maintainer opens a new PR in favour of this one

After discussion with the team, I'll be creating a clone of this PR once it's ready to test for changes. Once those changes are being verified, we'll merge this PR and close the clone PR.

yinweisu · 2023-05-03T17:20:10Z

.github/workflows/platform_tests-command.yml

@@ -45,7 +45,7 @@ jobs:
      fail-fast: false
      matrix:
        os: [macos-latest, windows-latest, ubuntu-latest]
-        python: ["3.8", "3.9", "3.10"]
+        python: ["3.8", "3.9", "3.10", "3.11"]


To trigger a platform test, you should be able to do it with comment on this PR by
/platform_tests ref=ddelange:cp311

However, our platform tests are currently failing because of other issues. @gradientsky @Innixma We'll want to fix the platform tests to unblock this PR

* 'master' of https://github.com/awslabs/autogluon: improve stability of flaky XShift tests (autogluon#3194)

Lightning-AI/torchmetrics#789

FANGAreNotGnu · 2023-05-03T21:01:53Z

Some failed tests point to utils/map.py. The version detection may not have taken effect:

autogluon/multimodal/src/autogluon/multimodal/utils/map.py

Lines 6 to 7 in 7dfefbd

if version.parse(torchmetrics.__version__) > version.parse("0.12.0"):

from torchmetrics.detection.mean_ap import MeanAveragePrecision

I think it's safe to always use torchmetrics>0.12.0. @FANGAreNotGnu correct me if I'm wrong.

Just confirmed that torchmetrics>=0.11.3 works well. I can update the version requirement (also removing the backup codes) if it also works well with other problem type. @zhiqiangdon

zhiqiangdon · 2023-05-03T23:30:40Z

@FANGAreNotGnu Please go ahead cleaning up the old logic for object detection. Other problem types already use the newest torchmetrics.

FANGAreNotGnu · 2023-05-04T00:22:04Z

Some failed tests point to utils/map.py. The version detection may not have taken effect:

autogluon/multimodal/src/autogluon/multimodal/utils/map.py

Lines 6 to 7 in 7dfefbd

if version.parse(torchmetrics.__version__) > version.parse("0.12.0"):

from torchmetrics.detection.mean_ap import MeanAveragePrecision

I think it's safe to always use torchmetrics>0.12.0. @FANGAreNotGnu correct me if I'm wrong.

For torchmetrics >= 0.11.3 the map implementation is bug free. But there's still an issue that it's much slower than 0.6.0 (the implementation copied here). Lightning-AI/torchmetrics#1024 There might be a problem that the old implementation is no more compatible with latest pytorch/torchmetrics. We can check the pytorch version and only use the new but slow implementation if we have to. @zhiqiangdon @sxjscience

multimodal/setup.py

ddelange · 2023-10-04T07:51:24Z

^ ray 2.7 contains some critical fixes for ray serve specific to python 3.11 ref kserve/kserve#3075 (comment)

Innixma · 2023-10-10T16:45:17Z

@ddelange and @yinweisu, awesome work! Everything appears to be passing now on the clone PR (#3204). We will move the changes over to this PR and merge soon. This is a huge feature, thanks so much!

Innixma

Awaiting changes from #3204 to be merged into this PR by @yinweisu.

yinweisu · 2023-10-10T17:06:09Z

@Innixma Changes merged. I think the code is good to merge functionality wise; though there are some installation instruction changes. Currently, we are prompting for pytorch 1.13 installation, I wonder shall we increase it to pytorch 2.0 and its corresponding torchvision?

We can probably address this in a follow-up PR too given this PR has been around for a long time

ddelange · 2023-10-10T17:44:14Z

docs/install-cpu-pip.md

@@ -4,7 +4,7 @@ pip install -U setuptools wheel

 # CPU version of pytorch has smaller footprint - see installation instructions in
 # pytorch documentation - https://pytorch.org/get-started/locally/
-pip install torch==1.13.1+cpu torchvision==0.14.1+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
+pip install torchvision~=0.15.1 --force-reinstall --extra-index-url https://download.pytorch.org/whl/cpu


I switched the docs to torch's new --extra-index-url notation.

torchvision etc have an exact pin on a torch version, so this command is sufficient (and will reinstall cuda version if a user previously had a cpu version installed amd vice versa)

multimodal/src/autogluon/multimodal/configs/optimization/adamw.yaml

docs/tutorials/multimodal/advanced_topics/customization.ipynb

yinweisu · 2023-10-10T19:58:32Z

@zhiqiangdon Addressed your comments

github-actions · 2023-10-10T22:17:27Z

Job PR-3190-bd6b173 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3190/bd6b173/index.html

Innixma

LGTM!!

zhiqiangdon

LGTM! Thanks for the contribution!

Innixma · 2023-10-11T21:28:16Z

Merged! Thanks everyone for working to push this through!

ddelange · 2023-11-28T10:53:23Z

multimodal/setup.py

@@ -64,7 +64,7 @@
    "onnx>=1.13.0,<1.14.0",
    "onnxruntime>=1.15.0,<1.16.0;platform_system=='Darwin'",
    "onnxruntime-gpu>=1.15.0,<1.16.0;platform_system!='Darwin'",
-    "tensorrt>=8.5.3.1,<8.5.4;platform_system=='Linux'",
+    "tensorrt>=8.5.3.1,<8.5.4;platform_system=='Linux' and python_version<'3.11'",  # tensorrt > 8.5.4 cause segfault


@yinweisu could you give tensorrt 8.6.1 another shot? to get some more info about the segfault?

if i find some time, I'll try to install it in our py 3.11 staging cluster and try out optimize_for_inference

Hi @ddelange, I've been trying to work out that problem with tensorrt 8.6.1 running the tests in test_deployment_onnx.py. It still gives me a segfault, I'll update with more info.

Update: I ran some more tests in python3.11, and tensorrt==8.6.1 worked with cuda12.1 with autogluon built from source (current master) and torch==2.1.1. I was able to successfully run tests in test_deployment_onnx.py with the above versions.

All in all, we can wait for the cuda version to be upgraded inside autogluon requirements to cuda12.1 (#3663), which will then solve the segfault issue on the latest version of tensorrt and cuda11.8. Only then we can go ahead and update tensorrt version in setup.py

Hi @AnirudhDagar 👋

#3982 was merged. could you finalize support for optimize_for_inference on python 3.11 to finally close #2687?

Yes, I'll update the issues as soon as we get rid of the py3.11 conditional which is still present there due to a few failing HPO tests (see comment). I need to run some tests locally, specifically for optimize_for_inference.

@AnirudhDagar I saw you added "close #2687" to the release tracker #4027

but that issue tracks optimize_for_inference support on 3.11. is your plan to squeeze it in still?

Thanks for the reminder @ddelange, yes I can give the optimize_for_inference testing a shot after merging your ray PR.

fwiw in a future version, the tensorrt python package seems deprecated (from inspecting the latest prerelease it's now an empty meta package forwarding to https://pypi.org/project/tensorrt-cu12/10.0.0b6/).

So if you hit a snatch, it might be worth to give tensorrt-cu12 a try. See also the install instructions ^ on the pypi page, to avoid an ugly pip-in-pip pattern ref NVIDIA/TensorRT#3080

Add support for python 3.11

fa4e31e

ddelange mentioned this pull request May 2, 2023

Add support for python 3.11 ddelange/autogluon#30

Closed

Add missing comma

e7e6bbc

ddelange added 2 commits May 2, 2023 16:25

Deprecate track_grad_norm

14e6fee

ref Lightning-AI/pytorch-lightning#16745

Use 'auto' instead of None defaults

dd96a19

Revert devices='auto'

9c922b9

gradientsky self-requested a review May 2, 2023 18:11

Innixma added this to the 0.8 Release milestone May 2, 2023

Innixma added enhancement New feature or request install Related to installation dependency Related to dependency packages priority: 0 Maximum priority labels May 2, 2023

gradientsky added 2 commits May 2, 2023 16:48

Merge branch 'master' into cp311

62c2008

fixes to inference; updated timeseries to align with the torch version

cf8d153

sxjscience reviewed May 3, 2023

View reviewed changes

yinweisu reviewed May 3, 2023

View reviewed changes

ddelange added 2 commits May 3, 2023 21:12

Merge branch 'master' of https://github.com/awslabs/autogluon into cp311

b2aef1a

* 'master' of https://github.com/awslabs/autogluon: improve stability of flaky XShift tests (autogluon#3194)

Deprecate compute_on_step

28ca9de

Lightning-AI/torchmetrics#789

ddelange commented May 4, 2023

View reviewed changes

multimodal/setup.py Outdated Show resolved Hide resolved

ddelange and others added 3 commits October 2, 2023 19:17

Bump ray to 2.7.0

fba061c

test

6deb782

disable tensorrt

2e62c43

yinweisu and others added 8 commits October 4, 2023 17:37

fix tests

5227c72

isort

84592cc

lint

4315d6e

isort

5586f48

fix

681709d

fix catboost

c30990d

fix

b82d137

Merge branch 'master' into py_311

97d9386

Innixma requested changes Oct 10, 2023

View reviewed changes

ddelange commented Oct 10, 2023

View reviewed changes

zhiqiangdon reviewed Oct 10, 2023

View reviewed changes

multimodal/src/autogluon/multimodal/configs/optimization/adamw.yaml Outdated Show resolved Hide resolved

docs/tutorials/multimodal/advanced_topics/customization.ipynb Outdated Show resolved Hide resolved

address comments

bd6b173

Innixma approved these changes Oct 11, 2023

View reviewed changes

zhiqiangdon approved these changes Oct 11, 2023

View reviewed changes

yinweisu merged commit 3345d3b into autogluon:master Oct 11, 2023
28 checks passed

Innixma mentioned this pull request Oct 11, 2023

Support Python 3.11 #2687

Closed

7 tasks

yinweisu mentioned this pull request Oct 12, 2023

Update torch install #3587

Merged

ddelange commented Nov 28, 2023

View reviewed changes

ddelange mentioned this pull request Dec 23, 2023

[DRAFT] Increase the upper bound of torch and lightning to accept 2.1 version #3663

Closed

prateekdesai04 mentioned this pull request Mar 15, 2024

[v.1.1][Upgrade] PyTorch 2.1 and CUDA 12.1 upgrade #3982

Merged

mabdulker mentioned this pull request Apr 8, 2024

Add support for Python 3.12 #4046

Open

Add support for python 3.11 #3190

Add support for python 3.11 #3190

Conversation

ddelange commented May 2, 2023 • edited

ddelange commented May 2, 2023 • edited

ddelange commented May 2, 2023 • edited

ddelange commented May 2, 2023

Innixma commented May 2, 2023 • edited

gradientsky commented May 3, 2023

sxjscience commented May 3, 2023

Choose a reason for hiding this comment

ddelange May 3, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddelange May 5, 2023 • edited

Choose a reason for hiding this comment

sxjscience commented May 3, 2023

gradientsky commented May 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FANGAreNotGnu commented May 3, 2023

zhiqiangdon commented May 3, 2023

FANGAreNotGnu commented May 4, 2023

ddelange commented Oct 4, 2023

Innixma commented Oct 10, 2023

Innixma left a comment

Choose a reason for hiding this comment

yinweisu commented Oct 10, 2023

Choose a reason for hiding this comment

yinweisu commented Oct 10, 2023

github-actions bot commented Oct 10, 2023

Innixma left a comment

Choose a reason for hiding this comment

zhiqiangdon left a comment

Choose a reason for hiding this comment

Innixma commented Oct 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddelange Apr 4, 2024 • edited

Choose a reason for hiding this comment

ddelange commented May 2, 2023 •

edited

ddelange commented May 2, 2023 •

edited

ddelange commented May 2, 2023 •

edited

Innixma commented May 2, 2023 •

edited

ddelange May 3, 2023 •

edited

ddelange May 5, 2023 •

edited

ddelange Apr 4, 2024 •

edited