Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] revert cat operator performance work-around #987

Merged
merged 1 commit into from
Apr 8, 2022

Conversation

jeffdaily
Copy link
Collaborator

revert d5ca53c (pytorch#46097). The changes only affect ROCm. Reverts a work-around for a compiler performance issue that is no longer needed.

python -m pt.cat_test --tag_filter all --device cuda


OLD Forward Execution Time (us) : 48.833
NEW Forward Execution Time (us) : 8.318

OLD Forward Execution Time (us) : 54.508
NEW Forward Execution Time (us) : 23.824

OLD Forward Execution Time (us) : 52.117
NEW Forward Execution Time (us) : 14.942

OLD Forward Execution Time (us) : 98.790
NEW Forward Execution Time (us) : 74.334

OLD Forward Execution Time (us) : 102.063
NEW Forward Execution Time (us) : 76.008

OLD Forward Execution Time (us) : 167.786
NEW Forward Execution Time (us) : 123.679

OLD Forward Execution Time (us) : 98.320
NEW Forward Execution Time (us) : 67.436

OLD Forward Execution Time (us) : 91.484
NEW Forward Execution Time (us) : 59.230

OLD Forward Execution Time (us) : 109.569
NEW Forward Execution Time (us) : 76.557

OLD Forward Execution Time (us) : 106.603
NEW Forward Execution Time (us) : 87.635

OLD Forward Execution Time (us) : 106.693
NEW Forward Execution Time (us) : 88.902

OLD Forward Execution Time (us) : 110.881
NEW Forward Execution Time (us) : 94.361

OLD Forward Execution Time (us) : 122.925
NEW Forward Execution Time (us) : 123.046

OLD Forward Execution Time (us) : 272.442
NEW Forward Execution Time (us) : 271.932

OLD Forward Execution Time (us) : 457.329
NEW Forward Execution Time (us) : 456.767

OLD Forward Execution Time (us) : 117.688
NEW Forward Execution Time (us) : 87.133

OLD Forward Execution Time (us) : 873.764
NEW Forward Execution Time (us) : 865.075

OLD Forward Execution Time (us) : 1746.831
NEW Forward Execution Time (us) : 1730.252

OLD Forward Execution Time (us) : 2619.303
NEW Forward Execution Time (us) : 2598.717

OLD Forward Execution Time (us) : 52.063
NEW Forward Execution Time (us) : 7.904

OLD Forward Execution Time (us) : 52.275
NEW Forward Execution Time (us) : 8.118

OLD Forward Execution Time (us) : 51.896
NEW Forward Execution Time (us) : 7.938

OLD Forward Execution Time (us) : 51.745
NEW Forward Execution Time (us) : 7.922

OLD Forward Execution Time (us) : 52.575
NEW Forward Execution Time (us) : 13.299

OLD Forward Execution Time (us) : 52.090
NEW Forward Execution Time (us) : 8.015

Pull Request resolved: pytorch#74129
Approved by: https://github.com/ngimel

Fixes #ISSUE_NUMBER

revert d5ca53c (pytorch#46097).  The changes only affect ROCm.  Reverts a work-around for a compiler performance issue that is no longer needed.

`python -m pt.cat_test --tag_filter all --device cuda`

```

OLD Forward Execution Time (us) : 48.833
NEW Forward Execution Time (us) : 8.318

OLD Forward Execution Time (us) : 54.508
NEW Forward Execution Time (us) : 23.824

OLD Forward Execution Time (us) : 52.117
NEW Forward Execution Time (us) : 14.942

OLD Forward Execution Time (us) : 98.790
NEW Forward Execution Time (us) : 74.334

OLD Forward Execution Time (us) : 102.063
NEW Forward Execution Time (us) : 76.008

OLD Forward Execution Time (us) : 167.786
NEW Forward Execution Time (us) : 123.679

OLD Forward Execution Time (us) : 98.320
NEW Forward Execution Time (us) : 67.436

OLD Forward Execution Time (us) : 91.484
NEW Forward Execution Time (us) : 59.230

OLD Forward Execution Time (us) : 109.569
NEW Forward Execution Time (us) : 76.557

OLD Forward Execution Time (us) : 106.603
NEW Forward Execution Time (us) : 87.635

OLD Forward Execution Time (us) : 106.693
NEW Forward Execution Time (us) : 88.902

OLD Forward Execution Time (us) : 110.881
NEW Forward Execution Time (us) : 94.361

OLD Forward Execution Time (us) : 122.925
NEW Forward Execution Time (us) : 123.046

OLD Forward Execution Time (us) : 272.442
NEW Forward Execution Time (us) : 271.932

OLD Forward Execution Time (us) : 457.329
NEW Forward Execution Time (us) : 456.767

OLD Forward Execution Time (us) : 117.688
NEW Forward Execution Time (us) : 87.133

OLD Forward Execution Time (us) : 873.764
NEW Forward Execution Time (us) : 865.075

OLD Forward Execution Time (us) : 1746.831
NEW Forward Execution Time (us) : 1730.252

OLD Forward Execution Time (us) : 2619.303
NEW Forward Execution Time (us) : 2598.717

OLD Forward Execution Time (us) : 52.063
NEW Forward Execution Time (us) : 7.904

OLD Forward Execution Time (us) : 52.275
NEW Forward Execution Time (us) : 8.118

OLD Forward Execution Time (us) : 51.896
NEW Forward Execution Time (us) : 7.938

OLD Forward Execution Time (us) : 51.745
NEW Forward Execution Time (us) : 7.922

OLD Forward Execution Time (us) : 52.575
NEW Forward Execution Time (us) : 13.299

OLD Forward Execution Time (us) : 52.090
NEW Forward Execution Time (us) : 8.015
```
Pull Request resolved: pytorch#74129
Approved by: https://github.com/ngimel
@jeffdaily jeffdaily merged commit 539d476 into release/1.10 Apr 8, 2022
jithunnair-amd pushed a commit to jithunnair-amd/pytorch that referenced this pull request Sep 20, 2022
revert d5ca53c (pytorch#46097).  The changes only affect ROCm.  Reverts a work-around for a compiler performance issue that is no longer needed.

`python -m pt.cat_test --tag_filter all --device cuda`

```

OLD Forward Execution Time (us) : 48.833
NEW Forward Execution Time (us) : 8.318

OLD Forward Execution Time (us) : 54.508
NEW Forward Execution Time (us) : 23.824

OLD Forward Execution Time (us) : 52.117
NEW Forward Execution Time (us) : 14.942

OLD Forward Execution Time (us) : 98.790
NEW Forward Execution Time (us) : 74.334

OLD Forward Execution Time (us) : 102.063
NEW Forward Execution Time (us) : 76.008

OLD Forward Execution Time (us) : 167.786
NEW Forward Execution Time (us) : 123.679

OLD Forward Execution Time (us) : 98.320
NEW Forward Execution Time (us) : 67.436

OLD Forward Execution Time (us) : 91.484
NEW Forward Execution Time (us) : 59.230

OLD Forward Execution Time (us) : 109.569
NEW Forward Execution Time (us) : 76.557

OLD Forward Execution Time (us) : 106.603
NEW Forward Execution Time (us) : 87.635

OLD Forward Execution Time (us) : 106.693
NEW Forward Execution Time (us) : 88.902

OLD Forward Execution Time (us) : 110.881
NEW Forward Execution Time (us) : 94.361

OLD Forward Execution Time (us) : 122.925
NEW Forward Execution Time (us) : 123.046

OLD Forward Execution Time (us) : 272.442
NEW Forward Execution Time (us) : 271.932

OLD Forward Execution Time (us) : 457.329
NEW Forward Execution Time (us) : 456.767

OLD Forward Execution Time (us) : 117.688
NEW Forward Execution Time (us) : 87.133

OLD Forward Execution Time (us) : 873.764
NEW Forward Execution Time (us) : 865.075

OLD Forward Execution Time (us) : 1746.831
NEW Forward Execution Time (us) : 1730.252

OLD Forward Execution Time (us) : 2619.303
NEW Forward Execution Time (us) : 2598.717

OLD Forward Execution Time (us) : 52.063
NEW Forward Execution Time (us) : 7.904

OLD Forward Execution Time (us) : 52.275
NEW Forward Execution Time (us) : 8.118

OLD Forward Execution Time (us) : 51.896
NEW Forward Execution Time (us) : 7.938

OLD Forward Execution Time (us) : 51.745
NEW Forward Execution Time (us) : 7.922

OLD Forward Execution Time (us) : 52.575
NEW Forward Execution Time (us) : 13.299

OLD Forward Execution Time (us) : 52.090
NEW Forward Execution Time (us) : 8.015
```
Pull Request resolved: pytorch#74129
Approved by: https://github.com/ngimel
jithunnair-amd pushed a commit that referenced this pull request Sep 28, 2022
revert d5ca53c (pytorch#46097).  The changes only affect ROCm.  Reverts a work-around for a compiler performance issue that is no longer needed.

`python -m pt.cat_test --tag_filter all --device cuda`

```

OLD Forward Execution Time (us) : 48.833
NEW Forward Execution Time (us) : 8.318

OLD Forward Execution Time (us) : 54.508
NEW Forward Execution Time (us) : 23.824

OLD Forward Execution Time (us) : 52.117
NEW Forward Execution Time (us) : 14.942

OLD Forward Execution Time (us) : 98.790
NEW Forward Execution Time (us) : 74.334

OLD Forward Execution Time (us) : 102.063
NEW Forward Execution Time (us) : 76.008

OLD Forward Execution Time (us) : 167.786
NEW Forward Execution Time (us) : 123.679

OLD Forward Execution Time (us) : 98.320
NEW Forward Execution Time (us) : 67.436

OLD Forward Execution Time (us) : 91.484
NEW Forward Execution Time (us) : 59.230

OLD Forward Execution Time (us) : 109.569
NEW Forward Execution Time (us) : 76.557

OLD Forward Execution Time (us) : 106.603
NEW Forward Execution Time (us) : 87.635

OLD Forward Execution Time (us) : 106.693
NEW Forward Execution Time (us) : 88.902

OLD Forward Execution Time (us) : 110.881
NEW Forward Execution Time (us) : 94.361

OLD Forward Execution Time (us) : 122.925
NEW Forward Execution Time (us) : 123.046

OLD Forward Execution Time (us) : 272.442
NEW Forward Execution Time (us) : 271.932

OLD Forward Execution Time (us) : 457.329
NEW Forward Execution Time (us) : 456.767

OLD Forward Execution Time (us) : 117.688
NEW Forward Execution Time (us) : 87.133

OLD Forward Execution Time (us) : 873.764
NEW Forward Execution Time (us) : 865.075

OLD Forward Execution Time (us) : 1746.831
NEW Forward Execution Time (us) : 1730.252

OLD Forward Execution Time (us) : 2619.303
NEW Forward Execution Time (us) : 2598.717

OLD Forward Execution Time (us) : 52.063
NEW Forward Execution Time (us) : 7.904

OLD Forward Execution Time (us) : 52.275
NEW Forward Execution Time (us) : 8.118

OLD Forward Execution Time (us) : 51.896
NEW Forward Execution Time (us) : 7.938

OLD Forward Execution Time (us) : 51.745
NEW Forward Execution Time (us) : 7.922

OLD Forward Execution Time (us) : 52.575
NEW Forward Execution Time (us) : 13.299

OLD Forward Execution Time (us) : 52.090
NEW Forward Execution Time (us) : 8.015
```
Pull Request resolved: pytorch#74129
Approved by: https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant