Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch][Winograd] Winograd kernel been selected has caused issue with test_Conv2d_naive_groups_cuda_float16 #2492

Closed
junliume opened this issue Oct 30, 2023 · 18 comments · Fixed by #2695

Comments

@junliume
Copy link
Collaborator

[Summary]

Winograd kernels are by design aiming performance by sacrificing numerical accuracy.
However, in this case for very small and non-practical case, selecting winograd kernels have caused test_Conv2d_naive_groups_cuda_float16 to fail.

Question:

  • test_Conv2d_naive_groups_cuda_float16 has keyword naive in it, does it expect naive implementations to begin with?
  • @Kirpich30000 should winograd kernels have issues with such cases? i.e. -H 6 -W 6 -k 2
MIOpenDriver convfp16 -n 2 -c 2 -H 6 -W 6 -k 2 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -S 0
Forward Conv solutions available: 2
- id: 84 algo: 3, time: 10 ms, ws: 0, name: ConvBinWinogradRxSf2x3g1
- id: 107 algo: 5, time: 20 ms, ws: 1280, name: ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC
MIOpen Forward Conv. Algorithm: 3, Solution: 84/ConvBinWinogradRxSf2x3g1
GPU Kernel Time Forward Conv. Elapsed: 0.015378 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv3x3u1, 2, 2, 4, 4, 3, 3, 2,  2304, 360, 128, 0, 0, 0.015378
Forward Convolution Verifies OK on CPU reference (0.000340009)

[Observation and Steps to reproduce]:

To Reproduce:

PYTORCH_TEST_WITH_ROCM=1 python3 nn/test_convolution.py --use-pytest --verbose -k test_Conv2d_naive_groups_cuda_float16
Docker Images:

ROCM 5.6: rocm/pytorch:rocm5.6_ubuntu20.04_py3.8_pytorch_2.0.1
PyTorch Installed at /var/lib/jenkins/pytorch/test
ROCM 5.7:  compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-5.7:86_ubuntu20.04_py3.9_pytorch_rocm5.7_internal_testing_55fbbdf
Original image: rocm/pytorch-private:86_ubuntu20.04_py3.9_pytorch_rocm5.7_internal_testing_55fbbdf
PyTorch Installed at /var/lib/jenkins/pytorch/test

NOTE: tolerance has already been raised to 1e-1, You need to run git revert e9b273df57b240f14ead07b5fda97bdf2be6673a to see the error
Expected Output:

nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float16 PASSED

Actual Output:

Mismatched elements: 47 / 128 (36.7%)
Greatest absolute difference: 0.0009765625 at index (0, 2, 2, 1) (up to 1e-05 allowed)
Greatest relative difference: 0.0999755859375 at index (0, 0, 2, 0) (up to 0.001 allowed)
@junliume
Copy link
Collaborator Author

@JehandadKhan @atamazov F.Y.I.

@JehandadKhan
Copy link
Collaborator

I dont think the practicality of the convolution operation is relevant here, however, if the client wishes to enforce certain numerical behavior, then they can filter out numerically inferior algorithms when MIOpen returns the result, instead of defaulting to the first element in the results of the miopenFindConvolution* call.

@atamazov
Copy link
Contributor

@JehandadKhan Absolutely. However we do not know what is the actual intent of the test. The one may assume that the client wants to validate the precision of the fastest convolution available in MIOpen (i.e. the one which is normally used for computations). If this is so, then choosing some other (not the fastest, but more accurate) solution in the test does not look like a proper solution.

From the other hand, the problem config used in the test seems very small and quite far from any convolution used in the real networks. This means that we can make Winograd inapplicable for this config and the test will pass without errors, while the performance of the real networks won't be affected. However, I am far from suggesting a hack just to make some test passing ;)

In fact, we won't be able to figure out what the problem is until we see the actual values that lead to large absolute/relative differences. Let me explain with an example: Let’s say we consider a convolution that performs calculations with an error of 1ULP to be sufficient.

Let's assume that some test expects value near max FP16 (65504); max-1ULP = 65472, which gives absolute diff 32 which is far greater than 1e-05. We can easily run into similar situation with small values: the smallest positive denorm 5.97e-8, the adding +1ULP gives ~1.2e-7 and relative difference (computed as (1.2e-7 - 5.97e-8)/(1.2e-7 + 5.97e-8)/2) is ~0.168 which is again far greater than 1e-3.

Proposal:

  • Ask @Kirpich30000 if precision of ConvBinWinogradRxSf2x3g1 may degrade for small configs like -n 2 -c 2 -H 6 -W 6 -k 2 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1
  • If the kernel is Ok, then let's investigate the test in detail.

💡

@Kirpich30000
Copy link
Contributor

Hi, this geometry should be fine with Winograd. Since the test is rather small we could analyze the actual data (reference input, filter, output and Winograd output) if the dumps are available.

@wenchenvincent
Copy link

@JehandadKhan Absolutely. However we do not know what is the actual intent of the test. The one may assume that the client wants to validate the precision of the fastest convolution available in MIOpen (i.e. the one which is normally used for computations). If this is so, then choosing some other (not the fastest, but more accurate) solution in the test does not look like a proper solution.

From the other hand, the problem config used in the test seems very small and quite far from any convolution used in the real networks. This means that we can make Winograd inapplicable for this config and the test will pass without errors, while the performance of the real networks won't be affected. However, I am far from suggesting a hack just to make some test passing ;)

In fact, we won't be able to figure out what the problem is until we see the actual values that lead to large absolute/relative differences. Let me explain with an example: Let’s say we consider a convolution that performs calculations with an error of 1ULP to be sufficient.

Let's assume that some test expects value near max FP16 (65504); max-1ULP = 65472, which gives absolute diff 32 which is far greater than 1e-05. We can easily run into similar situation with small values: the smallest positive denorm 5.97e-8, the adding +1ULP gives ~1.2e-7 and relative difference (computed as (1.2e-7 - 5.97e-8)/(1.2e-7 + 5.97e-8)/2) is ~0.168 which is again far greater than 1e-3.

Proposal:

  • Ask @Kirpich30000 if precision of ConvBinWinogradRxSf2x3g1 may degrade for small configs like -n 2 -c 2 -H 6 -W 6 -k 2 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1
  • If the kernel is Ok, then let's investigate the test in detail.

💡

I think this is the way to go. We need to first conduct error analysis. If this is a config that is not proper for winograd, we should just disable winograd when running the algorithm finding heuristic in MIOpen.

@CAHEK7
Copy link
Contributor

CAHEK7 commented Oct 31, 2023

I can barely imagine a real case for precision based heuristic.
There is no external api to point precision limitations, precision itself can be calculated theoretically but the real error heavily depends on the input data (I've already faced this while touched random generator for MIOpen tests).

@JehandadKhan
Copy link
Collaborator

@atamazov Can you please create a PR to restrict the applicability of this solver in coordination with @Kirpich30000 ?

@atamazov
Copy link
Contributor

@wenchenvincent

If this is a config that is not proper for winograd, we should just disable winograd when running the algorithm finding heuristic in MIOpen.

It is proper. However, as per @JehandadKhan's suggestion, maybe we'll temporarily restrict applicability of the Winograd solver, as a workaround, until the investigation of test failure is completed and a full-blown fix is done.

@Kirpich30000
Copy link
Contributor

Kirpich30000 commented Nov 4, 2023

@wenchenvincent @JehandadKhan @junliume

I've dumped output values and analyzed the data.

  1. Winograd kernel doesn't have any unexpected precision loss on that test
  2. The test performs convolution with G=2 using different methods (single conv operation and 2 separate nongrouped convolutions) and check if outputs of both methods are equal with some tolerance. It implicitly assumes that both methods would have the same numerical properties which is not true if different implementations would be used.
  3. The test has few weak points. For example:
  • It doesn't check if output is correct. The test would pass if miopen would always return zeros as output.
  • If we replace Winograd with 64-bit precise implementation, then the test would still fail, because difference with implicit gemm wouldn't fit in allowed range (10e-5 abs and 10e-3 rel)

One possible solution is to change data initialization to patterns which guarantee to have bit exact results (that is possible for direct convolution and f2x3, f3x2 Winograd convolutions). If such change is not considered, then some workaround on MIOpen side is needed to peek the same algorithm for the tested convolutions. It doesn't matter if it be Winograd, igemm, FFT or any other possible approach, but it should be the same one for both methods used in the test. Some other workarounds might be possible.

@atamazov
Copy link
Contributor

atamazov commented Nov 4, 2023

@wenchenvincent @JehandadKhan @junliume Right now the test has FP16 input data distribution that, in fact, requires output values to be bit exact in many cases. This is quite questionable (see #2492 (comment))

It is highly recommended to increase max abs diff form 1e-5 to 1e-3 and relative diff to 2e-2 in order to allow at least ~1ULP output deviation for FP16. That should reduce probability of false failures in this test provided that input data distribution remains the same.

@atamazov
Copy link
Contributor

atamazov commented Nov 4, 2023

@junliume @JehandadKhan @wenchenvincent @CAHEK7 W/A is provided in PR #2507, you are welcome to review.

@wenchenvincent
Copy link

@wenchenvincent @JehandadKhan @junliume Right now the test has FP16 input data distribution that, in fact, requires output values to be bit exact in many cases. This is quite questionable (see #2492 (comment))

It is highly recommended to increase max abs diff form 1e-5 to 1e-3 and relative diff to 2e-2 in order to allow at least ~1ULP output deviation for FP16. That should reduce probability of false failures in this test provided that input data distribution remains the same.

@atamazov @Kirpich30000 Thanks for looking into it. Do you have some references for how to derive the error tolerance for Winograd? If we're confident that we have a sound analysis of the error tolerance, then maybe we should convince Pytorch upstream to raise the error tolerance for the unit test.

junliume pushed a commit that referenced this issue Nov 7, 2023
…loss is huge (#2507)

* workaround_issue_2492(01) Disable ConvBinWinoRxS when granularity loss is > 0.995 (performance drops 200 times)

* workaround_issue_2492(02) Allow disabing the W/A by setting MIOPEN_DEBUG_WORKAROUND_ISSUE_2493=0 in the env.
junliume pushed a commit that referenced this issue Nov 10, 2023
* workaround_issue_2492(01) Disable ConvBinWinoRxS when granularity loss is > 0.995 (performance drops 200 times)

* workaround_issue_2492(02) Allow disabing the W/A by setting MIOPEN_DEBUG_WORKAROUND_ISSUE_2493=0 in the env.

* workaround_issue_2492(03) [debug] Disable MIOPEN_DEBUG_WORKAROUND_ISSUE_2493 during driver warm-up.

* workaround_issue_2492(04) [quality] Make the compuation of max granularity loss more clear.

* workaround_issue_2492_01(02) [debug] Log granularity loss when ConvBinWinogradRxSf2x3* solver is skipped.

* workaround_issue_2492_01(03) [tests] test_db_sync: Disable WORKAROUND_ISSUE_2493 via environment. Support reading legacy fdb (WORKAROUND_ISSUE_1987). Allow FDB testing on gfx1030 (SKIP_KDB_PDB_TESTING). Add W/A for ConvOclDirectFwdFused on gfx1030. Print number of failures per testing thread.

* workaround_issue_2492_01(04) Remove leftovers from gfx1030 testing

* workaround_issue_2492_01(05) More gfx1030 leftovers removed
github-actions bot pushed a commit that referenced this issue Dec 16, 2023
…loss is huge (#2507)

* workaround_issue_2492(01) Disable ConvBinWinoRxS when granularity loss is > 0.995 (performance drops 200 times)

* workaround_issue_2492(02) Allow disabing the W/A by setting MIOPEN_DEBUG_WORKAROUND_ISSUE_2493=0 in the env.
@junliume
Copy link
Collaborator Author

junliume commented Jan 25, 2024

@atamazov #2507 is causing a pretty large perf regression:
example:
default:

root@banff-cyxtera-s83-2:/opt/rocm-6.1.0-13361/bin# MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -t 1
MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -t 1
PRNG seed: 12345678
MIOpen Backward Weights Conv. Algorithm: 0, Solution: 102/GemmWrwUniversal
GPU Kernel Time Backward Weights Conv. Elapsed: 4.830571 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv3x3u1, 32, 48, 56, 56, 3, 3, 48,  86704128, 0, 0, 18, 0, 4.830571
Backward Convolution Weights Verifies OK on GPU reference (0.000739006 < 0.0164)

if we try forcing solution 53/ConvBinWinogradRxSf2x3:

root@banff-cyxtera-s83-2:/opt/rocm-6.1.0-13361/bin# MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -t 1 -S 53
MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -t 1 -S 53
PRNG seed: 12345678
Backward Weights Conv solutions available: 2
- id: 102 algo: 0, time: 5.1738 ms, ws: 2709504, name: GemmWrwUniversal
- id: 87 algo: 1, time: 13.1486 ms, ws: 0, name: ConvDirectNaiveConvWrw
Warning: Solution id (53) is not reported by the library. Trying it anyway...
MIOpen Backward Weights Conv. Algorithm: -1, Solution: 53/ConvBinWinogradRxSf2x3
GPU Kernel Time Backward Weights Conv. Elapsed: 1.158911 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv3x3u1, 32, 48, 56, 56, 3, 3, 48,  86704128, 0, 0, 75, 0, 1.158911
Backward Convolution Weights Verifies OK on GPU reference (5.28272e-05 < 0.0164)

and the reason for the solver not been applicable:

MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinogradRxS: Not applicable
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999997
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinogradRxSf3x2: Not applicable
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999993
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinogradRxSf2x3: Not applicable
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999993
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinogradRxSf2x3g1: Not applicable

@atamazov
Copy link
Contributor

atamazov commented Jan 25, 2024

@junliume

The root reason is that the System find-db is does not contain the faulty configs.

Look at the following logs. The first one is the default. The second one is the same run in NORMAL find mode that updates find-db. The 2rd one is the same as the 1st but with updated find-db. The gain is about 20 times, which is expected because the WTI of both Winograd and GEMM is terribly bad.

We have enough spare Solvers that can easily "beat" both GEMM and Winograd but these are not used in TunaNet and WTI Fallback:

  • MIOpen(HIP): Info2 [IsProblemSupported] TunaNet Inapplicable: Group count not 1
  • MIOpen(HIP): Info2 [GetSolutionsFallback] Using WTI Fallback

The questions are:

  • Is updating find-db feasible provided that time is limited (we need the fix ASAP)?
  • Are faulty configs popular enough to be included into Tuna database?
    • [Note] I see that number of groups is 48 which seems unusual
Navi21, Default Find mode (DYNAMIC_HYBRID), GemmWrwUniversal, 2.1ms
# ./bin/MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1
MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1030
MIOpen(HIP): Info [Handle] stream: 0x11eb4b0, device_id: 0
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = DYNAMIC_HYBRID(5)
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 6.0.23494, MIOpen version 3.1.0.852bd9205
MIOpen(HIP): Info [PrintVersion] HIPRTC v.9.0
MIOpen(HIP): Info [PrintVersionImpl] COMgr v.2.6.0, USE_HIP_PCH: 1
Warm-up: Wall-clock Total Time: 4680.08 ms, Find Algorithm: 5
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info [Measure] ReadonlyRamDb::Prefetch time: 45.9417 ms
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.006602 ms
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999986
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Warning [hip_mem_get_info_wrapper] hipMemGetInfo error, status: 1
MIOpen(HIP): Info [GetWorkSpaceSize] 2709504
PRNG seed: 12345678
MIOpen(HIP): Info [FindConvBwdWeightsAlgorithm] requestAlgoCount = 2, workspace = 2709504
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999986
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Info [TryLoad] Find-db regenerating.
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999986
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Info [FindSolutionImpl] ConvDirectNaiveConvWrw (not searchable)
MIOpen(HIP): Info [FindSolutionImpl] GemmWrwUniversal (not searchable)
MIOpen(HIP): Info [EvaluateInvokers] ConvDirectNaiveConvWrw: naive_conv_nonpacked_wrw_nchw_half_double_half: 9.52495 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] Selected: ConvDirectNaiveConvWrw: naive_conv_nonpacked_wrw_nchw_half_double_half: 9.52495, workspace_sz = 0
MIOpen(HIP): Info [EvaluateInvokers] GemmWrwUniversal: : 3.39456 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] Selected: GemmWrwUniversal: : 3.39456, workspace_sz = 2709504
MIOpen(HIP): Info [FindConvolution] miopenConvolutionBwdWeightsAlgoGEMM 3.39456 2709504
MIOpen(HIP): Info [FindConvolution] miopenConvolutionBwdWeightsAlgoDirect       9.52495 0
MIOpen(HIP): Info [FindConvBwdWeightsAlgorithm] BWrW Chosen Algorithm: GemmWrwUniversal , 2709504, 3.39456
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 0, workspace = 2709504
Wall-clock Time Backward Weights Conv. Elapsed: 14.6591 ms, Auxiliary API calls: 3267.59 ms (GWSS: 53.3403)
MIOpen Backward Weights Conv. Algorithm: 0, Solution: 102/GemmWrwUniversal
GPU Kernel Time Backward Weights Conv. Elapsed: 2.118989 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv3x3u1, 32, 48, 56, 56, 3, 3, 48,  86704128, 0, 0, 41, 0, 2.118989
Backward Convolution Weights Verifies OK on GPU reference (0.000739006 < 0.0164)
Navi21, NORMAL Find mode, ConvOclBwdWrW2<1>, 0.1ms
# MIOPEN_FIND_MODE=1 ./bin/MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1 -S -1    MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1 -S -1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1030
MIOpen(HIP): Info [Handle] stream: 0x17064b0, device_id: 0
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = NORMAL(1)
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 6.0.23494, MIOpen version 3.1.0.852bd9205
MIOpen(HIP): Info [PrintVersion] HIPRTC v.9.0
MIOpen(HIP): Info [PrintVersionImpl] COMgr v.2.6.0, USE_HIP_PCH: 1
Warm-up: Wall-clock Total Time: 3430.56 ms, Find Algorithm: 5
MIOpen(HIP): Warning [hip_mem_get_info_wrapper] hipMemGetInfo error, status: 1
MIOpen(HIP): Info [GetWorkSpaceSize] 2709504
PRNG seed: 12345678
MIOpen(HIP): Info [FindConvBwdWeightsAlgorithm] requestAlgoCount = 2, workspace = 2709504
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.010109 ms
MIOpen(HIP): Info [TryLoad] Find-db regenerating.
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999986
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Info [IsApplicableBase] granularity_loss =0.999972
MIOpen(HIP): Info [FindSolutionImpl] ConvOclBwdWrW2<1>
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclBwdWrW2<1>
MIOpen(HIP): Info [FindSolutionImpl] ConvOclBwdWrW2<2>
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclBwdWrW2<2>
MIOpen(HIP): Info [FindSolutionImpl] ConvOclBwdWrW2<4>
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclBwdWrW2<4>
MIOpen(HIP): Info [FindSolutionImpl] ConvOclBwdWrW2<8>
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclBwdWrW2<8>
MIOpen(HIP): Info [FindSolutionImpl] ConvOclBwdWrW2<16>
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclBwdWrW2<16>
MIOpen(HIP): Info [FindSolutionImpl] ConvOclBwdWrW53 (not searchable)
MIOpen(HIP): Info [FindSolutionImpl] ConvDirectNaiveConvWrw (not searchable)
MIOpen(HIP): Info [FindSolutionImpl] ConvMlirIgemmWrW
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvMlirIgemmWrW
MIOpen(HIP): Info [FindSolutionImpl] GemmWrwUniversal (not searchable)
MIOpen(HIP): Info [EvaluateInvokers] ConvOclBwdWrW2<1>: MIOpenCvBwdWrW/MIOpenCvBwdWrW_rdc: 0.114278 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] ConvOclBwdWrW2<2>: MIOpenCvBwdWrW/MIOpenCvBwdWrW_rdc: 0.12024 >= 0.114278
MIOpen(HIP): Info [EvaluateInvokers] ConvOclBwdWrW2<4>: MIOpenCvBwdWrW/MIOpenCvBwdWrW_rdc: 0.1268 >= 0.114278
MIOpen(HIP): Info [EvaluateInvokers] ConvOclBwdWrW2<8>: MIOpenCvBwdWrW/MIOpenCvBwdWrW_rdc: 0.15752 >= 0.114278
MIOpen(HIP): Info [EvaluateInvokers] ConvOclBwdWrW2<16>: MIOpenCvBwdWrW/MIOpenCvBwdWrW_rdc: 0.26864 >= 0.114278
MIOpen(HIP): Info [EvaluateInvokers] ConvOclBwdWrW53: MIOpenCvBwdWrW/MIOpenCvBwdWrW_rdc: 0.26316 >= 0.114278
MIOpen(HIP): Info [EvaluateInvokers] ConvDirectNaiveConvWrw: naive_conv_nonpacked_wrw_nchw_half_double_half: 9.54243 >= 0.114278
MIOpen(HIP): Info [EvaluateInvokers] Selected: ConvOclBwdWrW2<1>: MIOpenCvBwdWrW/MIOpenCvBwdWrW_rdc: 0.114278, workspace_sz = 27648
MIOpen(HIP): Info [EvaluateInvokers] GemmWrwUniversal: : 4.06272 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] Selected: GemmWrwUniversal: : 4.06272, workspace_sz = 2709504
MIOpen(HIP): Info [EvaluateInvokers] ConvMlirIgemmWrW: mlir_gen_igemm_conv2d_v4r4_wrw0: 5.09888 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] Selected: ConvMlirIgemmWrW: mlir_gen_igemm_conv2d_v4r4_wrw0: 5.09888, workspace_sz = 0
MIOpen(HIP): Info [FindConvolution] miopenConvolutionBwdWeightsAlgoDirect       0.114278        27648
MIOpen(HIP): Info [FindConvolution] miopenConvolutionBwdWeightsAlgoGEMM 4.06272 2709504
MIOpen(HIP): Info [FindConvolution] miopenConvolutionBwdWeightsAlgoImplicitGEMM 5.09888 0
MIOpen(HIP): Info [FindConvBwdWeightsAlgorithm] BWrW Chosen Algorithm: ConvOclBwdWrW2<1> , 27648, 0.114278
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
Wall-clock Time Backward Weights Conv. Elapsed: 0.167394 ms, Auxiliary API calls: 5359.83 ms (GWSS: 0.334961)
MIOpen Backward Weights Conv. Algorithm: 1, Solution: 18/ConvOclBwdWrW2<1>
GPU Kernel Time Backward Weights Conv. Elapsed: 0.100951 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv3x3u1, 32, 48, 56, 56, 3, 3, 48,  86704128, 0, 0, 859, 0, 0.100951
Backward Convolution Weights Verifies OK on GPU reference (0.000154016 < 0.0164)
Navi21, Default Find mode (DYNAMIC_HYBRID) with UPDATED FIND-DB, ConvOclBwdWrW2<1>, 0.1ms
# ./bin/MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1 -S -1
MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1 -S -1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1030
MIOpen(HIP): Info [Handle] stream: 0x1c034b0, device_id: 0
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = DYNAMIC_HYBRID(5)
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 6.0.23494, MIOpen version 3.1.0.852bd9205
MIOpen(HIP): Info [PrintVersion] HIPRTC v.9.0
MIOpen(HIP): Info [PrintVersionImpl] COMgr v.2.6.0, USE_HIP_PCH: 1
Warm-up: Wall-clock Total Time: 3435.95 ms, Find Algorithm: 5
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info [Measure] ReadonlyRamDb::Prefetch time: 47.6199 ms
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.008766 ms
MIOpen(HIP): Warning [hip_mem_get_info_wrapper] hipMemGetInfo error, status: 1
MIOpen(HIP): Info [GetWorkSpaceSize] 27648
PRNG seed: 12345678
MIOpen(HIP): Info [FindConvBwdWeightsAlgorithm] requestAlgoCount = 2, workspace = 27648
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info [FindSolutionImpl] ConvOclBwdWrW2<1>
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclBwdWrW2<1>
MIOpen(HIP): Info [FindConvolution] miopenConvolutionBwdWeightsAlgoDirect       0.114278        27648
MIOpen(HIP): Info [FindConvBwdWeightsAlgorithm] BWrW Chosen Algorithm: ConvOclBwdWrW2<1> , 27648, 0.114278
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
MIOpen(HIP): Info [ConvolutionBackwardWeights] algo = 1, workspace = 27648
Wall-clock Time Backward Weights Conv. Elapsed: 0.172434 ms, Auxiliary API calls: 359.023 ms (GWSS: 47.9888)
MIOpen Backward Weights Conv. Algorithm: 1, Solution: 18/ConvOclBwdWrW2<1>
GPU Kernel Time Backward Weights Conv. Elapsed: 0.108084 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv3x3u1, 32, 48, 56, 56, 3, 3, 48,  86704128, 0, 0, 802, 0, 0.108084
Backward Convolution Weights Verifies OK on GPU reference (0.000154016 < 0.0164)

@junliume
Copy link
Collaborator Author

@atamazov I am getting error if forcing ConvOclBwdWrW2<1>

# MIOPEN_FIND_MODE=1  ./bin/MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1 -S 18
MIOpenDriver convfp16 -n 32 -c 48 -H 56 -W 56 -k 48 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 48 -F 4 -w 2 -t 1 -S 18
Warm-up: Wall-clock Total Time: 155.942 ms, Find Algorithm: 1, Immediate Algorithm: miopenConvolutionAlgoDirect[85]
PRNG seed: 12345678
Backward Weights Conv solutions available: 4
- id: 53 algo: 3, time: 1.16909 ms, ws: 0, name: ConvBinWinogradRxSf2x3
- id: 37 algo: 3, time: 2.29568 ms, ws: 0, name: ConvBinWinogradRxSf3x2
- id: 102 algo: 0, time: 6.45261 ms, ws: 2709504, name: GemmWrwUniversal
- id: 87 algo: 1, time: 13.7496 ms, ws: 0, name: ConvDirectNaiveConvWrw
Warning: Solution id (18) is not reported by the library. Trying it anyway...
MIOpen Error: /data/driver/MIOpen/src/ocl/convolutionocl.cpp:1225: The supplied solution id: ConvOclBwdWrW2<1> is not applicable to the current problem
RunBackwardGPU() FAILED, rc = 0x30000
Backward Convolution Weights FAILED: 1.79769e+308 > 0.0164

@atamazov
Copy link
Contributor

@junliume Maybe this solver is not applicable for your GPU or something else (e.g. ConvOclBwdWrW2 is deprecated for your GPU). Just run the same command with MIOPEN_FIND_MODE=1 and without -S option and you'll get the User find-db fully updated. Then run the driver in immediate mode and see what happens.

junliume added a commit that referenced this issue Jan 26, 2024
…y tensor" based one. (#2695)

* disable 2492 granularity_loss workaround and enable tiny_tensor workaround

* workaround_issue_2492_02(01) Macros to uppercase. Add doc for WORKAROUND_ISSUE_2492_TINY_TENSOR. Add conditions N<=4 and C<=4 to the "tiny tensor" W/A. Disable it during warmup, make it controllable by MIOPEN_DEBUG_WORKAROUND_ISSUE_2492.

* Update src/solver/conv_winoRxS.cpp

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
junliume added a commit that referenced this issue Jan 27, 2024
junliume added a commit that referenced this issue Jan 27, 2024
@atamazov
Copy link
Contributor

atamazov commented Jan 31, 2024

@junliume Can we finish triaging of the problem you've listed at #2492 (comment)? If you have time, of course. ConvOclBwdWrW2<*> is not disabled for GPUs up to MI200 (see #2080), so I would like to make sure that the following instruction yield the expected effect:

Just run the same command with MIOPEN_FIND_MODE=1 and without
-S option and you'll get the User find-db fully updated.
Then run the driver in immediate mode and see what happens.

@JehandadKhan [thought] Maybe it is worth enabling ConvOclBwdWrW2<*> for the new GPUs to avoid inadequate performance of WrW convolutions, especially when n_groups > 1 (we can even enable them only for group convolutions).

@atamazov
Copy link
Contributor

@junliume If you use Navi3X, then please export MIOPEN_DEBUG_ENABLE_DEPRECATED_SOLVERS=1.

@junliume junliume reopened this Jan 31, 2024
cderb added a commit that referenced this issue Jun 28, 2024
* [Windows] roctracer: disable on Windows (not supported) (#2404)

Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* [MI200] Refresh kdb using db_sync (#2411)

* Removal of convolution context (#2402)

* [Jenkins][CI] clean workspace after each stage (#2412)

* [tests] convert test_conv_igemm_mlir_fwd to gTest (#2291)

* Revert "cmake: enable finding installed ZStd library (#2362)"

This reverts commit e608b43.

* Revert "Revert "cmake: enable finding installed ZStd library (#2362)""

This reverts commit 1e325a7.

* Bump cryptography from 41.0.3 to 41.0.4 in /docs/.sphinx (#2408)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.3 to 41.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.3...41.0.4)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [SWDEV-416089][Doc] convolution API in MIOpen is restricted to alpha = 1.0 and beta = 0.0 (#2419)

* [HotFix] zstd dependency on multi Linux distributes (#2417)

* [CI][Jenkins] Enable rebooting in CI stages for CI stages with GPU use (#2420)

* conf_reboot

* configs_chg

* [Bug Fixes] miopen_rocblas_gemm_ex3 call - invoker cache extra elements - conv direct naive input cast (#2414)

* bugfixes
miopen_rocblas_gemm_ex3 call would always throw error
invoker cache adding extra elements
conv direct naive yielding incorrect input cast for kernel arg

* clear clang format issue

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [CI][Jenkins] Disabling smoke stages for CI branch runs (#2422)

* [Tests] disable solver ConvHipImplicitGemm3DGroupWrwXdlops on Vega10 (#2432)

* [Dockerfile] Upgrade cmake so that MIOpen docker can compile Composable Kernel (#2424)

* upgrade cmake so that MIOpen docker can compile Composable Kernel

* pin the cmake version to 3.27.5

* [Bug Fix] Compilation fix for -DMIOPEN_USE_ROCBLAS=Off (#2435)

* bg/lwpmiopen 193 : Integrate CK's batch norm backward training into non-tunable MIOpen solver (#2385)

* Reference kernel for 3D convolution for non-packed tensors (#2334)

* [Doc] Bump rocm-docs-core from 0.24.2 to 0.25.0 in /docs/.sphinx (#2434)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.24.2 to 0.25.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.24.2...v0.25.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix weight tensor intialization to replace old PR1950 (#2436)

* Add typecast to config key (#2413)

* add typecast value to config key, as optional arg to fdb_key

* fix clang-format issue

* Save space in db key and optimize code. Do not print casting value when casting is not actually necessary.

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* do not print casting to confkey when unnecessary, code cleanup, datatype rename

* move GetDataTypeName to problem_descrption_base.hpp, organize includes

* fix missing header

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* [Bugfix] Add cast swapping for swapped gemm inputs. (#2443)

* add swapping for cast types when swapping A+B for gemm

* [Bugfix] Kernel name fix, compilation err fix (#2446)

* Bump gitpython from 3.1.35 to 3.1.37 in /docs/.sphinx (#2445)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.35 to 3.1.37.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](gitpython-developers/GitPython@3.1.35...3.1.37)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add MIOPEN_BETA_API defines around f8 (#2430)

---------

Co-authored-by: JD <jahandad@gmail.com>

* Remove INT8x4 support (#2441)

* Test non-packed inputs with naive reference convolution kernels (#2394)

* 3D forward convolution solver with non-packed input tensors (#2418)

* Bump CK comit for ROCm 6.0 (#2439)

* [Jenkins][CI] Enabling Nightly Runs for Nightly Branch w/ build_smoke_(fp32 + aux1 + fp16_bf16_int8) (#2437)

* Remove ck solver's strides restriction (#2438)

* remove ck solver's strides restriction

* bn_cleanup: rename to in_strides

* [tests] remove std::rand usage (#2400)

* remove std::rand usage

* remove deprecated code

* Bump rocm-docs-core from 0.25.0 to 0.26.0 in /docs/.sphinx (#2451)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.25.0 to 0.26.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.25.0...v0.26.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [tests] test_tensor_api enhancement (#2450)

* [MI100][MI200] Kernel db updates (#2454)

* [tests] bg/ck_gfx_white_list: start using ck_utility::is_ck_whitelist to restrict tests to applicable platforms (#2458)

* bg/ck_gfx_white_list :  start using ck_utility::is_ck_whitelist function for all CK solvers

* bg/ck_gfx_white_list: fix review comments

* Rename function and return invalid solution instead of throwing an error (#2457)

* Fix the return code on the workspace API (#2460)

* regression: do not use file system symbolic/hard links (#2425)

Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>
Co-authored-by: JD <jahandad@gmail.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* bg/fix_ck_guard_in_bn : fix CK guard around bn (#2464)

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Bugfix] Layernorm Test: add missing hip_runtime.h include (#2465)

* Remove redundancy. Replace test_layernorm_test with test_layernorm (#2467)

* add missing hip_runtime.h include

* rename the test to remove redundancy

* [Doc] fix: update guides locations (#2456)

* [MI300] add CI test stages (#2396)

* [Windows] comgr: fix compiling with HIP SDK 5.5+ on Windows (#2364)

Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Windows] cmake: add option for building shared libraries (#2361)

* [Windows] remove unused sys/time.h header file (#2360)

Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* Finally remove INT8x4 support. (#2452)

* [Windows] fix sequences for Windows (#2359)

* Fusion Find (#2388)

* [tests] tensor_holder enhancement (#2449)

* [Windows] cmake: bump up the minimum required version to 3.15 (#2356)

Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* [HotFix] Missing MIO_BN_GFX110X when building kernels (#2473)

* [Tuning][MI100][MI200] Gold19 (#2470)

* TunaNet Integration: MI250x (#2421)

* [tests] remove direct std::random_device usage. (#2397)

* Add a check for packed tensors for convolution solvers (#2471)

* Bump urllib3 from 1.26.15 to 1.26.18 in /docs/.sphinx (#2462)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.15 to 1.26.18.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.15...1.26.18)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Dependency][CK] Bump CK Commit Hash (#2479)

Regular promotion to newer CK commit hash

* [Windows] half: fix compiling with HIP SDK 5.7+ on Windows (#2363)

* Added an API function for find 2.0 activation problem creation. (#2448)


---------

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>
Co-authored-by: Evgenii Averin <86725875+averinevg@users.noreply.github.com>

* [Windows] cmake: generate export header function (#2348)

* cmake: generate export header function

* incorporate review feedback

---------

Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* [tests] Refactor cache test to gTest (#1652)

* Fix transposed convolutions (#2487)

* [Windows] cmake: enable testing on Windows (#2380)

* [tests] write 3d test that uses 2d gpu kernel (#2401)

* Update dependencies and Dockerfile ROCm versions (#2463)

* update sqlite3 and boost

* fix MIOPEN_USE_COMPOSABLEKERNEL issue; more updates

* sync googletest version with FIN

* fix merge conflict

* suppress float-equal warning in gtest

* update FIn to the latest of its develop

* update docker rocm to 5.7.1

* [Tests] Removed support for OCL backend. Do not print rocminfo output unless GPU detection failed. Some cleanup. (#2490)

* [Windows] cmake: fix cmake/googletest.cmake on Windows (#2350)

* cmake: fix cmake/googletest.cmake on Windows

* incorporate review feedback

---------

Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* [Windows] cmake: use imported target for threads library instead of variable (#2355)

* Remove a check that was missed for packed tensors (#2495)

* Fix builds with rocBLAS that does not support F8 (#2480)

* fix-build-old-rocblas-no-ck(01) Fix builds with rocBLAS that does not support F8

* fix-build-old-rocblas-no-ck(02) CK BN bugfixes. Fixes for builds without CK.

* fix-build-old-rocblas-no-ck(03) Update fin up to the most recent commit in develop

* fix-build-old-rocblas-no-ck(06) Resolve review comment

* [Workaround] Issue 2496 - disabling the unit test case in wrw solver (#2497)

* [NFC] Replace miopen::ProblemDescription with conv::ProblemDescription, part 4 (#2410)

* Bump rocm-docs-core from 0.26.0 to 0.27.0 in /docs/.sphinx (#2501)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update the SyncDB tests to use multi-threading (#2407)

* Workaround for issue #2492 - disable ConvBinWinoRxS when granularity loss is huge (#2507)

* workaround_issue_2492(01) Disable ConvBinWinoRxS when granularity loss is > 0.995 (performance drops 200 times)

* workaround_issue_2492(02) Allow disabing the W/A by setting MIOPEN_DEBUG_WORKAROUND_ISSUE_2493=0 in the env.

* Workaround for issue #2492 part 2 (improvement) (#2510)

* workaround_issue_2492(01) Disable ConvBinWinoRxS when granularity loss is > 0.995 (performance drops 200 times)

* workaround_issue_2492(02) Allow disabing the W/A by setting MIOPEN_DEBUG_WORKAROUND_ISSUE_2493=0 in the env.

* workaround_issue_2492(03) [debug] Disable MIOPEN_DEBUG_WORKAROUND_ISSUE_2493 during driver warm-up.

* workaround_issue_2492(04) [quality] Make the compuation of max granularity loss more clear.

* workaround_issue_2492_01(02) [debug] Log granularity loss when ConvBinWinogradRxSf2x3* solver is skipped.

* workaround_issue_2492_01(03) [tests] test_db_sync: Disable WORKAROUND_ISSUE_2493 via environment. Support reading legacy fdb (WORKAROUND_ISSUE_1987). Allow FDB testing on gfx1030 (SKIP_KDB_PDB_TESTING). Add W/A for ConvOclDirectFwdFused on gfx1030. Print number of failures per testing thread.

* workaround_issue_2492_01(04) Remove leftovers from gfx1030 testing

* workaround_issue_2492_01(05) More gfx1030 leftovers removed

* Find 2.0 problem fusing (#2466)

* [Windows] fix for end of line issue on Windows (#2515)

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Windows] cmake: strip mingw32 support and cross-compilation out (#2352)

* [Windows] cmake: cleanup outdated code (#2349)

* [Windows] cmake: make building tests optional (#2351)

* [Tests] Relocate Googletest to dev requirements (#2512)

* Step 0: build pass but tests fail

* Make test build passes

* Adopt reviewer comments

* Integrate CK's layer norm into MIOpen solver (#2481)

* [Tests] Limit layernorm CK test applicability (#2528)

* Bump rocm-docs-core from 0.27.0 to 0.28.0 in /docs/.sphinx (#2534)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.27.0 to 0.28.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.27.0...v0.28.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Windows] cmake: a few fixes for multi-config generators (#2357)

* [clang-tidy] Use config file for clang-tidy configuration (#2489)

* [Tests] split teardown to runtest and verify in layernorm gtest (#2535)

split teardown to runtest and verify

* [Enhancement] Add checks on workspace params (#2498)

* added checks on workspace params

* addressed review comments

* fix release build warning

* Revert "[Enhancement] Add checks on workspace params (#2498)"

This reverts commit aa878a8.

* [HOTFIX] Fix offline HIP builds after PR #2357. (#2544)

* [conv] Remove clamping to MAX from CastTensor used in Bwd and WrW (#2538)

* conv-bwd-wrw-disable-clamping(01) [wip] Add clamping parameter to CastTensor() and set it to proper value.

* conv-bwd-wrw-disable-clamping(02) Clamping in SubTensorOpWithCastTensor1d.

* conv-bwd-wrw-disable-clamping(03) Clamping in SubTensorOpWithCastTensor2d/3d/4d/5d

* conv-bwd-wrw-disable-clamping(04) Removed WORKAROUND_ISSUE_2496

* [CK] Bump CK commit hash (#2540)

* [Windows] cmake: use built in operator to component-wise version comparison (#2353)

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Hotfix] when MLIR is not used in MIOpen (#2549)

* [Doc] Standardize documentation for ReadtheDocs (#2548)

Relates to ROCm/rocm-docs-core#330

* [quality] Fix: always define MIOPEN_LIBMLIR_SUPPORTS_GFX103X_DEFAULT (#2552)

* ConvOclDirectFwdGen: Fixed out-of-bounds memory access (#2546)

* Find 2.0 fusion (#2486)

* [Doc] Remove dated comments in test CmakeLists.txt (#2551)

* Bump cryptography from 41.0.4 to 41.0.6 in /docs/sphinx (#2561)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.4...41.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Test] fix gemm driver dataType initialization (#2558)

fix gemm dataType initialization and make gemm driver more dataType friendly

* Environment variables update (#2514)

* Use split CK libraries. (#2526)

* Sum enhancement in case of inner dim reduce (#2543)

* Initialize sum, modify layernorm

* FLOAT to FLOAT_ACCUM in kernel, fix kernel index and host test and, split teardown to runtest and verify

* remove unused var, int64_t to size_t, add two kernel profile, fix kernel index error, change reqd_work_item_cnt

* Use GetMaxComputeUnits, fix GetSumWorkspaceSize flow

* Add doxygen, add test case

* remove MIOPEN_BETA_API

* modify tolerance, add solver list

* alignment

* add IsImprovementOverROCm, reduce to sqrt(reduce), modify test case

* throw to return false in performance check, duplicate code to function, fix wrong allocate memmory size

* add experimental caution in doc, add memory copy check in driver, add detail in verify result of driver

* modify tolerance

* modify get input in driver

* [Windows] cmake: replace UNIX with NATIVE command for separate_arguments() (#2555)

* [Find 2.0] Bias for Find 2.0 fusion (#2525)

* [HotFix] Env Var set conflicts between #2543 and #2514 (#2571)

* [Doc] Bump rocm-docs-core from 0.29.0 to 0.30.0 in /docs/sphinx (#2572)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>

* [Doc] document NHWC convolution support (#2575)

* Forward, backward data and backward weight convolution solver with fp8/bfp8 compute datatype. (#2531)

* [HotFix] disable the f8 test cases that failed the f8 reference kernel in gtest (#2576)

* [HotFix] Disable f8 gtest cases that might cause CI fails. (#2577)

* [CK] Bump CK commit hash for staging (#2581)

* Fix the f8 reference kernel issue that failed CI (#2586)

* Patch necessary to make FP8 convolution compile with hiprtc (#2584)

* [Doc] Bump rocm-docs-core from 0.30.0 to 0.30.1 in /docs/sphinx (#2589)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.0 to 0.30.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.30.0...v0.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [DOC] Doxygen change: enable warning as error msg and add missing API comments (#2585)

* AI Based Parameter Prediction Model for conv_hip_igemm_group_fwd_xdlops Solver (#2523)

* [HotFix] KDB Files should not be in the runtime package (#2591)

* [Doc] Adding issue template (#2590)

* [Doc] Add documentations for non-packed tensors convolution (#2537)

* edit document of convolution

* address comments

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Doc] Fix broken links in README.md (#2595)

* Add nightly update workflow (#2579)

* Tests for RNN seq API (#2493)

* [HotFix] Fix Windows build with disabled CK (after #2523) (#2598)

* Properly guard CK usage by MIOPEN_USE_COMPOSABLEKERNEL defines

* Update src/solver/conv_hip_implicit_gemm_grouped_fwd_xdlops.cpp

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* [MIOpenDriver] Enabled gemmfp16. [tests] Added smoke test for fp16 and fp32 gemm. (#2592)

* fix-gemmfp16(01) [MIOpenDriver] Enable gemmfp16 in the driver

* fix-gemmfp16(02) [tests] Add smoke test for fp16 gemm

* [Doc] Fix URLs (ROCmSoftwarePlatform -> ROCm) in the doc, comments, and code. + more (#2597)

* Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code.

* (2) Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code.

* Fix incorrect link

* Fix links

* [HotFix] Bump CK commit hash for F8 patch (#2603)

* [Doc] Fix broken links in CONTRIBUTING.md (#2601)

* Fix broken rocmsoftwareplatform.github.io links in CONTRIBUTING.md

* Use new organization name for repoistory links

* [Windows] use find_package() for Eigen and frugally-deep (#2574)

* [Windows] enable compilation on Windows (#2570)

* [HotFix] 3D Group Conv Backward data and weight update. Failure noticed when pads and strides are not 1 (#2560)

* [CMake] fix find_package(... GLOBAL) for CMake < 3.24 (#2610)

* [HotFix][atamazov] multiple undefined behavior discovered with -fsanitize=undefined in DEV builds (#2609)

* fix-issue-2602(01) Fix for smoke_miopendriver_gemm

* Do not print output parameters in MIOPEN_LOG_FUNCTION calls.

---------

Co-authored-by: atamazov <artem.tamazov@gmail.com>

* [hipRTC] resolve symbol issues by explicitly link with hipRTC (#2612)

* explicitly link with hipRTC

* Update formatting

* Consider MIOPEN_USE_HIPRTC=Off

* Clean up

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Standardize workspace abstraction (#2524)

* [gtest] conversion for code coverage tests (#2580)

* [HotFix] revert #2580 and re-enable smoke tests (#2616)

* Revert "[gtest] conversion for code coverage tests (#2580)"

This reverts commit c5a2384.

* re-enable smoke tests in CI

* remove problematic github action

* [Windows] use find_package() for SQLite3 (#2564)

* [Doc] Bump rocm-docs-core from 0.30.1 to 0.30.2 in /docs/sphinx (#2620)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.1 to 0.30.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.30.1...v0.30.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Windows] use official ZStd package from Facebook (#2565)

* Remove MIOpenGEMM and MIOpenTensile leftovers (#2499)

* Remove FIN_OLD_PROBLEM_DESCRIPTION_COMPAT (#2503)

* [Jenkins] Add NOMLIR stage. [Workaround] Limit usage of gfx908 nodes in non-nightly builds (#2622)

* Get rid of legacy 2GiB offset limits in CallGemm*() and transpose*() internal APIs and kernels. (#2613)

* [BugFix] Proper fix for backward passes bwd/wrw for CK group conv 3d (#2619)

* [BugFix] asm igemm fwd kernel will have computation error when c <=4 and dilation_y > 1, workaround (#2625)

* Fused solver for Fwd Convolution with Residual add, Bias add and then activation function (#2517)

* Bump MIOpen version to 3.1.0 and update CI docker (#2519)

* [HotFix] resolve unknown type issue after #2517 (#2629)

* [Doc] Bump rocm-docs-core from 0.30.2 to 0.30.3 in /docs/sphinx (#2628)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.30.2...v0.30.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517. (#2630)

* [Jenkins][Tests] Add stage with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517 #2630. (#2631)

* [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517.

* add -DMIOPEN_USE_COMPOSABLEKERNEL=Off stage

* make NOCK stage anyAPU and build ONLY

* Adopt recommended changes

* rename config_targets to make_targets

* Extend GTest DISCOVERY_TIMEOUT to 5 mins

* [Tests] add unit test for #2624 (#2632)

* [gtest] Combine gtests into single binary. (#2599)

* [Windows] rocblas: disable Beta API on Windows for HIP < 5.7 (#2405)

* [tests] Limit applicability of ConvFwdBiasActivAPI/ConvFwdBiasResAddActivTest.ConvFusedAPI (#2635)

* [Tests] helper for evn variables update in gtests (#2605)

Co-authored-by: xinlipn <xinlipn@gmail.com>

* [Windows] fix compilation of math functions on Windows (#2568)

* [Windows] fix printf type incompatibility between type specifiers (#2569)

* Fix miopen package dependency roctracer etc (#2508)

* [Doc][NFC] added rocm v6, mi300, and default component (#2618)

* [Windows] add a class to allow os-agnostic process execution (#2567)

* [Windows] make BZip2 a required package (#2566)

* [Windows] add missing symbol export (#2556)

* add missing symbol export

* more missing exports

* fix format issues

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* [ROCm 6.1][hipRTC] Fix build failures. [quality] Reorg standard includes in HIP sources. (#2637)

* [WORKAROUND] Disable W/A for issue #1359 starting from ROCm 5.4.3. (#2225)

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Dep] Bump CK commit hash for staging (#2640)

* [Windows] default paths to user and system db files on Windows (#2365)

* Fix COMgr dependency in MIOpen package (#2645)

* [ROCm 6.0.1] Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors. (#2644)

* Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors (#2465 + more)

(cherry picked from commit 4f695d9)

* Remove duplicated includes.

* [HOTFIX] Adapt to changes in HIP Mainline 417 (possibly future 6.1 RC) (#2652)

* fix-rocm61rc417(01) Disable new kernel build warnings. [NFC] Sort headers properly.

* fix-rocm61rc417(02) [ROCm 6.1][HIPRTC] Use custom implementations instead of standard <limits>. This fixes build issues with ROCm 6.1.

* fix-rocm61rc417(03) [ROCm 6.1][HIPRTC][Bugfix] Fixed issue in miopen_limits.h that prevented the use of custom implementations.

* fix-rocm61rc417(04) [ROCm 6.1 RC][HIPRTC] Disable some of the custom implementations from <type_traits> (like `integral_constant`) for HIP mainline 417. This fixes some build issues.

* fix-rocm61rc417(05) [ROCm 6.1 RC][offline compiler] Removed "-mcpu" from build options. This resolves kernel build issues with HIP mainline 417 (offline compiler). Improved diagnostic messages output onto console after offline build failures.

* fix-rocm61rc417(06) [tests] Disable some testcase from handle_test as #2600 still persists in Hip Mainline 417.

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Correct parameter which prints unused flag in log fusion cmd (#2653)

* [MI300][Tuning] Tunings for SWDEV tickets (#2654)

* add initial tunings for mi300

* add test to db_sync

* [ROCm 6.0.1] Fix merge error in #2652 that affects #2644. (#2658)

* [CK] Bump CK commit hash for staging (#2659)

* Bump gitpython from 3.1.37 to 3.1.41 in /docs/sphinx (#2662)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.37 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](gitpython-developers/GitPython@3.1.37...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2666)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](pallets/jinja@3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Doc] Updated links to ROCm Repositories (#2667)

Changed <old-organization> to "ROCm".

* [SWDEV-433582] Search-proofed PrepareInvoker (#2661)

* [HotFix] fix clang format issue from #2661

* [FIN] update submodule (#2660)

* [Windows] replace [[gnu::noreturn]] with [[noreturn]] (#2656)

* [Windows] addkernels: fix operations on path for Windows (#2657)

* [Windows] clean up the setting of environment variables cross-platform (#2655)

* clean up the setting of environment variables cross-platform

* fix clang-tidy

* Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2676)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Argmax enhancement in case of inner dim reduce (#2583)

* [Test] Convert conv_igemm_dynamic_dlops etc to gTest (#2553)

* [Bugfix] Restore Missing ctests (#2649)

* [Windows] fix compilation on Windows (#2677)

* [Windows] cmake: unpack kernels into a build directory (#2347)

* Remove FIN_OLD_HANDLE_COMPAT and FIN_OLD_BINARY_CACHE_COMPAT (#2627)

* Rename transpose* kernels (leftover of #2613) (#2673)

* [CK] Bump CK commit hash for staging (#2683)

Update CK to the latest staging

* [zlib] Update rocm-recipes for more reliable zlib link (#2686)

* [OCL] Use OpenCL 2.0 while compiling kernels (#2691)

* Fix compilation on SELS/RHEL after #2657 merged (#2690)

* [BF16][FP8][BF8] Fixed some specializations from `<limits>` and `<cmath>` (#2669)

* conv::ProblemDescription: remove underscores, change return data type (#2685)

* Add 2D Group Convolution Backward Data and Weights update solvers. Simplify and unify 3d group conv tests (#2663)

* [HOTFIX] Disable "granularity loss" W/A for #2492 and add a new, "tiny tensor" based one. (#2695)

* disable 2492 granularity_loss workaround and enable tiny_tensor workaround

* workaround_issue_2492_02(01) Macros to uppercase. Add doc for WORKAROUND_ISSUE_2492_TINY_TENSOR. Add conditions N<=4 and C<=4 to the "tiny tensor" W/A. Disable it during warmup, make it controllable by MIOPEN_DEBUG_WORKAROUND_ISSUE_2492.

* Update src/solver/conv_winoRxS.cpp

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Clang-Format] Fix format issue

* Bump rocm-docs-core from 0.31.0 to 0.32.0 in /docs/sphinx (#2699)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.32.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.31.0...v0.32.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [MI300][Tuning] gold 20 (#2697)

* add gfx942 superbench winograd tunings, update gold version to 20

* update with more superbench tunings

* Remove support for ROCm < 5.6.0 (#2665)

* Remove support for ROCm < 5.7.0

* deprecate-rocm-less-5.7(03) Leftover that fixes build error with "-Werror"

* deprecate-rocm-less-5.7(04) Resolve review comment

* Bump rocm-docs-core from 0.32.0 to 0.33.0 in /docs/sphinx (#2707)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.32.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.32.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [COMGR] Use OpenCL 2.0. [HIPRTC] Provide min/max limits for int. Fix build errors related to min/max limits for BF16. (#2705)

* fix-rocm-mainline-issues-01(01) Removeed `constexpr` from numeric_limits<hip_bfloat16>::min()/max() as BF16 ctor provided by HIP can't be used in const expressions.

* fix-rocm-mainline-issues-01(02) [COMGR] Globally engage OpenCL 2.0

* fix-rocm-mainline-issues-01(03) [HIPRTC] Provide min/max limits for int

* [DOC] fix broken links in docs (#2696)

* lwpmiopen_521_correct_doc_issues: fix broken links in docs

* lwpmiopen_521_correct_doc_issues: remove citing

* [HotFix] Fix DB install after #2347 (#2702)


---------

Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* Add GroupNorm forward operation (#2623)

* fix not reporting LFS missing files (#2710)

* [HotFix][WHL] move the bfloat16 header to the proper guard (#2711)

* [HotFix] Update FindDB for finetuning (#2712)

* [CK] Update CK commit in requirements.txt for staging (#2713)

* [Tests] Fix Gtest single executable build issue (#2715) (#2717)

Add the missing build job to Jenkinsfile

Fix duplicate class name issue in Gtest

* [Windows] Do not use HIP runtime headers on Windows (#2719)

* don't use WORKAROUND_DONT_USE_CUSTOM_LIMITS on Windows

* don't use workaround SWDEV_413293 on Windows

* CI base docker updates to ROCm 6.0.2 (#2714)

* Softmax ocl refactoring (#2671)

* Add cat forward operation (#2562)

* [HotFix] Fix namespace conflict issue in gtest after #2562 (#2725)

* Bump cryptography from 41.0.6 to 42.0.0 in /docs/sphinx (#2729)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.6...42.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [DB Install] fix installation of *.fdb.txt and *.db files (#2728)

* Update CHANGELOG.md (#2720)

* bg/update_change_log_lwpmiopen_501: update change long till rocm 6.1.0 (MIOpen-3.1.0)

* bg/update_change_log_lwpmiopen_501: remove typo

* bg/update_change_log_lwpmiopen_501: fix broken link

* bg/update_change_log_lwpmiopen_501: second attempt to fix hyper link

* Create placeholder CODEOWNERS (#2718)

Add @JehandadKhan and @junliume as CODEOWNERS.

* [Solvers] Fix for #2663 ensure tensor dimensions are consumed by solvers correctly (#2716)

* [DOC] Add codeowners for documentation (#2692)

* Add codeowners for documentation

* Update CODEOWNERS

---------

Co-authored-by: samjwu <samjwu@users.noreply.github.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Bump rocm-docs-core from 0.33.0 to 0.33.2 in /docs/sphinx (#2733)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.33.0...v0.33.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix build after #2657 and #2690 (boost::filesystem) (#2732)

* [Improvements] Replace HasAtLeastOne64BitTensor() with AllTensorsDimsFitIntoInt() (#2731)

* Update CK-based 2d/3d  convolution solvers to support  nchw/ncdhw layout (#2429)

* Bump rocm-docs-core from 0.33.2 to 0.34.0 in /docs/sphinx (#2739)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.33.2...v0.34.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [BugFix] Set System KDB journal_mode to Off (#2724)

* [Tests] Converting test_conv3d_extra into GTest (#2554)

* [Tests] Convert test_rnn_vanilla , test_gru, test_rnn_extra and test_gru_extra gTests (#2550)

* [Doc] Removing unmaintained release notes (#2745)

* [CK] Update CK commit in requirements.txt for staging (#2747)

* [Tests][gtest] conversion for LSTM (#2545)

* Fix for issue #2734: Detect if "-fno-offload-uniform-block" works in HIP compiler. (#2743)

* fix-issue-2734 (01) Use "-fno-offload-uniform-block" only if HIP compiler supports it. Resolves #2734.

(cherry picked from commit 458c833)

Partially changes code from PR #2719 "Do not use HIP runtime headers on Windows"

# RESOLVED Conflicts:
#	CMakeLists.txt

* fix-issue-2734(02) Removed W/A from PR #2719 as it is no longer needed.

* Enable softmax solver based on attention-softmax implementation (#2737)

* [Tests] Replace test_conv_igemm_dynamic_xdlops_bwd with gtest (#2409)

* [Tests] Convert ctest to gtest for test_conv_for_implicit_gemm (#2513)

* [Tuning][MI300] for m9 tickets (#2754)

* [hipRTC] add lowest() for float to MIOpen custom limits (#2753)

* [hipRTC] add lowest() to MIOpen custom limits

* the earliest trace can be found together with numeric_limits<int>

* [Linux] Enhance Compiler flags to avoid Hardcoded ROCm Path (Part 1) (#2694)

* Bump rocm-docs-core from 0.34.0 to 0.34.2 in /docs/sphinx (#2755)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.34.0...v0.34.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>
Co-authored-by: Vasilii Filippov <DrizztDoUrden@users.noreply.github.com>
Co-authored-by: JD <jahandad@gmail.com>
Co-authored-by: xinlipn <xinlipn@gmail.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: mentat <108366729+bghimireamd@users.noreply.github.com>
Co-authored-by: Reid Kawaja <74506315+reidkwja@users.noreply.github.com>
Co-authored-by: amberhassaan <amber_474@yahoo.com>
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>
Co-authored-by: Daming Feng <dmfeng8898@gmail.com>
Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>
Co-authored-by: Evgenii Averin <86725875+averinevg@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: M. Saud Ul Hassan <68208941+msaudulhassan@users.noreply.github.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: saeid-rostami <123997133+saeid-rostami@users.noreply.github.com>
Co-authored-by: Seungman Han <120356720+seungmanhan@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Dmantri98 <109552294+Dmantri98@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: xu-shawn <50402888+xu-shawn@users.noreply.github.com>
Co-authored-by: Kamil Nasyrov <shurale.nkn@gmail.com>
Co-authored-by: jasberc <146053952+jasberc@users.noreply.github.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Co-authored-by: Kyeonghwan Ryu <89056320+kyeonghwanryu@users.noreply.github.com>
Co-authored-by: scerzh <102019268+scerzh@users.noreply.github.com>
Co-authored-by: Vsevolod Golovko <vsevolod.golovko2@dxc.com>
Co-authored-by: Jungkeun Kim <et16kr@gmail.com>
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: M.Emin Ozturk <ozturk.27@osu.edu>
Co-authored-by: arvindcheru <90783369+arvindcheru@users.noreply.github.com>
cderb added a commit that referenced this issue Jun 28, 2024
* [quality] Fix: always define MIOPEN_LIBMLIR_SUPPORTS_GFX103X_DEFAULT (#2552)

* ConvOclDirectFwdGen: Fixed out-of-bounds memory access (#2546)

* Find 2.0 fusion (#2486)

* [Doc] Remove dated comments in test CmakeLists.txt (#2551)

* Bump cryptography from 41.0.4 to 41.0.6 in /docs/sphinx (#2561)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.4...41.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Test] fix gemm driver dataType initialization (#2558)

fix gemm dataType initialization and make gemm driver more dataType friendly

* Environment variables update (#2514)

* Use split CK libraries. (#2526)

* Sum enhancement in case of inner dim reduce (#2543)

* Initialize sum, modify layernorm

* FLOAT to FLOAT_ACCUM in kernel, fix kernel index and host test and, split teardown to runtest and verify

* remove unused var, int64_t to size_t, add two kernel profile, fix kernel index error, change reqd_work_item_cnt

* Use GetMaxComputeUnits, fix GetSumWorkspaceSize flow

* Add doxygen, add test case

* remove MIOPEN_BETA_API

* modify tolerance, add solver list

* alignment

* add IsImprovementOverROCm, reduce to sqrt(reduce), modify test case

* throw to return false in performance check, duplicate code to function, fix wrong allocate memmory size

* add experimental caution in doc, add memory copy check in driver, add detail in verify result of driver

* modify tolerance

* modify get input in driver

* [Windows] cmake: replace UNIX with NATIVE command for separate_arguments() (#2555)

* [Find 2.0] Bias for Find 2.0 fusion (#2525)

* [HotFix] Env Var set conflicts between #2543 and #2514 (#2571)

* [Doc] Bump rocm-docs-core from 0.29.0 to 0.30.0 in /docs/sphinx (#2572)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>

* [Doc] document NHWC convolution support (#2575)

* Forward, backward data and backward weight convolution solver with fp8/bfp8 compute datatype. (#2531)

* [HotFix] disable the f8 test cases that failed the f8 reference kernel in gtest (#2576)

* [HotFix] Disable f8 gtest cases that might cause CI fails. (#2577)

* [CK] Bump CK commit hash for staging (#2581)

* Fix the f8 reference kernel issue that failed CI (#2586)

* Patch necessary to make FP8 convolution compile with hiprtc (#2584)

* [Doc] Bump rocm-docs-core from 0.30.0 to 0.30.1 in /docs/sphinx (#2589)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.0 to 0.30.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.0...v0.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [DOC] Doxygen change: enable warning as error msg and add missing API comments (#2585)

* AI Based Parameter Prediction Model for conv_hip_igemm_group_fwd_xdlops Solver (#2523)

* [HotFix] KDB Files should not be in the runtime package (#2591)

* [Doc] Adding issue template (#2590)

* [Doc] Add documentations for non-packed tensors convolution (#2537)

* edit document of convolution

* address comments

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Doc] Fix broken links in README.md (#2595)

* Add nightly update workflow (#2579)

* Tests for RNN seq API (#2493)

* [HotFix] Fix Windows build with disabled CK (after #2523) (#2598)

* Properly guard CK usage by MIOPEN_USE_COMPOSABLEKERNEL defines

* Update src/solver/conv_hip_implicit_gemm_grouped_fwd_xdlops.cpp

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* [MIOpenDriver] Enabled gemmfp16. [tests] Added smoke test for fp16 and fp32 gemm. (#2592)

* fix-gemmfp16(01) [MIOpenDriver] Enable gemmfp16 in the driver

* fix-gemmfp16(02) [tests] Add smoke test for fp16 gemm

* [Doc] Fix URLs (ROCmSoftwarePlatform -> ROCm) in the doc, comments, and code. + more (#2597)

* Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code.

* (2) Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code.

* Fix incorrect link

* Fix links

* [HotFix] Bump CK commit hash for F8 patch (#2603)

* [Doc] Fix broken links in CONTRIBUTING.md (#2601)

* Fix broken rocmsoftwareplatform.github.io links in CONTRIBUTING.md

* Use new organization name for repoistory links

* [Windows] use find_package() for Eigen and frugally-deep (#2574)

* [Windows] enable compilation on Windows (#2570)

* [HotFix] 3D Group Conv Backward data and weight update. Failure noticed when pads and strides are not 1 (#2560)

* [CMake] fix find_package(... GLOBAL) for CMake < 3.24 (#2610)

* [HotFix][atamazov] multiple undefined behavior discovered with -fsanitize=undefined in DEV builds (#2609)

* fix-issue-2602(01) Fix for smoke_miopendriver_gemm

* Do not print output parameters in MIOPEN_LOG_FUNCTION calls.

---------

Co-authored-by: atamazov <artem.tamazov@gmail.com>

* [hipRTC] resolve symbol issues by explicitly link with hipRTC (#2612)

* explicitly link with hipRTC

* Update formatting

* Consider MIOPEN_USE_HIPRTC=Off

* Clean up

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Standardize workspace abstraction (#2524)

* [gtest] conversion for code coverage tests (#2580)

* [HotFix] revert #2580 and re-enable smoke tests (#2616)

* Revert "[gtest] conversion for code coverage tests (#2580)"

This reverts commit c5a2384dc0f29682ed51aeccf9b981dbdf7e058f.

* re-enable smoke tests in CI

* remove problematic github action

* [Windows] use find_package() for SQLite3 (#2564)

* [Doc] Bump rocm-docs-core from 0.30.1 to 0.30.2 in /docs/sphinx (#2620)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.1 to 0.30.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.1...v0.30.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Windows] use official ZStd package from Facebook (#2565)

* Remove MIOpenGEMM and MIOpenTensile leftovers (#2499)

* Remove FIN_OLD_PROBLEM_DESCRIPTION_COMPAT (#2503)

* [Jenkins] Add NOMLIR stage. [Workaround] Limit usage of gfx908 nodes in non-nightly builds (#2622)

* Get rid of legacy 2GiB offset limits in CallGemm*() and transpose*() internal APIs and kernels. (#2613)

* [BugFix] Proper fix for backward passes bwd/wrw for CK group conv 3d (#2619)

* [BugFix] asm igemm fwd kernel will have computation error when c <=4 and dilation_y > 1, workaround (#2625)

* Fused solver for Fwd Convolution with Residual add, Bias add and then activation function (#2517)

* Bump MIOpen version to 3.1.0 and update CI docker (#2519)

* [HotFix] resolve unknown type issue after #2517 (#2629)

* [Doc] Bump rocm-docs-core from 0.30.2 to 0.30.3 in /docs/sphinx (#2628)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.2...v0.30.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517. (#2630)

* [Jenkins][Tests] Add stage with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517 #2630. (#2631)

* [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517.

* add -DMIOPEN_USE_COMPOSABLEKERNEL=Off stage

* make NOCK stage anyAPU and build ONLY

* Adopt recommended changes

* rename config_targets to make_targets

* Extend GTest DISCOVERY_TIMEOUT to 5 mins

* [Tests] add unit test for #2624 (#2632)

* [gtest] Combine gtests into single binary. (#2599)

* [Windows] rocblas: disable Beta API on Windows for HIP < 5.7 (#2405)

* [tests] Limit applicability of ConvFwdBiasActivAPI/ConvFwdBiasResAddActivTest.ConvFusedAPI (#2635)

* [Tests] helper for evn variables update in gtests (#2605)

Co-authored-by: xinlipn <xinlipn@gmail.com>

* [Windows] fix compilation of math functions on Windows (#2568)

* [Windows] fix printf type incompatibility between type specifiers (#2569)

* Fix miopen package dependency roctracer etc (#2508)

* [Doc][NFC] added rocm v6, mi300, and default component (#2618)

* [Windows] add a class to allow os-agnostic process execution (#2567)

* [Windows] make BZip2 a required package (#2566)

* [Windows] add missing symbol export (#2556)

* add missing symbol export

* more missing exports

* fix format issues

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* [ROCm 6.1][hipRTC] Fix build failures. [quality] Reorg standard includes in HIP sources. (#2637)

* [WORKAROUND] Disable W/A for issue #1359 starting from ROCm 5.4.3. (#2225)

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Dep] Bump CK commit hash for staging (#2640)

* [Windows] default paths to user and system db files on Windows (#2365)

* Fix COMgr dependency in MIOpen package (#2645)

* [ROCm 6.0.1] Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors. (#2644)

* Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors (#2465 + more)

(cherry picked from commit 4f695d975a2a6de2f167fc2925f3bad79fbaaf98)

* Remove duplicated includes.

* [HOTFIX] Adapt to changes in HIP Mainline 417 (possibly future 6.1 RC) (#2652)

* fix-rocm61rc417(01) Disable new kernel build warnings. [NFC] Sort headers properly.

* fix-rocm61rc417(02) [ROCm 6.1][HIPRTC] Use custom implementations instead of standard <limits>. This fixes build issues with ROCm 6.1.

* fix-rocm61rc417(03) [ROCm 6.1][HIPRTC][Bugfix] Fixed issue in miopen_limits.h that prevented the use of custom implementations.

* fix-rocm61rc417(04) [ROCm 6.1 RC][HIPRTC] Disable some of the custom implementations from <type_traits> (like `integral_constant`) for HIP mainline 417. This fixes some build issues.

* fix-rocm61rc417(05) [ROCm 6.1 RC][offline compiler] Removed "-mcpu" from build options. This resolves kernel build issues with HIP mainline 417 (offline compiler). Improved diagnostic messages output onto console after offline build failures.

* fix-rocm61rc417(06) [tests] Disable some testcase from handle_test as #2600 still persists in Hip Mainline 417.

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Correct parameter which prints unused flag in log fusion cmd (#2653)

* [MI300][Tuning] Tunings for SWDEV tickets (#2654)

* add initial tunings for mi300

* add test to db_sync

* [ROCm 6.0.1] Fix merge error in #2652 that affects #2644. (#2658)

* [CK] Bump CK commit hash for staging (#2659)

* Bump gitpython from 3.1.37 to 3.1.41 in /docs/sphinx (#2662)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.37 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.37...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2666)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Doc] Updated links to ROCm Repositories (#2667)

Changed <old-organization> to "ROCm".

* [SWDEV-433582] Search-proofed PrepareInvoker (#2661)

* [HotFix] fix clang format issue from #2661

* [FIN] update submodule (#2660)

* [Windows] replace [[gnu::noreturn]] with [[noreturn]] (#2656)

* [Windows] addkernels: fix operations on path for Windows (#2657)

* [Windows] clean up the setting of environment variables cross-platform (#2655)

* clean up the setting of environment variables cross-platform

* fix clang-tidy

* Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2676)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Argmax enhancement in case of inner dim reduce (#2583)

* [Test] Convert conv_igemm_dynamic_dlops etc to gTest (#2553)

* [Bugfix] Restore Missing ctests (#2649)

* [Windows] fix compilation on Windows (#2677)

* [Windows] cmake: unpack kernels into a build directory (#2347)

* Remove FIN_OLD_HANDLE_COMPAT and FIN_OLD_BINARY_CACHE_COMPAT (#2627)

* Rename transpose* kernels (leftover of #2613) (#2673)

* [CK] Bump CK commit hash for staging (#2683)

Update CK to the latest staging

* [zlib] Update rocm-recipes for more reliable zlib link (#2686)

* [OCL] Use OpenCL 2.0 while compiling kernels (#2691)

* Fix compilation on SELS/RHEL after #2657 merged (#2690)

* [BF16][FP8][BF8] Fixed some specializations from `<limits>` and `<cmath>` (#2669)

* conv::ProblemDescription: remove underscores, change return data type (#2685)

* Add 2D Group Convolution Backward Data and Weights update solvers. Simplify and unify 3d group conv tests (#2663)

* [HOTFIX] Disable "granularity loss" W/A for #2492 and add a new, "tiny tensor" based one. (#2695)

* disable 2492 granularity_loss workaround and enable tiny_tensor workaround

* workaround_issue_2492_02(01) Macros to uppercase. Add doc for WORKAROUND_ISSUE_2492_TINY_TENSOR. Add conditions N<=4 and C<=4 to the "tiny tensor" W/A. Disable it during warmup, make it controllable by MIOPEN_DEBUG_WORKAROUND_ISSUE_2492.

* Update src/solver/conv_winoRxS.cpp

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Clang-Format] Fix format issue

* Bump rocm-docs-core from 0.31.0 to 0.32.0 in /docs/sphinx (#2699)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.32.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.32.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [MI300][Tuning] gold 20 (#2697)

* add gfx942 superbench winograd tunings, update gold version to 20

* update with more superbench tunings

* Remove support for ROCm < 5.6.0 (#2665)

* Remove support for ROCm < 5.7.0

* deprecate-rocm-less-5.7(03) Leftover that fixes build error with "-Werror"

* deprecate-rocm-less-5.7(04) Resolve review comment

* Bump rocm-docs-core from 0.32.0 to 0.33.0 in /docs/sphinx (#2707)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.32.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.32.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [COMGR] Use OpenCL 2.0. [HIPRTC] Provide min/max limits for int. Fix build errors related to min/max limits for BF16. (#2705)

* fix-rocm-mainline-issues-01(01) Removeed `constexpr` from numeric_limits<hip_bfloat16>::min()/max() as BF16 ctor provided by HIP can't be used in const expressions.

* fix-rocm-mainline-issues-01(02) [COMGR] Globally engage OpenCL 2.0

* fix-rocm-mainline-issues-01(03) [HIPRTC] Provide min/max limits for int

* [DOC] fix broken links in docs (#2696)

* lwpmiopen_521_correct_doc_issues: fix broken links in docs

* lwpmiopen_521_correct_doc_issues: remove citing

* [HotFix] Fix DB install after #2347 (#2702)


---------

Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* Add GroupNorm forward operation (#2623)

* fix not reporting LFS missing files (#2710)

* [HotFix][WHL] move the bfloat16 header to the proper guard (#2711)

* [HotFix] Update FindDB for finetuning (#2712)

* [CK] Update CK commit in requirements.txt for staging (#2713)

* [Tests] Fix Gtest single executable build issue (#2715) (#2717)

Add the missing build job to Jenkinsfile

Fix duplicate class name issue in Gtest

* [Windows] Do not use HIP runtime headers on Windows (#2719)

* don't use WORKAROUND_DONT_USE_CUSTOM_LIMITS on Windows

* don't use workaround SWDEV_413293 on Windows

* CI base docker updates to ROCm 6.0.2 (#2714)

* Softmax ocl refactoring (#2671)

* Add cat forward operation (#2562)

* [HotFix] Fix namespace conflict issue in gtest after #2562 (#2725)

* Bump cryptography from 41.0.6 to 42.0.0 in /docs/sphinx (#2729)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.6...42.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [DB Install] fix installation of *.fdb.txt and *.db files (#2728)

* Update CHANGELOG.md (#2720)

* bg/update_change_log_lwpmiopen_501: update change long till rocm 6.1.0 (MIOpen-3.1.0)

* bg/update_change_log_lwpmiopen_501: remove typo

* bg/update_change_log_lwpmiopen_501: fix broken link

* bg/update_change_log_lwpmiopen_501: second attempt to fix hyper link

* Create placeholder CODEOWNERS (#2718)

Add @JehandadKhan and @junliume as CODEOWNERS.

* [Solvers] Fix for #2663 ensure tensor dimensions are consumed by solvers correctly (#2716)

* [DOC] Add codeowners for documentation (#2692)

* Add codeowners for documentation

* Update CODEOWNERS

---------

Co-authored-by: samjwu <samjwu@users.noreply.github.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Bump rocm-docs-core from 0.33.0 to 0.33.2 in /docs/sphinx (#2733)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.0...v0.33.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix build after #2657 and #2690 (boost::filesystem) (#2732)

* [Improvements] Replace HasAtLeastOne64BitTensor() with AllTensorsDimsFitIntoInt() (#2731)

* Update CK-based 2d/3d  convolution solvers to support  nchw/ncdhw layout (#2429)

* Bump rocm-docs-core from 0.33.2 to 0.34.0 in /docs/sphinx (#2739)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.2...v0.34.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [BugFix] Set System KDB journal_mode to Off (#2724)

* [Tests] Converting test_conv3d_extra into GTest (#2554)

* [Tests] Convert test_rnn_vanilla , test_gru, test_rnn_extra and test_gru_extra gTests (#2550)

* [Doc] Removing unmaintained release notes (#2745)

* [CK] Update CK commit in requirements.txt for staging (#2747)

* [Tests][gtest] conversion for LSTM (#2545)

* Fix for issue #2734: Detect if "-fno-offload-uniform-block" works in HIP compiler. (#2743)

* fix-issue-2734 (01) Use "-fno-offload-uniform-block" only if HIP compiler supports it. Resolves #2734.

(cherry picked from commit 458c8338175383a95a5c3f30c726798828f15ea8)

Partially changes code from PR #2719 "Do not use HIP runtime headers on Windows"

# RESOLVED Conflicts:
#	CMakeLists.txt

* fix-issue-2734(02) Removed W/A from PR #2719 as it is no longer needed.

* Enable softmax solver based on attention-softmax implementation (#2737)

* [Tests] Replace test_conv_igemm_dynamic_xdlops_bwd with gtest (#2409)

* [Tests] Convert ctest to gtest for test_conv_for_implicit_gemm (#2513)

* [Tuning][MI300] for m9 tickets (#2754)

* [hipRTC] add lowest() for float to MIOpen custom limits (#2753)

* [hipRTC] add lowest() to MIOpen custom limits

* the earliest trace can be found together with numeric_limits<int>

* [Linux] Enhance Compiler flags to avoid Hardcoded ROCm Path (Part 1) (#2694)

* Bump rocm-docs-core from 0.34.0 to 0.34.2 in /docs/sphinx (#2755)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.0...v0.34.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump cryptography from 42.0.0 to 42.0.2 in /docs/sphinx (#2759)

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.0...42.0.2)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix] enable 2d grouped fwd convolution support on mi300 (#2761)

* enable support on mi300

* Fix missing include files

* Fix header needed even for non-ck build

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Bump cryptography from 42.0.2 to 42.0.4 in /docs/sphinx (#2765)

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.2...42.0.4)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Implemented preparsing sqlite db to text format (#2722)

* Bump rocm-docs-core from 0.34.2 to 0.35.0 in /docs/sphinx (#2768)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.2 to 0.35.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.2...v0.35.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix] Fixed incorrectly generated files (#2769)

* Adding Link Dependencies to resolve missing symbols from pthread and dl referenced by sqlite (#2773)

* Adding library dependency dl for dlopen

* Adding link dependency to pthread

* RNN Inference MS (#2727)

* [Tuning][MI300] Find db update - Superbench/Winograd (#2780)

* [HotFix] fix failed error bugs in conv backward weight solvers (#2770)

* fix failed error bugs in 2d/3d conv backward weight solvers

* fix time issue in NCHW layout invoker

* code refactoring: define hip event profiler to reduce code duplicate

* delete comments

* fix tidy error

* address comments

* [CK] Bump CK commit hash for staging (#2784)

* Minor softmax improvements (#2782)

* using ostream instead of concatanation of strings

* Problem description slightly  changed. softmax driver patched

* Remove SetTensorLayout (#2787)

* Add heuristics tests for gfx90a architecture (#2772)

* Bump rocm-docs-core from 0.35.0 to 0.35.1 in /docs/sphinx (#2791)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.0 to 0.35.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.0...v0.35.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix][Format] Fix clang-format issue with tuna_net update

* [Tests] Convert test_conv_group, test_conv_extra and test_conv_3d to gTests (#2767)

* Convert test_conv_group to gTest

* Convert test_conv_extra and test_conv_3d to gTest

* Fix build

* [Windows] Fixing linking issue for sqlite2txt on Windows (#2793)

* MI300 TunaNet Integration (#2795)

* Dynamic workspace calcuation (#2779)

* calculate workspace size for winning solution at runtime

* update GetWorkSpaceSize to use solver workspace query instead of reading db

* fix clang-format issues

---------

Co-authored-by: Christopher Erb <Christopher.Erb@amd>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* RNN back weights update (#2794)

* [CI] Enabling navi32 Testing Stages (#2796)

* [HotFix] Changed text perfdbs to be actually installed when enabled #2722 (#2800)

* Bump CK commit hash for staging and update CI docker (#2777)

* [Windows] Upgrade class TmpDir (#2762)

* Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx (#2801)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.1 to 0.36.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix] Fix unpackdb after merging #2800 (#2802)

* [HIPRTC] Provide option to add/remove include directories to/from compiler flags (#2764)

* Provide option to add/remove include directories to/from compiler flags

The hip compiler flags are getting embedded in MIOpen shared library and the isystem include directories in the compiler flags are hard coded paths.
For the ROCm use case, build scripts will set the option to OFF, so that include directories will not be added to compiler flags. This will help in removing the hard coded path from the library
By default the option is set to ON.

* Set the defualt value of the option MIOPEN_HIP_COMPILER_USE_SYSTEM_INCLUDE_DIRECTORIES based on HIPRTC compiler usage

* Check HIP version as well to enable/disable the use of system include directories in  hip compiler flags

Use system include directories if hip version is less than 6.1.40091

* [CI][test-perf] MIOpenDriver to use rocrand to init buffers. Do not init output buffers. Use non-DEV build in perf test. (#2785)

* [Windows] Fix MIOpenDriver linking with rocRand (#2820)

* Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx (#2827)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Offline Compiler] Update Target Link Dependency (#2815)

* Softmax for find20 (#2776)

* Implement Tensor Descriptors for MIOpen Backend API (#2751)

* Doc cleanup (#2783)

* [CK] Update requirements.txt for next staging (#2824)

* [WORKAROUND] unblock compilation on Windows after merging #2751 (#2832)

* fix Windows compilation after #2751

* fix clang-format

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Tests] Add client component with test package and fix single test binary not start (#2806)

* add client component with test package

* Fix single test binary not start

---------

Co-authored-by: Jehandad Khan <jahandad@gmail.com>

* [Windows] Adapt logging functionality to Windows (#2804)

* fix logging on Windows

* fix clang-format

* display the correct MIOpenDriver executable name

* [OCL] patch Softmax issue (#2268)

* Implement MIOPEN_BACKEND_VARIANT_PACK_DESCRIPTOR builder (#2847)

* For CK solvers change PerfConfigBase to PerfConfigBaseCK (#2834)

* For ck solvers change PerfConfigBase to PerfConfigBaseCK

* remove Find() from structs derived from PerfConfigBaseCK

* Bump rocm-docs-core[api_reference] from 0.37.0 to 0.38.0 in /docs/sphinx (#2852)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.38.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Find 2.0 must not autoreset buffers (#2836)

* Remove legacy prng leftover (#2853)

* TunaNetv2.0 for MI300 (#2835)

* Add alignment to the workspace pointer passed to the reduction kernel (#2822)

* Add alignment to the workspace pointer passed to the reduction kernel

* Use cacheline size for pointer alignment and uintptr_t for portable integer/pointer conversion

* Reformat using clang-format-12

* Use std::align to align the workspace pointer

* Helping to resolve @atamazov comments  as soon as possible

* missing check

---------

Co-authored-by: Shurale-nkn <Shurale.nkn@gmail.com>

* [MHA] Implement MIOPEN_BACKEND_RNG_DESCRIPTOR (#2861)

* Rename files removing unnecessary graphapi_ prefix

* Add missing enums for MIOPEN_BACKEND_RNG_DESCRIPTOR

* Add common header for Graph API tests

* Add GTest executer for Graph API

* Implement MIOPEN_BACKEND_RNG_DESCRIPTOR Builder

* Implement MIOPEN_BACKEND_RNG_DESCRIPTOR API class

* Fix missing pragma once

* [MI210][Tuning] UNet3D (#2859)

* [Tests] Convert three regression tests to gTests (#2810)

* [Windows] use standard C++ streams to access files (#2807)

* Forward MHA find2.0 interface and implementation (#2819)

* [MHA] Implement MIOPEN_BACKEND_POINTWISE_DESCRIPTOR (#2854)

* [Workaround][Issue #2867] Disable iGEMM kernels for corner configuration (#2869)

* Find 2.0 scalar run-time parameters (#2826)

* Find 2.0 scalar run-time parameters

* Update include/miopen/miopen.h

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* Added miopenTensorArgumentIsScalar value serialization

* Fixed build after renaming enum field

* format

* tidy fix

---------

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* [MHA] Implement convolution descriptors in graph (backend) API (#2792)

* [Windows] Use std::vector<char> for binary blobs (#2805)

* use std::vector<char> for binary blobs

* fix clang-format issues

* incorporate review feedback

* incorporate review feedback

* Update submodule FIN

* suppress warning in clang-tidy

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Skip fusions tests when xnack is enabled (#2870)

* [MHA] Implement MIOPEN_BACKEND_REDUCTION_DESCRIPTOR (#2862)

* [MHA] CPU multi head attention (#2563)

* lwpmiopen-230 : first attempt to cpu implementation of multi head attention fwd

* lwpmiopen-230 : fix indexing issue

* bg/lwpmiopen-230_cpu_multi_head_attention : fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention : output M and Z_inv

* bg/lwpmiopen-230_cpu_multi_head_attention : added gtest, used tensor

* bg/lwpmiopen-230_cpu_multi_head_attention: fix review comments and change function names

* bg/lwpmiopen-230_cpu_multi_head_attention : now able to have result exact as pytorch

* bg/lwpmiopen-230_cpu_multi_head_attention: move helper functions to mha_helper.hpp

* create helper filer for mha

* bg/lwpmiopen-230_cpu_multi_head_attention: f32 and fp8 mha computed

* bg/lwpmiopen-230_cpu_multi_head_attention: cleanup

* minor cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: comment cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: add softmax function

* bg/lwpmiopen-230_cpu_multi_head_attention: add attention json golden data

* bg/lwpmiopen-230_cpu_multi_head_attention: fix CI issue

* bg/lwpmiopen-230_cpu_multi_head_attention: test passing

* bg/lwpmiopen-230_cpu_multi_head_attention: fixed clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix path of attention_golden.json

* bg/lwpmiopen-230_cpu_multi_head_attention: moved test data from json to hpp

* Initial commit. solver infrastructure's classes are introduced

* some raw code added

* mha descriptor file added

* format clang run

* remove homegrown bitcast

* use fp32 functions explicitly

* add atomic final reduction step

* add dropout part

* add final scaling

* add mha solver (no dropout initialization)

* return scaling back

* clang run + some changes

* enum values change

* tidy check fixes

* format run

* cpp check fix, cmakelist fix

* comment fix

* properly use rocblas

* fix clang-tidy

* bg/lwpmiopen-230_cpu_multi_head_attention: increase tolerance

* scalars changed to tensors

* warning fix

* compilation fix after merge

* compilation (after merge) fix

* buffers removed from desks struct

* tidy fix

* fix descaling for softmax

* use miopen gemm

* use MultiBufferWorkspaceTraits

* cpu code refactoring

* fix format

* remove legacy prng leftover

* Find 2.0 must not autoreset buffers

* try to fix clang-tidy false-positve

* make cpu_mha more consistent with the docs and fix operation order

* fix format

* make cpu_mha 30% faster, remove unused headers

* use std::max for cpu mha instead of explicit conditions

* bg/lwpmiopen-230_cpu_multi_head_attention : remove typo

---------

Co-authored-by: Bibek Ghimire <gbibek@gmail.com>
Co-authored-by: Vsevolod Golovko <vsevolod.golovko2@dxc.com>
Co-authored-by: Aleksandr Eremin <CAHEK7@yandex.ru>

* MHA Forward Find 2.0 Wrapper Test (#2872)

* [MHA] Implement MIOPEN_BACKEND_VARIANT_PACK_DESCRIPTOR API Class (#2858)

* [Driver][NFC] Modular: Split MIOpenDriver to improve build time (#2856)

* [Windows] add filesystem utility functions (#2823)

Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* [Tests] Convert reduce tests to gTests (#2848)

* Convert reduce tests to gTests

* Refactor with initialization list to disable warnings

* [MHA] Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR Builder (#2879)

* Keep source attribute types for Pointwise without converting to double

* Add swish beta to pointwise attributes

* Apply naming rules

* Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR Builder

* Remove extra db path quotes introduced in #2823 (#2884)

* Bump rocm-docs-core[api_reference] from 0.38.0 to 0.38.1 in /docs/sphinx (#2887)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.0 to 0.38.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [MHA] Added forward numeric test. Fix some bugs in cpu and gpu implementations. Resolved some post-merge comments. (#2875)

* [MHA] Implement MIOPEN_BACKEND_OPERATION_RNG_DESCRIPTOR (#2873)

* Bump idna from 3.6 to 3.7 in /docs/sphinx (#2889)

Bumps [idna](https://github.com/kjd/idna) from 3.6 to 3.7.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.6...v3.7)

---
updated-dependencies:
- dependency-name: idna
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [conv][FP32] Extend applicability of GemmBwdRest and GemmFwdRest for big WS sizes (#2811)

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(01) Reorganized MaxMemAllocSz() code. Formalized WORKAROUND_MLOPEN_ISSUE_1430. Added MIOPEN_WORKAROUND_ISSUE_2808, MIOPEN_WORKAROUND_ISSUE_2809.

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(02) [conv][gemm] Common code from GEMM solvers moved to the solver/gemm_common module.

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(05) [driver][debugging] Add logging of hipMalloc/Free

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(06) [conv][gemm] Removed MIOPEN_WORKAROUND_ISSUE_2808/2809. Introduced MIOPEN_WORKAROUND_ISSUE_2789 that affects GemmFwd/BwdRest solvers only, and only for FP32.

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(08) Fix tidy issues

* Update Depends with correct HIP Runtime package name (#2871)

* [MHA] Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR (#2880)

* Apply member naming rule

* Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR Builder

* Introduce checkPtr common function

* Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR C API Class

* [CK] Update requirements.txt for next staging (#2877)

* [CK] Update requirements.txt for next staging

* update CK commit hash

* update CK commit hash

* [MHA] Implement Matmul descriptor for MIOpen Backend API (#2882)

* [MHA] Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR API Class (#2886)

* Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR C API Class

* Introduce checkPtr common function

* Use checkPtr common function

* Implement graph node signatures

* Graph API: Operation Graph creation and interface (#2818)

* [CI] removing gfx908 and vega builds node from smoke tests (#2876)

* Unify 'include half.hpp' between Windows and Linux (#2892)

* [MHA] backward pass (#2895)

* KernelTuningNet for MI300/200 ConvHipIGemmGrouped Solvers (#2898)

* [Windows] Unify 'include amd_comgr.h' between Windows and Linux (#2899)

* ConvProblemDescription: fix GetInSize(), GetOutSize() and GetWeightsSize() (#2896)

* [Windows] make rocMLIR required package on Windows (#2903)

* [NFC] Fix leftover of #2251 (Remove src/kernels/MIOpenCheckNumerics.cl) (#2901)

* Graph API: Operation Graph matching (#2855)

* WIP: graph creation and interface

* WIP: add a test for op graph

* WIP: tests for op graph

* initial test works

* Cleanup

* formatting fixes

* address comments

* fix build

* address comments

* combine duplication of OpNode and fix up Convolution Operation classes

* fix formatting

* use `copy_n` instead of `copy`.

Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* address comments

* fix copy_n

* Graph Matching algorithms and tests

Squash commits

WIP: implement matching tests for op graphs

rebase on parent branch

WIP: move helper functions out

WiP: fix build

initial tests for graph matching are passing. Some bug fixes to OpGraph class

* fix tidy warnings

* more matching tests and a dummy graph generator

* fix hip tidy warnings

* add throw for tensors names that exceed 8 chars

* add inline to avoid duplicate function warning

---------

Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* [MAH] [test] mha CPU backward test (#2829)

* lwpmiopen-230 : first attempt to cpu implementation of multi head attention fwd

* lwpmiopen-230 : fix indexing issue

* bg/lwpmiopen-230_cpu_multi_head_attention : fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention : output M and Z_inv

* bg/lwpmiopen-230_cpu_multi_head_attention : added gtest, used tensor

* bg/lwpmiopen-230_cpu_multi_head_attention: fix review comments and change function names

* bg/lwpmiopen-230_cpu_multi_head_attention : now able to have result exact as pytorch

* bg/lwpmiopen-230_cpu_multi_head_attention: move helper functions to mha_helper.hpp

* create helper filer for mha

* bg/lwpmiopen-230_cpu_multi_head_attention: f32 and fp8 mha computed

* bg/lwpmiopen-230_cpu_multi_head_attention: cleanup

* minor cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: comment cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: add softmax function

* bg/lwpmiopen-230_cpu_multi_head_attention: add attention json golden data

* bg/lwpmiopen-230_cpu_multi_head_attention: fix CI issue

* bg/lwpmiopen-230_cpu_multi_head_attention: test passing

* bg/lwpmiopen-230_cpu_multi_head_attention: fixed clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix path of attention_golden.json

* bg/lwpmiopen-230_cpu_multi_head_attention: moved test data from json to hpp

* bg/lwpmiopen-230_cpu_multi_head_attention: increase tolerance

* bg/mha_back_fp8_lwp-502: mha back

* bg/mha_back_fp8_lwp-502: create function check

* bg/mha_back_fp8_lwp-502 : fix indentation

* bg/mha_back_fp8_lwp-502: remove unwanted if check

* bg/mha_back_fp8_lwp-502: remove unused variable

* bg/mha_back_fp8_lwp-502: fix function name

* bg/mha_back_fp8_lwp-502: implement mha bwackward fp8

* bg/mha_back_fp8_lwp-502: minor fix on args

* bg/mha_back_fp8_lwp-502: fix CI issue

* match implementation with the graph

* fix typo in scaling tensor name

---------

Co-authored-by: Bibek Ghimire <gbibek@gmail.com>
Co-authored-by: Aleksandr Eremin <CAHEK7@yandex.ru>

* [NFC] Removed WORKAROUND_SWDEV_227826 macro and MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS envvar (#2816)

* remove-wa-swdev-227826(01) Removed WORKAROUND_SWDEV_227826 macro and MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS envvar

* remove-wa-swdev-227826(02) Removed leftover of MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Windows] fix test include_inliner on Windows (#2908)

* Fixes to support huge tensors. Enable huge tensors in ConvDirectNaive*. miopenSetTensorDescriptorV2 (BETA). (#2838)

* Consider workspace constraints when loading solutions from DB (#2888)

* [Tests] Make would fail with no device error without GPUs (#2909)

* Fix make failed with no device error without GPUs

* Add DISCOVERY_MODE PRE_TEST option in gtest_discover_tests so  test binary will execute during runtime to discover the tests before actually running them

* Set DISCOVERY_MODE to PRE_TEST in gtest_discover_tests() so test binary will execute during runtime to discover the tests before actually running them

* Remove duplicated DISCOVERY_MODE option

* [MHA] Implement MIOPEN_BACKEND_OPERATION_MATMUL_DESCRIPTOR (#2902)

* Update CI docker and bump CK commit hash for staging (#2900)

* [Windows] fix execution of a HIP compiler on Windows (#2905)

* [MHA] Implement MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API Interface (#2894)

* [HOTFIX] Fix typo introduced by #2894 and #2902. (#2934)

* Adjustments for the latest assembler (e.g. latest changes in the upstream clang) (#2891)

* gcnasm-noxnack-etc(01) Remove -mxnack/mno-xnack from COMgr assembler

* gcnasm-noxnack-etc(02) Added WORKAROUND_ROCMCOMPILERSUPPORT_ISSUE_67 for the "-nogpulib" warning during assembly via COMgr

* gcnasm-noxnack-etc(03) Removed "-mno-xnack" from the offline (clang) amdgcn assembly path.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>
Co-authored-by: Vasilii Filippov <DrizztDoUrden@users.noreply.github.com>
Co-authored-by: xinlipn <xinlipn@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Seungman Han <120356720+seungmanhan@users.noreply.github.com>
Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: mentat <108366729+bghimireamd@users.noreply.github.com>
Co-authored-by: Daming Feng <dmfeng8898@gmail.com>
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Dmantri98 <109552294+Dmantri98@users.noreply.github.com>
Co-authored-by: JD <jahandad@gmail.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: xu-shawn <50402888+xu-shawn@users.noreply.github.com>
Co-authored-by: Kamil Nasyrov <shurale.nkn@gmail.com>
Co-authored-by: amberhassaan <amber_474@yahoo.com>
Co-authored-by: Evgenii Averin <86725875+averinevg@users.noreply.github.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: jasberc <146053952+jasberc@users.noreply.github.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>
Co-authored-by: Kyeonghwan Ryu <89056320+kyeonghwanryu@users.noreply.github.com>
Co-authored-by: scerzh <102019268+scerzh@users.noreply.github.com>
Co-authored-by: Vsevolod Golovko <vsevolod.golovko2@dxc.com>
Co-authored-by: Jungkeun Kim <et16kr@gmail.com>
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
Co-authored-by: Reid Kawaja <74506315+reidkwja@users.noreply.github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: M.Emin Ozturk <ozturk.27@osu.edu>
Co-authored-by: arvindcheru <90783369+arvindcheru@users.noreply.github.com>
Co-authored-by: Marek Grzegorek <grzegorek.marek@zoho.com>
Co-authored-by: urpetkov-amd <127323899+urpetkov-amd@users.noreply.github.com>
Co-authored-by: M. Saud Ul Hassan <68208941+msaudulhassan@users.noreply.github.com>
Co-authored-by: Christopher Erb <Christopher.Erb@amd>
Co-authored-by: raramakr <91213141+raramakr@users.noreply.github.com>
Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>
Co-authored-by: Bibek Ghimire <gbibek@gmail.com>
Co-authored-by: Alexey Akimov <kikimych@gmail.com>
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Co-authored-by: Seunghoon Lee <lshqqytiger@naver.com>
cderb added a commit that referenced this issue Jun 28, 2024
* [Doc] Bump rocm-docs-core from 0.30.0 to 0.30.1 in /docs/sphinx (#2589)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.0 to 0.30.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.0...v0.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [DOC] Doxygen change: enable warning as error msg and add missing API comments (#2585)

* AI Based Parameter Prediction Model for conv_hip_igemm_group_fwd_xdlops Solver (#2523)

* [HotFix] KDB Files should not be in the runtime package (#2591)

* [Doc] Adding issue template (#2590)

* [Doc] Add documentations for non-packed tensors convolution (#2537)

* edit document of convolution

* address comments

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Doc] Fix broken links in README.md (#2595)

* Add nightly update workflow (#2579)

* Tests for RNN seq API (#2493)

* [HotFix] Fix Windows build with disabled CK (after #2523) (#2598)

* Properly guard CK usage by MIOPEN_USE_COMPOSABLEKERNEL defines

* Update src/solver/conv_hip_implicit_gemm_grouped_fwd_xdlops.cpp

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* [MIOpenDriver] Enabled gemmfp16. [tests] Added smoke test for fp16 and fp32 gemm. (#2592)

* fix-gemmfp16(01) [MIOpenDriver] Enable gemmfp16 in the driver

* fix-gemmfp16(02) [tests] Add smoke test for fp16 gemm

* [Doc] Fix URLs (ROCmSoftwarePlatform -> ROCm) in the doc, comments, and code. + more (#2597)

* Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code.

* (2) Update URLs (ROCmSoftwarePlatform -> ROCm) in the documentation and comments in the source code.

* Fix incorrect link

* Fix links

* [HotFix] Bump CK commit hash for F8 patch (#2603)

* [Doc] Fix broken links in CONTRIBUTING.md (#2601)

* Fix broken rocmsoftwareplatform.github.io links in CONTRIBUTING.md

* Use new organization name for repoistory links

* [Windows] use find_package() for Eigen and frugally-deep (#2574)

* [Windows] enable compilation on Windows (#2570)

* [HotFix] 3D Group Conv Backward data and weight update. Failure noticed when pads and strides are not 1 (#2560)

* [CMake] fix find_package(... GLOBAL) for CMake < 3.24 (#2610)

* [HotFix][atamazov] multiple undefined behavior discovered with -fsanitize=undefined in DEV builds (#2609)

* fix-issue-2602(01) Fix for smoke_miopendriver_gemm

* Do not print output parameters in MIOPEN_LOG_FUNCTION calls.

---------

Co-authored-by: atamazov <artem.tamazov@gmail.com>

* [hipRTC] resolve symbol issues by explicitly link with hipRTC (#2612)

* explicitly link with hipRTC

* Update formatting

* Consider MIOPEN_USE_HIPRTC=Off

* Clean up

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Standardize workspace abstraction (#2524)

* [gtest] conversion for code coverage tests (#2580)

* [HotFix] revert #2580 and re-enable smoke tests (#2616)

* Revert "[gtest] conversion for code coverage tests (#2580)"

This reverts commit c5a2384dc0f29682ed51aeccf9b981dbdf7e058f.

* re-enable smoke tests in CI

* remove problematic github action

* [Windows] use find_package() for SQLite3 (#2564)

* [Doc] Bump rocm-docs-core from 0.30.1 to 0.30.2 in /docs/sphinx (#2620)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.1 to 0.30.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.1...v0.30.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Windows] use official ZStd package from Facebook (#2565)

* Remove MIOpenGEMM and MIOpenTensile leftovers (#2499)

* Remove FIN_OLD_PROBLEM_DESCRIPTION_COMPAT (#2503)

* [Jenkins] Add NOMLIR stage. [Workaround] Limit usage of gfx908 nodes in non-nightly builds (#2622)

* Get rid of legacy 2GiB offset limits in CallGemm*() and transpose*() internal APIs and kernels. (#2613)

* [BugFix] Proper fix for backward passes bwd/wrw for CK group conv 3d (#2619)

* [BugFix] asm igemm fwd kernel will have computation error when c <=4 and dilation_y > 1, workaround (#2625)

* Fused solver for Fwd Convolution with Residual add, Bias add and then activation function (#2517)

* Bump MIOpen version to 3.1.0 and update CI docker (#2519)

* [HotFix] resolve unknown type issue after #2517 (#2629)

* [Doc] Bump rocm-docs-core from 0.30.2 to 0.30.3 in /docs/sphinx (#2628)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.2...v0.30.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517. (#2630)

* [Jenkins][Tests] Add stage with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517 #2630. (#2631)

* [HOTFIX] Fix build with -DMIOPEN_USE_COMPOSABLEKERNEL=Off after #2517.

* add -DMIOPEN_USE_COMPOSABLEKERNEL=Off stage

* make NOCK stage anyAPU and build ONLY

* Adopt recommended changes

* rename config_targets to make_targets

* Extend GTest DISCOVERY_TIMEOUT to 5 mins

* [Tests] add unit test for #2624 (#2632)

* [gtest] Combine gtests into single binary. (#2599)

* [Windows] rocblas: disable Beta API on Windows for HIP < 5.7 (#2405)

* [tests] Limit applicability of ConvFwdBiasActivAPI/ConvFwdBiasResAddActivTest.ConvFusedAPI (#2635)

* [Tests] helper for evn variables update in gtests (#2605)

Co-authored-by: xinlipn <xinlipn@gmail.com>

* [Windows] fix compilation of math functions on Windows (#2568)

* [Windows] fix printf type incompatibility between type specifiers (#2569)

* Fix miopen package dependency roctracer etc (#2508)

* [Doc][NFC] added rocm v6, mi300, and default component (#2618)

* [Windows] add a class to allow os-agnostic process execution (#2567)

* [Windows] make BZip2 a required package (#2566)

* [Windows] add missing symbol export (#2556)

* add missing symbol export

* more missing exports

* fix format issues

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* [ROCm 6.1][hipRTC] Fix build failures. [quality] Reorg standard includes in HIP sources. (#2637)

* [WORKAROUND] Disable W/A for issue #1359 starting from ROCm 5.4.3. (#2225)

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Dep] Bump CK commit hash for staging (#2640)

* [Windows] default paths to user and system db files on Windows (#2365)

* Fix COMgr dependency in MIOpen package (#2645)

* [ROCm 6.0.1] Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors. (#2644)

* Automatically activate the new HIPRTC PCH adaptations starting from the 6.0.24000 version. Fix some build errors (#2465 + more)

(cherry picked from commit 4f695d975a2a6de2f167fc2925f3bad79fbaaf98)

* Remove duplicated includes.

* [HOTFIX] Adapt to changes in HIP Mainline 417 (possibly future 6.1 RC) (#2652)

* fix-rocm61rc417(01) Disable new kernel build warnings. [NFC] Sort headers properly.

* fix-rocm61rc417(02) [ROCm 6.1][HIPRTC] Use custom implementations instead of standard <limits>. This fixes build issues with ROCm 6.1.

* fix-rocm61rc417(03) [ROCm 6.1][HIPRTC][Bugfix] Fixed issue in miopen_limits.h that prevented the use of custom implementations.

* fix-rocm61rc417(04) [ROCm 6.1 RC][HIPRTC] Disable some of the custom implementations from <type_traits> (like `integral_constant`) for HIP mainline 417. This fixes some build issues.

* fix-rocm61rc417(05) [ROCm 6.1 RC][offline compiler] Removed "-mcpu" from build options. This resolves kernel build issues with HIP mainline 417 (offline compiler). Improved diagnostic messages output onto console after offline build failures.

* fix-rocm61rc417(06) [tests] Disable some testcase from handle_test as #2600 still persists in Hip Mainline 417.

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Correct parameter which prints unused flag in log fusion cmd (#2653)

* [MI300][Tuning] Tunings for SWDEV tickets (#2654)

* add initial tunings for mi300

* add test to db_sync

* [ROCm 6.0.1] Fix merge error in #2652 that affects #2644. (#2658)

* [CK] Bump CK commit hash for staging (#2659)

* Bump gitpython from 3.1.37 to 3.1.41 in /docs/sphinx (#2662)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.37 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.37...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2666)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Doc] Updated links to ROCm Repositories (#2667)

Changed <old-organization> to "ROCm".

* [SWDEV-433582] Search-proofed PrepareInvoker (#2661)

* [HotFix] fix clang format issue from #2661

* [FIN] update submodule (#2660)

* [Windows] replace [[gnu::noreturn]] with [[noreturn]] (#2656)

* [Windows] addkernels: fix operations on path for Windows (#2657)

* [Windows] clean up the setting of environment variables cross-platform (#2655)

* clean up the setting of environment variables cross-platform

* fix clang-tidy

* Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2676)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Argmax enhancement in case of inner dim reduce (#2583)

* [Test] Convert conv_igemm_dynamic_dlops etc to gTest (#2553)

* [Bugfix] Restore Missing ctests (#2649)

* [Windows] fix compilation on Windows (#2677)

* [Windows] cmake: unpack kernels into a build directory (#2347)

* Remove FIN_OLD_HANDLE_COMPAT and FIN_OLD_BINARY_CACHE_COMPAT (#2627)

* Rename transpose* kernels (leftover of #2613) (#2673)

* [CK] Bump CK commit hash for staging (#2683)

Update CK to the latest staging

* [zlib] Update rocm-recipes for more reliable zlib link (#2686)

* [OCL] Use OpenCL 2.0 while compiling kernels (#2691)

* Fix compilation on SELS/RHEL after #2657 merged (#2690)

* [BF16][FP8][BF8] Fixed some specializations from `<limits>` and `<cmath>` (#2669)

* conv::ProblemDescription: remove underscores, change return data type (#2685)

* Add 2D Group Convolution Backward Data and Weights update solvers. Simplify and unify 3d group conv tests (#2663)

* [HOTFIX] Disable "granularity loss" W/A for #2492 and add a new, "tiny tensor" based one. (#2695)

* disable 2492 granularity_loss workaround and enable tiny_tensor workaround

* workaround_issue_2492_02(01) Macros to uppercase. Add doc for WORKAROUND_ISSUE_2492_TINY_TENSOR. Add conditions N<=4 and C<=4 to the "tiny tensor" W/A. Disable it during warmup, make it controllable by MIOPEN_DEBUG_WORKAROUND_ISSUE_2492.

* Update src/solver/conv_winoRxS.cpp

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Clang-Format] Fix format issue

* Bump rocm-docs-core from 0.31.0 to 0.32.0 in /docs/sphinx (#2699)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.32.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.32.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [MI300][Tuning] gold 20 (#2697)

* add gfx942 superbench winograd tunings, update gold version to 20

* update with more superbench tunings

* Remove support for ROCm < 5.6.0 (#2665)

* Remove support for ROCm < 5.7.0

* deprecate-rocm-less-5.7(03) Leftover that fixes build error with "-Werror"

* deprecate-rocm-less-5.7(04) Resolve review comment

* Bump rocm-docs-core from 0.32.0 to 0.33.0 in /docs/sphinx (#2707)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.32.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.32.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [COMGR] Use OpenCL 2.0. [HIPRTC] Provide min/max limits for int. Fix build errors related to min/max limits for BF16. (#2705)

* fix-rocm-mainline-issues-01(01) Removeed `constexpr` from numeric_limits<hip_bfloat16>::min()/max() as BF16 ctor provided by HIP can't be used in const expressions.

* fix-rocm-mainline-issues-01(02) [COMGR] Globally engage OpenCL 2.0

* fix-rocm-mainline-issues-01(03) [HIPRTC] Provide min/max limits for int

* [DOC] fix broken links in docs (#2696)

* lwpmiopen_521_correct_doc_issues: fix broken links in docs

* lwpmiopen_521_correct_doc_issues: remove citing

* [HotFix] Fix DB install after #2347 (#2702)


---------

Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>

* Add GroupNorm forward operation (#2623)

* fix not reporting LFS missing files (#2710)

* [HotFix][WHL] move the bfloat16 header to the proper guard (#2711)

* [HotFix] Update FindDB for finetuning (#2712)

* [CK] Update CK commit in requirements.txt for staging (#2713)

* [Tests] Fix Gtest single executable build issue (#2715) (#2717)

Add the missing build job to Jenkinsfile

Fix duplicate class name issue in Gtest

* [Windows] Do not use HIP runtime headers on Windows (#2719)

* don't use WORKAROUND_DONT_USE_CUSTOM_LIMITS on Windows

* don't use workaround SWDEV_413293 on Windows

* CI base docker updates to ROCm 6.0.2 (#2714)

* Softmax ocl refactoring (#2671)

* Add cat forward operation (#2562)

* [HotFix] Fix namespace conflict issue in gtest after #2562 (#2725)

* Bump cryptography from 41.0.6 to 42.0.0 in /docs/sphinx (#2729)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.6...42.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [DB Install] fix installation of *.fdb.txt and *.db files (#2728)

* Update CHANGELOG.md (#2720)

* bg/update_change_log_lwpmiopen_501: update change long till rocm 6.1.0 (MIOpen-3.1.0)

* bg/update_change_log_lwpmiopen_501: remove typo

* bg/update_change_log_lwpmiopen_501: fix broken link

* bg/update_change_log_lwpmiopen_501: second attempt to fix hyper link

* Create placeholder CODEOWNERS (#2718)

Add @JehandadKhan and @junliume as CODEOWNERS.

* [Solvers] Fix for #2663 ensure tensor dimensions are consumed by solvers correctly (#2716)

* [DOC] Add codeowners for documentation (#2692)

* Add codeowners for documentation

* Update CODEOWNERS

---------

Co-authored-by: samjwu <samjwu@users.noreply.github.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Bump rocm-docs-core from 0.33.0 to 0.33.2 in /docs/sphinx (#2733)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.0...v0.33.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix build after #2657 and #2690 (boost::filesystem) (#2732)

* [Improvements] Replace HasAtLeastOne64BitTensor() with AllTensorsDimsFitIntoInt() (#2731)

* Update CK-based 2d/3d  convolution solvers to support  nchw/ncdhw layout (#2429)

* Bump rocm-docs-core from 0.33.2 to 0.34.0 in /docs/sphinx (#2739)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.2...v0.34.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [BugFix] Set System KDB journal_mode to Off (#2724)

* [Tests] Converting test_conv3d_extra into GTest (#2554)

* [Tests] Convert test_rnn_vanilla , test_gru, test_rnn_extra and test_gru_extra gTests (#2550)

* [Doc] Removing unmaintained release notes (#2745)

* [CK] Update CK commit in requirements.txt for staging (#2747)

* [Tests][gtest] conversion for LSTM (#2545)

* Fix for issue #2734: Detect if "-fno-offload-uniform-block" works in HIP compiler. (#2743)

* fix-issue-2734 (01) Use "-fno-offload-uniform-block" only if HIP compiler supports it. Resolves #2734.

(cherry picked from commit 458c8338175383a95a5c3f30c726798828f15ea8)

Partially changes code from PR #2719 "Do not use HIP runtime headers on Windows"

# RESOLVED Conflicts:
#	CMakeLists.txt

* fix-issue-2734(02) Removed W/A from PR #2719 as it is no longer needed.

* Enable softmax solver based on attention-softmax implementation (#2737)

* [Tests] Replace test_conv_igemm_dynamic_xdlops_bwd with gtest (#2409)

* [Tests] Convert ctest to gtest for test_conv_for_implicit_gemm (#2513)

* [Tuning][MI300] for m9 tickets (#2754)

* [hipRTC] add lowest() for float to MIOpen custom limits (#2753)

* [hipRTC] add lowest() to MIOpen custom limits

* the earliest trace can be found together with numeric_limits<int>

* [Linux] Enhance Compiler flags to avoid Hardcoded ROCm Path (Part 1) (#2694)

* Bump rocm-docs-core from 0.34.0 to 0.34.2 in /docs/sphinx (#2755)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.0...v0.34.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump cryptography from 42.0.0 to 42.0.2 in /docs/sphinx (#2759)

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.0...42.0.2)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix] enable 2d grouped fwd convolution support on mi300 (#2761)

* enable support on mi300

* Fix missing include files

* Fix header needed even for non-ck build

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Bump cryptography from 42.0.2 to 42.0.4 in /docs/sphinx (#2765)

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.2...42.0.4)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Implemented preparsing sqlite db to text format (#2722)

* Bump rocm-docs-core from 0.34.2 to 0.35.0 in /docs/sphinx (#2768)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.2 to 0.35.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.2...v0.35.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix] Fixed incorrectly generated files (#2769)

* Adding Link Dependencies to resolve missing symbols from pthread and dl referenced by sqlite (#2773)

* Adding library dependency dl for dlopen

* Adding link dependency to pthread

* RNN Inference MS (#2727)

* [Tuning][MI300] Find db update - Superbench/Winograd (#2780)

* [HotFix] fix failed error bugs in conv backward weight solvers (#2770)

* fix failed error bugs in 2d/3d conv backward weight solvers

* fix time issue in NCHW layout invoker

* code refactoring: define hip event profiler to reduce code duplicate

* delete comments

* fix tidy error

* address comments

* [CK] Bump CK commit hash for staging (#2784)

* Minor softmax improvements (#2782)

* using ostream instead of concatanation of strings

* Problem description slightly  changed. softmax driver patched

* Remove SetTensorLayout (#2787)

* Add heuristics tests for gfx90a architecture (#2772)

* Bump rocm-docs-core from 0.35.0 to 0.35.1 in /docs/sphinx (#2791)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.0 to 0.35.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.0...v0.35.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix][Format] Fix clang-format issue with tuna_net update

* [Tests] Convert test_conv_group, test_conv_extra and test_conv_3d to gTests (#2767)

* Convert test_conv_group to gTest

* Convert test_conv_extra and test_conv_3d to gTest

* Fix build

* [Windows] Fixing linking issue for sqlite2txt on Windows (#2793)

* MI300 TunaNet Integration (#2795)

* Dynamic workspace calcuation (#2779)

* calculate workspace size for winning solution at runtime

* update GetWorkSpaceSize to use solver workspace query instead of reading db

* fix clang-format issues

---------

Co-authored-by: Christopher Erb <Christopher.Erb@amd>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* RNN back weights update (#2794)

* [CI] Enabling navi32 Testing Stages (#2796)

* [HotFix] Changed text perfdbs to be actually installed when enabled #2722 (#2800)

* Bump CK commit hash for staging and update CI docker (#2777)

* [Windows] Upgrade class TmpDir (#2762)

* Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx (#2801)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.1 to 0.36.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [HotFix] Fix unpackdb after merging #2800 (#2802)

* [HIPRTC] Provide option to add/remove include directories to/from compiler flags (#2764)

* Provide option to add/remove include directories to/from compiler flags

The hip compiler flags are getting embedded in MIOpen shared library and the isystem include directories in the compiler flags are hard coded paths.
For the ROCm use case, build scripts will set the option to OFF, so that include directories will not be added to compiler flags. This will help in removing the hard coded path from the library
By default the option is set to ON.

* Set the defualt value of the option MIOPEN_HIP_COMPILER_USE_SYSTEM_INCLUDE_DIRECTORIES based on HIPRTC compiler usage

* Check HIP version as well to enable/disable the use of system include directories in  hip compiler flags

Use system include directories if hip version is less than 6.1.40091

* [CI][test-perf] MIOpenDriver to use rocrand to init buffers. Do not init output buffers. Use non-DEV build in perf test. (#2785)

* [Windows] Fix MIOpenDriver linking with rocRand (#2820)

* Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx (#2827)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Offline Compiler] Update Target Link Dependency (#2815)

* Softmax for find20 (#2776)

* Implement Tensor Descriptors for MIOpen Backend API (#2751)

* Doc cleanup (#2783)

* [CK] Update requirements.txt for next staging (#2824)

* [WORKAROUND] unblock compilation on Windows after merging #2751 (#2832)

* fix Windows compilation after #2751

* fix clang-format

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Tests] Add client component with test package and fix single test binary not start (#2806)

* add client component with test package

* Fix single test binary not start

---------

Co-authored-by: Jehandad Khan <jahandad@gmail.com>

* [Windows] Adapt logging functionality to Windows (#2804)

* fix logging on Windows

* fix clang-format

* display the correct MIOpenDriver executable name

* [OCL] patch Softmax issue (#2268)

* Implement MIOPEN_BACKEND_VARIANT_PACK_DESCRIPTOR builder (#2847)

* For CK solvers change PerfConfigBase to PerfConfigBaseCK (#2834)

* For ck solvers change PerfConfigBase to PerfConfigBaseCK

* remove Find() from structs derived from PerfConfigBaseCK

* Bump rocm-docs-core[api_reference] from 0.37.0 to 0.38.0 in /docs/sphinx (#2852)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.38.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Find 2.0 must not autoreset buffers (#2836)

* Remove legacy prng leftover (#2853)

* TunaNetv2.0 for MI300 (#2835)

* Add alignment to the workspace pointer passed to the reduction kernel (#2822)

* Add alignment to the workspace pointer passed to the reduction kernel

* Use cacheline size for pointer alignment and uintptr_t for portable integer/pointer conversion

* Reformat using clang-format-12

* Use std::align to align the workspace pointer

* Helping to resolve @atamazov comments  as soon as possible

* missing check

---------

Co-authored-by: Shurale-nkn <Shurale.nkn@gmail.com>

* [MHA] Implement MIOPEN_BACKEND_RNG_DESCRIPTOR (#2861)

* Rename files removing unnecessary graphapi_ prefix

* Add missing enums for MIOPEN_BACKEND_RNG_DESCRIPTOR

* Add common header for Graph API tests

* Add GTest executer for Graph API

* Implement MIOPEN_BACKEND_RNG_DESCRIPTOR Builder

* Implement MIOPEN_BACKEND_RNG_DESCRIPTOR API class

* Fix missing pragma once

* [MI210][Tuning] UNet3D (#2859)

* [Tests] Convert three regression tests to gTests (#2810)

* [Windows] use standard C++ streams to access files (#2807)

* Forward MHA find2.0 interface and implementation (#2819)

* [MHA] Implement MIOPEN_BACKEND_POINTWISE_DESCRIPTOR (#2854)

* [Workaround][Issue #2867] Disable iGEMM kernels for corner configuration (#2869)

* Find 2.0 scalar run-time parameters (#2826)

* Find 2.0 scalar run-time parameters

* Update include/miopen/miopen.h

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* Added miopenTensorArgumentIsScalar value serialization

* Fixed build after renaming enum field

* format

* tidy fix

---------

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* [MHA] Implement convolution descriptors in graph (backend) API (#2792)

* [Windows] Use std::vector<char> for binary blobs (#2805)

* use std::vector<char> for binary blobs

* fix clang-format issues

* incorporate review feedback

* incorporate review feedback

* Update submodule FIN

* suppress warning in clang-tidy

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* Skip fusions tests when xnack is enabled (#2870)

* [MHA] Implement MIOPEN_BACKEND_REDUCTION_DESCRIPTOR (#2862)

* [MHA] CPU multi head attention (#2563)

* lwpmiopen-230 : first attempt to cpu implementation of multi head attention fwd

* lwpmiopen-230 : fix indexing issue

* bg/lwpmiopen-230_cpu_multi_head_attention : fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention : output M and Z_inv

* bg/lwpmiopen-230_cpu_multi_head_attention : added gtest, used tensor

* bg/lwpmiopen-230_cpu_multi_head_attention: fix review comments and change function names

* bg/lwpmiopen-230_cpu_multi_head_attention : now able to have result exact as pytorch

* bg/lwpmiopen-230_cpu_multi_head_attention: move helper functions to mha_helper.hpp

* create helper filer for mha

* bg/lwpmiopen-230_cpu_multi_head_attention: f32 and fp8 mha computed

* bg/lwpmiopen-230_cpu_multi_head_attention: cleanup

* minor cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: comment cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: add softmax function

* bg/lwpmiopen-230_cpu_multi_head_attention: add attention json golden data

* bg/lwpmiopen-230_cpu_multi_head_attention: fix CI issue

* bg/lwpmiopen-230_cpu_multi_head_attention: test passing

* bg/lwpmiopen-230_cpu_multi_head_attention: fixed clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix path of attention_golden.json

* bg/lwpmiopen-230_cpu_multi_head_attention: moved test data from json to hpp

* Initial commit. solver infrastructure's classes are introduced

* some raw code added

* mha descriptor file added

* format clang run

* remove homegrown bitcast

* use fp32 functions explicitly

* add atomic final reduction step

* add dropout part

* add final scaling

* add mha solver (no dropout initialization)

* return scaling back

* clang run + some changes

* enum values change

* tidy check fixes

* format run

* cpp check fix, cmakelist fix

* comment fix

* properly use rocblas

* fix clang-tidy

* bg/lwpmiopen-230_cpu_multi_head_attention: increase tolerance

* scalars changed to tensors

* warning fix

* compilation fix after merge

* compilation (after merge) fix

* buffers removed from desks struct

* tidy fix

* fix descaling for softmax

* use miopen gemm

* use MultiBufferWorkspaceTraits

* cpu code refactoring

* fix format

* remove legacy prng leftover

* Find 2.0 must not autoreset buffers

* try to fix clang-tidy false-positve

* make cpu_mha more consistent with the docs and fix operation order

* fix format

* make cpu_mha 30% faster, remove unused headers

* use std::max for cpu mha instead of explicit conditions

* bg/lwpmiopen-230_cpu_multi_head_attention : remove typo

---------

Co-authored-by: Bibek Ghimire <gbibek@gmail.com>
Co-authored-by: Vsevolod Golovko <vsevolod.golovko2@dxc.com>
Co-authored-by: Aleksandr Eremin <CAHEK7@yandex.ru>

* MHA Forward Find 2.0 Wrapper Test (#2872)

* [MHA] Implement MIOPEN_BACKEND_VARIANT_PACK_DESCRIPTOR API Class (#2858)

* [Driver][NFC] Modular: Split MIOpenDriver to improve build time (#2856)

* [Windows] add filesystem utility functions (#2823)

Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* [Tests] Convert reduce tests to gTests (#2848)

* Convert reduce tests to gTests

* Refactor with initialization list to disable warnings

* [MHA] Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR Builder (#2879)

* Keep source attribute types for Pointwise without converting to double

* Add swish beta to pointwise attributes

* Apply naming rules

* Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR Builder

* Remove extra db path quotes introduced in #2823 (#2884)

* Bump rocm-docs-core[api_reference] from 0.38.0 to 0.38.1 in /docs/sphinx (#2887)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.0 to 0.38.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [MHA] Added forward numeric test. Fix some bugs in cpu and gpu implementations. Resolved some post-merge comments. (#2875)

* [MHA] Implement MIOPEN_BACKEND_OPERATION_RNG_DESCRIPTOR (#2873)

* Bump idna from 3.6 to 3.7 in /docs/sphinx (#2889)

Bumps [idna](https://github.com/kjd/idna) from 3.6 to 3.7.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.6...v3.7)

---
updated-dependencies:
- dependency-name: idna
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [conv][FP32] Extend applicability of GemmBwdRest and GemmFwdRest for big WS sizes (#2811)

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(01) Reorganized MaxMemAllocSz() code. Formalized WORKAROUND_MLOPEN_ISSUE_1430. Added MIOPEN_WORKAROUND_ISSUE_2808, MIOPEN_WORKAROUND_ISSUE_2809.

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(02) [conv][gemm] Common code from GEMM solvers moved to the solver/gemm_common module.

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(05) [driver][debugging] Add logging of hipMalloc/Free

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(06) [conv][gemm] Removed MIOPEN_WORKAROUND_ISSUE_2808/2809. Introduced MIOPEN_WORKAROUND_ISSUE_2789 that affects GemmFwd/BwdRest solvers only, and only for FP32.

* gemm_fwd_bwd_rest_fp32_ws_size_limit_increase(08) Fix tidy issues

* Update Depends with correct HIP Runtime package name (#2871)

* [MHA] Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR (#2880)

* Apply member naming rule

* Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR Builder

* Introduce checkPtr common function

* Implement MIOPEN_BACKEND_OPERATION_REDUCTION_DESCRIPTOR C API Class

* [CK] Update requirements.txt for next staging (#2877)

* [CK] Update requirements.txt for next staging

* update CK commit hash

* update CK commit hash

* [MHA] Implement Matmul descriptor for MIOpen Backend API (#2882)

* [MHA] Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR API Class (#2886)

* Implement MIOPEN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR C API Class

* Introduce checkPtr common function

* Use checkPtr common function

* Implement graph node signatures

* Graph API: Operation Graph creation and interface (#2818)

* [CI] removing gfx908 and vega builds node from smoke tests (#2876)

* Unify 'include half.hpp' between Windows and Linux (#2892)

* [MHA] backward pass (#2895)

* KernelTuningNet for MI300/200 ConvHipIGemmGrouped Solvers (#2898)

* [Windows] Unify 'include amd_comgr.h' between Windows and Linux (#2899)

* ConvProblemDescription: fix GetInSize(), GetOutSize() and GetWeightsSize() (#2896)

* [Windows] make rocMLIR required package on Windows (#2903)

* [NFC] Fix leftover of #2251 (Remove src/kernels/MIOpenCheckNumerics.cl) (#2901)

* Graph API: Operation Graph matching (#2855)

* WIP: graph creation and interface

* WIP: add a test for op graph

* WIP: tests for op graph

* initial test works

* Cleanup

* formatting fixes

* address comments

* fix build

* address comments

* combine duplication of OpNode and fix up Convolution Operation classes

* fix formatting

* use `copy_n` instead of `copy`.

Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* address comments

* fix copy_n

* Graph Matching algorithms and tests

Squash commits

WIP: implement matching tests for op graphs

rebase on parent branch

WIP: move helper functions out

WiP: fix build

initial tests for graph matching are passing. Some bug fixes to OpGraph class

* fix tidy warnings

* more matching tests and a dummy graph generator

* fix hip tidy warnings

* add throw for tensors names that exceed 8 chars

* add inline to avoid duplicate function warning

---------

Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>

* [MAH] [test] mha CPU backward test (#2829)

* lwpmiopen-230 : first attempt to cpu implementation of multi head attention fwd

* lwpmiopen-230 : fix indexing issue

* bg/lwpmiopen-230_cpu_multi_head_attention : fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention : output M and Z_inv

* bg/lwpmiopen-230_cpu_multi_head_attention : added gtest, used tensor

* bg/lwpmiopen-230_cpu_multi_head_attention: fix review comments and change function names

* bg/lwpmiopen-230_cpu_multi_head_attention : now able to have result exact as pytorch

* bg/lwpmiopen-230_cpu_multi_head_attention: move helper functions to mha_helper.hpp

* create helper filer for mha

* bg/lwpmiopen-230_cpu_multi_head_attention: f32 and fp8 mha computed

* bg/lwpmiopen-230_cpu_multi_head_attention: cleanup

* minor cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: comment cleanups

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: fix santizer

* bg/lwpmiopen-230_cpu_multi_head_attention: add softmax function

* bg/lwpmiopen-230_cpu_multi_head_attention: add attention json golden data

* bg/lwpmiopen-230_cpu_multi_head_attention: fix CI issue

* bg/lwpmiopen-230_cpu_multi_head_attention: test passing

* bg/lwpmiopen-230_cpu_multi_head_attention: fixed clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix clang format

* bg/lwpmiopen-230_cpu_multi_head_attention: fix path of attention_golden.json

* bg/lwpmiopen-230_cpu_multi_head_attention: moved test data from json to hpp

* bg/lwpmiopen-230_cpu_multi_head_attention: increase tolerance

* bg/mha_back_fp8_lwp-502: mha back

* bg/mha_back_fp8_lwp-502: create function check

* bg/mha_back_fp8_lwp-502 : fix indentation

* bg/mha_back_fp8_lwp-502: remove unwanted if check

* bg/mha_back_fp8_lwp-502: remove unused variable

* bg/mha_back_fp8_lwp-502: fix function name

* bg/mha_back_fp8_lwp-502: implement mha bwackward fp8

* bg/mha_back_fp8_lwp-502: minor fix on args

* bg/mha_back_fp8_lwp-502: fix CI issue

* match implementation with the graph

* fix typo in scaling tensor name

---------

Co-authored-by: Bibek Ghimire <gbibek@gmail.com>
Co-authored-by: Aleksandr Eremin <CAHEK7@yandex.ru>

* [NFC] Removed WORKAROUND_SWDEV_227826 macro and MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS envvar (#2816)

* remove-wa-swdev-227826(01) Removed WORKAROUND_SWDEV_227826 macro and MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS envvar

* remove-wa-swdev-227826(02) Removed leftover of MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Windows] fix test include_inliner on Windows (#2908)

* Fixes to support huge tensors. Enable huge tensors in ConvDirectNaive*. miopenSetTensorDescriptorV2 (BETA). (#2838)

* Consider workspace constraints when loading solutions from DB (#2888)

* [Tests] Make would fail with no device error without GPUs (#2909)

* Fix make failed with no device error without GPUs

* Add DISCOVERY_MODE PRE_TEST option in gtest_discover_tests so  test binary will execute during runtime to discover the tests before actually running them

* Set DISCOVERY_MODE to PRE_TEST in gtest_discover_tests() so test binary will execute during runtime to discover the tests before actually running them

* Remove duplicated DISCOVERY_MODE option

* [MHA] Implement MIOPEN_BACKEND_OPERATION_MATMUL_DESCRIPTOR (#2902)

* Update CI docker and bump CK commit hash for staging (#2900)

* [Windows] fix execution of a HIP compiler on Windows (#2905)

* [MHA] Implement MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API Interface (#2894)

* [HOTFIX] Fix typo introduced by #2894 and #2902. (#2934)

* Adjustments for the latest assembler (e.g. latest changes in the upstream clang) (#2891)

* gcnasm-noxnack-etc(01) Remove -mxnack/mno-xnack from COMgr assembler

* gcnasm-noxnack-etc(02) Added WORKAROUND_ROCMCOMPILERSUPPORT_ISSUE_67 for the "-nogpulib" warning during assembly via COMgr

* gcnasm-noxnack-etc(03) Removed "-mno-xnack" from the offline (clang) amdgcn assembly path.

* [Tests] Remove extra - in paramter to fix reduce tests (#2935)

* [Doc] Fix extra space in doc link (#2937)

* [Doc] Fix docs structure and broken link in Log & Debug and Argmax (#2944)

* Fix link to rocBLAS programmer guide

* Fix Argmax docs in doxygen. Update reference/index.rst and remove unused argmax.rst.

- Adds Argmax (experimental) to the list of all modules in
documentation.
- Gives Argmax documentation formatting consistent with other API docs.

* [gfx11][Solvers][Winograd] ConvWinoFuryRxS v2.4 (#2778)

* [MHA] add test for the backward pass (#2929)

* [Doc] Update LICENSE.txt to reflect all licenses used (#2758)

- Fixes #2757

 - Updates LICENSE.txt to reflect the following files which diverge, at least
   partially, from the repo's indicated MIT license

    BSD-2-Clause
        driver/mloSoftmaxHost.hpp

    BSD-2-Clause and MIT
        src/include/miopen/mlo_internal.hpp

    Apache-2.0 and MIT
        src/include/miopen/kernel_cache.hpp
        src/kernel_cache.cpp

    Public Domain (and MIT)
        src/md5.cpp

* Bump tqdm from 4.66.2 to 4.66.3 in /docs/sphinx (#2949)

Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.2 to 4.66.3.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](https://github.com/tqdm/tqdm/compare/v4.66.2...v4.66.3)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Tests] Fixed test_perfdb and test_sqlite_perfdb to propertly use mutex (#2907)

* Bump jinja2 from 3.1.3 to 3.1.4 in /docs/sphinx (#2951)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [MHA] Implement several graph API descriptors (#2919)

* Implement MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API Interface without tests

* Fix empty-graph API violation

* Add tests for MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API Interface

* Fix tests for MIOPEN_BACKEND_OPERATIONGRAPH_DESCRIPTOR C API

* Revert "Fix empty-graph API violation"

This reverts commit 3e5092a6cdb823d5f9f3f555348f1f7b3aa77bb5.

* Resolve the resulted list's TODO

* Rename files enginefinder* to operationgraph_descriptor*

* Fix a memory leak

* Add builder for C++ MHA Forward end-to-end test

* Fix a typo

* Define ctors explicitly

* Implement MIOPEN_BACKEND_ENGINE_DESCRIPTOR without tests

* Implement MIOPEN_BACKEND_ENGINECFG_DESCRIPTOR without tests

* Combine OpGraph and OperationGraph

* Implement part of MIOPEN_BACKEND_EXECUTION_PLAN_DESCRIPTOR without tests

* Fix tests for opgraph

* Fix tidy issues

* [gTest] Reduce_custom_fp32 skips on MI200/gfx90a (#2948)

* Fix reduce_custom_fp32 skips MI200

* Fix test_reduce_custom_fp32 skipped for test all

* [Windows] Workaround conflicting definitions of std::min() MSVC and HIP Clang (#2952)

* opgraph: fix compilation on Windows

* Implement addlayernorm, T5layernorm (#2833)

* [CK] Bump CK commit hash by updating requirements.txt (#2940)

* [CK] Bump CK commit hash by updating requirements.txt

* update CK commit hash for staging

* [MHA] Implement MIOPEN_BACKEND_ENGINEHEUR_DESCRIPTOR (#2932)

* Add adam and amp adam optimizer (#2868)

* [Windows] fix NOGPU backend compilation (#2953)

* fix NOGPU backend compilation on Windows

* fix tidy format issue

* incorporate review feedback

---------

Co-authored-by: Jun Liu <Liu.Jun@amd.com>

* [Windows] unblock gtest tests discovering (#2904)

* Reduce extreme (argmin, argmax, min, max etc) enhancement in case of inner dim (#2766)

* Fix MIOpen throw message when MIOPEN_OFFLINE_COMPILER_PATHS_V2 is enabled (#2959)

* Fix MIOpen THROW message when nogpu exists and fail to compile

* Update src/hip/hip_build_utils.cpp

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

* fix clang-format issue

---------

Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: mentat <108366729+bghimireamd@users.noreply.github.com>
Co-authored-by: Dmantri98 <109552294+Dmantri98@users.noreply.github.com>
Co-authored-by: JD <jahandad@gmail.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: Daming Feng <dmfeng8898@gmail.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: xu-shawn <50402888+xu-shawn@users.noreply.github.com>
Co-authored-by: Kamil Nasyrov <shurale.nkn@gmail.com>
Co-authored-by: Alex Eremin <CAHEK7@yandex.ru>
Co-authored-by: Artem Tamazov <artem.tamazov@gmail.com>
Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: amberhassaan <amber_474@yahoo.com>
Co-authored-by: Evgenii Averin <86725875+averinevg@users.noreply.github.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: xinlipn <xinlipn@gmail.com>
Co-authored-by: jasberc <146053952+jasberc@users.noreply.github.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Co-authored-by: Vasilii Filippov <DrizztDoUrden@users.noreply.github.com>
Co-authored-by: Seungman Han <120356720+seungmanhan@users.noreply.github.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>
Co-authored-by: Kyeonghwan Ryu <89056320+kyeonghwanryu@users.noreply.github.com>
Co-authored-by: scerzh <102019268+scerzh@users.noreply.github.com>
Co-authored-by: Vsevolod Golovko <vsevolod.golovko2@dxc.com>
Co-authored-by: Jungkeun Kim <et16kr@gmail.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
Co-authored-by: Reid Kawaja <74506315+reidkwja@users.noreply.github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: M.Emin Ozturk <ozturk.27@osu.edu>
Co-authored-by: arvindcheru <90783369+arvindcheru@users.noreply.github.com>
Co-authored-by: Marek Grzegorek <grzegorek.marek@zoho.com>
Co-authored-by: urpetkov-amd <127323899+urpetkov-amd@users.noreply.github.com>
Co-authored-by: M. Saud Ul Hassan <68208941+msaudulhassan@users.noreply.github.com>
Co-authored-by: Christopher Erb <Christopher.Erb@amd>
Co-authored-by: raramakr <91213141+raramakr@users.noreply.github.com>
Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>
Co-authored-by: Bibek Ghimire <gbibek@gmail.com>
Co-authored-by: Alexey Akimov <kikimych@gmail.com>
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Co-authored-by: Seunghoon Lee <lshqqytiger@naver.com>
Co-authored-by: peter <peter.park@amd.com>
Co-authored-by: tflink <tflink@tirfa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants