Skip to content

MKL FP16 GEMM crash on MTL iGPU #524

@rnwang04

Description

@rnwang04

Summary

I found on MTL iGPU, if I call FP16 gemm of onemkl (no matter using OneAPI 2024.0 or 2024.2), the program will crash, and if I call it many times, it will cause my machine to freeze directly.
However, on ARC, everything is fine.

Version

oneAPI 2024.0 or oneAPI 2024.2 .

Environment

  • minimal c++ program
  • Windows 11
  • icx 2024.0.2 or 2024.2
  • Hardware: Intel Core Ultra iGPU

Steps to reproduce

#include <sycl/sycl.hpp>
#include <oneapi/mkl.hpp>
#include <iostream>
using namespace sycl;
int main() {

   queue q{gpu_selector_v};
   std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;

   const int M = 1024;
   const int N = 11008;
   const int K = 4096;

   float* A_h = new float[M * K];
   float* B_h = new float[K * N];
   float* C_h = new float[M * N];
   // random
   for (int i = 0; i < M * K; ++i) {
       A_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
   }
   for (int i = 0; i < K * N; ++i) {
       B_h[i] = static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
   }
   // convert input to half
   sycl::half* A_h_half = new sycl::half[M * K];
   sycl::half* B_h_half = new sycl::half[K * N];
   for (int i = 0; i < M * K; ++i) {
       A_h_half[i] = sycl::half(A_h[i]);
   }
   for (int i = 0; i < K * N; ++i) {
       B_h_half[i] = sycl::half(B_h[i]);
   }

   buffer<sycl::half> A(A_h_half, M * K);
   buffer<sycl::half> B(B_h_half, K * N);
   buffer<float> C(C_h, M * N);
   // Use OneMKL to do GEMM
   {
       q.submit([&](handler &h) {
           sycl::accessor A_acc(A, h, sycl::write_only, sycl::no_init);
           sycl::accessor B_acc(B, h, sycl::write_only, sycl::no_init);
           sycl::accessor C_acc(C, h, sycl::write_only, sycl::no_init);
           oneapi::mkl::blas::row_major::gemm(
                q,
                oneapi::mkl::transpose::nontrans,
                oneapi::mkl::transpose::trans,
                M, N, K,
                1.0f, A_acc.get_pointer(), K,
                B_acc.get_pointer(), K,
                0.0f, C_acc.get_pointer(), N);
       }).wait();
   }

   delete[] A_h;
   delete[] B_h;
   delete[] C_h;
   delete[] A_h_half;
   delete[] B_h_half;

   printf("run success!\n");
   return 0;
}

running above script with below command:

# for linux
source /opt/intel/oneapi/setvars.sh
icpx -std=c++17 -fsycl -fopenmp -lpthread -l mkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_sycl_blas -lmkl_intel_ilp64 -lmkl_tbb_thread -o gemm_fp16 gemm_fp16.cpp

# for windows
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" --force
icx -std=c++17 -fsycl -fopenmp mkl_sycl_blas_dll.lib mkl_intel_ilp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib -o gemm_fp16 gemm_fp16.cpp

Observed behavior

If I run above command on Linux Arc A770, it works fine:
image

If I run above command on Windows MTL iGPU, it fails and even cause a black screen:
image

Expected behavior

I hope above FP16 GEMM can work for MTL iGPU. Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions