Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limits file not found #2049

Closed
Epliz opened this issue Mar 26, 2023 · 20 comments
Closed

limits file not found #2049

Epliz opened this issue Mar 26, 2023 · 20 comments
Assignees

Comments

@Epliz
Copy link

Epliz commented Mar 26, 2023

Hi,

When running the "AI Benchmark" form https://ai-benchmark.com/ranking_deeplearning.html for the first time, I got a crash with the following info:

MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-e64065/input/naive_conv.cpp:39:10: fatal error: 'limits' file not found
#include <limits> // std::numeric_limits
         ^~~~~~~~
1 error generated when compiling for gfx908.
terminate called after throwing an instance of 'miopen::Exception'
  what():  /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/src/hipoc/hipoc_program.cpp:304: Code object build failed. Source: naive_conv.cpp
Aborted (core dumped)

It seems like I resolved it by installing the libstdc++-12-dev on ubuntu 22.04.2 LTS .
I have rocm 5.4.3 installed through packages.
I guess one of your packages should declare that one or the right one as dependency.

Best regards,
Epliz

@atamazov
Copy link
Contributor

@Epliz Thanks for reporting. It seems like you do not have the binary kernel cache package installed.

This is HIPRTC-specific issue. naive_conv.cpp needs to be fixed like this:

https://github.com/ROCmSoftwarePlatform/MIOpen/blob/d3ee8a87fa6e7b0a9db1b41f1d7c11acb529786a/src/kernels/static_composable_kernel/include/utility/static_kernel_reduction_operator.hpp#L29-L31

More details can be found at #627.

[Attribution] @junliume @johnny-keker https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_normal, Proposed assignee: @carlushuang

@junliume
Copy link
Collaborator

@Epliz @atamazov this is a duplicated issue of #1921 at least with similar root cause

@Epliz
Copy link
Author

Epliz commented Mar 27, 2023

thank you for looking into this.
Regarding:

It seems like you do not have the binary kernel cache package installed.

I have miopenkernels-gfx908-120kdb installed as mentioned at https://docs.amd.com/bundle/MIOpen_gh-pages/page/install.html . Anything else I should make sure I have installed?
Any environment variable that I should make sure is set?

@atamazov
Copy link
Contributor

@Epliz Do you use MI100 with 120 compute units?

@atamazov
Copy link
Contributor

@junliume

@Epliz @atamazov this is a duplicated issue of #1921 at least with similar root cause

Well, this is actually related to #1926.

@Epliz
Copy link
Author

Epliz commented Mar 27, 2023

@Epliz Do you use MI100 with 120 compute units?

Yes, as shown by the rocminfo output:

$ rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 1700 Eight-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 1700 Eight-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3000                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32792528(0x1f45fd0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32792528(0x1f45fd0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32792528(0x1f45fd0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-6e83bfb5727a9272               
  Marketing Name:          AMD Instinct MI100                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   3840                               
  Internal Node ID:        1                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx1031                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6700 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      3072(0xc00) KB                     
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29663(0x73df)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2855                               
  BDFID:                   4608                               
  Internal Node ID:        2                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    12566528(0xbfc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1031         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

I typically disable the RX 6700XT that I use for display output whenever using tensorflow by setting ROCR_VISIBLE_DEVICES accordingly.

@atamazov
Copy link
Contributor

@Epliz

I typically disable the RX 6700XT that I use for display output whenever using tensorflow by setting ROCR_VISIBLE_DEVICES accordingly.

This is good, because MIOpen currently requires all GPUs to be identical.

@atamazov
Copy link
Contributor

@Epliz

I have miopenkernels-gfx908-120kdb installed as mentioned at https://docs.amd.com/bundle/MIOpen_gh-pages/page/install.html . Anything else I should make sure I have installed? Any environment variable that I should make sure is set?

No, you seem to have done everything right. But the library should read the naive kernel from the precompiled binary cache because you have it installed. I am wondering why it even tries to build it. Can you attach a text file with log captured with export MIOPEN_LOG_LEVEL=6?

CC @JehandadKhan

@Epliz
Copy link
Author

Epliz commented Mar 27, 2023

I can't reproduce anymore even after installing the libstdc++12 package, so I think it was most likely a mistake on my side. I probably had not yet installed properly the kernels.
I don't think it is worth wasting your time on this.

If you are interested in seeing the logs, I am still attaching them. But they seem to show that the kernel database was found properly.
aibenchmark_miopen_log.txt

@atamazov
Copy link
Contributor

atamazov commented Mar 27, 2023

@Epliz Please remove user kernel cache and try (.ukdb files somewhere in ~/.cache/miopen).

@atamazov
Copy link
Contributor

@Epliz
This one is installed from precompiled kernel package:

MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file /opt/rocm-5.4.3/share/miopen/db/gfx90878.kdb

This is user kernel cache (I've asked you to remove):

MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /home/me/.cache/miopen/2.19.0.d437a795f/gfx90878.ukdb

@Epliz
Copy link
Author

Epliz commented Mar 28, 2023

Here is with deleting the cache before
aibenchmark_miopen_log2.txt

@atamazov
Copy link
Contributor

@Epliz it does not try to load or build the naive kernel, so there is no error either ;) Anyway, we have PR #2050 with fix, and this issue can be closed.

@Epliz
Copy link
Author

Epliz commented Mar 28, 2023

Yes, if anything it was most likely due to not having the package of kernels installed.

For that matter, I think users would appreciate if the amdgpu-install scripts either detected the GPU and installed the right kernel cache, or took an option for the GPU archs to specialize the installation for (installing such kind of kernel cache packages). I would be happy to open a ticket about that wherever you tell me it should be opened.

@atamazov
Copy link
Contributor

@Epliz Thanks, I think we already have this feature in our backlog and the ticket is not necessary.

CC @JehandadKhan

@Epliz
Copy link
Author

Epliz commented Mar 29, 2023

The pr for the fix has been merged, so closing the ticket.

Thank you for your quick action and your help was much appreciated!

@Epliz Epliz closed this as completed Mar 29, 2023
@Mershl
Copy link

Mershl commented Jun 17, 2023

Latest nightly of torch on Fedora 38 seems still affected.

pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.5

shows

MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-d809c6/input/naive_conv.cpp:39:10: fatal error: 'limits' file not found
#include <limits> // std::numeric_limits
         ^~~~~~~~

on use.

@junliume
Copy link
Collaborator

junliume commented Jun 18, 2023

@Mershl could you help to provide more detailed log? Could you provide system info (which GPU) and try if the above suggestions might help? i.e. install KDB, and/or install the libstdc++-12-dev on ubuntu 22.04.2 LTS.

CC: @jeffdaily since this is PT wheel nightly related issue. PR #2050 was not included in ROCm 5.5, and maybe we should.

@Mershl
Copy link

Mershl commented Jun 18, 2023

try if the above suggestions might help? i.e. install KDB, and/or install the libstdc++-12-dev on ubuntu 22.04.2 LTS.

Providing the libstdc++-12-dev equivalent (libstdc++-devel) on Fedora alone did not fix the issue. Installing the "Development Tools" "Development Libraries" group (similar to the build-essential group on Ubuntu) as well fixes the issue. Generation is completing successfully on latest torch nightly.

Interesting though that I can now remove stable-diffusion-webui + its venv + libstdc++-12-dev + the dev tools + do a reboot
and a freshly checked out stable-diffusion-webui will still generate successfully. Is the built kernel / openmi lib cached somewhere?

EDIT: it fails again when using previously never used features of stable-diffusion-webui (like enabling HiRes fix) and is again fixed when providing the dependencies listed above.

@junliume
Copy link
Collaborator

Built kernels are cached at ~/.cache/miopen, and #2050 is only recently merged into PT wheel nightly and expect to exist there in the next build https://download.pytorch.org/whl/nightly/rocm5.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants