Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Tensorflow Docker image for Ivy Bridge processors #217

Closed
Pneumaticat opened this issue Oct 21, 2018 · 12 comments

Comments

Projects
None yet
5 participants
@Pneumaticat
Copy link

commented Oct 21, 2018

On https://github.com/RadeonOpenCompute/ROCm#supported-cpus, it says that the CPU should support PCIe Gen 3 and PCIe Atomics, and so it requires a Haswell+ architecture. However, my Ivy Bridge CPU (Intel i7-4820k) actually does support atomics (p. 11), as does the Xeon E5-26xx V2 according to this GitHub issue.

Just to demonstrate, amdkfd initializes correctly on my system:

Oct 18 07:44:27 rem kernel: kfd kfd: Initialized module
Oct 18 07:44:27 rem kernel: kfd kfd: Allocated 3969056 bytes on gart
Oct 18 07:44:27 rem kernel: kfd kfd: added device 1002:67df

Unfortunately, the Tensorflow Docker image doesn't run, because it was compiled with -march=haswell:

root@2a2a4a430b33:/root# python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
2018-10-21 14:20:43.079860: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX2 instructions, but these aren't available on your machine.
Aborted (core dumped)
root@2a2a4a430b33:/root#

Would it be possible to lower the march to ivybridge to support these older, but still functional, processors?

@Pneumaticat Pneumaticat changed the title Build Tensorflow for Ivy Bridge processors Build Tensorflow Docker image for Ivy Bridge processors Oct 21, 2018

@whchung

This comment has been minimized.

Copy link
Collaborator

commented Oct 21, 2018

We have only tested on CPU later than Haswell hence the default config. If possible you could help provide information running HIP directed tests on your machine?

Please clone HIP at https://github.com/ROCm-Developer-Tools/HIP.git and build and run its unit tests and share the information with us. Thank you very much.

@Pneumaticat

This comment has been minimized.

Copy link
Author

commented Oct 21, 2018

Sure! I'll give it a shot and report back.

@sunway513

This comment has been minimized.

Copy link

commented Oct 21, 2018

CC @jlgreathouse for ROCm lvy Bridge support.

@Pneumaticat

This comment has been minimized.

Copy link
Author

commented Oct 21, 2018

@whchung All tests pass on HIP v1.9.1. Ran in the Ubuntu 18.04 docker image with ROCm 1.9.1 installed from repositories.

It was a bit of a struggle freeing enough memory for one of the memcpy tests ;)

Details:

root@rem:/HIP/build# nice -n 19 ctest
Test project /HIP/build
        Start   1: directed_tests/hipEnvVarDriver.tst
  1/105 Test   #1: directed_tests/hipEnvVarDriver.tst ........................................   Passed    1.40 sec
        Start   2: directed_tests/Functional/device/hipFuncDeviceSynchronize.tst
  2/105 Test   #2: directed_tests/Functional/device/hipFuncDeviceSynchronize.tst .............   Passed    2.63 sec
        Start   3: directed_tests/Functional/device/hipFuncGetDevice.tst
  3/105 Test   #3: directed_tests/Functional/device/hipFuncGetDevice.tst .....................   Passed    0.89 sec
        Start   4: directed_tests/Functional/device/hipFuncSetDevice.tst
  4/105 Test   #4: directed_tests/Functional/device/hipFuncSetDevice.tst .....................   Passed    0.16 sec
        Start   5: directed_tests/Functional/device/hipFuncSetDeviceFlags.tst
  5/105 Test   #5: directed_tests/Functional/device/hipFuncSetDeviceFlags.tst ................   Passed    0.23 sec
        Start   6: directed_tests/context/hipCtx_simple.tst
  6/105 Test   #6: directed_tests/context/hipCtx_simple.tst ..................................   Passed    0.18 sec
        Start   7: directed_tests/context/hipMemsetD8.tst
  7/105 Test   #7: directed_tests/context/hipMemsetD8.tst ....................................   Passed    0.41 sec
        Start   8: directed_tests/context/hipMemsetD8-N10--memsetval0x42.tst
  8/105 Test   #8: directed_tests/context/hipMemsetD8-N10--memsetval0x42.tst .................   Passed    0.29 sec
        Start   9: directed_tests/context/hipMemsetD8-N10013--memsetval0x5a.tst
  9/105 Test   #9: directed_tests/context/hipMemsetD8-N10013--memsetval0x5a.tst ..............   Passed    0.30 sec
        Start  10: directed_tests/context/hipMemsetD8-N256M--memsetval0xa6.tst
 10/105 Test  #10: directed_tests/context/hipMemsetD8-N256M--memsetval0xa6.tst ...............   Passed    1.31 sec
        Start  11: directed_tests/deviceLib/hipDeviceMemcpy.tst
 11/105 Test  #11: directed_tests/deviceLib/hipDeviceMemcpy.tst ..............................   Passed    2.39 sec
        Start  12: directed_tests/deviceLib/hipDoublePrecisionIntrinsics.tst
 12/105 Test  #12: directed_tests/deviceLib/hipDoublePrecisionIntrinsics.tst .................   Passed    0.27 sec
        Start  13: directed_tests/deviceLib/hipDoublePrecisionMathDevice.tst
 13/105 Test  #13: directed_tests/deviceLib/hipDoublePrecisionMathDevice.tst .................   Passed    0.22 sec
        Start  14: directed_tests/deviceLib/hipDoublePrecisionMathHost.tst
 14/105 Test  #14: directed_tests/deviceLib/hipDoublePrecisionMathHost.tst ...................   Passed    0.21 sec
        Start  15: directed_tests/deviceLib/hipFloatMath.tst
 15/105 Test  #15: directed_tests/deviceLib/hipFloatMath.tst .................................   Passed    0.77 sec
        Start  16: directed_tests/deviceLib/hipFloatMathPrecise.tst
 16/105 Test  #16: directed_tests/deviceLib/hipFloatMathPrecise.tst ..........................   Passed    0.80 sec
        Start  17: directed_tests/deviceLib/hipIntegerIntrinsics.tst
 17/105 Test  #17: directed_tests/deviceLib/hipIntegerIntrinsics.tst .........................   Passed    0.30 sec
        Start  18: directed_tests/deviceLib/hipMathFunctions.tst
 18/105 Test  #18: directed_tests/deviceLib/hipMathFunctions.tst .............................   Passed    0.24 sec
        Start  19: directed_tests/deviceLib/hipSimpleAtomicsTest.tst
 19/105 Test  #19: directed_tests/deviceLib/hipSimpleAtomicsTest.tst .........................   Passed    0.22 sec
        Start  20: directed_tests/deviceLib/hipSinglePrecisionIntrinsics.tst
 20/105 Test  #20: directed_tests/deviceLib/hipSinglePrecisionIntrinsics.tst .................   Passed    0.48 sec
        Start  21: directed_tests/deviceLib/hipSinglePrecisionMathDevice.tst
 21/105 Test  #21: directed_tests/deviceLib/hipSinglePrecisionMathDevice.tst .................   Passed    1.54 sec
        Start  22: directed_tests/deviceLib/hipSinglePrecisionMathHost.tst
 22/105 Test  #22: directed_tests/deviceLib/hipSinglePrecisionMathHost.tst ...................   Passed    0.25 sec
        Start  23: directed_tests/deviceLib/hipTestDevice.tst
 23/105 Test  #23: directed_tests/deviceLib/hipTestDevice.tst ................................   Passed    0.28 sec
        Start  24: directed_tests/deviceLib/hipTestDeviceSymbol.tst
 24/105 Test  #24: directed_tests/deviceLib/hipTestDeviceSymbol.tst ..........................   Passed    0.96 sec
        Start  25: directed_tests/deviceLib/hipTestHalf.tst
 25/105 Test  #25: directed_tests/deviceLib/hipTestHalf.tst ..................................   Passed    0.63 sec
        Start  26: directed_tests/deviceLib/hipTestNativeHalf.tst
 26/105 Test  #26: directed_tests/deviceLib/hipTestNativeHalf.tst ............................   Passed    0.24 sec
        Start  27: directed_tests/deviceLib/hipThreadFence.tst
 27/105 Test  #27: directed_tests/deviceLib/hipThreadFence.tst ...............................   Passed    0.22 sec
        Start  28: directed_tests/deviceLib/hipVectorTypes.tst
 28/105 Test  #28: directed_tests/deviceLib/hipVectorTypes.tst ...............................   Passed    0.20 sec
        Start  29: directed_tests/deviceLib/hipVectorTypesDevice.tst
 29/105 Test  #29: directed_tests/deviceLib/hipVectorTypesDevice.tst .........................   Passed    0.20 sec
        Start  30: directed_tests/deviceLib/hip_anyall.tst
 30/105 Test  #30: directed_tests/deviceLib/hip_anyall.tst ...................................   Passed    1.01 sec
        Start  31: directed_tests/deviceLib/hip_ballot.tst
 31/105 Test  #31: directed_tests/deviceLib/hip_ballot.tst ...................................   Passed    0.45 sec
        Start  32: directed_tests/deviceLib/hip_brev.tst
 32/105 Test  #32: directed_tests/deviceLib/hip_brev.tst .....................................   Passed    0.27 sec
        Start  33: directed_tests/deviceLib/hip_clz.tst
 33/105 Test  #33: directed_tests/deviceLib/hip_clz.tst ......................................   Passed    0.21 sec
        Start  34: directed_tests/deviceLib/hip_ffs.tst
 34/105 Test  #34: directed_tests/deviceLib/hip_ffs.tst ......................................   Passed    0.24 sec
        Start  35: directed_tests/deviceLib/hip_mbcnt.tst
 35/105 Test  #35: directed_tests/deviceLib/hip_mbcnt.tst ....................................   Passed    0.31 sec
        Start  36: directed_tests/deviceLib/hip_popc.tst
 36/105 Test  #36: directed_tests/deviceLib/hip_popc.tst .....................................   Passed    0.59 sec
        Start  37: directed_tests/deviceLib/hip_test_ldg.tst
 37/105 Test  #37: directed_tests/deviceLib/hip_test_ldg.tst .................................   Passed    0.76 sec
        Start  38: directed_tests/deviceLib/hip_threadfence_system.tst
 38/105 Test  #38: directed_tests/deviceLib/hip_threadfence_system.tst .......................   Passed    0.51 sec
        Start  39: directed_tests/deviceLib/hip_trig.tst
 39/105 Test  #39: directed_tests/deviceLib/hip_trig.tst .....................................   Passed    0.22 sec
        Start  40: directed_tests/kernel/hipEmptyKernel.tst
 40/105 Test  #40: directed_tests/kernel/hipEmptyKernel.tst ..................................   Passed    0.30 sec
        Start  41: directed_tests/kernel/hipGridLaunch.tst
 41/105 Test  #41: directed_tests/kernel/hipGridLaunch.tst ...................................   Passed    0.61 sec
        Start  42: directed_tests/kernel/hipLanguageExtensions.tst
 42/105 Test  #42: directed_tests/kernel/hipLanguageExtensions.tst ...........................   Passed    0.61 sec
        Start  43: directed_tests/kernel/hipLaunchParm.tst
 43/105 Test  #43: directed_tests/kernel/hipLaunchParm.tst ...................................   Passed    0.59 sec
        Start  44: directed_tests/kernel/hipPrintfKernel.tst
 44/105 Test  #44: directed_tests/kernel/hipPrintfKernel.tst .................................   Passed    0.27 sec
        Start  45: directed_tests/kernel/hipTestConstant.tst
 45/105 Test  #45: directed_tests/kernel/hipTestConstant.tst .................................   Passed    0.29 sec
        Start  46: directed_tests/kernel/hipTestMemKernel.tst
 46/105 Test  #46: directed_tests/kernel/hipTestMemKernel.tst ................................   Passed    0.24 sec
        Start  47: directed_tests/kernel/inline_asm_vadd.tst
 47/105 Test  #47: directed_tests/kernel/inline_asm_vadd.tst .................................   Passed    0.77 sec
        Start  48: directed_tests/runtimeApi/device/hipChooseDevice.tst
 48/105 Test  #48: directed_tests/runtimeApi/device/hipChooseDevice.tst ......................   Passed    0.30 sec
        Start  49: directed_tests/runtimeApi/device/hipDeviceComputeCapability.tst
 49/105 Test  #49: directed_tests/runtimeApi/device/hipDeviceComputeCapability.tst ...........   Passed    0.38 sec
        Start  50: directed_tests/runtimeApi/device/hipDeviceGetByPCIBusId.tst
 50/105 Test  #50: directed_tests/runtimeApi/device/hipDeviceGetByPCIBusId.tst ...............   Passed    0.23 sec
        Start  51: directed_tests/runtimeApi/device/hipDeviceGetName.tst
 51/105 Test  #51: directed_tests/runtimeApi/device/hipDeviceGetName.tst .....................   Passed    0.17 sec
        Start  52: directed_tests/runtimeApi/device/hipDeviceGetPCIBusId.tst
 52/105 Test  #52: directed_tests/runtimeApi/device/hipDeviceGetPCIBusId.tst .................   Passed    0.17 sec
        Start  53: directed_tests/runtimeApi/device/hipDeviceSynchronize.tst
 53/105 Test  #53: directed_tests/runtimeApi/device/hipDeviceSynchronize.tst .................   Passed    0.20 sec
        Start  54: directed_tests/runtimeApi/device/hipDeviceTotalMem.tst
 54/105 Test  #54: directed_tests/runtimeApi/device/hipDeviceTotalMem.tst ....................   Passed    0.14 sec
        Start  55: directed_tests/runtimeApi/device/hipGetDevice.tst
 55/105 Test  #55: directed_tests/runtimeApi/device/hipGetDevice.tst .........................   Passed    0.80 sec
        Start  56: directed_tests/runtimeApi/device/hipGetDeviceAttribute.tst
 56/105 Test  #56: directed_tests/runtimeApi/device/hipGetDeviceAttribute.tst ................   Passed    0.57 sec
        Start  57: directed_tests/runtimeApi/device/hipRuntimeGetVersion.tst
 57/105 Test  #57: directed_tests/runtimeApi/device/hipRuntimeGetVersion.tst .................   Passed    0.36 sec
        Start  58: directed_tests/runtimeApi/device/hipSetCachceConfig.tst
 58/105 Test  #58: directed_tests/runtimeApi/device/hipSetCachceConfig.tst ...................   Passed    0.17 sec
        Start  59: directed_tests/runtimeApi/device/hipSetDevice.tst
 59/105 Test  #59: directed_tests/runtimeApi/device/hipSetDevice.tst .........................   Passed    0.25 sec
        Start  60: directed_tests/runtimeApi/device/hipSetDeviceFlags.tst
 60/105 Test  #60: directed_tests/runtimeApi/device/hipSetDeviceFlags.tst ....................   Passed    0.24 sec
        Start  61: directed_tests/runtimeApi/error/hipPeekAtLastError.tst
 61/105 Test  #61: directed_tests/runtimeApi/error/hipPeekAtLastError.tst ....................   Passed    0.19 sec
        Start  62: directed_tests/runtimeApi/event/hipEventRecord--iterations10.tst
 62/105 Test  #62: directed_tests/runtimeApi/event/hipEventRecord--iterations10.tst ..........   Passed    0.62 sec
        Start  63: directed_tests/runtimeApi/event/record_event.tst
 63/105 Test  #63: directed_tests/runtimeApi/event/record_event.tst ..........................   Passed   96.50 sec
        Start  64: directed_tests/runtimeApi/memory/hipHostGetFlags.tst
 64/105 Test  #64: directed_tests/runtimeApi/memory/hipHostGetFlags.tst ......................   Passed    0.62 sec
        Start  65: directed_tests/runtimeApi/memory/hipHostMalloc.tst
 65/105 Test  #65: directed_tests/runtimeApi/memory/hipHostMalloc.tst ........................   Passed    1.85 sec
        Start  66: directed_tests/runtimeApi/memory/hipHostRegister.tst
 66/105 Test  #66: directed_tests/runtimeApi/memory/hipHostRegister.tst ......................   Passed   12.92 sec
        Start  67: directed_tests/runtimeApi/memory/hipMemPtrGetInfo.tst
 67/105 Test  #67: directed_tests/runtimeApi/memory/hipMemPtrGetInfo.tst .....................   Passed    0.49 sec
        Start  68: directed_tests/runtimeApi/memory/hipMemcpy-modes.tst
 68/105 Test  #68: directed_tests/runtimeApi/memory/hipMemcpy-modes.tst ......................   Passed    6.07 sec
        Start  69: directed_tests/runtimeApi/memory/hipMemcpy-size.tst
 69/105 Test  #69: directed_tests/runtimeApi/memory/hipMemcpy-size.tst .......................   Passed   40.07 sec
        Start  70: directed_tests/runtimeApi/memory/hipMemcpy-dev-offsets.tst
 70/105 Test  #70: directed_tests/runtimeApi/memory/hipMemcpy-dev-offsets.tst ................   Passed    7.13 sec
        Start  71: directed_tests/runtimeApi/memory/hipMemcpy-host-offsets.tst
 71/105 Test  #71: directed_tests/runtimeApi/memory/hipMemcpy-host-offsets.tst ...............   Passed    7.00 sec
        Start  72: directed_tests/runtimeApi/memory/hipMemcpy-multithreaded.tst
 72/105 Test  #72: directed_tests/runtimeApi/memory/hipMemcpy-multithreaded.tst ..............   Passed    2.93 sec
        Start  73: directed_tests/runtimeApi/memory/hipMemcpyDtoD.tst
 73/105 Test  #73: directed_tests/runtimeApi/memory/hipMemcpyDtoD.tst ........................   Passed    0.15 sec
        Start  74: directed_tests/runtimeApi/memory/hipMemcpyDtoDAsync.tst
 74/105 Test  #74: directed_tests/runtimeApi/memory/hipMemcpyDtoDAsync.tst ...................   Passed    0.22 sec
        Start  75: directed_tests/runtimeApi/memory/hipMemcpyPeer.tst
 75/105 Test  #75: directed_tests/runtimeApi/memory/hipMemcpyPeer.tst ........................   Passed    0.14 sec
        Start  76: directed_tests/runtimeApi/memory/hipMemcpyPeerAsync.tst
 76/105 Test  #76: directed_tests/runtimeApi/memory/hipMemcpyPeerAsync.tst ...................   Passed    0.15 sec
        Start  77: directed_tests/runtimeApi/memory/hipMemcpy_simple.tst
 77/105 Test  #77: directed_tests/runtimeApi/memory/hipMemcpy_simple.tst .....................   Passed    1.19 sec
        Start  78: directed_tests/runtimeApi/memory/hipMemcpyAsync-simple.tst
 78/105 Test  #78: directed_tests/runtimeApi/memory/hipMemcpyAsync-simple.tst ................   Passed    0.62 sec
        Start  79: directed_tests/runtimeApi/memory/hipMemoryAllocateCoherentDriver.tst
 79/105 Test  #79: directed_tests/runtimeApi/memory/hipMemoryAllocateCoherentDriver.tst ......   Passed    0.13 sec
        Start  80: directed_tests/runtimeApi/memory/hipMemset.tst
 80/105 Test  #80: directed_tests/runtimeApi/memory/hipMemset.tst ............................   Passed    0.19 sec
        Start  81: directed_tests/runtimeApi/memory/hipMemset-N10--memsetval0x42.tst
 81/105 Test  #81: directed_tests/runtimeApi/memory/hipMemset-N10--memsetval0x42.tst .........   Passed    0.18 sec
        Start  82: directed_tests/runtimeApi/memory/hipMemset-N10013--memsetval0x5a.tst
 82/105 Test  #82: directed_tests/runtimeApi/memory/hipMemset-N10013--memsetval0x5a.tst ......   Passed    0.19 sec
        Start  83: directed_tests/runtimeApi/memory/hipMemset-N256M--memsetval0xa6.tst
 83/105 Test  #83: directed_tests/runtimeApi/memory/hipMemset-N256M--memsetval0xa6.tst .......   Passed    1.77 sec
        Start  84: directed_tests/runtimeApi/memory/hipMemset2D.tst
 84/105 Test  #84: directed_tests/runtimeApi/memory/hipMemset2D.tst ..........................   Passed    0.24 sec
        Start  85: directed_tests/runtimeApi/memory/hipMemset3D-N10--memsetval0x42.tst
 85/105 Test  #85: directed_tests/runtimeApi/memory/hipMemset3D-N10--memsetval0x42.tst .......   Passed    0.26 sec
        Start  86: directed_tests/runtimeApi/memory/hipRandomMemcpyAsync.tst
 86/105 Test  #86: directed_tests/runtimeApi/memory/hipRandomMemcpyAsync.tst .................   Passed    0.21 sec
        Start  87: directed_tests/runtimeApi/memory/hipTestMemcpyPin.tst
 87/105 Test  #87: directed_tests/runtimeApi/memory/hipTestMemcpyPin.tst .....................   Passed    0.50 sec
        Start  88: directed_tests/runtimeApi/module/hipFuncGetAttributes.tst
 88/105 Test  #88: directed_tests/runtimeApi/module/hipFuncGetAttributes.tst .................   Passed    0.56 sec
        Start  89: directed_tests/runtimeApi/module/hipFuncSetCacheConfig.tst
 89/105 Test  #89: directed_tests/runtimeApi/module/hipFuncSetCacheConfig.tst ................   Passed    0.40 sec
        Start  90: directed_tests/runtimeApi/multiThread/hipMultiThreadDevice-serial.tst
 90/105 Test  #90: directed_tests/runtimeApi/multiThread/hipMultiThreadDevice-serial.tst .....   Passed    5.45 sec
        Start  91: directed_tests/runtimeApi/multiThread/hipMultiThreadDevice-pyramid.tst
 91/105 Test  #91: directed_tests/runtimeApi/multiThread/hipMultiThreadDevice-pyramid.tst ....   Passed    5.94 sec
        Start  92: directed_tests/runtimeApi/multiThread/hipMultiThreadDevice-nearzero.tst
 92/105 Test  #92: directed_tests/runtimeApi/multiThread/hipMultiThreadDevice-nearzero.tst ...   Passed    3.82 sec
        Start  93: directed_tests/runtimeApi/multiThread/hipMultiThreadStreams1.tst
 93/105 Test  #93: directed_tests/runtimeApi/multiThread/hipMultiThreadStreams1.tst ..........   Passed    7.76 sec
        Start  94: directed_tests/runtimeApi/multiThread/hipMultiThreadStreams2.tst
 94/105 Test  #94: directed_tests/runtimeApi/multiThread/hipMultiThreadStreams2.tst ..........   Passed    0.32 sec
        Start  95: directed_tests/runtimeApi/stream/hipNullStream.tst
 95/105 Test  #95: directed_tests/runtimeApi/stream/hipNullStream.tst ........................   Passed   57.47 sec
        Start  96: directed_tests/runtimeApi/stream/hipStreamAddCallback.tst
 96/105 Test  #96: directed_tests/runtimeApi/stream/hipStreamAddCallback.tst .................   Passed    0.43 sec
        Start  97: directed_tests/runtimeApi/stream/hipStreamGetFlags.tst
 97/105 Test  #97: directed_tests/runtimeApi/stream/hipStreamGetFlags.tst ....................   Passed    0.36 sec
        Start  98: directed_tests/runtimeApi/stream/hipStreamL5.tst
 98/105 Test  #98: directed_tests/runtimeApi/stream/hipStreamL5.tst ..........................   Passed   28.51 sec
        Start  99: directed_tests/runtimeApi/stream/hipStreamSync2.tst
 99/105 Test  #99: directed_tests/runtimeApi/stream/hipStreamSync2.tst .......................   Passed   21.74 sec
        Start 100: directed_tests/runtimeApi/synchronization/copy_coherency.tst
100/105 Test #100: directed_tests/runtimeApi/synchronization/copy_coherency.tst ..............   Passed    2.11 sec
        Start 101: directed_tests/surface/hipSurfaceObj2D.tst
101/105 Test #101: directed_tests/surface/hipSurfaceObj2D.tst ................................   Passed    0.34 sec
        Start 102: directed_tests/texture/hipBindTexRef1DFetch.tst
102/105 Test #102: directed_tests/texture/hipBindTexRef1DFetch.tst ...........................   Passed    0.19 sec
        Start 103: directed_tests/texture/hipTextureObj1DFetch.tst
103/105 Test #103: directed_tests/texture/hipTextureObj1DFetch.tst ...........................   Passed    1.24 sec
        Start 104: directed_tests/texture/hipTextureObj2D.tst
104/105 Test #104: directed_tests/texture/hipTextureObj2D.tst ................................   Passed    0.38 sec
        Start 105: directed_tests/texture/hipTextureRef2D.tst
105/105 Test #105: directed_tests/texture/hipTextureRef2D.tst ................................   Passed    0.54 sec

100% tests passed, 0 tests failed out of 105

Total Test time (real) = 350.74 sec

@jlgreathouse

This comment has been minimized.

Copy link

commented Oct 24, 2018

@sunway513 if that particular model of IVB has support for PCIe gen 3 atomics, so much the better. It looks to be the case, based on the kfd finding and enumerating a Polaris 10 GPU (device ID 67df). I updated our documentation to indicate that some Ivy Bridge-E devices support atomics.

That's not to say that all Ivy Bridge processor will work in ROCm, though. And as may be obvious from the conversations here, we don't test Ivy Bridge for any of our releases inside AMD. :)

@ghostplant

This comment has been minimized.

Copy link

commented Jan 7, 2019

@jlgreathouse Try using -march=x86-64 when running the ./configure of Tensorflow

@sunway513

This comment has been minimized.

Copy link

commented Feb 7, 2019

Hi @ghostplant could you verify if the issue can be reproducible using our dev docker images, e.g.:
rocm/tensorflow:rocm2.1-tf1.12-python3-dev
The docker image includes all the dependencies to build tensorflow from source.

@sunway513 sunway513 self-assigned this Feb 8, 2019

@sunway513

This comment has been minimized.

Copy link

commented Mar 26, 2019

@Pneumaticat are there any updates?

@Pneumaticat

This comment has been minimized.

Copy link
Author

commented Mar 27, 2019

@sunway513 Not much - I just tried @ghostplant's suggestion of using -march=x86-64 on the image tag you mentioned, but it appears that even the build tools are compiled with instructions not present on Ivy Bridge; protoc fails with illegal instruction:

ERROR: /root/tensorflow/tensorflow/core/BUILD:2408:1: ProtoCompile tensorflow/core/lib/core/error_codes.pb.cc failed (Illegal 
instruction): protoc failed: error executing command                                                                          
  (cd /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow && \                            
  exec env - \                                                                                                                
    PATH=/opt/rocm/opencl/bin:/opt/rocm/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
bin:/sbin:/bin \                                                                                                              
    PYTHON_BIN_PATH=/usr/bin/python \                                                                                         
    PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \                                                                  
    TF_DOWNLOAD_CLANG=0 \                                                                                                     
    TF_NEED_CUDA=0 \                                                                                                          
    TF_NEED_OPENCL_SYCL=0 \                                                                                                   
    TF_NEED_ROCM=1 \                                                                                                          
  bazel-out/host/bin/external/protobuf_archive/protoc '--cpp_out=bazel-out/k8-opt/genfiles/' -I. -Iexternal/protobuf_archive/src -Ibazel-out/k8-opt/genfiles/external/protobuf_archive/src tensorflow/core/lib/core/error_codes.proto): protoc failed: error executing command
  (cd /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow && \                           
  exec env - \
    PATH=/opt/rocm/opencl/bin:/opt/rocm/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PYTHON_BIN_PATH=/usr/bin/python

I suspect the build would work, however, if not for the build tools not being compatible. Not sure if there is an easier way to compile the tools for a build environment myself?

@sunway513

This comment has been minimized.

Copy link

commented Mar 27, 2019

Hi @Pneumaticat , maybe you can revert the following commit and try again with the dev docker image: 6616c92

@Pneumaticat

This comment has been minimized.

Copy link
Author

commented Apr 20, 2019

@sunway513 To provide an update, I reverted that commit as you suggested using the develop-upstream-rocm2.3-updates branch using manually installed ROCM dependencies in an Ubuntu 16.04 container, and Tensorflow successfully built and starts!

I have not yet tried out training on the GPU, but I will test that shortly.

Edit: initialization of a tf.Session shows:

Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-04-20 03:42:02.324100: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3700075000 Hz
2019-04-20 03:42:02.326542: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc2586d5bb0 executing computations on platform Host. Devices:
2019-04-20 03:42:02.326886: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-20 03:42:02.331872: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libhip_hcc.so
2019-04-20 03:42:02.344526: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc25875c950 executing computations on platform ROCM. Devices:
2019-04-20 03:42:02.344556: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Device 67df, AMDGPU ISA version: gfx803
2019-04-20 03:42:02.370318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1564] Found device 0 with properties: 
name: Device 67df
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.36
pciBusID 0000:03:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB
2019-04-20 03:42:02.371319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0
2019-04-20 03:42:02.373557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-20 03:42:02.373838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088]      0 
2019-04-20 03:42:02.374343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0:   N 
2019-04-20 03:42:02.375954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Device 67df, pci bus id: 0000:03:00.0)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Device 67df, pci bus id: 0000:03:00.0
2019-04-20 03:42:02.391370: I tensorflow/core/common_runtime/direct_session.cc:282] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Device 67df, pci bus id: 0000:03:00.0

A simple computation using the GPU:

>>> with tf.device('/gpu:0'):
...     a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
...     b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
...     c = tf.matmul(a, b)
... 
>>> with tf.Session() as sess:
...     print (sess.run(c))
... 
2019-04-20 03:43:15.885665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0
2019-04-20 03:43:15.885856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-20 03:43:15.885871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088]      0 
2019-04-20 03:43:15.885879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0:   N 
2019-04-20 03:43:15.885934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Device 67df, pci bus id: 0000:03:00.0)

[[22. 28.]
 [49. 64.]]

(taken from https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell)

@Pneumaticat

This comment has been minimized.

Copy link
Author

commented Apr 21, 2019

...Actually, it seems like the stock rocm/tensorflow:latest now works? No more crash on import.

docker run --rm -it --privileged rocm/tensorflow                                                        130 ↵
root@81e6a4c40e32:/root# impor^C
root@81e6a4c40e32:/root# python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-04-21 02:24:30.496717: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2019-04-21 02:24:30.514573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties: 
name: Device 67df
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.36
pciBusID 0000:03:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB

I'm not sure what changed in the build configuration of the stock image, but in any case it seems like the issue has been resolved. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.