./rocprofiler/bin/rocprof -d traces --hsa-trace --hip-trace --kfd-trace python3 ./test.py RPL: on '210319_074742' from '/home/yoann/rocprofiler' in '/home/yoann' RPL: profiling '"python3" "./test.py"' RPL: input file '' RPL: output dir 'traces/rpl_data_210319_074742_860200' RPL: result dir 'traces/rpl_data_210319_074742_860200/input_results_210319_074742' 2021-03-19 07:47:43.940137: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-19 07:47:43.940305: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libamdhip64.so ROCProfiler: input from "traces/rpl_data_210319_074742_860200/input.xml" 0 metrics ROCTracer (pid=860222): KFD-trace() HSA-trace() HSA-activity-trace() HIP-trace() 2021-03-19 07:47:44.015203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1738] Found device 0 with properties: pciBusID: 0000:04:00.0 name: Vega 10 XL/XT [Radeon RX Vega 56/64] ROCm AMD GPU ISA: gfx900 coreClock: 1.59GHz coreCount: 56 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: 381.47GiB/s 2021-03-19 07:47:44.017385: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library librocblas.so 2021-03-19 07:47:44.018647: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libMIOpen.so 2021-03-19 07:47:44.031696: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library librocfft.so 2021-03-19 07:47:44.031948: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library librocrand.so 2021-03-19 07:47:44.032509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-03-19 07:47:44.032773: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-03-19 07:47:44.033137: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-19 07:47:44.033471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1738] Found device 0 with properties: pciBusID: 0000:04:00.0 name: Vega 10 XL/XT [Radeon RX Vega 56/64] ROCm AMD GPU ISA: gfx900 coreClock: 1.59GHz coreCount: 56 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: 381.47GiB/s 2021-03-19 07:47:44.033490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library librocblas.so 2021-03-19 07:47:44.033499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libMIOpen.so 2021-03-19 07:47:44.033516: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library librocfft.so 2021-03-19 07:47:44.033536: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library librocrand.so 2021-03-19 07:47:44.034059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-03-19 07:47:44.034216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-03-19 07:47:44.034225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-03-19 07:47:44.034228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-03-19 07:47:44.035025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7685 MB memory) -> physical GPU (device: 0, name: Vega 10 XL/XT [Radeon RX Vega 56/64], pci bus id: 0000:04:00.0) 2021-03-19 07:47:44.658930: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-03-19 07:47:44.659388: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3397895000 Hz 2021-03-19 07:47:44.932245: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library librocblas.so 2021-03-19 07:47:44.976109: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libMIOpen.so 2.4.0 x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 26, 26, 32) 320 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 11, 11, 64) 18496 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0 _________________________________________________________________ flatten (Flatten) (None, 1600) 0 _________________________________________________________________ dropout (Dropout) (None, 1600) 0 _________________________________________________________________ dense (Dense) (None, 10) 16010 ================================================================= Total params: 34,826 Trainable params: 34,826 Non-trainable params: 0 _________________________________________________________________ Epoch 1/2 422/422 [==============================] - 9s 12ms/step - loss: 0.7884 - accuracy: 0.7541 - val_loss: 0.0821 - val_accuracy: 0.9777 Epoch 2/2 422/422 [==============================] - 5s 12ms/step - loss: 0.1180 - accuracy: 0.9631 - val_loss: 0.0548 - val_accuracy: 0.9863 Test loss: 0.05501381307840347 Test accuracy: 0.9824000000953674 ROCPRofiler: 92254 contexts collected, output directory traces/rpl_data_210319_074742_860200/input_results_210319_074742 malloc(): mismatching next->prev_size (unsorted) ./rocprofiler/bin/rocprof: line 271: 860222 Aborted (core dumped) "python3" "./test.py" START timestamp found (0ns) scan kfd API data 2803664:2803665 /home/yoann/rocprofiler/bin/tblextr.py: kfd bad record: '' Profiling data corrupted: ' traces/rpl_data_210319_074742_860200/input_results_210319_074742/results.txt'