Permalink
Browse files

Changing Odroid XU4 NEXT mainline u-boot with legacy for compatibilit…

…y reasons - now it boots from SD and eMMC out of the box, kernel switching works.
  • Loading branch information...
igorpecovnik committed May 18, 2017
1 parent d2ac18d commit 9268b73289504d4bf095214dbe19410a0c0654c7
@@ -10,7 +10,7 @@ setenv fdt_high "0xffffffff"
# run copy_uboot_sd2emmc
# Mac address configuration
setenv macaddr "00:1e:06:61:7a:55
setenv macaddr "00:1e:06:61:7a:55"
#------------------------------------------------------------------------------------------------------
# Basic Ubuntu Setup. Don't touch unless you know what you are doing.
@@ -207,7 +207,10 @@ setenv hdmi_phy_control "hdmi_tx_amp_lvl=${hdmi_tx_amp_lvl} hdmi_tx_lvl_ch0=${hd
# Load kernel, initrd and dtb in that sequence
ext4load mmc 0:1 0x40008000 /boot/zImage || fatload mmc 0:1 0x40008000 zImage || ext4load mmc 0:1 0x40008000 zImage
ext4load mmc 0:1 0x42000000 /boot/uInitrd || fatload mmc 0:1 0x42000000 uInitrd || ext4load mmc 0:1 0x42000000 uInitrd
ext4load mmc 0:1 0x44000000 /boot/dtb/exynos5422-odroidxu3.dtb || fatload mmc 0:1 0x44000000 dtb/exynos5422-odroidxu3.dtb || ext4load mmc 0:1 0x44000000 dtb/exynos5422-odroidxu3.dtb
# If we load mainline kernel we need some other approach
if ext4load mmc 0:1 0x00000000 "/boot/.next"; then echo "Found mainline kernel configuration"; ext4load mmc 0:1 0x44000000 /boot/dtb/exynos5422-odroidxu4.dtb || fatload mmc 0:1 0x44000000 dtb/exynos5422-odroidxu4.dtb || ext4load mmc 0:1 0x44000000 dtb/exynos5422-odroidxu4.dtb; else echo "Found legacy kernel configuration"; ext4load mmc 0:1 0x44000000 /boot/dtb/exynos5422-odroidxu3.dtb || fatload mmc 0:1 0x44000000 dtb/exynos5422-odroidxu3.dtb || ext4load mmc 0:1 0x44000000 dtb/exynos5422-odroidxu3.dtb; fi
#failsafe
if test "${fdtloaded}" != "true"; then ext4load mmc 0:1 0x44000000 exynos5422-odroidxu4.dtb; fi
# set FDT address
fdt addr 0x44000000
@@ -36,6 +36,19 @@ case $BRANCH in
;;
esac
# USE LEGACY UBOOT for both kernels to secure proper eMMC booting, until mainline u-boot get this function
BOOTSOURCE='https://github.com/hardkernel/u-boot.git'
BOOTBRANCH='branch:odroidxu3-v2012.07'
BOOTSCRIPT='boot-odroid-xu4-default.ini:boot.ini'
BOOTDIR='u-boot-odroidxu'
BOOTPATCHDIR='u-boot-odroidxu4-default'
UBOOT_TARGET_MAP=';;sd_fuse/hardkernel/bl1.bin.hardkernel sd_fuse/hardkernel/bl2.bin.hardkernel u-boot.bin sd_fuse/hardkernel/tzsw.bin.hardkernel'
BOOTENV_FILE='odroidxu4-next.txt'
UBOOT_USE_GCC='< 4.9'
HAS_UUID_SUPPORT=yes
# USE LEGACY UBOOT for both kernels to secure proper eMMC booting, until mainline u-boot get this function
CPUMIN=600000
CPUMAX=2000000
GOVERNOR=ondemand
View
@@ -420,6 +420,12 @@ prepare_partitions()
-e "s/rootfstype \"ext4\"/rootfstype \"$ROOTFS_TYPE\"/" $CACHEDIR/$SDCARD/boot/$bootscript_dst
fi
# if we have boot.ini = remove armbianEnv.txt and add UUID there if enabled
if [[ -f $CACHEDIR/$SDCARD/boot/boot.ini ]]; then
[[ $HAS_UUID_SUPPORT == yes ]] && sed -i 's/^setenv rootdev .*/setenv rootdev "'$rootfs'"/' $CACHEDIR/$SDCARD/boot/boot.ini
[[ -f $CACHEDIR/$SDCARD/boot/armbianEnv.txt ]] && rm $CACHEDIR/$SDCARD/boot/armbianEnv.txt
fi
# recompile .cmd to .scr if boot.cmd exists
[[ -f $CACHEDIR/$SDCARD/boot/boot.cmd ]] && \
mkimage -C none -A arm -T script -d $CACHEDIR/$SDCARD/boot/boot.cmd $CACHEDIR/$SDCARD/boot/boot.scr > /dev/null 2>&1
@@ -217,7 +217,7 @@ index 3001ec5..dc76d3c
+sed -e "s/exit 0//g" -i $tmpdir/DEBIAN/postinst
+cat >> $tmpdir/DEBIAN/postinst <<EOT
+ln -sf $(basename $kernel_tmp_version) /boot/zImage > /dev/null 2>&1 || mv /$kernel_tmp_version /boot/zImage
+
+rm -f /boot/.next
+
+exit 0
+EOT

7 comments on commit 9268b73

@ThomasKaiser

This comment has been minimized.

Member

ThomasKaiser replied May 19, 2017

Doesn't work for me (Xenial build host):

[ o.k. ] Syncing clock [ host ]
[ o.k. ] Downloading sources 
[ o.k. ] Checking git sources [ u-boot-odroidxu odroidxu3-v2012.07 ]
[ .... ] Creating local copy 
[ .... ] Fetching updates 
remote: Counting objects: 8057, done.
remote: Compressing objects: 100% (7030/7030), done.
remote: Total 8057 (delta 2116), reused 4716 (delta 890), pack-reused 0
Receiving objects: 100% (8057/8057), 14.64 MiB | 7.31 MiB/s, done.
Resolving deltas: 100% (2116/2116), done.
From https://github.com/hardkernel/u-boot
 * branch            odroidxu3-v2012.07 -> FETCH_HEAD
 * [new branch]      odroidxu3-v2012.07 -> origin/odroidxu3-v2012.07
[ .... ] Checking out 
[ o.k. ] Checking git sources [ odroidxu-mainline-hardkernel odroidxu4-4.9.y ]
[ .... ] Checking out 
[ o.k. ] Checking git sources [ sunxi-tools master ]
[ .... ] Up to date 
[ o.k. ] Cleaning output/debs for [ odroidxu4 next ]
[ o.k. ] Cleaning [ u-boot-odroidxu/odroidxu3-v2012.07 ]
[ o.k. ] Compiling u-boot [ 2012.07 ]
[ o.k. ] Compiler version [ arm-linux-gnueabihf-gcc 4.8.3 ]
[ .... ] Checking out sources 
[ o.k. ] Cleaning [ u-boot-odroidxu/odroidxu3-v2012.07 ]
[ o.k. ] Started patching process for [ u-boot odroidxu4-odroidxu4-next ]
[ o.k. ] Looking for user patches in [ userpatches/u-boot/u-boot-odroidxu4-default ]
[ o.k. ] ... [l][c] cfgload-ext4-boot-improvements.patch [ succeeded ]
[ o.k. ] ... [l][c] ext4-fixes-pr-28.patch [ succeeded ]
make: *** No rule to make target 'odroid-xu3_defconfig'.  Stop.
System not configured - see README
Makefile:668: recipe for target 'all' failed
make: *** [all] Error 1
[ error ] ERROR in function compile_uboot [ common.sh:97 ]
[ error ] U-boot compilation failed 
[ o.k. ] Process terminated 
@igorpecovnik

This comment has been minimized.

Member

igorpecovnik replied May 19, 2017

It should work now.

@ThomasKaiser

This comment has been minimized.

Member

ThomasKaiser replied May 19, 2017

Yep, b2abecd fixed it. Some quick tests with iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 (Samsung EVO840 with ASM1153), tinymembench and sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$(grep -c '^processor' /proc/cpuinfo) to check for u-boot/kernel dependency problems:

Mainline u-boot (debug log)

                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4    17652    17506    16171    16525    16262    17901
      102400      16    60345    66141    55086    54723    51006    61896
      102400     512   175701   195524   185331   190530   195533   201955
      102400    1024   188503   208844   215534   217313   218125   208614
      102400   16384   211165   327497   325548   325781   325544   328529

And

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   1175.3 MB/s
 C copy backwards (32 byte blocks)                    :   1163.4 MB/s (0.2%)
 C copy backwards (64 byte blocks)                    :   2327.0 MB/s (0.1%)
 C copy                                               :   1206.1 MB/s (0.9%)
 C copy prefetched (32 bytes step)                    :   1447.5 MB/s (0.4%)
 C copy prefetched (64 bytes step)                    :   1446.8 MB/s (0.3%)
 C 2-pass copy                                        :   1128.1 MB/s
 C 2-pass copy prefetched (32 bytes step)             :   1392.6 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   1389.8 MB/s (1.8%)
 C fill                                               :   4901.1 MB/s (0.6%)
 C fill (shuffle within 16 byte blocks)               :   1843.4 MB/s
 C fill (shuffle within 32 byte blocks)               :   1842.7 MB/s (0.3%)
 C fill (shuffle within 64 byte blocks)               :   1946.3 MB/s (0.3%)
 ---
 standard memcpy                                      :   2313.6 MB/s (1.9%)
 standard memset                                      :   4895.8 MB/s (0.8%)
 ---
 NEON read                                            :   3373.8 MB/s
 NEON read prefetched (32 bytes step)                 :   4279.1 MB/s (0.3%)
 NEON read prefetched (64 bytes step)                 :   4291.7 MB/s (3.2%)
 NEON read 2 data streams                             :   3455.0 MB/s
 NEON read 2 data streams prefetched (32 bytes step)  :   4437.8 MB/s (0.6%)
 NEON read 2 data streams prefetched (64 bytes step)  :   4445.9 MB/s
 NEON copy                                            :   2614.5 MB/s
 NEON copy prefetched (32 bytes step)                 :   2916.5 MB/s (2.2%)
 NEON copy prefetched (64 bytes step)                 :   2907.7 MB/s (0.5%)
 NEON unrolled copy                                   :   2255.6 MB/s (0.4%)
 NEON unrolled copy prefetched (32 bytes step)        :   3226.2 MB/s
 NEON unrolled copy prefetched (64 bytes step)        :   3251.9 MB/s (0.7%)
 NEON copy backwards                                  :   1224.0 MB/s (1.6%)
 NEON copy backwards prefetched (32 bytes step)       :   1433.1 MB/s
 NEON copy backwards prefetched (64 bytes step)       :   1432.9 MB/s (0.2%)
 NEON 2-pass copy                                     :   2094.2 MB/s
 NEON 2-pass copy prefetched (32 bytes step)          :   2276.6 MB/s (0.4%)
 NEON 2-pass copy prefetched (64 bytes step)          :   2278.4 MB/s (3.5%)
 NEON unrolled 2-pass copy                            :   1389.8 MB/s (0.3%)
 NEON unrolled 2-pass copy prefetched (32 bytes step) :   1715.7 MB/s
 NEON unrolled 2-pass copy prefetched (64 bytes step) :   1732.0 MB/s
 NEON fill                                            :   4882.2 MB/s (0.5%)
 NEON fill backwards                                  :   1841.5 MB/s (2.0%)
 VFP copy                                             :   2470.7 MB/s (0.8%)
 VFP 2-pass copy                                      :   1331.2 MB/s
 ARM fill (STRD)                                      :   4904.5 MB/s (0.5%)
 ARM fill (STM with 8 registers)                      :   4869.6 MB/s (0.9%)
 ARM fill (STM with 4 registers)                      :   4876.6 MB/s (0.3%)
 ARM copy prefetched (incr pld)                       :   2934.0 MB/s (0.6%)
 ARM copy prefetched (wrap pld)                       :   2767.9 MB/s (0.6%)
 ARM 2-pass copy prefetched (incr pld)                :   1662.0 MB/s
 ARM 2-pass copy prefetched (wrap pld)                :   1618.3 MB/s (2.3%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON read (from framebuffer)                         :  12212.7 MB/s
 NEON copy (from framebuffer)                         :   7047.3 MB/s (0.8%)
 NEON 2-pass copy (from framebuffer)                  :   4669.0 MB/s (0.2%)
 NEON unrolled copy (from framebuffer)                :   5687.9 MB/s
 NEON 2-pass unrolled copy (from framebuffer)         :   3799.7 MB/s (0.2%)
 VFP copy (from framebuffer)                          :   5734.4 MB/s (0.4%)
 VFP 2-pass copy (from framebuffer)                   :   3520.9 MB/s
 ARM copy (from framebuffer)                          :   7585.4 MB/s (0.3%)
 ARM 2-pass copy (from framebuffer)                   :   3786.0 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.1 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.1 ns 
     65536 :    4.4 ns          /     6.8 ns 
    131072 :    6.7 ns          /     9.1 ns 
    262144 :    9.6 ns          /    12.0 ns 
    524288 :   11.1 ns          /    13.7 ns 
   1048576 :   12.0 ns          /    14.6 ns 
   2097152 :   21.0 ns          /    29.4 ns 
   4194304 :   96.2 ns          /   144.5 ns 
   8388608 :  134.8 ns          /   183.0 ns 
  16777216 :  154.4 ns          /   198.4 ns 
  33554432 :  170.1 ns          /   218.5 ns 
  67108864 :  180.0 ns          /   231.4 ns 

And

total time:                          47.5736s
total number of events:              10000
total time taken by event execution: 380.3276
per-request statistics:
     min:                                 22.40ms
     avg:                                 38.03ms
     max:                                123.54ms
     approx.  95 percentile:              56.39ms
@igorpecovnik

This comment has been minimized.

Member

igorpecovnik replied May 19, 2017

legacy u-boot and 4.9.28

Test execution summary:
    total time:                          44.0843s
    total number of events:              10000
    total time taken by event execution: 352.4561
    per-request statistics:
         min:                                 23.16ms
         avg:                                 35.25ms
         max:                                121.81ms
         approx.  95 percentile:              55.87ms

Threads fairness:
    events (avg/stddev):           1250.0000/142.72
    execution time (avg/stddev):   44.0570/0.01
@ThomasKaiser

This comment has been minimized.

Member

ThomasKaiser replied May 19, 2017

Legacy u-boot (debug log):

                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4    17429    18063    16711    16583    16662    18157
      102400      16    60515    66411    54269    53884    53125    62796
      102400     512   196927   222043   194967   201697   197068   227450
      102400    1024   209206   238126   216232   224110   222387   203806
      102400   16384   212471   319707   334500   333360   328053   344759

And

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   1189.4 MB/s (3.8%)
 C copy backwards (32 byte blocks)                    :   1152.0 MB/s (2.8%)
 C copy backwards (64 byte blocks)                    :   2357.6 MB/s (0.3%)
 C copy                                               :   1205.5 MB/s
 C copy prefetched (32 bytes step)                    :   1446.0 MB/s
 C copy prefetched (64 bytes step)                    :   1446.1 MB/s
 C 2-pass copy                                        :   1127.8 MB/s (1.7%)
 C 2-pass copy prefetched (32 bytes step)             :   1392.9 MB/s (0.2%)
 C 2-pass copy prefetched (64 bytes step)             :   1390.6 MB/s
 C fill                                               :   4910.7 MB/s (0.8%)
 C fill (shuffle within 16 byte blocks)               :   1828.3 MB/s
 C fill (shuffle within 32 byte blocks)               :   1828.3 MB/s (1.9%)
 C fill (shuffle within 64 byte blocks)               :   1914.6 MB/s
 ---
 standard memcpy                                      :   2293.3 MB/s (0.5%)
 standard memset                                      :   4916.5 MB/s (0.7%)
 ---
 NEON read                                            :   3542.3 MB/s
 NEON read prefetched (32 bytes step)                 :   4448.8 MB/s (4.8%)
 NEON read prefetched (64 bytes step)                 :   4461.6 MB/s
 NEON read 2 data streams                             :   3642.1 MB/s (0.4%)
 NEON read 2 data streams prefetched (32 bytes step)  :   4595.0 MB/s (0.5%)
 NEON read 2 data streams prefetched (64 bytes step)  :   4604.1 MB/s (3.1%)
 NEON copy                                            :   2810.4 MB/s (0.4%)
 NEON copy prefetched (32 bytes step)                 :   3137.8 MB/s
 NEON copy prefetched (64 bytes step)                 :   3132.1 MB/s (0.4%)
 NEON unrolled copy                                   :   2374.8 MB/s (1.3%)
 NEON unrolled copy prefetched (32 bytes step)        :   3416.7 MB/s (2.4%)
 NEON unrolled copy prefetched (64 bytes step)        :   3439.7 MB/s (0.8%)
 NEON copy backwards                                  :   1256.9 MB/s (0.2%)
 NEON copy backwards prefetched (32 bytes step)       :   1465.3 MB/s (0.8%)
 NEON copy backwards prefetched (64 bytes step)       :   1465.2 MB/s (1.9%)
 NEON 2-pass copy                                     :   2086.0 MB/s
 NEON 2-pass copy prefetched (32 bytes step)          :   2319.5 MB/s (0.3%)
 NEON 2-pass copy prefetched (64 bytes step)          :   2322.0 MB/s
 NEON unrolled 2-pass copy                            :   1416.1 MB/s (0.3%)
 NEON unrolled 2-pass copy prefetched (32 bytes step) :   1745.3 MB/s
 NEON unrolled 2-pass copy prefetched (64 bytes step) :   1764.5 MB/s (1.8%)
 NEON fill                                            :   4872.1 MB/s (0.5%)
 NEON fill backwards                                  :   1861.4 MB/s (0.5%)
 VFP copy                                             :   2620.3 MB/s (1.2%)
 VFP 2-pass copy                                      :   1375.0 MB/s (1.9%)
 ARM fill (STRD)                                      :   4906.1 MB/s (0.6%)
 ARM fill (STM with 8 registers)                      :   4884.5 MB/s (0.3%)
 ARM fill (STM with 4 registers)                      :   4893.3 MB/s (0.9%)
 ARM copy prefetched (incr pld)                       :   3159.3 MB/s
 ARM copy prefetched (wrap pld)                       :   2988.2 MB/s
 ARM 2-pass copy prefetched (incr pld)                :   1678.1 MB/s
 ARM 2-pass copy prefetched (wrap pld)                :   1652.6 MB/s (0.3%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON read (from framebuffer)                         :  12361.5 MB/s (0.5%)
 NEON copy (from framebuffer)                         :   7046.6 MB/s (0.6%)
 NEON 2-pass copy (from framebuffer)                  :   4665.8 MB/s (1.6%)
 NEON unrolled copy (from framebuffer)                :   5725.9 MB/s
 NEON 2-pass unrolled copy (from framebuffer)         :   3771.6 MB/s
 VFP copy (from framebuffer)                          :   5738.4 MB/s (0.1%)
 VFP 2-pass copy (from framebuffer)                   :   3517.4 MB/s
 ARM copy (from framebuffer)                          :   7584.5 MB/s
 ARM 2-pass copy (from framebuffer)                   :   3788.3 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.1 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.1 ns 
      8192 :    0.0 ns          /     0.1 ns 
     16384 :    0.0 ns          /     0.1 ns 
     32768 :    0.0 ns          /     0.2 ns 
     65536 :    4.4 ns          /     6.9 ns 
    131072 :    6.7 ns          /     9.1 ns 
    262144 :    9.6 ns          /    12.0 ns 
    524288 :   11.5 ns          /    14.2 ns 
   1048576 :   12.0 ns          /    14.7 ns 
   2097152 :   21.0 ns          /    29.6 ns 
   4194304 :   96.7 ns          /   144.5 ns 
   8388608 :  134.1 ns          /   182.4 ns 
  16777216 :  153.8 ns          /   197.6 ns 
  33554432 :  169.4 ns          /   218.3 ns 
  67108864 :  179.0 ns          /   235.2 ns 

And

total time:                          44.1353s
total number of events:              10000
total time taken by event execution: 352.8364
per-request statistics:
     min:                                 22.41ms
     avg:                                 35.28ms
     max:                                125.10ms
     approx.  95 percentile:              52.81ms
@ThomasKaiser

This comment has been minimized.

Member

ThomasKaiser replied May 19, 2017

sysbench numbers seem to be better with legacy u-boot (possible indication for different thermal behaviour) but I won't look into since too close. If anyone else wants to have a look I would suggest switching to performance cpufreq governor, using --cpu-max-prime=200000 (10 times more than before) and both watching armbianmonitor -m output and record /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state prior and after sysbench execution.

@ThomasKaiser

This comment has been minimized.

Member

ThomasKaiser replied May 19, 2017

Booted again with mainline u-boot image after board was powered down for ~120 seconds (cooled down a little bit more):

total time:                          43.9373s
total number of events:              10000
total time taken by event execution: 351.2477
per-request statistics:
     min:                                 22.42ms
     avg:                                 35.12ms
     max:                                 87.47ms
     approx.  95 percentile:              54.61ms

Obviously same thermal behaviour and in case there are no functional differences (nothing tested by me) switching to the old u-boot version seems to be fine.

Please sign in to comment.