Add files via upload

ArnoldUK · Jul 16, 2023 · 8799219 · 8799219
1 parent feca724
commit 8799219
Showing 1 changed file with 120 additions and 0 deletions.
diff --git a/arm-rpi-notes.txt b/arm-rpi-notes.txt
@@ -0,0 +1,120 @@
+Circle SDK for Bare Metal Builds
+There are a number of configurable system options in the file include/circle/sysconfig.h
+
+For a binary distribution you should do seperate builds for each RPi model
+and distribute the files kernel.img, kernel7.img and kernel7l.img.
+Optionally you can do a build with RASPPI = 3 to create kernel8-32.img for
+an optimized version for the Raspberry Pi 3.
+
+The RASPPI value for each model set in the Makefile:
+RASPPI 	Target 	Models 	Optimized for
+1 	kernel.img 	A, B, A+, B+, Zero, (CM) 	ARM1176JZF-S
+2 	kernel7.img 	2, 3, Zero 2, (CM3) 	Cortex-A7
+3 	kernel8-32.img 	3, Zero 2, (CM3) 	Cortex-A53
+4 	kernel7l.img 	4B, 400, CM4 	Cortex-A72
+
+
+GCC compiler optimization for ARM-based systems
+
+The gcc compiler can optimize code by taking advantage of CPU specific features.
+Especially for ARM CPU's, this can have impact on application performance.
+ARM CPU's, even under the same architecture, could be implemented with different
+versions of floating point units (FPU). Utilizing full FPU potential improves performance
+of heavier operating systems such as full Linux distributions.
+
+-mcpu, -march: Defining the CPU type and architecture
+
+These flags can both be used to set the CPU type. Setting one or the other is sufficient.
+
+GCC supports all Cortex-A processors up to A15 (2010).
+Example are 'cortex-a5', 'cortex-a7', 'cortex-a8', 'cortex-a9', 'cortex-a15'.
+For specific values on popular boards see table below.
+-mfloat-abi, -mfpu: Defining the floating-point application binary interface type (ABI)
+and floating point unit type
+
+All ARM Cortex-A processors available today come with a floating-point unit called vector
+floating-point VFP. Most also have a SIMD co-processor called NEON.
+Confusingly, both are types of floating point units.
+VFP is older, IEEE-754 compliant and provides double precision (64bit).
+NEON is a multi-function coprocessor that comes with single precision (32bit) vector
+functionality in 32bit CPU's, and got double precision in 64bit ARM.
+
+As a result, we have a combination of -mfpu type options, depending on the actual CPU implementation.
+Some combinations have an alias: 'neon-vfpv3' is aliased to 'neon' and 'vfpv2' is aliased to 'vfp'.
+-mfpu and -mfloat-abi are important flags, because GCC will only use floating-point and NEON
+instructions if it is explicitly told to do so. It is also important to set -mfpu in combination
+with -mfloat-abi.
+
+-mfloat-abi sets the overall strategy for floating point code compilation.
+It has only three values: 'soft', 'softfp', or 'hard'. ‘soft’ generates library calls
+for floating-point operations. ‘softfp’ allows the generation of code using hardware
+floating-point instructions, but still uses the soft-float calling conventions.
+'hard' generates floating-point instructions with FPU-specific calling conventions.
+Other options that "can" improve performance in certain circumstances
+
+-mtune This flag can further improve performance by setting the CPU type to a specific CPU model.
+For example, on a Raspberry Pi 1, it could be set to -mtune=arm1176jzf-s.
+
+-mneon-for-64bits lets Neon handle scalar 64-bits operations. This flag has no arguments.
+It is disabled by default due to high cost of moving data from core registers to Neon on 32bit CPU's.
+
+-funsafe-math-optimizations can improve floating point performance by letting it run through NEON.
+Use of NEON instructions may lead to a loss of precision (depending on the version of NEON).
+
+-munaligned-access, -mno-unaligned-access
+Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not
+16- or 32- bit aligned. The CPU type defines the correct default, and changing it may
+negatively impact performance.
+
+Pre-ARMv6, ARMv6-M, and ARMv8-M default to -mno-unaligned-access.
+-munaligned-access is default for all other architectures.
+If unaligned access is not enabled, then words in packed data structures are accessed one byte at a time.
+Flags that don't need to be specified, because they are correct on default:
+
+-mlittle-endian generates code for little-endian CPUs. This is the default.
+CPU information for popular ARM-based boards:
+Board			CPU			Architecture 	ARM Core 		FPU
+Raspberry Pi 1 		Broadcom BCM2835 	ARMv6 		ARM11 (ARM1176JZFS) 	VFPv2 (VFP only, no NEON)
+Raspberry Pi 2 		Broadcom BCM2836 	ARMv7-A 	Cortex-A7 MPcore 	VFPv4-D32 (VFP and NEON)
+BeagleBone Black 	TI Sitara AM3358/9 	ARMv7-A 	Cortex-A8 		VFPv3-D32 (VFP and NEON)
+Altera Cyclone V5 	5CSEMA4U23C6N A9    	ARMv7-A       	Cortex-A9 MPcore   	VFPv3-D32 (VFP and NEON)
+NanoPi NEO 2 		Allwinner H5 		ARMv8-A 	Cortex-A53 		VFPv4 (VFP and NEON)
+Raspberry Pi 3 		Broadcom BCM2837 	ARMv8-A 	Cortex-A53 		VFPv4 (VFP and NEON)
+Amazon EC2 A1 		Graviton 		ARMv8-A 	Cortex-A72 		VFPv4 (VFP and NEON)
+Raspberry Pi 4 		Broadcom BCM2711 	ARMv8-A 	Cortex-A72 		VFPv4 (VFP and NEON)
+
+GCC compiler options for popular ARM-based boards:
+Board 			GCC optimisation flags
+Raspberry Pi 1 		-mcpu=arm1176jzf-s -mfloat-abi=hard -mfpu=vfp (alias for vfpv2)
+Raspberry Pi 2   	-mcpu=cortex-a7 -mfloat-abi=hard -mfpu=neon-vfpv4                      
+BeagleBone Black  	-mcpu=cortex-a8 -mfloat-abi=hard -mfpu=neon (alias for neon-vfpv3)
+Altera Cyclone V5 	-mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon (alias for neon-vfpv3)
+Raspberry Pi 3 		-mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits
+Amazon EC2 A1 		-mcpu=cortex-a72 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits
+Raspberry Pi 4 		-mcpu=cortex-a72 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits
+
+GCC compiler tuning options for popular ARM-based boards:
+Board 			GCC extra tuning flags
+Raspberry Pi 1 		-mtune=arm1176jzf-s
+Raspberry Pi 2   	-mtune=cortex-a7      
+BeagleBone Black  	-mtune=cortex-a8
+Altera Cyclone V5 	-mtune=cortex-a9      
+Raspberry Pi 3 		-mtune=cortex-a53
+Raspberry Pi 4 		-mtune=cortex-a72
+
+Note that gcc's code optimisation creates code for a specific CPU/board.
+If that code is copied to another platform, it may perform different (e.g. poorly).
+How to set compiler options
+
+For most projects, setting the environment variable or adding the flags in the Makefile
+variables will do the trick.
+
+export CFLAGS="-mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits"
+
+Note to crypto miners
+
+Starting with the ARMv8 architecture, a crypto acceleration extension has been added.
+On a Raspberry Pi 3 (Cortex A53), it accelerates AES and SHA1/SHA2.
+To enable it, use -mcpu=cortex-a53+crypto on a RPi3, and -mcpu=cortex-a72+crypto for RPi4.
+
+Further crypto support (SHA3, SHA512) gets added from ARMv8.4-A onwards.