This repository has been archived by the owner. It is now read-only.
Permalink
Browse files

JustArchi's ArchiDroid Optimizations V4.1

     _             _     _ ____            _     _
    / \   _ __ ___| |__ (_)  _ \ _ __ ___ (_) __| |
   / _ \ | '__/ __| '_ \| | | | | '__/ _ \| |/ _` |
  / ___ \| | | (__| | | | | |_| | | | (_) | | (_| |
 /_/   \_\_|  \___|_| |_|_|____/|_|  \___/|_|\__,_|

Copyright (C) 2015 Łukasz "JustArchi" Domeradzki
----------------------------------------------------------------------------------------------------------------
This is the squashed commit of countless hours spent on trying to optimize Android to the maximum and push it to the limits.
Please give proper credit if you decide to cherry-pick this commit. It includes countless number of full builds and tests, almost 200 hours (KK), and additional 100 hours (LP) spent on compiling and testing
----------------------------------------------------------------------------------------------------------------
WARNING! These modifcations are pretty DANGEROUS and may cause problems during compiling or running. You may also need to implement some fixes on your own.
----------------------------------------------------------------------------------------------------------------
Pre-requirements:
- GCC 4.8+, both androideabi and armeabi. Fortunately for us, Lollipop already uses GCC 4.8, but it's in outdated versions and has some internal compiler errors with O3 flags this commit applies. Therefore, I suggest:
1. UBERTC 4.9 arm-linux-androideabi -> https://github.com/ArchiDroid/Toolchain/tree/uber-4.9-arm-linux-androideabi
2. ArchiToolchain 4.9 arm-eabi -> https://github.com/ArchiDroid/Toolchain/tree/architoolchain-4.9-arm-linux-gnueabi-generic
(Please notice that ArchiToolchain is available in many variants, you may want to select the one that suits you best, e.g. Cortex A9 onex)

- (Optional) Target user, instead of userdebug. You should build with user variant instead of userdebug, as user variant in general pre-optimizes some parts and disables additional debug modules, i.e. dalvik debugging, procmem/procrank binaries and so.
This can be done by multiple ways, e.g. using proper lunch command (i.e. lunching cm_i9300-user instead of cm_i9300-userdebug) or "brunch device user", such as "brunch i9300 user", if your android_build includes my CM commit -> CyanogenMod@73db70d
----------------------------------------------------------------------------------------------------------------
Important notes:
- This commit has been tested on up-to-date CM-12.1 (Android 5.1.1) and suggested toolchains mentioned above. Other ROMs and toolchains may, or may not, work out of the box with it.
- You can ask for help in such cases in the XDA thread linked on the bottom of the commit.
- You're applying these optimizations at your own risk, I highly suggest to understand what each of the flag does, and not trying to blindly "apply them all" and hope for the best. Also, due to various hacks and issues in various trees and devices, I can't assure you that what works for me, will also work for you.
- Fortunately for you, my test device is Samsung Galaxy S3 with the nightmare Exynos4 SoC, so it's likely that it should work for major of the (better done) devices.
- As pointed in pre-requirements, stock AOSP toolchains are outdated and have many internal compiler errors. This commit probably can work on stock AOSP toolchains, but you'll need to get rid of some flags that are causing those errors on AOSP toolchains (e.g. -O3 for thumb)
- Due to O3 optimization, code is significantly larger and may cause problems with oversized images especially for older devices. For example, I couldn't apply O3 because building TWRP recovery failed due to the fact that it didn't fit on recovery's block in my device. In such case you have two options. You can use my little hack that ignored recovery size, so compiler doesn't yell about that (but you obviously can't flash such images), or you need to go back either to O2 or Os (for thumb), up to you.
----------------------------------------------------------------------------------------------------------------
Important changes:
- Optimized for speed yet more all instructions - ARM and THUMB (-O3)
- Optimized for speed also parts which are compiled with Clang (-O3)
- Turned off all debugging code (lack of -g)
- Eliminated redundant loads that come after stores to the same memory location, both partial and full redundancies (-fgcse-las)
- Ran a store motion pass after global common subexpression elimination. This pass attempts to move stores out of loops (-fgcse-sm)
- Enabled the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. We can then check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator ISL, like index splitting and dead code elimination in loops (-fgraphite -fgraphite-identity)
- Performed interprocedural pointer analysis and interprocedural modification and reference analysis (-fipa-pta)
- Performed induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees (-fivopts)
- Didn't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions (-fomit-frame-pointer)
- Attempted to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers (-frename-registers)
- Tried to reduce the number of symbolic address calculations by using shared “anchor” symbols to address nearby objects. This transformation can help to reduce the number of GOT entries and GOT accesses on some targets (-fsection-anchors)
- Performed tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do a better job (-ftracer)
- Performed loop invariant motion on trees. It also moved operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion (-ftree-loop-im)
- Created a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily (-ftree-loop-ivcanon)
- Assumed that loop indices do not overflow, and that loops with nontrivial exit condition are not infinite. This enables a wider range of loop optimizations even if the loop optimizer itself cannot prove that these assumptions are valid (-funsafe-loop-optimizations)
- Moved branches with loop invariant conditions out of the loop (-funswitch-loops)
- Constructed webs as commonly used for register allocation purposes and assigned each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover (-fweb)
- Sorted the common symbols by alignment in descending order. This is to prevent gaps between symbols due to alignment constraints (-Wl,--sort-common)
----------------------------------------------------------------------------------------------------------------

For more information, support, troubleshooting and discussion about my optimizations, please refer to my thread on XDA -> http://forum.xda-developers.com/showthread.php?t=2754997
  • Loading branch information...
JustArchi committed Mar 24, 2015
1 parent f50e3fb commit f9b983e8e11624b48ae575da206f1baf6979772c
Showing with 153 additions and 6 deletions.
  1. +5 −0 core/Makefile
  2. +119 −0 core/archidroid.mk
  3. +7 −0 core/clang/config.mk
  4. +10 −4 core/combo/TARGET_linux-arm.mk
  5. +8 −1 core/combo/TARGET_linux-arm64.mk
  6. +4 −1 core/combo/select.mk
View
@@ -917,6 +917,9 @@ else
build/target/product/security/cm-devkey
endif
# ArchiDroid
include $(BUILD_SYSTEM)/archidroid.mk
# Generate a file containing the keys that will be read by the
# recovery binary.
RECOVERY_INSTALL_OTA_KEYS := \
@@ -977,7 +980,9 @@ $(INSTALLED_RECOVERYIMAGE_TARGET): $(MKBOOTIMG) $(recovery_ramdisk) \
ifeq (true,$(PRODUCTS.$(INTERNAL_PRODUCT).PRODUCT_SUPPORTS_VERITY))
$(BOOT_SIGNER) /recovery $@ $(PRODUCTS.$(INTERNAL_PRODUCT).PRODUCT_VERITY_SIGNING_KEY).pk8 $(PRODUCTS.$(INTERNAL_PRODUCT).PRODUCT_VERITY_SIGNING_KEY).x509.pem $@
endif
ifneq ($(ARCHIDROID_IGNORE_RECOVERY_SIZE),true)
$(hide) $(call assert-max-image-size,$@,$(BOARD_RECOVERYIMAGE_PARTITION_SIZE))
endif
@echo -e ${CL_CYN}"Made recovery image: $@"${CL_RST}
endif # BOARD_CUSTOM_BOOTIMG_MK
View
@@ -0,0 +1,119 @@
# _ _ _ ____ _ _
# / \ _ __ ___| |__ (_) _ \ _ __ ___ (_) __| |
# / _ \ | '__/ __| '_ \| | | | | '__/ _ \| |/ _` |
# / ___ \| | | (__| | | | | |_| | | | (_) | | (_| |
# /_/ \_\_| \___|_| |_|_|____/|_| \___/|_|\__,_|
#
# Copyright 2015 Łukasz "JustArchi" Domeradzki
# Contact: JustArchi@JustArchi.net
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#######################
### GENERAL SECTION ###
#######################
# General optimization level of target ARM compiled with GCC. Default: -O2
ARCHIDROID_GCC_CFLAGS_ARM := -O3
# General optimization level of target THUMB compiled with GCC. Default: -Os
ARCHIDROID_GCC_CFLAGS_THUMB := -O3
# Additional flags passed to all C targets compiled with GCC
ARCHIDROID_GCC_CFLAGS := -O3 -fgcse-las -fgcse-sm -fipa-pta -fivopts -fomit-frame-pointer -frename-registers -fsection-anchors -ftracer -ftree-loop-im -ftree-loop-ivcanon -funsafe-loop-optimizations -funswitch-loops -fweb -Wno-error=array-bounds -Wno-error=clobbered -Wno-error=maybe-uninitialized -Wno-error=strict-overflow
############################
### EXPERIMENTAL SECTION ###
############################
# Flags in this section are highly experimental
# Current setup is based on proposed androideabi toolchain
# Results with other toolchains may vary
# These flags work fine in suggested compiler, but may cause ICEs in other compilers, comment if needed
ARCHIDROID_GCC_CFLAGS += -fgraphite -fgraphite-identity
# The following flags (-floop) require that your GCC has been configured with --with-isl
# Additionally, applying any of them will most likely cause ICE in your compiler, so they're disabled
# ARCHIDROID_GCC_CFLAGS += -floop-block -floop-interchange -floop-nest-optimize -floop-parallelize-all -floop-strip-mine
# These flags have been disabled because of assembler errors
# ARCHIDROID_GCC_CFLAGS += -fmodulo-sched -fmodulo-sched-allow-regmoves
####################
### MISC SECTION ###
####################
# Flags passed to GCC preprocessor for C and C++
ARCHIDROID_GCC_CPPFLAGS := $(ARCHIDROID_GCC_CFLAGS)
# Flags passed to linker (ld) of all C and C++ targets compiled with GCC
ARCHIDROID_GCC_LDFLAGS := -Wl,--sort-common
#####################
### CLANG SECTION ###
#####################
# Flags passed to all C targets compiled with CLANG
ARCHIDROID_CLANG_CFLAGS := -O3 -Qunused-arguments -Wno-unknown-warning-option
# Flags passed to CLANG preprocessor for C and C++
ARCHIDROID_CLANG_CPPFLAGS := $(ARCHIDROID_CLANG_CFLAGS)
# Flags passed to linker (ld) of all C and C++ targets compiled with CLANG
ARCHIDROID_CLANG_LDFLAGS := -Wl,--sort-common
# Flags that are used by GCC, but are unknown to CLANG. If you get "argument unused during compilation" error, add the flag here
ARCHIDROID_CLANG_UNKNOWN_FLAGS := \
-mvectorize-with-neon-double \
-mvectorize-with-neon-quad \
-fgcse-after-reload \
-fgcse-las \
-fgcse-sm \
-fgraphite \
-fgraphite-identity \
-fipa-pta \
-floop-block \
-floop-interchange \
-floop-nest-optimize \
-floop-parallelize-all \
-ftree-parallelize-loops=2 \
-ftree-parallelize-loops=4 \
-ftree-parallelize-loops=8 \
-ftree-parallelize-loops=16 \
-floop-strip-mine \
-fmodulo-sched \
-fmodulo-sched-allow-regmoves \
-frerun-cse-after-loop \
-frename-registers \
-fsection-anchors \
-ftree-loop-im \
-ftree-loop-ivcanon \
-funsafe-loop-optimizations \
-fweb
#####################
### HACKS SECTION ###
#####################
# Most of the flags are increasing code size of the output binaries, especially O3 instead of Os for target THUMB
# This may become problematic for small blocks, especially for boot or recovery blocks (ramdisks)
# If you don't care about the size of recovery.img, e.g. you have no use of it, and you want to silence the
# error "image too large" for recovery.img, use this definition
#
# NOTICE: It's better to use device-based flag TARGET_NO_RECOVERY instead, but some devices may have
# boot + recovery combo (e.g. Sony Xperias), and we must build recovery for them, so we can't set TARGET_NO_RECOVERY globally
# Therefore, this seems like a safe approach (will only ignore check on recovery.img, without doing anything else)
# However, if you use compiled recovery.img for your device, please disable this flag (comment or set to false), and lower
# optimization levels instead
ARCHIDROID_IGNORE_RECOVERY_SIZE := true
View
@@ -35,6 +35,12 @@ CLANG_CONFIG_EXTRA_CFLAGS :=
CLANG_CONFIG_EXTRA_CPPFLAGS :=
CLANG_CONFIG_EXTRA_LDFLAGS :=
# ArchiDroid
include $(BUILD_SYSTEM)/archidroid.mk
CLANG_CONFIG_EXTRA_CFLAGS += $(ARCHIDROID_CLANG_CFLAGS)
CLANG_CONFIG_EXTRA_CPPFLAGS += $(ARCHIDROID_CLANG_CPPFLAGS)
CLANG_CONFIG_EXTRA_LDFLAGS += $(ARCHIDROID_CLANG_LDFLAGS)
CLANG_CONFIG_EXTRA_CFLAGS += \
-D__compiler_offsetof=__builtin_offsetof
@@ -48,6 +54,7 @@ CLANG_CONFIG_EXTRA_CFLAGS += \
-Wno-unused-command-line-argument
CLANG_CONFIG_UNKNOWN_CFLAGS := \
$(ARCHIDROID_CLANG_UNKNOWN_FLAGS) \
-funswitch-loops \
-fno-tree-sra \
-finline-limit=64 \
@@ -67,17 +67,24 @@ $(combo_2nd_arch_prefix)TARGET_STRIP := $($(combo_2nd_arch_prefix)TARGET_TOOLS_P
$(combo_2nd_arch_prefix)TARGET_NO_UNDEFINED_LDFLAGS := -Wl,--no-undefined
$(combo_2nd_arch_prefix)TARGET_arm_CFLAGS := -O2 \
# ArchiDroid
include $(BUILD_SYSTEM)/archidroid.mk
$(combo_2nd_arch_prefix)TARGET_arm_CFLAGS := $(ARCHIDROID_GCC_CFLAGS_ARM) \
-fomit-frame-pointer \
-fstrict-aliasing \
-funswitch-loops
# Modules can choose to compile some source as thumb.
$(combo_2nd_arch_prefix)TARGET_thumb_CFLAGS := -mthumb \
-Os \
$(ARCHIDROID_GCC_CFLAGS_THUMB) \
-fomit-frame-pointer \
-fno-strict-aliasing
$(combo_2nd_arch_prefix)TARGET_GLOBAL_CFLAGS += $(ARCHIDROID_GCC_CFLAGS)
$(combo_2nd_arch_prefix)TARGET_GLOBAL_CPPFLAGS += $(ARCHIDROID_GCC_CPPFLAGS)
$(combo_2nd_arch_prefix)TARGET_GLOBAL_LDFLAGS += $(ARCHIDROID_GCC_LDFLAGS)
# Set FORCE_ARM_DEBUGGING to "true" in your buildspec.mk
# or in your environment to force a full arm build, even for
# files that are normally built as thumb; this can make
@@ -114,7 +121,7 @@ $(combo_2nd_arch_prefix)TARGET_GLOBAL_CFLAGS += \
# "-Wall -Werror" due to a commom idiom "ALOGV(mesg)" where ALOGV is turned
# into no-op in some builds while mesg is defined earlier. So we explicitly
# disable "-Wunused-but-set-variable" here.
ifneq ($(filter 4.6 4.6.% 4.7 4.7.% 4.8, $($(combo_2nd_arch_prefix)TARGET_GCC_VERSION)),)
ifneq ($(filter 4.6 4.6.% 4.7 4.7.% 4.8 4.8.% 4.9 4.9.%, $($(combo_2nd_arch_prefix)TARGET_GCC_VERSION)),)
$(combo_2nd_arch_prefix)TARGET_GLOBAL_CFLAGS += -fno-builtin-sin \
-fno-strict-volatile-bitfields
endif
@@ -145,7 +152,6 @@ $(combo_2nd_arch_prefix)TARGET_GLOBAL_CPPFLAGS += -fvisibility-inlines-hidden
# More flags/options can be added here
$(combo_2nd_arch_prefix)TARGET_RELEASE_CFLAGS := \
-DNDEBUG \
-g \
-Wstrict-aliasing=2 \
-fgcse-after-reload \
-frerun-cse-after-loop \
@@ -115,10 +115,17 @@ TARGET_GLOBAL_LDFLAGS += \
TARGET_GLOBAL_CPPFLAGS += -fvisibility-inlines-hidden
# ArchiDroid
include $(BUILD_SYSTEM)/archidroid.mk
$(combo_2nd_arch_prefix)TARGET_GLOBAL_CFLAGS += $(ARCHIDROID_GCC_CFLAGS)
$(combo_2nd_arch_prefix)TARGET_GLOBAL_CPPFLAGS += $(ARCHIDROID_GCC_CPPFLAGS)
$(combo_2nd_arch_prefix)TARGET_GLOBAL_LDFLAGS += $(ARCHIDROID_GCC_LDFLAGS)
# More flags/options can be added here
TARGET_RELEASE_CFLAGS := \
-DNDEBUG \
-O2 -g \
$(ARCHIDROID_GCC_CFLAGS_ARM) \
-Wstrict-aliasing=2 \
-fgcse-after-reload \
-frerun-cse-after-loop \
View
@@ -26,6 +26,9 @@ combo_os_arch := $($(combo_target)OS)-$($(combo_target)$(combo_2nd_arch_prefix)A
combo_var_prefix := $(combo_2nd_arch_prefix)$(combo_target)
# ArchiDroid
include $(BUILD_SYSTEM)/archidroid.mk
# Set reasonable defaults for the various variables
$(combo_var_prefix)CC := $(CC)
@@ -50,7 +53,7 @@ $(combo_var_prefix)HAVE_STRLCAT := 0
$(combo_var_prefix)HAVE_KERNEL_MODULES := 0
$(combo_var_prefix)GLOBAL_CFLAGS := -fno-exceptions -Wno-multichar
$(combo_var_prefix)RELEASE_CFLAGS := -O2 -g -fno-strict-aliasing
$(combo_var_prefix)RELEASE_CFLAGS := $(ARCHIDROID_GCC_CFLAGS_ARM) -fno-strict-aliasing
$(combo_var_prefix)GLOBAL_CPPFLAGS :=
$(combo_var_prefix)GLOBAL_LDFLAGS :=
$(combo_var_prefix)GLOBAL_ARFLAGS := crsPD

0 comments on commit f9b983e

Please sign in to comment.