# Analysis of Android App Build Reproducibility

This notebook analyzes the reproducibility of Android app builds across different environments and build methods.

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import hashlib
import json
from collections import defaultdict

## Methodology

To test the reproducibility of the Android application, we have followed [the guide in the official repository](https://github.com/signalapp/Signal-Android/tree/main/reproducible-builds). Initially, we have run each step manaully and then we wrote [a custom script](https://github.com/TheTechZone/reproducible-tests/blob/main/build_signal.py) to automate the process and prevent errors that could be introduced by humans.
Each build output was obtained using a clean<a name="clean"></a>[<sup>[clean]</sup>](#clean) Docker container. 

All tests very carried again version `v7.25.2` of the Signal application, which the most recent version currently available on Play Store (`region=CH`, `releaseChannel=production`). [According to public info](https://github.com/signalapp/Signal-Android/issues/13754#issuecomment-2450435519), this version should include the recent proguard fixes that ensure build determinism (TODO: link the commits)

The only variability in environment consists of the host operating system that was used (`fedora-40`, `fedora-41`, and `ubuntu-24.04`). Each machine was connected to a diffrent physical Android device for retrieving the play store app.

Example of footnote.


<a name="clean"></a> [^clean](#clean) Before invoking gradle, the Docker image was built from scratch with no caching to prevent build artifacts from affecting reproducibility.  -- see the `./clean` script
[^clean]: )

## Data Collection

First, let's collect information about all the builds in our outputs directory.

In [3]:
def get_file_hash(filepath):
    """Calculate SHA-256 hash of a file."""
    sha256_hash = hashlib.sha256()
    with open(filepath, "rb") as f:
        for byte_block in iter(lambda: f.read(4096), b""):
            sha256_hash.update(byte_block)
    return sha256_hash.hexdigest()


def collect_build_data(output_dir):
    """Collect build data from the outputs directory."""
    builds_data = []

    def human_size(bytes, units=[" bytes", "KB", "MB", "GB", "TB", "PB", "EB"]):
        """Returns a human readable string representation of bytes"""
        return (
            str(bytes) + units[0]
            if bytes < 1024
            else human_size(bytes >> 10, units[1:])
        )

    for build_dir in Path(output_dir).iterdir():
        if not build_dir.is_dir():
            continue

        # Parse environment info from directory name
        env_info = build_dir.name.split("-")
        os_name = env_info[0]
        build_type = env_info[1]
        execution_count = 1 if len(env_info) == 2 else int(env_info[2])

        # Process APKs
        for apk_dir in build_dir.glob("**/apks*"):
            apk_type = apk_dir.name
            for apk_file in apk_dir.glob("**/*.apk"):
                size = apk_file.stat().st_size
                builds_data.append(
                    {
                        "os": os_name,
                        "build_type": build_type,
                        "exec_count": execution_count,
                        "apk_type": apk_type,
                        "local_apk": apk_type == "apks-i-built",
                        "filename": apk_file.name,
                        "filepath": str(apk_file),
                        "hash": get_file_hash(apk_file),
                        "size": size,
                        "human_size": human_size(size),
                    }
                )

    return pd.DataFrame(builds_data)


# Collect data
output_dir = "outputs"
df = collect_build_data(output_dir)
df.head(6)

Unnamed: 0,os,build_type,exec_count,apk_type,local_apk,filename,filepath,hash,size,human_size
0,fedora40,scripted,1,apks-from-device,False,base-master.apk,outputs/fedora40-scripted-1/apks-from-device/b...,eb1902b3aa98e15a140e9de574dfa47b78bf8bb85fc556...,85037580,81MB
1,fedora40,scripted,1,apks-from-device,False,base-arm64_v8a.apk,outputs/fedora40-scripted-1/apks-from-device/b...,d296fff8691b3632f85725202045875af7e89dafedc9df...,23569519,22MB
2,fedora40,scripted,1,apks-from-device,False,base-xxhdpi.apk,outputs/fedora40-scripted-1/apks-from-device/b...,c65a2691cde28650f8934d492bff6c241b2e55eee4d222...,1666719,1MB
3,fedora40,scripted,1,apks-i-built,True,base-xxhdpi.apk,outputs/fedora40-scripted-1/apks-i-built/base-...,51119b048c0e03525cbb253ed7c19974738b58804908d3...,1658464,1MB
4,fedora40,scripted,1,apks-i-built,True,base-arm64_v8a.apk,outputs/fedora40-scripted-1/apks-i-built/base-...,a74d526fd13f4fe3bda7e066c575fa0c931d8da7333b8d...,23561264,22MB
5,fedora40,scripted,1,apks-i-built,True,base-master.apk,outputs/fedora40-scripted-1/apks-i-built/base-...,4768602685c6e5ceff078093c87e9aba5f19597b0d76c5...,85021050,81MB


In [4]:
def print_tree_with_hashes(df):
    """Print directory tree with file hashes."""
    # Create a nested dictionary structure
    tree = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))

    # ANSI color codes
    BLUE = "\033[34m"
    PURPLE = "\33[35m"
    BOLD = "\033[1m"
    RESET = "\033[0m"

    # Populate the tree structure
    for _, row in df.iterrows():
        tree[row["os"]][row["build_type"]][row["apk_type"]][row["filename"]] = row[
            "hash"
        ][
            :8
        ]  # First 8 chars of hash

    # Print the tree
    for os in sorted(tree.keys()):
        print(f"{BLUE}{BOLD}{os}/{RESET}")
        for build_type in sorted(tree[os].keys()):
            print(f"├── {BLUE}{BOLD}{build_type}/{RESET}")
            for apk_type in sorted(tree[os][build_type].keys()):
                print(f"│   ├── {BLUE}{BOLD}{apk_type}/{RESET}")
                for filename, hash_prefix in sorted(
                    tree[os][build_type][apk_type].items()
                ):
                    print(
                        f"│   │   ├── {filename} [hash: {PURPLE}{BOLD}{hash_prefix}{RESET}]"
                    )
    print("..... Done")


print("Directory Tree with Hash Prefixes (first 8 chars):")
print_tree_with_hashes(df)

Directory Tree with Hash Prefixes (first 8 chars):
[34m[1mfedora40/[0m
├── [34m[1mmanual/[0m
│   ├── [34m[1mapks-from-device/[0m
│   │   ├── base-arm64_v8a.apk [hash: [35m[1md296fff8[0m]
│   │   ├── base-master.apk [hash: [35m[1meb1902b3[0m]
│   │   ├── base-xxhdpi.apk [hash: [35m[1mc65a2691[0m]
│   ├── [34m[1mapks-i-built/[0m
│   │   ├── base-arm64_v8a.apk [hash: [35m[1ma74d526f[0m]
│   │   ├── base-master.apk [hash: [35m[1m47686026[0m]
│   │   ├── base-xxhdpi.apk [hash: [35m[1m51119b04[0m]
├── [34m[1mscripted/[0m
│   ├── [34m[1mapks-from-device/[0m
│   │   ├── base-arm64_v8a.apk [hash: [35m[1md296fff8[0m]
│   │   ├── base-master.apk [hash: [35m[1meb1902b3[0m]
│   │   ├── base-xxhdpi.apk [hash: [35m[1mc65a2691[0m]
│   ├── [34m[1mapks-i-built/[0m
│   │   ├── base-arm64_v8a.apk [hash: [35m[1ma74d526f[0m]
│   │   ├── base-master.apk [hash: [35m[1m47686026[0m]
│   │   ├── base-xxhdpi.apk [hash: [35m[1m51119b04[0m]
[34m[1mfedora41/

## Build Reproducibility Analysis

Let's analyze the reproducibility of builds by comparing hashes across different environments.

In [26]:
df.groupby(["os", "build_type"])["exec_count"].nunique().reset_index(
    name="execution_times"
).rename_axis(None).set_index(["os", "build_type"])

Unnamed: 0_level_0,Unnamed: 1_level_0,execution_times
os,build_type,Unnamed: 2_level_1
fedora40,manual,1
fedora40,scripted,2
fedora41,scripted,1
ubuntu24,scripted,1


> So indeed, 5 total runs, 4 of them scripted as explained in Methodology

In [27]:
df[~df["local_apk"]].groupby("filename")["hash"].nunique().reset_index(
    name="distinct file hashes"
).set_index("filename")

Unnamed: 0_level_0,distinct file hashes
filename,Unnamed: 1_level_1
base-arm64_v8a.apk,1
base-master.apk,1
base-xxhdpi.apk,1


> So, we have only one unique variant of each apk file, which implies that every phone received the same apk bundle (splits) when 

Sanity check: do all the builds in the same environment produce the same apk?

In [24]:
df[df["local_apk"]].groupby(["os", "filename"])["hash"].nunique().reset_index(
    name="distinct file hashes"
).set_index(["os", "filename"])

Unnamed: 0_level_0,Unnamed: 1_level_0,distinct file hashes
os,filename,Unnamed: 2_level_1
fedora40,base-arm64_v8a.apk,1
fedora40,base-master.apk,1
fedora40,base-xxhdpi.apk,1
fedora41,base-arm64_v8a.apk,1
fedora41,base-master.apk,1
fedora41,base-xxhdpi.apk,1
ubuntu24,base-arm64_v8a.apk,1
ubuntu24,base-master.apk,1
ubuntu24,base-xxhdpi.apk,1


> So, the host operating system does NOT introduce variability with regards to the built of subsequent apk (bunndles)

In [35]:
df[df["local_apk"]].groupby(["filename"])["hash"].nunique()

filename
base-arm64_v8a.apk    2
base-master.apk       3
base-xxhdpi.apk       2
Name: hash, dtype: int64

In [90]:
# Step 1: Group by 'filename' and count unique 'hash' values
unique_hashes_count = df[df["local_apk"]].groupby("filename")["hash"].nunique()

# Step 2: Filter filenames where the unique hash count is greater than 1
diverging_filenames = unique_hashes_count[unique_hashes_count > 1].index

# Step 3: Extract rows where the 'filename' is in the list of filenames with multiple unique hashes
divergent_data = df[df["local_apk"] & df["filename"].isin(diverging_filenames)]

# Step 4: Group by 'filename' and get the relevant columns
result = (
    divergent_data.groupby(["filename", "hash"])[["os", "filepath"]]
    .agg(
        {
            "os": "first",  # Pick the first 'os' for each filename (assuming it's consistent within each group)
            "filepath": "first",  # Pick the first 'filepath' for each filename (assuming it's consistent within each group)
            # 'hash': 'unique'       # Get all unique hash values for each filename
        }
    )
    .reset_index()
)

# Step 5: Show only rows where there are multiple hashes (you can remove this step if needed)
result = result[result["hash"].apply(len) > 1]

# Show the result
result.set_index(["filename", "os"])

Unnamed: 0_level_0,Unnamed: 1_level_0,hash,filepath
filename,os,Unnamed: 2_level_1,Unnamed: 3_level_1
base-arm64_v8a.apk,ubuntu24,4832f3a837a4059db7476c45d995045b0c561388234335...,outputs/ubuntu24-scripted-1/apks-i-built/base-...
base-arm64_v8a.apk,fedora40,a74d526fd13f4fe3bda7e066c575fa0c931d8da7333b8d...,outputs/fedora40-scripted-1/apks-i-built/base-...
base-master.apk,fedora40,4768602685c6e5ceff078093c87e9aba5f19597b0d76c5...,outputs/fedora40-scripted-1/apks-i-built/base-...
base-master.apk,fedora41,dbe76a3649f90f061170c6d3d2071bcddc3c32672bebc2...,outputs/fedora41-scripted-1/apks-i-built/base-...
base-master.apk,ubuntu24,fa1e7eec6dc3e634aae9ccba817d04db46b46c10ca4556...,outputs/ubuntu24-scripted-1/apks-i-built/base-...
base-xxhdpi.apk,ubuntu24,41b12629671ba0778364d671909add3a94380955c5c366...,outputs/ubuntu24-scripted-1/apks-i-built/base-...
base-xxhdpi.apk,fedora40,51119b048c0e03525cbb253ed7c19974738b58804908d3...,outputs/fedora40-scripted-1/apks-i-built/base-...


> What about fedora41?

In [114]:
df[(df["os"] == "fedora41") & (df["local_apk"])][["os", "filename", "filepath", "hash"]]

Unnamed: 0,os,filename,filepath,hash
27,fedora41,base-arm64_v8a.apk,outputs/fedora41-scripted-1/apks-i-built/base-...,4832f3a837a4059db7476c45d995045b0c561388234335...
28,fedora41,base-xxhdpi.apk,outputs/fedora41-scripted-1/apks-i-built/base-...,41b12629671ba0778364d671909add3a94380955c5c366...
29,fedora41,base-master.apk,outputs/fedora41-scripted-1/apks-i-built/base-...,dbe76a3649f90f061170c6d3d2071bcddc3c32672bebc2...


> So, it produced a similar `base-arm64_v8a.apk` and `base-xxhdpi.apk` as ubuntu24, but different than fedora40. Each os has a unique `base-master.apk`

## Investigation

TODO: what tf is going on, run apkdiff on the matrix of diverging stuff and check

## Conclusions

Based on the analysis above, we can draw the following conclusions:

1. Build Reproducibility: [This will be filled based on actual results]
2. Size Consistency: [This will be filled based on actual results]
3. Environment Impact: [This will be filled based on actual results]

### Recommendations

Based on these findings, here are some recommendations for improving build reproducibility:

1. [Will be filled based on actual results]
2. [Will be filled based on actual results]
3. [Will be filled based on actual results]