# Peripheral Randomness Project — Analysis Notebook

This notebook analyzes the data collected by our experiment scripts

**Goal**
We compare randomness quality of bitstreams derived from:

- mouse only
- keyboard only
- combined mouse + keyboard

under two scenarios:

- **idle** (low activity)
- **active** (high activity)

We will:

1. Load event logs (CSV)
2. Perform sanity checks
3. Analyze inter-event intervals
4. Inspect bitstreams
5. Compute entropy metrics (Shannon entropy & min-entropy)
6. Prepare data for external validation (NIST SP 800-22, Dieharder)

NIST SP800-22 en Dieharder testen dingen als:

- Zijn er ongeveer evenveel 0’s als 1’s?

- Zijn de wisselingen 0↔1 random?

- Zit er periodiciteit in?


In [1]:
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random

## Generate mock data bitstream for testing purpose

Call the function binary_number_gen() for the data
- path
- count= the bitcount in the stream

Expected formatting example:
`0b10101101101001 (256 bit long)`

In [2]:
def binary_number_gen(path, count):
    path = Path(path)
    with path.open("w", encoding="utf-8") as f:
        for _ in range(count):
            number = random.getrandbits(256)
            bit = format(number, f"0{256}b")
            f.write("0b" + bit + "\n")


# binary_number_gen("test_mouse_active_rns.txt", count=1000)

## Load bitstream data from path

- read mouse bitstream data

In [3]:
def load_blocks(path):
    lines = Path(path).read_text().splitlines()
    blocks = []
    for ln in lines:
        ln = ln.strip()
        if not ln:
            continue
        if ln.startswith("0b"):
            ln = ln[2:]
        blocks.append(ln)
    return blocks


mouse_blocks = load_blocks("test_mouse_active_rns.txt")
print(mouse_blocks[:2])
print(f"block count collected: {len(mouse_blocks)}")

['1111010011111110100101011011000101111011110110110010111010110010011100001001100011110011111110010100000110010000111110101011110001100111010010011111010101110001101101001011011001100110110101001111011110101010000000000110100011000001011111010111101010010111', '0101001000001100110011100110100110110001011011011100100000010001111110111110001010000101101011111011100010011000101011001111010010010101111010101111011110010111011101100001011111111101101111101110110100011001100001110000010100110010011100000100000000001011']
block count collected: 1000


In [4]:
def normalize_256(blocks):
    out = []
    for b in blocks:
        if len(b) > 256:
            # safety: cut if something went wrong
            b = b[-256:]
        elif len(b) < 256:
            b = b.zfill(256)
        out.append(b)
    return out


mouse_blocks_256 = normalize_256(mouse_blocks)
set(map(len, mouse_blocks_256))

{256}

### Simple bitcount in the bitstream

In [5]:
def blocks_to_bitstream(blocks_256):
    return "".join(blocks_256)


mouse_stream = blocks_to_bitstream(mouse_blocks_256)
len(mouse_stream)

256000

In [6]:
def bitstream_stats(stream):
    bits = np.fromiter(stream, dtype=np.uint8)
    n_bits = len(bits)
    frac_ones = bits.mean()

    runs = 1 + np.sum(bits[1:] != bits[:-1]) if n_bits > 0 else 0
    return n_bits, frac_ones, runs


n_bits, frac_ones, runs = bitstream_stats(mouse_stream)
print("n_bits:", n_bits, ", frac_ones:", frac_ones, ", runs:", runs)

n_bits: 256000 , frac_ones: 0.4999609375 , runs: 128080


In [7]:
def save_stream(stream, out_path):
    Path(out_path).write_text(stream, encoding="utf-8")


save_stream(mouse_stream, "results/test_mouse_active_rns.bits.txt")

# Prep data for NIST Testing

We are using chunk sizes of 100k for NIST and dieharder testing

NIST work with minimum chunks of 100k bits


In [8]:
def chunk_stream(stream, chunk_size=100_000):
    return [
        stream[i : i + chunk_size]
        for i in range(0, len(stream), chunk_size)
        if len(stream[i : i + chunk_size]) == chunk_size
    ]


chunks = chunk_stream(mouse_stream, 100_000)
print(len(chunks))

out_dir = Path("results/nist_inputs")
out_dir.mkdir(parents=True, exist_ok=True)

for i, c in enumerate(chunks, 1):
    (out_dir / f"active_mouse_chunk{i}.bits.txt").write_text(c)

2


## Copy chunk to NIST data folder


In [9]:
import shutil
from pathlib import Path


def copy_to_nist(bitfile, nist_path="nist-sts/data/"):
    Path(nist_path).mkdir(parents=True, exist_ok=True)
    dest = Path(nist_path) / "input_stream.txt"
    shutil.copy(bitfile, dest)
    print(f"Copied {bitfile} to {dest}")
    return dest


copy_to_nist("results/nist_inputs/active_mouse_chunk1.bits.txt")

Copied results/nist_inputs/active_mouse_chunk1.bits.txt to nist-sts/data/input_stream.txt


PosixPath('nist-sts/data/input_stream.txt')

## Run NIST from TERMINAL here


Terminal input commands:

`sts-2.1.2 % ./assess 100000`

G E N E R A T O R S E L E C T I O N 
- [0] Input File 
- [1] Linear Congruential 
- [2] Quadratic Congruential I 
- [3] Quadratic Congruential II 
- [4] Cubic Congruential 
- [5] XOR 
- [6] Modular Exponentiation 
- [7] Blum-Blum-Shub 
- [8] Micali-Schnorr 
- [9] G Using SHA-1 Enter Choice: 0 User Prescribed 

Input File: `data/input_stream.txt`

S T A T I S T I C A L T E S T S  
- [01] Frequency 
- [02] Block Frequency 
- [03] Cumulative Sums 
- [04] Runs 
- [05] Longest Run of Ones 
- [06] Rank 
- [07] Discrete Fourier Transform 
- [08] Nonperiodic Template Matchings 
- [09] Overlapping Template Matchings 
- [10] Universal Statistical 
- [11] Approximate Entropy 
- [12] Random Excursions 
- [13] Random Excursions Variant 
- [14] Serial 
- [15] Linear Complexity 

INSTRUCTIONS 
Enter 0 if you DO NOT want to apply all of the statistical tests to each sequence and 1 if you DO. 
Enter Choice: `1` 

P a r a m e t e r A d j u s t m e n t s 
- [1] Block Frequency Test - block length(M): 128 
- [2] NonOverlapping Template Test - block length(m): 9 
- [3] Overlapping Template Test - block length(m): 9 
- [4] Approximate Entropy Test - block length(m): 10 
- [5] Serial Test - block length(m): 16 
- [6] Linear Complexity Test - block length(M): 500 Select Test (0 to continue): 0 

How many bitstreams? `1` 

Input File Format: 
- [0] ASCII - A sequence of ASCII 0's and 1's 
- [1] Binary - Each byte in data file contains 8 bits of data 

Select input mode: `0`

Statistical Testing In Progress......... 

Statistical Testing Complete!!!!!!!!!!!!


In [10]:
from pathlib import Path

print("CWD:", Path().resolve())
print("Experiments exists?", Path("sts-2.1.2/experiments").exists())
print("Subdirs:", list(Path("sts-2.1.2/experiments").glob("*")))

CWD: /Users/jelteoldenhof/Projects/master/semester_3/sss/assignment2/sourcecode/S-SS_project
Experiments exists? True
Subdirs: [PosixPath('sts-2.1.2/experiments/XOR'), PosixPath('sts-2.1.2/experiments/MS'), PosixPath('sts-2.1.2/experiments/LCG'), PosixPath('sts-2.1.2/experiments/QCG1'), PosixPath('sts-2.1.2/experiments/MODEXP'), PosixPath('sts-2.1.2/experiments/AlgorithmTesting'), PosixPath('sts-2.1.2/experiments/G-SHA1'), PosixPath('sts-2.1.2/experiments/create-dir-script'), PosixPath('sts-2.1.2/experiments/QCG2'), PosixPath('sts-2.1.2/experiments/BBS'), PosixPath('sts-2.1.2/experiments/CCG')]


In [11]:
from pathlib import Path

root = Path("sts-2.1.2/experiments")

stats_files = list(root.rglob("stats.txt"))
print("Aantal stats.txt gevonden:", len(stats_files))
for p in stats_files:
    print(p)

Aantal stats.txt gevonden: 15
sts-2.1.2/experiments/AlgorithmTesting/NonOverlappingTemplate/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/OverlappingTemplate/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/Universal/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/Frequency/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/ApproximateEntropy/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/LinearComplexity/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/RandomExcursionsVariant/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/CumulativeSums/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/BlockFrequency/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/FFT/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/RandomExcursions/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/Rank/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/Runs/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/Serial/stats.txt
sts-2.1.2/experiments/AlgorithmTesting/LongestRun/stats.txt


In [12]:
import re
from pathlib import Path
import pandas as pd


def parse_nist_pvalues(experiments_root="sts-2.1.2/experiments/AlgorithmTesting"):
    root = Path(experiments_root)
    rows = []

    for stats_file in root.rglob("stats.txt"):
        text = stats_file.read_text(errors="ignore")
        # zoek naar "p_value = ..."
        match = re.search(r"p_value\s*=\s*([0-9\.Ee\-]+)", text)
        if not match:
            continue

        try:
            pval = float(match.group(1))
        except ValueError:
            continue

        # testnaam is de mapnaam net boven stats.txt
        test_name = stats_file.parent.name

        rows.append(
            {
                "test_name": test_name,
                "p_value": pval,
            }
        )

    if not rows:
        return pd.DataFrame(columns=["test_name", "p_value", "stats_path"])

    df = pd.DataFrame(rows).drop_duplicates(subset=["test_name"])
    return df.sort_values("test_name").reset_index(drop=True)


df_nist = parse_nist_pvalues()
df_nist

Unnamed: 0,test_name,p_value
0,ApproximateEntropy,0.349965
1,BlockFrequency,0.605167
2,CumulativeSums,0.712297
3,FFT,0.504492
4,Frequency,0.919397
5,LongestRun,0.169091
6,Rank,0.793223
7,Runs,0.671728


In [13]:
ALPHA = 0.01

df_nist["result"] = df_nist["p_value"].apply(lambda p: "FAIL" if p < ALPHA else "PASS")
df_nist

Unnamed: 0,test_name,p_value,result
0,ApproximateEntropy,0.349965,PASS
1,BlockFrequency,0.605167,PASS
2,CumulativeSums,0.712297,PASS
3,FFT,0.504492,PASS
4,Frequency,0.919397,PASS
5,LongestRun,0.169091,PASS
6,Rank,0.793223,PASS
7,Runs,0.671728,PASS
