## Gap Test for Patterns & Dependencies

The **gap test** checks for patterns in the sequence of random numbers by measuring the gaps between occurrences of the same bin. A uniform distribution should have gap lengths that follow an expected frequency distribution.

The process:
1. Dividing the range of random numbers into **bins**.
2. Tracking the **gap lengths** between occurrences of the same bin.
3. Comparing the observed gap distribution to the expected uniform distribution using a **Chi-Squared test**.

In [18]:
library(dplyr)
library(tidyverse)

### Loading the Data

We load the dataset containing the random numbers.

In [19]:
randoms <- read.csv("../../Data/randoms2.csv")$n

### Binning the Data

We divide the range of random numbers into `num_bins` equal-width bins.

In [20]:
num_bins <- 10
breaks <- seq(min(randoms), max(randoms), length.out = num_bins + 1)

binned <- cut(randoms, breaks = breaks, include.lowest = TRUE, labels = FALSE)

### Calculating Gaps

We measure the gaps between occurrences of the same bin.

In [21]:
calculate_gaps <- function(binned_data) {
  gaps_list <- list()

  for (bin in unique(binned_data)) {
    indices <- which(binned_data == bin)

    if (length(indices) > 1) {
      gaps <- diff(indices)
      gaps_list <- c(gaps_list, gaps)
    }
  }

  return(unlist(gaps_list))
}

gaps <- calculate_gaps(binned)

### Chi-Square Test on Gap Frequencies

We apply a Chi-Square test to check if the gap distribution matches what we expect for a uniform distribution.

In [22]:
gap_freq <- table(gaps)

expected <- rep(sum(gap_freq) / length(gap_freq), length(gap_freq))

gap_test <- chisq.test(gap_freq, p = expected / sum(expected))

cat("Gap test result for randoms:\n")
print(gap_test)

"Chi-squared approximation may be incorrect"


Gap test result for randoms:

	Chi-squared test for given probabilities

data:  gap_freq
X-squared = 60.933, df = 23, p-value = 2.797e-05



## Interpretation of the Gap Test Results

- If the **p-value** is **high (e.g., > 0.05)**, the observed gaps match the expected distribution, indicating **no detectable patterns** in the gaps.
- If the **p-value is low (e.g., < 0.05)**, the observed gaps deviate significantly from expectation, suggesting **patterns or dependencies** in the random sequence.
- The **Chi-Square statistic** quantifies the difference between observed and expected gap distributions.