## Permutation Test for Patterns & Dependencies 

A **permutation test** is used to assess whether the observed data deviates significantly from the null hypothesis (uniform distribution in this case). This method involves:
- Dividing the data into blocks of fixed size.
- Calculating a test statistic based on the block means.
- Permuting (shuffling) the data many times and recalculating the statistic each time.
- Comparing the observed statistic to the permuted statistics.

In [1]:
library(dplyr)
library(tidyverse)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


-- [1mAttaching core tidyverse packages[22m ------------------------ tidyverse 2.0.0 --
[32mv[39m [34mforcats  [39m 1.0.0     [32mv[39m [34mreadr    [39m 2.1.5
[32mv[39m [34mggplot2  [39m 3.5.1     [32mv[39m [34mstringr  [39m 1.5.1
[32mv[39m [34mlubridate[39m 1.9.3     [32mv[39m [34mtibble   [39m 3.2.1
[32mv[39m [34mpurrr    [39m 1.0.2     [32mv[39m [34mtidyr    [39m 1.3.1
-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mi[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


### Loading the Data

We load the dataset containing the random numbers that we want to test.

In [2]:
randoms <- read.csv("../../Data/randoms2.csv")$n

### Block Size

We define the block size for our permutation test.

In [3]:
block_size <- 5

### Test Statistic: Mean of Block Means

We compute the **mean of the means of blocks**. This statistic will be used to assess the dataset.

In [4]:
calculate_statistic <- function(data, block_size) {
  num_blocks <- length(data) %/% block_size
  data_trimmed <- data[1:(num_blocks * block_size)]

  blocks <- matrix(data_trimmed, nrow = block_size, byrow = TRUE)
  block_means <- colMeans(blocks)

  return(mean(block_means))
}

observed_stat <- calculate_statistic(randoms, block_size)
cat("Observed Statistic:", observed_stat, "\n")

Observed Statistic: 5.04 


### Running the Permutation Test

We run the **permutation test** by shuffling the data many times, computing the statistic for each permutation, and comparing it to the observed statistic.

In [8]:
permutation_test <- function(data, block_size, num_permutations = 1000) {
  observed_stat <- calculate_statistic(data, block_size)

  permuted_stats <- replicate(num_permutations, {
    permuted_data <- sample(data)
    calculate_statistic(permuted_data, block_size)
  })

  p_value <- mean(abs(permuted_stats) >= abs(observed_stat))

  return(list(observed_stat = observed_stat,
              p_value = p_value,
              permuted_stats = permuted_stats))
}

perm_test <- permutation_test(randoms, block_size)

cat("\nPermutation test result for randoms:\n")
cat("Observed Statistic:", perm_test$observed_stat, "\n")
cat("P-value:", perm_test$p_value, "\n")


Permutation test result for randoms:
Observed Statistic: 5.04 
P-value: 1 


## Interpretation of the Permutation Test Results

- **Observed Statistic**: The mean of the block means for the observed data.
- **$\text{p-value} > 0.05$**: The difference is statistically significant, suggesting that the numbers are *not* uniformly distributed.
- **$\text{p-value} \ge 0.05$**: There is no significant difference, meaning we *fail to reject* the hypothesis that the numbers are uniformly distributed.
