## Chi-Squared ($\chi ^ 2$) Test for Uniformity

The **Chi-Squared test** is used to compare the observed frequencies of events to expected frequencies, which are based on a hypothesis of a uniform distribution. If the observed frequencies match the expected frequencies, the numbers are considered uniformly distributed.

In [1]:
#| message: FALSE
library(dplyr)
library(tidyverse)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


-- [1mAttaching core tidyverse packages[22m ------------------------ tidyverse 2.0.0 --
[32mv[39m [34mforcats  [39m 1.0.0     [32mv[39m [34mreadr    [39m 2.1.5
[32mv[39m [34mggplot2  [39m 3.5.1     [32mv[39m [34mstringr  [39m 1.5.1
[32mv[39m [34mlubridate[39m 1.9.3     [32mv[39m [34mtibble   [39m 3.2.1
[32mv[39m [34mpurrr    [39m 1.0.2     [32mv[39m [34mtidyr    [39m 1.3.1
-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mi[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


### Loading the Data

We load the random numbers from a CSV file to analyze their distribution.

In [2]:
randoms <- read.csv("../../../Data/LCG-RN/randoms-1.csv")$n

max_val <- 10
min_val <- 1

"cannot open file '../../../Data/LCG-RN/randoms-1.csv': No such file or directory"


ERROR: Error in file(file, "rt"): cannot open the connection


### Creating Bins

The numbers will be grouped into bins for comparison. We use a binning method to divide the range [min, max] into 10 equal intervals. This helps in comparing observed versus expected frequency in each bin.

In [None]:
bins <- cut(randoms,
            breaks = seq(min_val, max_val,
                         length.out = 11),
            include.lowest = TRUE)

### Frequency Calculation

We count how many numbers fall into each bin. This gives us the observed frequencies for the test.

In [None]:
freq <- table(bins)

### Expected Frequency

Under a uniform distribution, we expect the numbers to be evenly distributed across the bins. The expected frequency for each bin is the total number of random numbers divided by the number of bins.

In [None]:
expected <- rep(length(randoms) / length(freq),
                length(freq))

### Running the Chi-Squared Test

We now apply the Chi-Squared test to check whether the observed frequencies significantly deviate from the expected frequencies for a uniform distribution.

In [None]:
chi_test <- chisq.test(freq,
                       p = expected / sum(expected))

print(chi_test)

## Interpreting the Results

The Chi-Squared test provides a p-value which tells us whether there is a significant difference between the observed and expected frequencies. A low p-value (<0.05) suggests that the random numbers are not uniformly distributed, while a high p-value indicates no significant difference, supporting the hypothesis of uniformity.

- **$\text{p-value} > 0.05$**: There is significant evidence that the random numbers are not uniformly distributed.
- **$\text{p-value} \ge 0.05$**: There is insufficient evidence to reject the hypothesis that the random numbers are uniformly distributed.

## Approach

1. Create 10 equal-width bins between min_val and max_val.

2. Count how many data points fall into each bin.

3. Calculate the expected frequency for each bin (assuming uniform distribution).

4. Perform a chi-squared test comparing the frequency to the expected frequencies.

In [2]:
randoms <- read.csv("../../../Data/Q-RN/randoms-1.csv")$n

chisqr_test <- function(data, min_val = 0, max_val = 10) {
  bins <- cut(data,
              breaks = seq(min_val, max_val, length.out = max_val - min_val + 2),
              include.lowest = TRUE)

  freq <- table(bins)

  expected <- rep(length(data) / length(freq), length(freq))

  chi_test <- chisq.test(freq, p = expected / sum(expected))

  return(c(chisqr_p = chi_test$p.value,
           chisqr_X2 = chi_test$statistic,
           chisqr_df = chi_test$parameter))
}

res <- chisqr_test(randoms)
print(res)

           chisqr_p chisqr_X2.X-squared        chisqr_df.df 
          0.8040539           6.1320000          10.0000000 
