# 00 — Project Overview: Chaos vs Stochasticity Classification

**Author:** Hafeez  | **University of Cambridge**, Department of Applied Mathematics and Theoretical Physics | February 2026

---

This notebook lays out my thinking going into this project. It is meant to be read before anything else in the repository.


## 1. The Problem

Given a raw time series, can we tell whether it came from a **deterministic chaotic system** or a **stochastic process**?

Both can look random to the eye. But the distinction matters enormously:

- If the process is **deterministic**, there is hidden structure we can exploit for potential short-term forecasting
- If the process is **stochastic**, the randomness is fundamental and we are limited to statistical descriptions.

The two mechanisms produce time series that can be very difficult to tell apart visually. The goal of this project is to train a neural network that learns to make this distinction automatically from raw data.


## 2. Why It Matters

**Quantitative Finance**

Markets alternate between regimes. Sometimes prices exhibit structured, exploitable patterns (deterministic-like); other times behaviour is dominated by noise. A chaos classifier can act as a regime detector — signalling when systematic strategies are likely to work and when to pull back. It directly informs position sizing, entry/exit timing, and risk management.

**Climate Science**

Climate data contains both deterministic components (ENSO, ocean circulation) and stochastic components (weather noise, volcanic forcing). Distinguishing the two is essential for model validation and for understanding how far into the future different phenomena can be predicted.

**Biomedical Signals**

EEG and ECG signals mix deterministic dynamics with noise. Detecting transitions between regimes (e.g., onset of an epileptic seizure) has direct clinical applications.

**Seismic Activity**

Tracking the slippping of faults


## 3. The Key Idea: Permutation Entropy

The approach is rooted in the **Omega metric** introduced by Boaretto et al. (2021).

The core mathematical tool is **permutation entropy** (Bandt & Pompe, 2002). The idea is simple:

1. Take a time series and look at short consecutive subsequences of length $m$ they use 6 in the paper, my theory is because a subsequence of length six gives enough information for a potential correlation but it's not too long so that you lose a lot of the variability.
2. For each subsequence, record only the **rank ordering**, essentially see orders from smallest to largest and the volatile nature of the ranking should allude to some type of structure... or lack thereof. 
3. Count how often each possible pattern appears
4. Compute the Shannon entropy of this distribution


The CNN we build will learn to extract equivalent information directly from the raw data with no manual feature engineering required.


## 4. Data Generation Plan

We generate all training data synthetically so that we have **perfect ground-truth labels**. Data generation is done in **Julia** for two reasons:

1. `DynamicalSystems.jl` provides high-accuracy ODE integrators purpose-built for chaotic systems (adaptive step sizes, tight error tolerances). This matters because chaotic systems amplify numerical errors exponentially.
2. Julia is significantly faster than Python for the tight numerical loops in discrete maps and delay equations.

### Deterministic Systems (Script `01_deterministic_systems.jl`)

| System | Type | 
|--------|------|
| Lorenz attractor | 3D continuous ODE
| Rössler attractor | 3D continuous ODE
| Logistic map | 1D discrete map 
| Hénon map | 2D discrete map 
| Mackey-Glass | Delay DE 

Each system is generated with multiple parameter configurations and all three components (where applicable) to maximise diversity.

### Stochastic Processes (Script `02_stochastic_processes.jl`)

| Process | Why include it? |
|---------|-----------------|
| White noise | Baseline — no temporal correlation at all |
| Coloured noise (1/f^β) | Introduces power-law correlations that could fool a naive classifier |
| Random walk | Non-stationary; tests robustness |
| ARMA(p,q) | Linear temporal dependence — the "hard" stochastic case |
| Ornstein-Uhlenbeck | Mean-reverting; widely used in finance |

Each process is generated with multiple parameter values and five independent realisations per configuration.

### Augmentation

Every deterministic series is also duplicated with **additive Gaussian noise** at SNR levels of 30, 25, 20, 15, and 10 dB. This forces the CNN to detect deterministic structure even when partially obscured.


## 5. CNN Architecture Plan

We use a **1D Convolutional Neural Network** because:
- 1D convolutions naturally detect **temporal patterns** in sequential data
- They provide **translation invariance** — the same pattern is detected wherever it appears in the window
- They are computationally efficient compared to recurrent architectures
- They learn their own feature extractors end-to-end from raw data

### The Design

```
Input: raw time series window (512 points)

→ Multi-Scale Conv Block
    Three parallel 1D convolutions (kernel sizes 3, 7, 15)
    Captures local, medium, and long-range temporal patterns
    Output: 64 channels

→ Residual Block 1 (64 → 64, kernel size 7) → MaxPool
→ Residual Block 2 (64 → 128, kernel size 5) → MaxPool
→ Residual Block 3 (128 → 256, kernel size 3)

→ Global Average Pooling
→ Fully Connected: 256 → 128 → 1
→ Sigmoid → P(deterministic)
```

**Multi-scale convolutions** let the first layer capture patterns at three temporal resolutions simultaneously — analogous to computing permutation entropy at different embedding dimensions.

**Residual connections** add the input of each block directly to its output, which prevents gradient degradation and lets the network learn identity mappings where adding complexity would not help.

**Global average pooling** collapses the temporal dimension by averaging, giving a fixed-size representation regardless of where patterns appear.

### Training Strategy

| Setting | Value | Why |
|---------|-------|-----|
| Loss | Binary cross-entropy with logits | Numerically stable for binary classification |
| Optimiser | AdamW | Decoupled weight decay; solid default |
| Learning rate | 1e-3, halved on plateau | Standard starting point with adaptive reduction |
| Early stopping | Patience 15 epochs | Prevents overfitting; restores best weights |
| Dropout | 0.2 | Regularisation throughout the network |
| Gradient clipping | Max norm 1.0 | Prevents exploding gradients |
| Batch size | 64 | Balance between gradient noise and speed |


## 7. Project Roadmap

| # | Notebook / Script | Language | What it does |
|---|-------------------|----------|--------------|
| **00** | `00_project_overview.ipynb` (this) | Python | Motivation, plan, and preliminary demonstration |
| **01** | `01_deterministic_systems.jl` | Julia | Generate chaotic time series (Lorenz, Rössler, logistic, Hénon, Mackey-Glass) |
| **02** | `02_stochastic_processes.jl` | Julia | Generate stochastic time series (white/coloured noise, random walk, ARMA, OU) |
| **03** | `03_dataset_assembly.ipynb` | Python | Load the Julia CSVs, window, add noise, split into train/val/test |
| **04** | `04_cnn_architecture.ipynb` | Python | Define the CNN layer by layer with full explanations |
| **05** | `05_training_pipeline.ipynb` | Python | Train with AdamW, LR scheduling, early stopping |
| **06** | `06_evaluation.ipynb` | Python | Test set metrics, confusion matrix, ROC curve, confidence analysis |



---

## References

1. Boaretto, B.R.R. et al. (2021). *Evaluating temporal correlations in time series using permutation entropy, ordinal probabilities and machine learning.* Chaos, 31(6), 063124.
2. Bandt, C. & Pompe, B. (2002). *Permutation entropy: a natural complexity measure for time series.* Physical Review Letters, 88(17), 174102.
3. Lorenz, E.N. (1963). *Deterministic nonperiodic flow.* Journal of the Atmospheric Sciences, 20(2), 130–141.

---

*Proceed to `01_deterministic_systems.jl` to begin generating the training data.*
