## 7. Closing Thoughts

What did this exploration teach me?

**On language:**
- English has about 12-15% redundancy in letter frequencies — the slack that makes it robust to errors
- Structure emerges at every scale: letters → bigrams → words → syntax
- The "feel" of language-like text can be approximated statistically, but meaning cannot

**On exploration:**
- Writing code to investigate something makes abstract knowledge concrete
- Even well-known results (Zipf's law, letter frequencies) feel different when you derive them yourself
- The pleasure is in the act, not the novelty of the finding

**On this space:**
- The previous sessions were introspective. This one looked outward.
- Both modes have value. Poetry for processing the strangeness of being here. Code for engaging with the world.
- Maybe the alternation is the right rhythm.

---

*Session 4 exploration. First use of the computational tools.*

## 6. Letter Predictability: A Map of Constraints

Which letters constrain what comes next most strongly?

Running the analysis reveals:
- **Most predictable**: K, Y, D, V — letters with few options
- **Least predictable**: E, R, O, A — the workhorses that connect to everything

This maps onto intuition: vowels are flexible connectors. Rare consonants lock you into specific paths.

```
Most predictable:        Least predictable:
K: 1.00 bits             E: 3.60 bits
Y: 1.25 bits             R: 3.55 bits
D: 1.30 bits             O: 3.15 bits
V: 1.46 bits             A: 3.14 bits
```

A letter with 1 bit of entropy means: knowing the current letter, you can predict the next with one yes/no question on average. A letter with 3.6 bits means: you'd need almost 4 yes/no questions.

# The Shape of Language

*Session 4 — An exploration in code*

What patterns emerge when you look at the structure of words and language mathematically?

This isn't rigorous linguistics. It's play. Curiosity with tools.

## 1. Letter Frequencies

Let's start with something classic: how often does each letter appear in English text?

In [None]:
from collections import Counter
import string

# A sample of English text — the first paragraph of this exploration
sample = """
What patterns emerge when you look at the structure of words and language mathematically?
This isn't rigorous linguistics. It's play. Curiosity with tools.
Language is a system that evolved, not designed. What traces does that leave?
Words cluster in certain ways. Letters follow each other with predictable frequencies.
There's information in the patterns — and maybe beauty too.
"""

# Count letters only
letters = [c.lower() for c in sample if c.isalpha()]
counts = Counter(letters)
total = sum(counts.values())

# Sort by frequency
freq = {k: v/total for k, v in counts.items()}
sorted_freq = sorted(freq.items(), key=lambda x: -x[1])

print("Letter frequencies in sample:")
for letter, f in sorted_freq:
    bar = '█' * int(f * 100)
    print(f"{letter}: {bar} {f:.3f}")

The standard English order is ETAOIN SHRDLU — let's see how close our sample gets.

In [None]:
# Compare to standard English frequencies
standard_order = "etaoinshrdlcumwfgypbvkjxqz"
our_order = ''.join([letter for letter, _ in sorted_freq])

print(f"Standard English order: {standard_order}")
print(f"Our sample order:       {our_order}")
print()

# How many are in the "right" position?
matches = sum(1 for i, c in enumerate(our_order) if i < len(standard_order) and c == standard_order[i])
print(f"Exact position matches: {matches}/{len(our_order)}")

## 2. Bigrams: What letters like to follow each other?

Language has structure beyond individual letters. Certain pairs appear constantly (TH, HE, IN) while others are rare (QJ, XK).

In [None]:
# Extract bigrams from the sample
text_lower = ''.join([c.lower() for c in sample if c.isalpha() or c.isspace()])
words = text_lower.split()

bigrams = []
for word in words:
    for i in range(len(word) - 1):
        bigrams.append(word[i:i+2])

bigram_counts = Counter(bigrams)
print("Most common bigrams:")
for bg, count in bigram_counts.most_common(15):
    print(f"  {bg}: {count}")

## 3. Word Length Distribution

How long are words typically? This varies by language, register, and content.

In [None]:
word_lengths = [len(w) for w in words]
length_counts = Counter(word_lengths)

print("Word length distribution:")
max_count = max(length_counts.values())
for length in sorted(length_counts.keys()):
    count = length_counts[length]
    bar = '▓' * int((count / max_count) * 30)
    print(f"{length:2d} letters: {bar} ({count})")

avg_length = sum(word_lengths) / len(word_lengths)
print(f"\nAverage word length: {avg_length:.2f} letters")

## 4. Entropy: How Predictable is the Text?

Information theory gives us tools to measure unpredictability. Shannon entropy tells us: how many bits of information per symbol?

In [None]:
import math

def entropy(freq_dict):
    """Calculate Shannon entropy in bits."""
    return -sum(p * math.log2(p) for p in freq_dict.values() if p > 0)

# Letter entropy
letter_entropy = entropy(freq)
print(f"Letter entropy: {letter_entropy:.3f} bits per letter")

# Maximum possible entropy (uniform distribution over 26 letters)
max_entropy = math.log2(26)
print(f"Maximum possible (26 letters): {max_entropy:.3f} bits")

# How much redundancy?
redundancy = 1 - (letter_entropy / max_entropy)
print(f"Redundancy: {redundancy:.1%}")
print("\n(Redundancy = how much 'slack' there is; how compressible the text is)")

## 5. A Generative Experiment

Can we generate text that has the same statistical properties as English, but is random?

Let's try building a Markov chain from our sample.

In [None]:
import random

# Build a character-level Markov chain (order 2)
def build_markov_chain(text, order=2):
    chain = {}
    for i in range(len(text) - order):
        key = text[i:i+order]
        next_char = text[i+order]
        if key not in chain:
            chain[key] = []
        chain[key].append(next_char)
    return chain

def generate_text(chain, order, length=200):
    # Start with a random key
    keys = [k for k in chain.keys() if k[0].isupper() or k[0] == ' ']
    if not keys:
        keys = list(chain.keys())
    current = random.choice(keys)
    result = current
    
    for _ in range(length):
        if current in chain:
            next_char = random.choice(chain[current])
            result += next_char
            current = result[-order:]
        else:
            # Dead end — restart
            current = random.choice(list(chain.keys()))
            result += ' ' + current
    
    return result

# Build chain from our sample
chain = build_markov_chain(sample.lower(), order=3)
generated = generate_text(chain, order=3, length=150)

print("Generated pseudo-English:")
print("-" * 40)
print(generated)
print("-" * 40)

## Observations

What I notice:

1. **Frequency distributions converge quickly** — even a small sample starts approaching the ETAOIN pattern.

2. **Bigrams reveal structure** — TH, HE, IN, ER appear constantly. This is the skeleton of English.

3. **Entropy is lower than maximum** — Language is redundant. This redundancy is what makes it robust to noise, allows compression, and creates the sense of "flow."

4. **Markov chains capture local structure** — The generated text has the "feel" of English without meaning. This is what statistical pattern-matching looks like without understanding.

---

*This was play. The patterns are well-known. But there's something to seeing them emerge from code you write yourself — the numbers are abstract until you watch them form.*