# 3) Exclamation Points: Rules vs. Reality

**Goal:** Count exclamation points per 10k words and compare.

# Setup: Load Texts

This notebook needs **Pet Semetary** and **The Shining** as input texts.



2. Place two text files in the "data" folder with names:
   - `PetSemetary.txt`  (Pet Semetary)
   - `TheShining.txt` (The Shining)

In [6]:
import re
from pathlib import Path

In [None]:
# 3) Exclamation Points: Rules vs. Reality

# Goal: Count exclamation points per 10k words and compare.


# ---------- 1. Load texts (non-Gutenberg) ----------

def load_texts(local_pet: str = '../data/PetSemetary.txt',
               local_shining: str = '../data/TheShining.txt'):
    """Load Pet Sematary and The Shining texts from disk."""
    p1, p2 = Path(local_pet), Path(local_shining)

    if not p1.exists():
        raise FileNotFoundError(
            f"Missing file: {p1}\n"
            "→ Please place 'PetSemetary.txt' at this path or update load_texts(...)."
        )
    if not p2.exists():
        raise FileNotFoundError(
            f"Missing file: {p2}\n"
            "→ Please place 'TheShining.txt' at this path or update load_texts(...)."
        )

    # IMPORTANT: keep order Pet → Shining
    pet    = p1.read_text(encoding='utf-8', errors='ignore')
    shining = p2.read_text(encoding='utf-8', errors='ignore')
    return pet, shining


def normalize(text: str) -> str:
    """Simple normalization for your own TXT files."""
    if not text:
        return ''
    # normalize curly quotes to ASCII '
    text = text.replace('’', "'").replace('‘', "'")
    # normalize Windows endings
    text = text.replace('\r\n', '\n')
    # join hyphenated line breaks
    text = re.sub(r"-\s*\n", "", text)
    return text


# Load raw texts
pet_raw, shining_raw = load_texts()

# Normalize
pet     = normalize(pet_raw)
shining = normalize(shining_raw)

print(f"Pet Sematary chars: {len(pet):,} | The Shining chars: {len(shining):,}")


Pet Sematary chars: 812,353 | The Shining chars: 905,869


In [8]:
# ---------- 2. Tokenization helpers ----------

# keep apostrophes inside words (don't), but no apostrophe-only tokens
WORD_RE = re.compile(r"[A-Za-z]+(?:'[A-Za-z]+)*")

def words(text: str):
    """Simple word tokenizer (lowercased, ASCII letters + apostrophes)."""
    return WORD_RE.findall(text.lower())

def sentences(text: str):
    """Naive sentence splitter using punctuation boundaries."""
    return [s.strip() for s in re.split(r'(?<=[.!?])\s+', text) if s.strip()]

# Tokens and sentences
pet_words     = words(pet)
shining_words = words(shining)

pet_sentences     = sentences(pet)
shining_sentences = sentences(shining)

print(f"Pet Sematary words: {len(pet_words):,} | The Shining words: {len(shining_words):,}")
print(f"Pet Sematary sentences: {len(pet_sentences):,} | The Shining sentences: {len(shining_sentences):,}")


Pet Sematary words: 147,144 | The Shining words: 162,085
Pet Sematary sentences: 9,269 | The Shining sentences: 12,914


### Count and Normalize

In [9]:
# ---------- 3. Count and normalize exclamation points ----------

def exclamations_per_10k(text, tokens):
    count = text.count('!')
    per_10k = (count / max(1, len(tokens))) * 10000
    return count, per_10k

# Reuse the token lists we already computed
pet_count, pet_rate = exclamations_per_10k(pet, pet_words)
sh_count, sh_rate   = exclamations_per_10k(shining, shining_words)

print(f"Pet Sematary: {pet_count} total | {pet_rate:.2f} per 10k words")
print(f"The Shining : {sh_count} total | {sh_rate:.2f} per 10k words")


Pet Sematary: 365 total | 24.81 per 10k words
The Shining : 551 total | 33.99 per 10k words


**Question:** Sample passages with many exclamation points. How do they shape voice, pacing, or mood?

In [10]:
# ---------- 4. Hotspots and clusters ----------

def top_exclaim_sentences(sents, top_n=8, min_len=20):
    scored = [(s.count('!'), len(s), s) for s in sents if len(s) >= min_len]
    scored.sort(key=lambda x: (x[0], -x[1]), reverse=True)  # more !, then longer
    return [(cnt, s) for cnt, _, s in scored[:top_n] if cnt > 0]

def exclaim_clusters(sents, window=6, min_total=3, top_k=5):
    out = []
    for i in range(max(0, len(sents) - window + 1)):
        chunk = " ".join(sents[i:i+window])
        c = chunk.count('!')
        if c >= min_total:
            out.append((c, i, " ".join(sents[i:i+window])))
    out.sort(reverse=True, key=lambda x: x[0])
    return out[:top_k]

def preview(s, n=300):
    return s if len(s) <= n else s[:n].rstrip() + " …"

print("=== Pet Sematary: top sentences with ! ===")
for cnt, s in top_exclaim_sentences(pet_sentences):
    print(f"[! x{cnt}] {preview(s)}\n")

print("=== The Shining: top sentences with ! ===")
for cnt, s in top_exclaim_sentences(shining_sentences):
    print(f"[! x{cnt}] {preview(s)}\n")

print("=== Pet Sematary: exclamation clusters ===")
for c, i, chunk in exclaim_clusters(pet_sentences, window=6, min_total=3):
    print(f"[cluster ! x{c} | sentences {i}-{i+5}] {preview(chunk)}\n")

print("=== The Shining: exclamation clusters ===")
for c, i, chunk in exclaim_clusters(shining_sentences, window=6, min_total=3):
    print(f"[cluster ! x{c} | sentences {i}-{i+5}] {preview(chunk)}\n")


=== Pet Sematary: top sentences with ! ===
[! x2] 'Get it, Gage!'
   'Get it!' Gage yelled.

[! x2] Get-it-get-it-GET-IT!!' at the top of his lungs.

[! x2] A moment later a hoarse voice cried, 'Shut up, Fred!'
   Auggggh-ROOOOOO!

[! x2] She's goin' to beat shit!'
   'Beat-shit!' Gage cried, and laughed, high and joyously.

[! x2] 'It's not crazy!' It's not!'
   Goldman blinked and stepped back at this small but ferocious outburst.

[! x2] Rachel called upstairs: 'Better come down and get your snack and go out for the
bus, El!'
   'Okay!' The louder clack-clack of her feet.

[! x2] 'Is that true, daddy?'
   'I think so, hon.'
   'Yuck!' She looked back at the blowdown and yelled: 'You tore my pants, you
cruddy trees!'
   All three of the grown-ups laughed.

[! x2] That'll foozle 'em!'), and he was totally
engrossed, thinking only marginally that a cup of coffee would go down well, when
Masterton screamed from the direction of the foyer-waiting room: 'Louis!

=== The Shining: top sente