# 3) Exclamation Points: Rules vs. Reality

**Goal:** Count exclamation points per 10k words and compare.

# Setup: Load Texts

This notebook needs **Fellowship of the King** and **The return of the King** as input texts.
**How to provide the texts:**
1. Download books from open sources
J.R.R.Tolkien

2. Place two text files in the "data" folder with names:
   - `Fellowship.txt`  (Fellowship of the King)
   - `TheKing.txt` (The return of the King)

In [3]:
import re
from pathlib import Path

In [4]:

def load_texts(local_fellowship: str = '..\\data\\Fellowship.txt',
               local_theking: str = '..\\data\\TheKing.txt'):
    """Load Fellowship and TheKing texts from disk.

    Parameters
    ----------
    local_fellowship : str
        Path to Fellowship text file. Defaults to '../data/Fellowship.txt'.
    local_theking : str
        Path to TheKing text file. Defaults to '../data/TheKing.txt'.

    Returns
    -------
    tuple[str, str]
        (fellowship_text, theking_text).

    Raises
    ------
    FileNotFoundError
        If either file is missing.

    Extra Notes
    -----------
    - Using UTF-8 with `errors='ignore'` avoids codec exceptions on
      older J.R.R.Tolkien dumps or inconsistent encodings.
    """
    p1, p2 = Path(local_fellowship), Path(local_theking)

    # Fail fast with a clear message if a file is missing
    if not p1.exists():
        raise FileNotFoundError(
            f"Missing file: {p1}\n"
            "→ Please place 'Fellowship.txt' at this path or update load_texts(...)."
        )
    if not p2.exists():
        raise FileNotFoundError(
            f"Missing file: {p2}\n"
            "→ Please place 'TheKing.txt' at this path or update load_texts(...)."
        )

    # Read the files (UTF-8; ignore undecodable bytes to stay robust)
    fellowship   = p1.read_text(encoding='utf-8', errors='ignore')
    theking = p2.read_text(encoding='utf-8', errors='ignore')
    return fellowship, theking

def normalize(text: str, is_fellowship: bool = False) -> str:
    """Normalize a text for tokenization."""
    if not text:
        return ''
    
    # If it's Fellowship, skip the Foreword and Prologue
    if is_fellowship:
        prologue_end = text.find('Chapter 1\n\nA Long-expected Party')
        if prologue_end != -1:
            text = text[prologue_end:]
    
    # For Return of the King
    if not is_fellowship:
        contents_end = text.find('Book V\n\nChapter 1. Minas Tirith')
        if contents_end != -1:
            text = text[contents_end:]

    return text.replace('\r\n', '\n')
# Normalize for tokenization
# Load raw texts
fellowship_raw, theking_raw = load_texts()

# Normalize for tokenization
fellowship = normalize(fellowship_raw, is_fellowship=True)
theking = normalize(theking_raw, is_fellowship=False) 

print(f"Fellowship chars: {len(fellowship):,} | TheKing chars: {len(theking):,}")


Fellowship chars: 948,198 | TheKing chars: 709,796


In [5]:
# This new regex finds words like "don't" but skips junk like "'s"
WORD_RE = re.compile(r"\b[A-Za-z][A-Za-z']*\b") 

def words(text: str):
    """Smarter word tokenizer (lowercased, ASCII letters + internal apostrophes)."""
    return WORD_RE.findall(text.lower())


def sentences(text: str):
    """Naive sentence splitter using punctuation boundaries."""
    return [s.strip() for s in re.split(r'(?<=[.!?])\s+', text) if s.strip()]


# --- Run the tokenizers ---
fellowship_words = words(fellowship)
theking_words = words(theking)

fellowship_sentences = sentences(fellowship)
theking_sentences = sentences(theking)

# Save total word counts for later
nF = len(fellowship_words) # Total words in Fellowship
nR = len(theking_words) # Total words in TheKing

print(f"Fellowship words: {nF:,} | TheKing words: {nR:,}")
print(f"Fellowship sentences: {len(fellowship_sentences):,} | TheKing sentences: {len(theking_sentences):,}")

Fellowship words: 179,144 | TheKing words: 136,735
Fellowship sentences: 10,880 | TheKing sentences: 7,449


### Count and Normalize

In [6]:
def exclamations_per_10k(text, total_words):
    count = text.count('!')
    per_10k = (count / max(1, total_words)) * 10000
    return count, per_10k

# Use the nF and nR variables we defined in Cell 8
f_count, f_rate = exclamations_per_10k(fellowship, nF)
r_count, r_rate = exclamations_per_10k(theking, nR)

print(f"Fellowship: {f_count} total | {f_rate:.2f} per 10k words")
print(f"TheKing: {r_count} total | {r_rate:.2f} per 10k words")

Fellowship: 1426 total | 79.60 per 10k words
TheKing: 1010 total | 73.87 per 10k words


**Question:** Sample passages with many exclamation points. How do they shape voice, pacing, or mood?

In [7]:
# A) Sentence-level hotspots
def top_exclaim_sentences(sents, top_n=8, min_len=20):
    scored = [(s.count('!'), len(s), s) for s in sents if len(s) >= min_len]
    scored.sort(key=lambda x: (x[0], -x[1]), reverse=True)  # more !, then longer
    return [(cnt, s) for cnt, _, s in scored[:top_n] if cnt > 0]

# B) Clusters over sliding windows (tempo spikes)
def exclaim_clusters(sents, window=6, min_total=3, top_k=5):
    out = []
    for i in range(max(0, len(sents)-window+1)):
        chunk = " ".join(sents[i:i+window])
        c = chunk.count('!')
        if c >= min_total:
            out.append((c, i, " ".join(sents[i:i+window])))
    out.sort(reverse=True, key=lambda x: x[0])
    return out[:top_k]

def preview(s, n=300):
    return s if len(s) <= n else s[:n].rstrip() + " …"


print("=== Fellowship: top sentences with ! ===")
for cnt, s in top_exclaim_sentences(fellowship_sentences):
    print(f"[! x{cnt}] {preview(s)}\n")

print("=== TheKing: top sentences with ! ===")
for cnt, s in top_exclaim_sentences(theking_sentences):
    print(f"[! x{cnt}] {preview(s)}\n")

print("=== Fellowship: exclamation clusters ===") # <-- Corrected label
for c, i, chunk in exclaim_clusters(fellowship_sentences, window=6, min_total=3):
    print(f"[cluster ! x{c} | sentences {i}-{i+5}] {preview(chunk)}\n")

print("=== TheKing: exclamation clusters ===") # <-- Corrected label
for c, i, chunk in exclaim_clusters(theking_sentences, window=6, min_total=3):
    print(f"[cluster ! x{c} | sentences {i}-{i+5}] {preview(chunk)}\n")

=== Fellowship: top sentences with ! ===
[! x2] Nob!'

'Coming, sir!

[! x2] Lend me the Ring!'

'No!

[! x2] Come on!'

'Wait a moment!' cried Aragorn.

[! x2] Black Riders!'

'Black Riders!' cried Frodo.

[! x2] May it serve you well!'

'Come!' said Haldir.

[! x2] I want to think!'

'Good heavens!' said Pippin.

[! x2] Si vanwa na, Romello vanwa, Valimar!'

		Namarie!

[! x2] Alas for Gimli son of Gloin!'

'Nay!' said Legolas.

=== TheKing: top sentences with ! ===
[! x2] ‘Hullo Rosie!’
‘Hullo, Sam!’ said Rosie.

[! x2] Farewell!’
‘Farewell, lord!’ said Aragorn.

[! x2] Go, Saruman, by the speediest way!’
‘Worm!

[! x2] Go!’
The Black Rider flung back his hood, and behold!

[! x2] Théoden King!’
But Éomer said to them:
Mourn not overmuch!

[! x2] Use well the days!’
But Celeborn said: ‘Kinsman, farewell!

[! x2] I wish I could leave him!’
‘Then leave him!’ said Gandalf.

[! x2] He will be a hundred and thirty-one!’
‘So he will!’ said Sam.

=== Fellowship: exclamation clusters ===
[c

In [8]:
import pandas as pd
import altair as alt

# This uses the f_rate and r_rate variables from Cell 10
data = [
    {'Book': 'Fellowship', 'Rate (per 10k words)': f_rate},
    {'Book': 'TheKing', 'Rate (per 10k words)': r_rate}
]
chart_df = pd.DataFrame(data)

# Build the bar chart
chart = alt.Chart(chart_df).mark_bar().encode(
    x=alt.X('Book', sort=None),  # Use sort=None to keep the order
    y=alt.Y('Rate (per 10k words)', title='Exclamation Point Rate (per 10k words)'),
    color='Book',
    tooltip=['Book', 'Rate (per 10k words)']
).properties(
    title='Pacing & Emotion: Exclamation Rate in Tolkien'
).interactive()

# Save the chart as a JSON file
chart.save('exclamation_rate_chart.json')
print("Graph saved to 'exclamation_rate_chart.json'")

# Display the chart
chart

Graph saved to 'exclamation_rate_chart.json'
