# Reading Federal Reserve (FOMC) Statements with Python
This notebook shows a practical, *Fed-watcher-friendly* workflow:

1. Load a public corpus of FOMC communications directly from a **raw GitHub URL**
2. Filter to **FOMC statements**
3. Read a chosen statement and compare it to the previous one
4. Run a few lightweight text analyses:
   - sentence-level diffs (Git-style)
   - added/removed sentences
   - an illustrative "hawk vs dove" dictionary score
   - keyword trends over time

## Why raw GitHub links?
A GitHub file link like:

`https://github.com/USER/REPO/blob/main/path/file.csv`

can usually be turned into a direct-download link by converting it to:

`https://raw.githubusercontent.com/USER/REPO/main/path/file.csv`

That means **no cloning** and no local files—great for teaching notebooks.


## Data source
We’ll use the open dataset maintained by `vtasca/fed-statement-scraping`, which scrapes Federal Reserve communications and stores them in a CSV.

If the dataset schema changes over time, the notebook includes a small “column chooser” to map common column names to a standard set (`date`, `doc_type`, `title`, `text`).

From this Repo:

https://github.com/vtasca/fed-statement-scraping/tree/master

We can access the CSV directly via this raw GitHub URL:

https://raw.githubusercontent.com/vtasca/fed-statement-scraping/refs/heads/master/communications.csv




In [None]:
import re
import difflib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


In [None]:
# Load the corpus from a raw GitHub URL (no cloning needed)
FED_CORPUS_URL = "https://raw.githubusercontent.com/vtasca/fed-statement-scraping/refs/heads/master/communications.csv"

fed = pd.read_csv(FED_CORPUS_URL)
fed.head()


## Quick inspection
Before we do any analysis, we inspect the columns and a few random rows.

This makes it easy to adapt if the dataset’s column names differ from what we expect.


In [None]:
print("Shape:", fed.shape)
print("Columns:", list(fed.columns))

# Sample a few rows (columns shown here may vary by dataset version)
preview_cols = [c for c in ["date", "type", "title"] if c in fed.columns]
fed.sample(5, random_state=0)[preview_cols].head() if preview_cols else fed.sample(5, random_state=0).head()


## Normalize key fields (date, doc_type, title, text)
We map whatever the dataset uses into standard fields:

- `date` → pandas datetime
- `doc_type` → lowercase type label
- `title` → title/headline
- `text` → cleaned text

The helper logic is intentionally simple and transparent for students.


In [None]:
fed["Date"] = pd.to_datetime(fed["Date"])
fed["Type"] = fed["Type"].str.lower()
fed["Text"] = fed["Text"].astype(str)

## Filter to FOMC statements
Many corpora include other documents (minutes, transcripts, speeches, etc.).  
We filter to rows whose `doc_type` includes the word `"statement"` and then sort by date.


In [None]:
statements = (
    fed
    .query("Type == 'statement'")
    .sort_values("Date")
    .reset_index(drop=True)
)

statements[["Date", "Release Date", "Text"]].tail()

## Choose a statement to read
By default we take the 2 latest statement in the dataset.

These are December 2025 and October 2025 in the current data.

You can also set `i` manually to read a different meeting:
- `i = 0` is the earliest statement
- `i = len(statements)-1` is the latest statement


In [None]:
current = statements.iloc[-1]
previous = statements.iloc[-2]

current["Date"]


## Print statement text (truncated)
Statements can be long. This prints the first 1,500 characters.  
Feel free to increase the number if you want the full statement in the output.


In [None]:
print(current["Text"][:1500])


## Compare to previous meeting (sentence-level diff)
We split each statement into sentences, then compute a unified diff (like Git).
This is a simple but powerful way to see what changed.


In [None]:
def sentence_split(text):
    # Simple sentence split (good enough for class)
    parts = re.split(r"(?<=[.!?])\s+", text)
    return [p.strip() for p in parts if p.strip()]

cur_sents = sentence_split(current["Text"])
prev_sents = sentence_split(previous["Text"]) if previous is not None else []

diff = difflib.unified_diff(
    prev_sents, cur_sents,
    fromfile=f"prev ({previous['Date'].date()})",
    tofile=f"curr ({current['Date'].date()})",
    lineterm=""
)

# Show first ~200 diff lines
for k, line in zip(range(200), diff):
    print(line)


## Added and removed sentences
A more “digestible” view than the full diff:
- sentences present now but not previously (`added`)
- sentences present previously but not now (`removed`)


In [None]:
cur_set = set(cur_sents)
prev_set = set(prev_sents)

added = [s for s in cur_sents if s not in prev_set]
removed = [s for s in prev_sents if s not in cur_set]

print("ADDED sentences:\n")
for s in added[:10]:
    print("-", s)

print("\nREMOVED sentences:\n")
for s in removed[:10]:
    print("-", s)


## A simple "hawk vs dove" dictionary score (illustrative)
This is intentionally simple and transparent:
- count occurrences of a small list of "hawkish" terms
- subtract occurrences of a small list of "dovish" terms

First lets define the lists of terms:

In [None]:
HAWK = [
    "inflation", "strong", "tight", "raise", "increases", "restrictive",
    "higher", "persistent", "upside risks"
]
DOVE = [
    "patient", "accommodative", "support", "lower", "cut", "easing",
    "downside risks", "slack", "moderation"
]

In [None]:
def hawk_count(text):
    text = text.lower()
    return sum(text.count(term) for term in HAWK)

current_hawk = hawk_count(current["Text"])
current_hawk


**Important:** This is not a validated research measure—just a class-friendly starting point.
Students can improve it by:
- adding/removing terms
- building phrase matching
- validating against eg recession indicators or external labels

### Now we can compute a Hawk-Dove score = Hawk mentions - Dove mentions
Is the statement more hawkish (positive score) or dovish (negative score) and how does it compare to the previous one?


In [None]:
def term_count(text, terms):
    t = text.lower()
    return sum(t.count(term) for term in terms)

def hawk_dove_score(text):
    return term_count(text, HAWK) - term_count(text, DOVE)

current_score = hawk_dove_score(current["Text"])
prev_score = hawk_dove_score(previous["Text"]) if previous is not None else np.nan

print("Current score:", current_score)
print("Previous score:", prev_score)


In [None]:
def counts_breakdown(text):
    return {
        "hawk": term_count(text, HAWK),
        "dove": term_count(text, DOVE),
        "score": hawk_dove_score(text),
    }

print("CURRENT:", counts_breakdown(current["Text"]))
print("PREV:   ", counts_breakdown(previous["Text"]))

## Plot the hawk–dove score over time
This creates a simple time series so students can see:
- long-run shifts in language
- spikes around specific macro periods

Again: interpret cautiously—this is a toy indicator meant for learning.


In [None]:
statements["hawk_dove"] = statements["Text"].map(hawk_dove_score)

plt.figure(figsize=(10,4))
plt.plot(statements["Date"], statements["hawk_dove"])
plt.axhline(0, linewidth=1)
plt.title("FOMC Statement Hawk–Dove Dictionary Score (Illustrative)")
plt.ylabel("hawk words − dove words")
plt.xlabel("")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


## Keyword trends over time
A second, very interpretable analysis: count keyword mentions.

You can add macro-relevant terms like:
- "inflation", "labor", "financial conditions"
- "uncertainty", "risks", "growth"
- "balance sheet", "quantitative tightening", etc.


In [None]:
KEYWORDS = ["inflation", "labor", "employment", "unemployment", "financial conditions"]

for kw in KEYWORDS:
    statements[kw] = statements["Text"].str.lower().str.count(kw)

plt.figure(figsize=(10,4))
plt.plot(statements["Date"], statements["inflation"], label="inflation")
plt.plot(statements["Date"], statements["labor"], label="labor")
plt.plot(statements["Date"], statements["financial conditions"], label="financial conditions")
plt.title("Keyword Mentions in FOMC Statements")
plt.ylabel("count")
plt.xlabel("")
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()


## Student assignment prompts
1) Pick a statement and identify **3 substantive changes** vs the previous statement. Quote the changed sentence(s).  
2) Does the dictionary score move in the direction you’d expect given the macro context? Explain.  
3) Choose 2 keywords and describe how their attention changes across time windows.  
4) Optional: propose a better dictionary (5 hawk terms, 5 dove terms) and justify each term.


## Optional extensions (for a follow-on lab)
- TF‑IDF similarity between meetings
- bigram extraction (“financial conditions”, “ongoing increases”, …)
- paragraph-level diffs instead of sentence-level
- merge statement dates with market data (FRED / Yahoo Finance) for an event-study
