Utilities for turning Gmail API message responses into a convenient Python dataclass or a lightweight summary that agent-style tools can consume. The project includes:
email_parserpackage with the low-level parser (parse_gmail_message) and a high-level helper (read_message_summary).- CLI scripts to fetch Gmail messages and print human-readable summaries.
| Requirement | Why |
|---|---|
| Python 3.11 | Managed via .python-version so tooling stays consistent. |
uv |
Dependency management (uv sync, uv run, etc.). |
| Google Cloud project with Gmail API enabled | Needed to obtain OAuth credentials for your mailbox. |
Make sure uv is on your PATH. On macOS with Homebrew: brew install uv.
uv sync --dev- Creates/updates
.venv/with everything defined inpyproject.toml. - Installs testing tools (
pytest), Gmail client libraries (google-api-python-client,google-auth,google-auth-oauthlib), andhtml2textfor HTML→plain-text conversion.
If you run commands from outside the repo, prefix with UV_CACHE_DIR=/path/to/cache to reuse the local cache we’ve been using.
-
Enable the Gmail API
- Visit Google Cloud Console.
- Select your project (or create one).
- APIs & Services → Library → enable Gmail API.
-
Set up the OAuth consent screen
- APIs & Services → OAuth consent screen.
- Fill in the required details; for personal testing, add your Google account as a test user.
-
Create OAuth client credentials
- APIs & Services → Credentials → “Create Credentials” → OAuth client ID.
- Application type: Desktop app.
- Download the JSON file and save it in the repo root as
client_secret.json.
-
First authorization
- The first time you run any Gmail command below, a browser window opens so you can approve access.
- A
token.jsonfile (already git-ignored) stores the refresh/access token for future runs.
All commands below should be executed from the repository root.
uv run python scripts/fetch_message.py --list--max-results 10 and --labels INBOX are available if you want more control.
uv run python scripts/fetch_message.py --message-id <ID> --output message.json- Defaults to Gmail’s
format="full". - Use
--format rawif you want the raw RFC 822 payload (remember to pass--prefer-rawwhen parsing later).
uv run python main.py message.jsonOutput includes:
- Subject
- From
- To / CC
- Body text (plain text preferred; falls back to HTML)
uv run python scripts/fetch_message.py --message-id <ID> --output - \
| uv run python main.py -That pipeline fetches the message via Gmail, writes JSON to stdout, and pipes it into the parser CLI.
from email_parser.tool import read_message_summary
summary = read_message_summary(
"19a724d70fc3fe08",
token_path="token.json",
client_secret_path="client_secret.json",
# format="raw", prefer_raw=True # if you fetched with format="raw"
)
print(summary["subject"])
print(summary["from"])
print(summary["body"][:200])The returned dict only contains subject, from, to, cc, and body. body favors MIME text/plain, but when an email only includes HTML the project runs it through html2text so you still get readable plain text without tags.
from email_parser.tool import build_gmail_service, read_message_summary
gmail = build_gmail_service(token_path="token.json", client_secret_path="client_secret.json")
summary = read_message_summary("19a724d70fc3fe08", gmail_service=gmail)Pass the gmail_service parameter to avoid repeated OAuth prompts and to batch multiple lookups efficiently.
from email_parser.tool import fetch_message_json, summarize_message_json
from email_parser import parse_gmail_message
gmail = build_gmail_service()
# Full Gmail payload
msg = fetch_message_json(gmail, "19a724d70fc3fe08", format="full")
# Dataclass (subject, from_, to, cc, date, text/html, attachments, etc.)
parsed = parse_gmail_message(msg)
# Minimal summary (what read_message_summary returns internally)
summary = summarize_message_json(msg)Additional helpers:
| Helper | Description |
|---|---|
build_gmail_service(...) |
Creates an authenticated Gmail API service (handles tokens). |
list_message_ids(...) |
Returns recent message IDs, optionally filtered by label. |
fetch_message_json(...) |
Downloads a message (format="full" or "raw"). |
summarize_message_json(...) |
Converts a Gmail payload into {subject, from, to, cc, body}. |
summarize_parsed_email(...) |
Same as above but works with the ParsedEmail dataclass. |
read_message_summary(...) |
High-level helper for agent tooling (fetch + summarize). |
uv run pytestTests live in tests/test_parser.py and cover:
- Parsing Gmail “full” and “raw” payloads.
- Attachment handling.
- Summarizer behavior (plain text preference plus html2text-based fallback when only HTML is available).
- Python warnings: we pin Python 3.11 to avoid Google’s “unsupported Python” warnings clogging stdout during streaming.
- Missing credentials: If
client_secret.jsonortoken.jsonis missing, the scripts explain how to regenerate them. Both files are listed in.gitignore. - No message data: When streaming (
--output -), ensure the consumer (e.g.,main.py -) reads stdin immediately. Any stray prints before the JSON (warnings, logs) can break the parser—keep the environment clean or upgrade your Python version as we did here.
That’s it—install deps, drop in your client_secret.json, run uv run python scripts/fetch_message.py --list to grab IDs, and either pipe into the CLI or call read_message_summary() from your agent. When in doubt, re-run uv sync --dev to refresh the virtualenv.***