# Purpose
* to take a folder of documents that I have downloaded and produce a single Markdown "briefing pack" - a short, structured summary of several documents that answers:

> “What do I need to know, quickly, so I can think or act intelligently?”

# Resources

* **Artificial intelligence**: https://en.wikipedia.org/wiki/Artificial_intelligence
* **Project Gutenberg**: https://www.gutenberg.org/
*  **Global strategy on digital health 2020-2025**: https://iris.who.int/server/api/core/bitstreams/1f4d4a08-b20d-4c36-9148-a59429ac3477/content
*  **AI Prompts** (restricted): https://chatgpt.com/g/g-p-69529e4bdbf881918d6e6b89073ab1c5-ai-python/project

# Packages
* `python-dotenv`: safely loads secret keys (like your OpenAI key) from a file
* `openai`: lets Python talk to an AI model
* `pypdf`: understands PDF structure; extracts selectable text from pages

In [1]:
%pip install python-dotenv openai pypdf


Collecting openai
  Downloading openai-2.14.0-py3-none-any.whl.metadata (29 kB)
Collecting pypdf
  Downloading pypdf-6.5.0-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.10.0 (from openai)
  Downloading jiter-0.12.0-cp312-cp312-win_amd64.whl.metadata (5.3 kB)
Downloading openai-2.14.0-py3-none-any.whl (1.1 MB)
   ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
   ---------------------------------------- 1.1/1.1 MB 17.1 MB/s eta 0:00:00
Downloading pypdf-6.5.0-py3-none-any.whl (329 kB)
Downloading jiter-0.12.0-cp312-cp312-win_amd64.whl (205 kB)
Installing collected packages: pypdf, jiter, openai
Successfully installed jiter-0.12.0 openai-2.14.0 pypdf-6.5.0
Note: you may need to restart the kernel to use updated packages.


In [17]:
from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
import os

print("✅ Packages imported successfully")


✅ Packages imported successfully


# Load API Key

In [3]:
#checking to see if the .env where the API key is stored exists:

from pathlib import Path

print("Exists:", Path(".env").exists())


Exists: True


In [5]:
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

assert api_key is not None, "Key still not found"
print("✅ OpenAI API key loaded successfully")

✅ OpenAI API key loaded successfully


# Create the OpenAI client
* My phone line to ChatGPT :)

In [7]:
#from openai import OpenAI
client = OpenAI(api_key=api_key)

print("✅ OpenAI client created")

✅ OpenAI client created


In [9]:
# Tiny test call
response = client.responses.create(
    model="gpt-4.1-mini",
    input="In one sentence, explain what a briefing pack is."
)

print(response.output_text)

A briefing pack is a concise collection of essential information and documents prepared to inform and update individuals or teams on a specific topic or project.


# Read documents into Python

In [10]:
# Does the folder "docs" exist?
#from pathlib import Path
DOCS_DIR = Path("docs")
assert DOCS_DIR.exists(), "I can't find a 'docs' folder next to your notebook."

# Print out the files in "docs"
files = sorted([p.name for p in DOCS_DIR.glob("*")])
files


['Artificial intelligence_Wikipedia.txt',
 'The Art of War.txt',
 'WHO_Digital_Health.pdf']

## Read a `.txt` file

In [13]:
def read_txt(path):
    return path.read_text(encoding="utf-8", errors="ignore")

art_of_war_path = DOCS_DIR / "The Art of War.txt"
art_of_war_text = read_txt(art_of_war_path)

len(art_of_war_text)


312935

In [14]:
print(art_of_war_text[:300])


*** START OF THE PROJECT GUTENBERG EBOOK 132 ***
Sun Tzŭ
on
The Art of War

THE OLDEST MILITARY TREATISE IN THE WORLD
Translated from the Chinese with Introduction and Critical Notes

BY
LIONEL GILES, M.A.

Assistant in the Department of Oriental Printed Books and MSS.
in the British Museum




1910


## Reading a `PDF`

In [18]:
#from pypdf import PdfReader

def read_pdf_text_based(path):
    reader = PdfReader(str(path))
    pages = []
    for page in reader.pages:
        pages.append(page.extract_text() or "")
    return "\n\n".join(pages).strip()

pdf_path = DOCS_DIR / "WHO_Digital_Health.pdf"
digital_health_text = read_pdf_text_based(pdf_path)

len(digital_health_text)


103104

In [19]:
print(digital_health_text[:300])

1
Global strategy 
on digital health
2020-2025


Global strategy on digital health 2020-2025
ISBN 978-92-4-002092-4 (electronic version)
ISBN 978-92-4-002093-1 (print version)
© World Health Organization 2021
Some rights reserved. This work is available under the Creative Commons Attribution-NonComm


# Organize my documents
* In the previous section I just read each document one by one. This is **not** scalable. 
* I will create a list of each documnet so I can pass them to AI one by one

In [20]:
documents = []

documents.append({
    "name": "The Art of War.txt",
    "type": "txt",
    "text": art_of_war_text
})

documents.append({
    "name": "WHO_Digital_Health.pdf",
    "type": "pdf",
    "text": digital_health_text
})

len(documents)


2

In [21]:
# Inspect one document 
# This helps you see:
## what keys exist
## how to access values
## how big each document is

documents[0].keys(), documents[0]["name"], len(documents[0]["text"])


(dict_keys(['name', 'type', 'text']), 'The Art of War.txt', 312935)

In [22]:
# Loop over documents
for doc in documents:
    print(doc["name"], "→", len(doc["text"]))


The Art of War.txt → 312935
WHO_Digital_Health.pdf → 103104


# Use AI to summarize ONE document
* take one document
* send it to the AI
* get a high-level summary

In [24]:
def summarize_document(doc):
    """
    Takes a document dictionary and returns a high-level summary using AI.
    """
    text = doc["text"][:4000]  # limit text so we don't send too much at once

    prompt = f"""
You are an assistant helping create a briefing pack.

Document name: {doc['name']}

Task:
Provide a high-level summary (3–5 sentences) of the document.
Focus on the main ideas only.
Do not include unnecessary details.

Document text:
{text}
"""

    response = client.responses.create(
        model="gpt-4.1-mini",
        input=prompt
    )

    return response.output_text


In [25]:
summary = summarize_document(documents[0])
print(summary)


The Art of War, authored by Sun Tzŭ and translated by Lionel Giles, is the oldest military treatise in the world, offering timeless lessons on strategy and warfare. Giles’ 1910 translation aimed to provide a more accurate and scholarly version than previous flawed translations, establishing a foundational text for English readers. The book covers various aspects of warfare, including planning, tactics, energy, terrain, and the use of spies, emphasizing strategy and adaptability in conflict. Giles' edition remained the standard for decades, only gaining wider recognition in the English-speaking world during and after the Second World War.


# Summarize ALL documents (loop + list of dictionaries)

In [26]:
summaries = []

for doc in documents:
    summary_text = summarize_document(doc)
    summaries.append({
        "name": doc["name"],
        "summary": summary_text
    })

len(summaries)


2

In [27]:
# Preview what you collected
for item in summaries:
    print("—" * 60)
    print(item["name"])
    print(item["summary"])


————————————————————————————————————————————————————————————
The Art of War.txt
The Art of War by Sun Tzu, translated by Lionel Giles in 1910, is the oldest military treatise known and offers timeless lessons on strategy and warfare. Giles' translation sought to improve on earlier inadequate versions by providing a more accurate, scholarly edition enriched with critical notes and commentary. The book covers various aspects of military strategy, including planning, tactics, terrain, and the use of spies. Though initially overlooked in the English-speaking world, interest in Sun Tzu’s work grew significantly around World War II. Giles' edition remains a foundational and highly respected translation in the study of military strategy.
————————————————————————————————————————————————————————————
WHO_Digital_Health.pdf
The WHO Global Strategy on Digital Health 2020-2025 outlines the vision and framework for integrating digital technologies into national health systems worldwide. It emphasize

In [35]:
def build_briefing_pack(summaries):
    # Alternative compact version (commented out for clarity)
    # combined = "\n\n".join(
    #     f"DOCUMENT: {item['name']}\nSUMMARY:\n{item['summary']}"
    #     for item in summaries
    # )

    combined_parts = []

    for item in summaries:
        part = (
            f"DOCUMENT: {item['name']}\n"
            f"SUMMARY: {item['summary']}\n"
        )
        combined_parts.append(part)

    # Take all the strings in combined_parts and glue them together
    # with two newline characters between each one
    combined = "\n\n".join(combined_parts)

    prompt = f"""
You are creating a briefing pack from multiple document summaries.

Create a Markdown report with these sections:

## Executive summary
(6–10 sentences)

## Key takeaways
(5–8 bullets)

## Document notes
For each document:
- 2–4 bullets of what it covers

Here are the document summaries:
{combined}
"""

    response = client.responses.create(
        model="gpt-4.1-mini",
        input=prompt
    )

    return response.output_text


In [36]:
briefing_md = build_briefing_pack(summaries)
print(briefing_md[:1200])


# Briefing Report

## Executive summary
This briefing synthesizes insights from two significant documents covering very different domains: classical military strategy and modern global health innovation. *The Art of War*, translated by Lionel Giles, is a foundational military text offering enduring principles of strategy and tactics that continue to influence military and leadership thinking. Giles’ scholarly translation elevated understanding by providing rich commentary and accurate interpretation, contributing to its wider recognition especially during the 20th century.

On the other hand, the WHO Global Strategy on Digital Health 2020-2025 presents a contemporary roadmap for integrating digital technologies into health systems worldwide. It advocates for countries to develop cohesive, evidence-based digital health strategies that promote equitable access and privacy protection, while leveraging new technologies to overcome health service challenges, especially in resource-limited s

# Saving the briefing pack to a file

In [37]:
#from pathlib import Path
from datetime import datetime

# Make sure the outputs folder exists
OUT_DIR = Path("outputs")
OUT_DIR.mkdir(exist_ok=True)

# Create a timestamped filename
date_stamp = datetime.now().strftime("%Y-%m-%d")
output_path = OUT_DIR / f"briefing_pack_{date_stamp}.md"

# Write the briefing pack to the file
output_path.write_text(briefing_md, encoding="utf-8")

output_path


WindowsPath('outputs/briefing_pack_2025-12-29.md')