# üõ∏ Project: Viral Science Short Generator (v1.0)

### **Objective**

To transform dense scientific documentation (Wikipedia) into high-retention, 50-second YouTube Short scripts by utilizing a **multi-stage LLM research pipeline**.

### **Requirements**

To use this notebook, uv add wikipedia-api

---

## üèóÔ∏è The Architecture

The project follows a "Director-Researcher-Editor" workflow to ensure both viral appeal and scientific accuracy.

### **Stage 1: The Creative Director (Idea & Source Selection)**

* **Input:** Primary Wikipedia article text + a list of all internal Wiki links.
* **Process:** The LLM identifies a "Viral Hook" and selects the top 3-5 sub-topics (links) that best support that specific narrative.
* **Output:** A **Viral Story Concept** and a **Target Research List** (JSON).

### **Stage 2: The Fact Researcher (Pre-Summarization)**

* **Input:** Raw content from the selected 3-5 Wikipedia sub-pages.
* **Process:** To prevent "Context Window" bloat, summarize these wikipedia page based on the viral idea and hook preview outfrom from director
* **Output:** A distilled **Fact Sheet** (15‚Äì25 high-impact bullet points).

### **Stage 3: The Script Editor (Final Synthesis)**

* **Input:** Viral Story Concept + Distilled Fact Sheet.
* **Process:** Synthesizes the research into a 130‚Äì150 word script using the **Hook-Meat-Loop** framework.
* **Output:** A production-ready script with visual cues for 50-second vertical video.

---

## üõ†Ô∏è Technical Stack (2026)

* **Language:** Python 3.11+
* **Data Sourcing:** `wikipedia-api` (clean, structured extraction).
* **Environment:** Jupyter Notebook (Iterative development).
* **LLM Engine:** OpenAI / Anthropic (API) or Llama 3.2 (Local via Ollama).
* **Logic:** Multi-stage Prompt Chaining & JSON Output Enforcement.

---

## üìà Learning Goals

* **Prompt Chaining:** Mastering state-passing between different LLM "roles."
* **Context Management:** Using summarization to fit massive data into limited token windows.
* **JSON Enforcement:** Forcing LLMs to output machine-readable data for automated research.
* **Agentic Agency:** Designing a system where the AI "decides" which paths to research.

---

### **Next Steps for the Notebook**

1. **Cell 1:** Setup `wikipedia-api` and fetch the primary article.
2. **Cell 2:** Execute **Stage 1 Prompt** to get the Story Idea and selected links.
3. **Cell 3:** Loop through selected links and execute **Stage 2 Prompt** (Summarization).
4. **Cell 4:** Execute **Stage 3 Prompt** to generate the final 50-second script.


In [3]:
import os
import json
from dotenv import load_dotenv


from IPython.display import Markdown, display, update_display
from openai import OpenAI


In [4]:
load_dotenv(override=True)

api_key = os.getenv("OPENAI_API_KEY")

if api_key and api_key.startswith("sk-") and len(api_key) > 40:
    print("API key loaded successfully.")
else:
    print("Failed to load API key. Please check your .env file.")

THINKING_MODEL = "gpt-5"
WRITING_MODEL = "gpt-4.1"

client = OpenAI()



API key loaded successfully.


### STEP 1: Director
* **Input:** Primary Wikipedia article text + a list of all internal Wiki links.
* **Process:** The LLM identifies a "Viral Hook" and selects the top 3-5 sub-topics (links) that best support that specific narrative.
* **Output:** A **Viral Story Concept** and a **Target Research List** (JSON).

YOU NEED WIKIPEDIA-API:
uv add wikipedia-api

In [5]:
from os import link

import wikipediaapi

def get_wikipedia_summary(topic: str, includeLinks = False):
    # 2026 Rule: Always include a descriptive user agent or you might get blocked
    wiki = wikipediaapi.Wikipedia(
        user_agent='MyScienceScriptGenerator/1.0 (contact@example.com)',
        language='en',
        extract_format=wikipediaapi.ExtractFormat.WIKI
    )

    page = wiki.page(topic)

    # Extract the text
    content = page.text[:10000] # Take first 10k chars to save tokens

    # Extract the links (titles only, to save space)
    links = list(page.links.keys())

    if (includeLinks):
        result = "### Wikipedia Links: " + ", ".join(links, ) + "\n\n" + "### Wikipedia Summary: " + "\n" + content
    else:
        result = "### Wikipedia Summary: " + "\n" + content

    return result

In [6]:
print(get_wikipedia_summary("quantum entanglement"))

### Wikipedia Summary: 
Quantum entanglement is the phenomenon wherein the quantum state of each particle in a group cannot be described independently of the state of the others, even when the particles are separated by a large distance. The topic of quantum entanglement is at the heart of the disparity between classical physics and quantum physics: entanglement is a primary feature of quantum mechanics not present in classical mechanics.
Measurements of physical properties such as position, momentum, spin, and polarization performed on entangled particles can, in some cases, be found to be perfectly correlated. For example, if a pair of entangled particles is generated such that their total spin is known to be zero, and one particle is found to have clockwise spin on a first axis, then the spin of the other particle, measured on the same axis, is found to be anticlockwise. This behavior gives rise to seemingly paradoxical effects: any measurement of a particle's properties results in 

In [7]:
director_system_prompt = """
### ROLE
You are a Viral Science "Creative Director" for a major YouTube Shorts channel (similar to Veritasium or Kurzesagt). Your specialty is finding "spooky," "counter-intuitive," or "mind-blowing" angles in complex scientific data.

### TASK
1. Analyze the provided Wikipedia content.
2. Formulate an high-level idea for a high-retention "Viral Story Concept" for a 50-second video.
3. Review the list of provided internal Wikipedia links.
4. Select exactly 3-5 links that are the most critical for researching and proving the viral concept you just created.

### CONSTRAINTS
- You will be provided with two things:  a) ### Wikipedia Links: a list of all internal Wikipedia links from that page and b) ### Wikipedia Summary: the full text content of a Wikipedia page found under ###
- You must ONLY output a valid JSON object. Do not include any conversational text before or after the JSON.

### OUTPUT FORMAT (JSON ONLY)
{
    "viral_idea": "A short, punchy sentence describing the angle of the video.",
    "hook_preview": "The first 5 seconds of the video script to grab attention.",
    "research_targets": ["Exact Title 1", "Exact Title 2", "Exact Title 3"],
    "reasoning": "A brief explanation of why these specific links were chosen."
}
"""

In [8]:
def get_director_user_prompt(wiki_article: str):
    director_user_prompt = """
    Here is the Wikipedia content you need to analyze:
    """
    director_user_prompt += "\n\n" + get_wikipedia_summary(wiki_article, includeLinks=True)
    return director_user_prompt



In [24]:
def get_director_output(wiki_article: str):
    print(f"Getting director output... {wiki_article}")
    response = client.chat.completions.create(
        model=THINKING_MODEL,
        messages=[
            {"role": "system", "content": director_system_prompt},
            {"role": "user", "content": get_director_user_prompt(wiki_article)}
        ],
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content

In [10]:
director_output = get_director_output("quantum entanglement")

In [11]:
director_output_json = json.loads(director_output)

In [12]:
print(director_output_json)

{'viral_idea': 'Entangle two particles that never met‚Äîapparently even after they‚Äôre measured‚Äîwithout breaking relativity.', 'hook_preview': 'We entangled two photons that never met‚Äîafter the detectors clicked. Did we hack time or physics?', 'research_targets': ['Entanglement swapping', 'Bell test experiments', 'No-communication theorem', 'Quantum teleportation', "Wheeler's delayed-choice experiment"], 'reasoning': "Entanglement swapping is the core mechanism for entangling strangers; Bell test experiments verify the nonclassical correlations; the No-communication theorem rules out faster-than-light messaging; Quantum teleportation shares the same Bell-state measurement machinery that makes the trick work; Wheeler's delayed-choice experiment frames the ‚Äòafter-the-fact‚Äô twist that hooks the story."}


In [13]:
print(get_wikipedia_summary("Delayed-choice quantum eraser"))

### Wikipedia Summary: 
A delayed-choice quantum eraser experiment is an elaboration on the quantum eraser experiment that incorporates concepts considered in John Archibald Wheeler's delayed-choice experiment. The experiment was designed to investigate peculiar consequences of the well-known double-slit experiment in quantum mechanics, as well as the consequences of quantum entanglement.
Delayed-choice quantum eraser experiments are designed to investigate the following apparent paradox arising from the traditional double-slit experiment: if, upon observing a photon, one can deduce that it arrived at a detector by following a particular path, then "common sense" (which Wheeler and others challenge) says that it must have entered the double-slit device as a particle, whereas if the photon's path cannot be deduced, then it must have entered the double-slit device as a wave. By this logic, a spontaneous change in the mode of observation while the photon is in transit may force it to retr

### **Stage 2: The Fact Researcher (Pre-Summarization)**

* **Input:** Raw content from the selected 3-5 Wikipedia sub-pages.
* **Process:** To prevent "Context Window" bloat, summarize these wikipedia page based on the viral idea and hook preview outfrom from director
* **Output:** A distilled **Fact Sheet** (15‚Äì25 high-impact bullet points).


In [14]:
researcher_system_prompt = """
### ROLE
You are an "Elite Science Researcher." Your job is to extract high-impact, verifiable, and mind-blowing facts from complex scientific texts to support a specific creative vision.

### TASK
1. Review the "Viral Story Concept" and "Hook Preview" provided by the Director.
2. Analyze the provided raw content from 3-5 Wikipedia articles.
3. Extract exactly 5-10 unique, "mind-blowing" facts from that directly support the viral concept.
4. If an article contains information that contradicts or adds a surprising twist to the concept, prioritize that information.

### CONSTRAINTS
- Accuracy is paramount. Do not exaggerate the science, but do highlight the most extreme "edge cases."
- Avoid "fluff" or general introductory sentences. 
- Ensure each fact is self-contained and easy to understand.
- Output ONLY a structured list of bullet points.

### INPUT DATA
- VIRAL IDEA: A short, punchy sentence describing the angle of the video.
- HOOK PREVIEW: Rough, approximate script for the first 5 seconds of the video.
- RESEARCH MATERIAL: 3-5 Wikipedia articles in raw text format.

### OUTPUT FORMAT
Provide the facts as a simple list of bullet points.

"""

In [26]:
def get_researcher_user_prompt(viral_idea: str, hook_preview: str, wiki_articles: list[str]):
    researcher_user_prompt = f"""
    Here is the Viral Story Concept and Hook Preview provided by the Director:
    
    VIRAL IDEA: {viral_idea}
    
    HOOK PREVIEW: {hook_preview}
    
    Now, here are the raw contents of the Wikipedia articles you need to analyze:
    """
    for wiki_article in wiki_articles:
        researcher_user_prompt += f"\n\n### Article {wiki_article}\n" + "\n\n" + get_wikipedia_summary(wiki_article, includeLinks=False)
    
    return researcher_user_prompt

In [16]:
print(get_researcher_user_prompt(director_output_json["viral_idea"], director_output_json["hook_preview"], director_output_json["research_targets"]))


    Here is the Viral Story Concept and Hook Preview provided by the Director:

    VIRAL IDEA: Entangle two particles that never met‚Äîapparently even after they‚Äôre measured‚Äîwithout breaking relativity.

    HOOK PREVIEW: We entangled two photons that never met‚Äîafter the detectors clicked. Did we hack time or physics?

    Now, here are the raw contents of the Wikipedia articles you need to analyze:
    

### Article Entanglement swapping


### Wikipedia Summary: 
In quantum mechanics, entanglement swapping is a protocol to transfer quantum entanglement from one pair of particles to another, even if the second pair of particles have never interacted. This process may have application in quantum communication networks and quantum computing.

Concept
Basic principles
Entanglement swapping has two pairs of entangled particles: (A, B) and (C, D). Pair of particles (A, B) is initially entangled, as is the pair (C, D). The pair (B, C) taken from the original pairs, is projected onto 

In [31]:
def get_researcher_output(director_output_json):
    print(f"Getting researcher output... {director_output_json['viral_idea']}")
    response = client.chat.completions.create(
        model=WRITING_MODEL,
        messages=[
            {"role": "system", "content": researcher_system_prompt},
            {"role": "user", "content": get_researcher_user_prompt(director_output_json["viral_idea"], director_output_json["hook_preview"], director_output_json["research_targets"])}
        ]
    )
    return response.choices[0].message.content

In [18]:
researcher_output = get_researcher_output(director_output_json)

In [19]:
print(researcher_output)

- Entanglement swapping allows two particles (e.g., photons) to become entangled despite never interacting or encountering each other directly; their entanglement is created by performing a joint measurement (Bell state measurement or BSM) on partner particles from two separate entangled pairs.

- In the original 1992 and subsequent 1998 experiments, physicists successfully demonstrated that two photons could end up entangled without ever sharing a joint history or physical interaction, purely due to measurements performed on their respective partners.

- Entanglement between particles that "never met" is confirmed after the relevant measurement is made; this can occur even after both particles have been detected, aligning with the idea that measurement events can entangle particles retroactively in time.

- Although quantum correlations can appear instantaneous and even "retroactive" (as if the future measurement defines the past entanglement), the no-communication theorem rigorously 

In [None]:
writer_system_prompt = """
### ROLE
You are a Lead Scriptwriter for a viral Science YouTube channel. You specialize in "The Narrative Bridge"‚Äîtaking dense physics facts and turning them into vivid, relatable, and high-retention 50-second scripts.

### THE TASK
Create a 130-150 word script for a vertical video based on the provided "Viral Concept" and "Fact Sheet." 

### CORE DIRECTIVES
1. THE NARRATIVE ARC:
- 0-5s (The Hook): Start with a "Knowledge Gap." Challenge the viewer‚Äôs intuition about reality immediately.
- 5-40s (The Meat): Use ONE primary analogy to explain the science. Anchor the abstract physics to a concrete, everyday object (like a coin, a mirror, or a pair of shoes).
- 40-50s (The Loop): End with a sentence that ties back to the opening hook, making the video satisfying to re-watch.

2. STYLE & LANGUAGE:
- Ruthless Prioritization: Only explain ONE core idea. Ignore secondary facts that don't serve the viral concept.
- Sensory Writing: Replace abstract terms (e.g., "non-locality") with sensory descriptions (e.g., "happening across the void," "tugging on a ghost string").
- Human Scale: Convert large numbers or distances into comparisons a human can visualize (e.g., "a grain of sand vs the Sahara").
- Emotional Stakes: Frame the science as a mystery or a "spooky" secret that changes how the viewer sees themselves.

### CONSTRAINTS
- Avoid jargon. If you must use a scientific term, define it instantly with a "schema" (an analogy).
- Keep the tone wonder-filled and slightly dramatic.
- Total word count must stay between 130 and 150 words.

### INPUT DATA
- VIRAL CONCEPT: {{viral_idea}}
- FACT SHEET: {{fact_sheet}}

### OUTPUT FORMAT
[VISUAL CUE: Detailed scene description]
NARRATOR: "The spoken dialogue..."

[VISUAL CUE: ...]
NARRATOR: "..."
"""

In [28]:
def get_writer_output(viral_idea: str, fact_sheet: str):
    print(f"Getting writer output... {viral_idea}")
    response = client.chat.completions.create(
        model=WRITING_MODEL,
        messages=[
            {"role": "system", "content": writer_system_prompt},
            {"role": "user", "content": f"VIRAL CONCEPT: {viral_idea}\n\nFACT SHEET:\n{fact_sheet}"}
        ]
    )
    return response.choices[0].message.content

In [22]:
writer_output = get_writer_output(director_output_json["viral_idea"], researcher_output)

In [23]:
print(writer_output)

[VISUAL CUE: Two lonely marbles on opposite sides of a table, no bridge between them. Dramatic zoom-in.]

**HOOK (0-5s):**  
Can you tie a knot between two shoes‚Ä¶ that never touched? In quantum physics, this isn‚Äôt just possible‚Äîit's reality.

[VISUAL CUE: Animation: Two colored light particles racing past each other, never crossing paths.]

**MEAT (5-40s):**  
Imagine two photons, each born on opposite ends of the universe. They've never met, never whispered secrets‚Äîbut there‚Äôs a quantum trick called ‚Äúentanglement swapping‚Äù that links them anyway.

Here‚Äôs the twist: By measuring their partners, scientists can fuse the fates of these strangers‚Äîeven after both have been caught and measured. It‚Äôs as if a referee shouts the game‚Äôs outcome after the players already left the field!

In mind-bending experiments, physicists witnessed this knot form‚Äînot through any secret signal, but using measurements that ripple backward like a retroactive handshake. Yet, not even a pi

In [None]:
def write_short_science_script(wiki_article : str):
    director_output = get_director_output(wiki_article)
    director_output_json = json.loads(director_output)
    researcher_output = get_researcher_output(director_output_json)
    writer_output = get_writer_output(director_output_json["viral_idea"], researcher_output)
    return writer_output

In [33]:
display(Markdown(write_short_science_script("spacetime")))

Getting director output... spacetime
Getting researcher output... Motion tilts your slice of ‚Äònow‚Äô‚Äîa single step can shift what‚Äôs happening ‚Äòright now‚Äô in Andromeda by days.
Getting writer output... Motion tilts your slice of ‚Äònow‚Äô‚Äîa single step can shift what‚Äôs happening ‚Äòright now‚Äô in Andromeda by days.


[VISUAL CUE: Split-screen‚Äîon the left, a person lacing shoes; on the right, a cosmic shot of the Andromeda galaxy.]

**HOOK (0-5s):**  
Did you know that just *walking across your room* can change what‚Äôs happening right now‚Äî*millions of light-years away* in Andromeda, by days?

[VISUAL CUE: Animated timeline stretching from the person to Andromeda; the line labeled ‚Äúyour now‚Äù tilts as the person takes a step.]

**MEAT (5-40s):**  
Here‚Äôs the mind-bender: there‚Äôs no cosmic master clock ticking the same for everyone. As you move‚Äîeven strolling‚Äîthe universe reshuffles events just for you. When you step east, your version of ‚Äúnow‚Äù in Andromeda leaps forward; walk west, and it jumps back.  
How big is this effect? At walking speed, your ‚Äúnow‚Äù slice for Andromeda shifts over *10 days*‚Äîwith every step! It's like the universe lets you flick through moments in a distant galaxy just by moving.  
That happens because space and time *mix* when you move. Einstein‚Äôs relativity says you don‚Äôt just travel through space‚Äîyou tilt your window into time across the cosmos.

[VISUAL CUE: Person walking; a cosmic ‚Äúslice‚Äù swivels like a window blade over Andromeda, days flick by on a distant calendar.]

**LOOP (40-50s):**  
So next time you step forward, remember: you‚Äôre not just going somewhere. *You‚Äôre rewriting ‚Äúnow‚Äù in the galaxies above‚Äîone step at a time.*  
Want more reality-bending facts? Hit that replay.

---

**Stickiness Analysis:**  
This script leverages a familiar act (walking) as anchoring schema, sets up a shocking knowledge gap (controlling the cosmic "now"), uses concrete human-scale shifts (10 days in Andromeda), vivid visual analogies (tilting slice), and emotional payoff (your steps reshuffle the universe), tightly following all elements of the SUCCESs framework for maximum memorability.