# InterviewAgent - From Foreword of Open Source AI
This notebook uses **CrewAI** agents to simulate an interview with a public figure—drawing exclusively from real quotes across public podcasts, articles, and transcripts. It includes modular programs for discovering sources, assembling quote-based responses, and compiling everything into a polished markdown interview suitable for publication. The design supports scalable, trustworthy dialogue generation for book chapters, blogs, or media projects involving open-source AI.

Before running any of the main program listings, be sure to first run the setup cells:
- `pip install` to bring in required libraries (`crewai[tools])
- Google Colab Secrets loading cell to securely access your API keys (used by tools like `Serper` and `OpenAI`)
- Global constants defining the default model, output paths, ** interviewee name, and questions**



In [None]:
%%capture --no-stderr
%pip install -U --quiet 'crewai[tools]' aisuite databricks-sdk boto3

In [None]:
# Constants and API Key Configuration
import os
from google.colab import userdata

# === Load API keys securely from Google Colab Secrets ===
def load_api_keys():
    keys = {
        "HF_TOKEN": userdata.get("HF_TOKEN"),
        "SERPER_API_KEY": userdata.get("SERPER_API_KEY"),
        "OPENAI_API_KEY": userdata.get("OPENAI_API_KEY"),
        "GEMINI_API_KEY": userdata.get("GEMINI_API_KEY"),
    }
    for key, value in keys.items():
        if not value:
            raise ValueError(f"❌ Missing {key}. Please set this API key in Colab secrets.")
        os.environ[key] = value
    print("✅ All API keys loaded and configured successfully.")

# Execute API key loading upon running this cell
load_api_keys()

In [None]:
# === Config ===
DEFAULT_MODEL = "gpt-4o-mini"
#DEFAULT_MODEL = "huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct"

# === Directory and File Configuration ===
DIRECTORY = "interview_outputs"
RAW_OUTPUT = "compiled_interview_raw.txt"
FINAL_OUTPUT = "interview_final.md"


In [None]:
# === Interviewee Name (edit this for future interviews) ===
INTERVIEWEE = "Clement Delangue"

# === Interview Questions ===
INTERVIEW_QUESTIONS = [
    "Can you tell us about your background, education, and the events leading up to the founding of Hugging Face?",
    "How did the name Hugging Face come about?",
    "What is Hugging Face's mission?",
    "What does open source AI mean to you?",
    "Tell us about Hugging Face Partnerships"
    "What is the future of AI?"
]

### Listing F-1A: Media Source Discovery Agent
This listing shows a focused script that uses a single `CrewAI` agent to discover up to 12 verified public sources featuring the interviewee. The agent is equipped with search and website tools and is instructed to find interviews, blog posts, podcasts, company articles, and LinkedIn posts where the subject has shared meaningful commentary. Each source is saved with a short description and URL, creating a reusable foundation for future interview tasks. The results are stored in a single output file.


In [None]:
import os
from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool, WebsiteSearchTool

# === Tools ===
search_tool = SerperDevTool()
web_rag_tool = WebsiteSearchTool()

# === Configuration ===
SOURCE_OUTPUT_FILE = os.path.join(DIRECTORY, "verified_sources.txt")
os.makedirs(DIRECTORY, exist_ok=True)

# === Create Discovery Agent ===
def create_discovery_agent():
    return Agent(
        role="Interview Source Discovery Agent",
        goal=f"Identify up to 12 verified public sources featuring interviews or direct quotes by {INTERVIEWEE}.",
        backstory=(
            f"You are a research assistant trained to discover interview content and direct public statements made by {INTERVIEWEE}. "
            "Your job is to find blog posts, company blogs, podcasts, video transcripts, media articles, and public LinkedIn posts where "
            f"{INTERVIEWEE} shares relevant commentary, especially on topics like AI, openness, innovation, and Hugging Face."
        ),
        tools=[search_tool, web_rag_tool],
        llm=DEFAULT_MODEL,
        verbose=True,
    )

# === Create Discovery Task ===
def create_discovery_task(agent):
    return Task(
        description=(
            f"Find up to 12 verified public sources (blogs, podcasts, company sites, media interviews, LinkedIn posts) "
            f"that feature direct quotes, interview segments, or substantial remarks made by {INTERVIEWEE}.\n\n"
            f"Requirements:\n"
            "- Do not include duplicate domains or sources.\n"
            "- Ensure each source includes a valid URL.\n"
            "- Briefly describe what kind of source it is and why it's relevant.\n"
            "- Use bullet format for output (up to 12 items max).\n\n"
            "If fewer than 12 strong matches are found, return what you can."
        ),
        expected_output=(
            "A bullet-point list of up to 12 verified sources, each with a short description and plain-text URL. Example:\n"
            "- Forbes Africa article about Clément’s early business ventures and education. https://www.forbesafrica.com/... \n"
            "- Hugging Face blog post about open-source partnerships. https://huggingface.co/blog/..."
        ),
        agent=agent,
        output_file=SOURCE_OUTPUT_FILE
    )

# === Run Discovery Crew ===
def run_discovery_crew():
    agent = create_discovery_agent()
    task = create_discovery_task(agent)

    crew = Crew(
        agents=[agent],
        tasks=[task],
        verbose=True
    )
    crew.kickoff()

    print(f"✅ Discovery complete. Verified sources saved to: {SOURCE_OUTPUT_FILE}")

# === Entry Point ===
if __name__ == "__main__":
    run_discovery_crew()


### Listing F-1: Agent-Guided Quote-Based Interview Builder
This script uses `CrewAI` to simulate interview responses using publicly available quotes from a specified interviewee. It creates one agent per question, each tasked with locating quotes that plausibly address that question using tools like `Serper` and `WebsiteSearchTool`. The output consists of literal, cited responses stored in individual files. This method supports scalable, citation-respecting interview assembly using publicly sourced material without paraphrasing or generative content.

In [None]:
import os
from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool, WebsiteSearchTool

# === Tools ===
search_tool = SerperDevTool()
web_rag_tool = WebsiteSearchTool()

# === Create Output Directory ===
OUTPUT_DIR = "interview_outputs"
os.makedirs(OUTPUT_DIR, exist_ok=True)

# === Create Agents ===
def create_question_agent(question, index):
    return Agent(
        role=f"Interview Question Agent {index + 1}",
        goal=f"Find a public source of text dialog where {INTERVIEWEE} answers a question similar to: '{question}'.",
        backstory=(
            "You are an AI researcher simulating interviews from quoted material from public sources. "
            "Your job is to find quotes from public interviewee that come close to addressing the question."
        ),
        tools=[search_tool, web_rag_tool],
        llm=DEFAULT_MODEL,
        verbose=True,
    )

# === Create Tasks ===
def create_question_task(question, index, agent):
    file_path = os.path.join(OUTPUT_DIR, f"Q{index + 1}-response.txt")
    return Task(
        description=(
            f"Your task is to simulate a response to the following interview question for {INTERVIEWEE}.\n\n"
            f"Question {index + 1}: {question}\n\n"
            f"- Search for quotes from {INTERVIEWEE} that plausably answer this question or a closely related one.\n"
            "- Use only quotes from sourced interview — do not paraphrase or summarize.\n"
            "- Keep the answer between 50–200 words of quoted material if possible.\n"
            "- End the answer with source URL(s).\n"
            "- If no close quote exists, output 'No matching quote found.'"
        ),
        expected_output=(
            f"The response must include:\n"
            f"1. **Question {index + 1}: {question}**\n"
            f"2. **Quoted Response from {INTERVIEWEE}** (only sourced quotes, no paraphrasing)\n"
            f"3. **Source URL(s)** (plain URLs only)"
        ),
        agent=agent,
        output_file=file_path
    )

# === Run Crew ===
def run_interview_crew():
    agents = [create_question_agent(q, i) for i, q in enumerate(INTERVIEW_QUESTIONS)]
    tasks = [create_question_task(q, i, agent) for i, (q, agent) in enumerate(zip(INTERVIEW_QUESTIONS, agents))]

    interview_crew = Crew(
        agents=agents,
        tasks=tasks,
        verbose=True
    )

    interview_crew.kickoff()

# === Entry Point ===
if __name__ == "__main__":
    run_interview_crew()


### Listing F-1B: Shared-Agent Interview with Context-Aware Tasks
This version **(still under development)** uses a single agent to generate quote-based responses to multiple interview questions. The process starts with a discovery phase that locates public interviews or transcripts featuring the interviewee. Each follow-up task is given that shared context to locate relevant quotes. The result is a modular, efficient way to simulate an interview entirely from public content, one file per question.

In [None]:
import os
from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool, WebsiteSearchTool, FileReadTool

# === Config ===
SOURCE_FILE = os.path.join(DIRECTORY, "interview_sources.txt")
os.makedirs(DIRECTORY, exist_ok=True)

# === Tools ===
search_tool = SerperDevTool()
web_rag_tool = WebsiteSearchTool()

# === Discovery Agent ===
def create_discovery_agent():
    return Agent(
        role="Interview Discovery Agent",
        goal=f"Find public, text-based interview sources where {INTERVIEWEE} is quoted or interviewed directly.",
        backstory="You specialize in sourcing interview transcripts, podcasts, conference talks, or Q&A blogs featuring the subject.",
        tools=[search_tool],
        llm=DEFAULT_MODEL,
        verbose=True
    )

def create_discovery_task(agent):
    return Task(
        description=(
            f"Find up to 10 credible public sources that feature {INTERVIEWEE} being interviewed or quoted at length. "
            "Prioritize interviews, podcasts, blog posts, YouTube transcripts, or conference talks. Do not include pages that simply mention them without direct speech."
        ),
        expected_output=(
            f"A list of up to 10 plain-text URLs, one per line, where {INTERVIEWEE} is quoted directly or being interviewed."
        ),
        agent=agent,
        output_file=SOURCE_FILE
    )

# === Interview Agent ===
def create_interview_agent():
    return Agent(
        role="Quote-Based Interview Agent",
        goal=f"Answer each question using literal quotes from public sources featuring {INTERVIEWEE}.",
        backstory="You are conducting a structured interview using only verifiable quotes from prior interviews and public appearances.",
        tools=[FileReadTool(file_path=SOURCE_FILE)],
        llm=DEFAULT_MODEL,
        verbose=True
    )

def create_question_task(question, index, agent, context_tasks=None):
    file_path = os.path.join(DIRECTORY, f"Q{index + 1}-response.txt")
    return Task(
        description=(
            f"Using only the provided public sources, generate a response to this interview question:\n\n"
            f"**Question {index + 1}: {question}**\n\n"
            "Use direct quotes that reasonably relate to the question (not necessarily exact matches). "
            "Do not summarize or paraphrase. Keep the answer between 50–200 words of quoted material if available. "
            "End with the URLs of the sources used. If no material fits, output 'No matching quote found.'"
        ),
        expected_output=(
            f"**Question {index + 1}: {question}**\n\n"
            f"**Quoted Response from {INTERVIEWEE}:**\n"
            "- One or more sourced quotes\n\n"
            "**Source URL(s):**\n"
            "- One per line"
        ),
        agent=agent,
        output_file=file_path,
        context=context_tasks if context_tasks else []
    )

# === Run Crew ===
def run_interview_crew():
    # Step 1: Discovery
    discovery_agent = create_discovery_agent()
    discovery_task = create_discovery_task(discovery_agent)
    Crew(agents=[discovery_agent], tasks=[discovery_task], verbose=True).kickoff()

    # Step 2: Interview responses with shared source context
    interview_agent = create_interview_agent()
    tasks = [
        create_question_task(q, i, interview_agent, context_tasks=[discovery_task])
        for i, q in enumerate(INTERVIEW_QUESTIONS)
    ]

    Crew(agents=[interview_agent], tasks=tasks, verbose=True).kickoff()

# === Entry Point ===
if __name__ == "__main__":
    run_interview_crew()


### Listing F-3: Markdown Interview Polisher from Q&A Files
This post-processing pipeline converts a set of structured interview responses into a clean, flowing markdown transcript. It first concatenates individual question files, then uses a single CrewAI agent to reformat the content as a readable, conversational interview. Questions are rephrased for natural flow, and literal quotes are preserved in the answers. Source links are retained to ensure transparency. This is the final step for publishing polished, AI-assisted interviews.

In [None]:
import os
from crewai import Agent, Task, Crew
from crewai_tools import FileReadTool

# === Step 1: Concatenate Interview Files ===
def concatenate_interview_files(input_dir, output_file):
    files = sorted([
        f for f in os.listdir(input_dir)
        if f.startswith("Q") and f.endswith("-response.txt")
    ])

    with open(output_file, "w") as outfile:
        for filename in files:
            path = os.path.join(input_dir, filename)
            with open(path, "r") as infile:
                content = infile.read().strip()
                outfile.write(content + "\n\n")

    print(f"✅ Concatenated {len(files)} files into {output_file}")

# === Step 2: Create Agent to Polish Interview ===
def create_polishing_agent(input_file_path, interviewee_name):
    return Agent(
        role="Interview Editor Agent",
        goal=f"Transform raw literal responses into a polished markdown interview with {interviewee_name}.",
        backstory=(
            f"You are an experienced editor and conversational designer. Your job is to turn structured question-and-answer content "
            f"into a smooth, professional markdown interview. The subject of this interview is {interviewee_name}."
        ),
        tools=[FileReadTool(file_path=input_file_path)],
        llm=DEFAULT_MODEL,
        verbose=True,
    )

# === Step 3: Create Task ===
def create_polishing_task(agent, interviewee_name):
    return Task(
        description=(
            f"You are editing an interview featuring {interviewee_name} from structured Q&A content.\n\n"
            "Your job is to produce a smooth markdown-formatted interview transcript.\n\n"
            "Instructions:\n"
            "- Rewrite the questions in a natural voice, as if interviewer were asking them live.\n"
            "- Do not paraphrase the answers. Use literal quotes from the source, placed in double quotes.\n"
            "- Each answer must include one or more source links from the original response file.\n"
            f"- Format as markdown with clear **Robo:** and **{interviewee_name}:** prefixes.\n"
            "- Maintain a flowing, intelligent, and conversational tone.\n"
            "- Include the following at the top of the file:\n"
            f"  - Markdown title (e.g., '# Interview with {interviewee_name}')\n"
            "  - Interviewee name\n"
            "  - Placeholder for date\n"
            "-  Ensure that all interview questions (Q1 to QN) are represented.\n"
            "-  Each answer must end with clearly marked source URL(s)."
        ),
        expected_output=(
            "A complete markdown-formatted interview saved to the output file. It must include:\n"
            "- Title, interviewee name, date placeholder\n"
            "- Robo/Interviewer and interviewee dialogue\n"
            "- Literal quotes (not paraphrased) from the subject\n"
            "- One or more source URLs following each response"
        ),
        agent=agent,
        output_file=FINAL_OUTPUT
    )

# === Step 4: Assemble and Run Crew ===
def run_interview_polishing_pipeline():
    # Step 1: Concatenate source Q&A files
    concatenate_interview_files(DIRECTORY, RAW_OUTPUT)

    # Step 2: Create the polishing agent
    agent = create_polishing_agent(RAW_OUTPUT, INTERVIEWEE)

    # Step 3: Define the polishing task
    task = create_polishing_task(agent, INTERVIEWEE)

    # Step 4: Create and run the crew
    crew = Crew(
        agents=[agent],
        tasks=[task],
        verbose=True
    )
    crew.kickoff()

    print(f"✅ Final interview output saved to '{FINAL_OUTPUT}'")

# === Entry Point ===
if __name__ == "__main__":
    run_interview_polishing_pipeline()

