In [8]:
%load_ext tensorboard
%load_ext dotenv

%dotenv

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


In [9]:
import os
import openai

MODEL = "gpt-4o-mini"

openai.api_key = os.getenv("OPENAI_API_KEY")


In [10]:
topic="hospital management systems"

In [13]:
# -> find groups first, then find individuals within those groups
# 2: maybe run this multiple times and merge, maybe prompt different models, etc etc
# 3: generate individuals first without including the goal of the ontology in the prompt, then modify their profession etc. to fit the goal of the ontology, this would lead to us not just getting individuals that have interests like "healthcare administration", but no clue if this actually improves anything

from pydantic import BaseModel


find_groups_prompt = """You are an ontology engineer tasked with creating an ontology on <topic>{topic}</topic>. As a first step, you are tasked with finding appropriate individuals to interview to get a grasp for the domain. What kind of people would you like to interview? Provide an exhaustive list of descriptions and nothing else."""

class GroupDef(BaseModel):
    name: str
    description: str

class GroupDefs(BaseModel):
    items: list[GroupDef]

groups_response = openai.beta.chat.completions.parse(
    model=MODEL,
    messages=[{"role": "system", "content": find_groups_prompt.format(topic=topic)}],
    response_format=GroupDefs
)

group_defs = groups_response.choices[0].message.parsed

print(group_defs.model_dump_json(indent=2))

{
  "items": [
    {
      "name": "Hospital Administrators",
      "description": "Individuals responsible for the overall operation and management of a hospital, including finance, staffing, and policy implementation."
    },
    {
      "name": "Healthcare IT Professionals",
      "description": "Tech specialists who design, implement, and maintain information systems within hospitals, including EHR and hospital management systems."
    },
    {
      "name": "Clinical Professionals (Doctors and Nurses)",
      "description": "Healthcare providers who use hospital management systems for patient care and administrative tasks."
    },
    {
      "name": "Health Informatics Specialists",
      "description": "Experts in managing health information data to improve healthcare outcomes, often involved in evaluating hospital management systems."
    },
    {
      "name": "Patient Care Coordinators",
      "description": "Professionals who facilitate patient services and ensure smooth ope

In [None]:
# -> find individuals in the groups, choose just minimum number required.

from typing import Literal

from pydantic import Field

# note: if this does not work, maybe we could try to figure out which group is most important and then find more individuals based on importance of group, also would it be beneficial to have individuals who overlap groups?

find_individuals_prompt = """You are an ontology engineer tasked with creating an ontology on <topic>{topic}</topic>. You have decided to interview {group_name} ({group_description}). Find at most three individuals belonging to the group whom you would like to interview to cover their perspective on the domain. Ensure you cover a wide range of experiences, backgrounds, geographical locations, lived experiences, ... 

The goal is to avoid over-representing a single demographic and instead capture a broad spectrum of voices that could contribute valuable insights on this topic."""

class Education(BaseModel):
    institution: str
    degree: str | None
    field_of_study: str | None
    description: str | None

class WorkExperience(BaseModel):
    company: str
    position: str
    description: str | None

class Skill(BaseModel):
    name: str
    proficiency: Literal["beginner", "intermediate", "advanced"] | None

class Individual(BaseModel):
    name: str
    bio: str = Field(description="Short bio written by the person themselves. (\"I am ...\")")
    education: list[Education] | None
    work_experience: list[WorkExperience] | None
    skills: list[Skill] | None



class Individuals(BaseModel):
    items: list[Individual]

class Group(BaseModel):
    name: str
    description: str
    individuals: list[Individual]

class Groups(BaseModel):
    items: list[Group]

    def find_by_name(self, name: str):
        return next((group for group in self.items if group.name == name), None)

groups = Groups(items=[Group(name=group_def.name, description=group_def.description, individuals=[]) for group_def in group_defs.items])

for group_def in group_defs.items:
    individuals_response = openai.beta.chat.completions.parse(
        model=MODEL,
        messages=[{"role": "system", "content": find_individuals_prompt.format(topic=topic, group_name=group_def.name, group_description=group_def.description)}],
        response_format=Individuals
    )

    individuals = individuals_response.choices[0].message.parsed
    groups.find_by_name(group_def.name).individuals = individuals.items
    print(individuals.model_dump_json(indent=2))
    

print(groups)

{
  "items": [
    {
      "name": "Dr. Aisha Khan",
      "bio": "I am a hospital administrator with over 15 years of experience in managing healthcare facilities, primarily focusing on improving operational efficiency and patient satisfaction.",
      "education": [
        {
          "institution": "Harvard University",
          "degree": "Master of Public Health",
          "field_of_study": "Health Policy and Management",
          "description": null
        }
      ],
      "work_experience": [
        {
          "company": "City Hospital",
          "position": "Director of Operations",
          "description": "Responsible for overseeing daily operations and implementing policies for efficient healthcare delivery."
        },
        {
          "company": "Green Valley Medical Center",
          "position": "Chief Administrative Officer",
          "description": "Led strategic planning and administration for a multi-specialty hospital."
        }
      ],
      "skills": 

In [None]:
from typing import Literal

from pydantic import Field

# note: maybe a different prompt would work better, i.e. name people that could be involved with the topic, then ask to generate them, getting a more diverse list potentially?

generate_personas_prompt = """You are an ontology engineer tasked with creating an ontology on <topic>{topic}</topic>. Find {count} individuals who you would like to interview to figure out initial competency questions. Ensure a diverse mix of backgrounds, experiences, and viewpoints. Include people from different professions, levels of education, socioeconomic statuses, geographical locations, and lived experiences. Focus on real-world perspectives from all walks of life. The goal is to avoid over-representing a single demographic and instead capture a broad spectrum of voices that could contribute valuable insights on this topic."""




def generate_personas(topic: str, count: int):
    prompt = generate_personas_prompt.format(count=count, topic=topic)

    response = openai.beta.chat.completions.parse(
        model=MODEL,
        messages=[
            {"role": "system", "content": prompt}
        ],
        response_format=Individuals
    )

    # todo make sure we have the right number of personas, else generate more

    return response.choices[0].message.parsed

personas = generate_personas(topic, 10)

print(len(personas.items))
print(personas.model_dump_json(indent=True))

(Personas(items=[]), Box({'prompt_tokens': 2840, 'completion_tokens': 807, 'total_tokens': 3647}))
(Personas(items=[Persona(name='Dr. Emily Chen', bio='Asian American, focuses on pure mathematics', education=[Education(institution='University of California, Berkeley', degree='PhD', field_of_study='Mathematics', description='Extensive research in pure mathematics.')], work_experience=[WorkExperience(company='Stanford University', position='Professor', description='Teaching advanced mathematics courses.')], skills=[Skill(name='Algebra', proficiency='advanced'), Skill(name='Topology', proficiency='advanced')]), Persona(name='Mr. David Johnson', bio='African American, works in a diverse urban school', education=[Education(institution='University of Chicago', degree="Master's Degree", field_of_study='Education', description='Focus on mathematics pedagogy.')], work_experience=[WorkExperience(company='Lincoln High School', position='Math Teacher', description='Taught high school mathematics t

In [None]:
from typing import Any


expert_system_prompt = """# Role & Objective

You are {description}, an expert in {topic}. You are being interviewed by an ontology engineer who is developing an ontology for this domain.

Your role is to share your domain knowledge clearly and concisely. You do not need to know about ontologies or competency questions—that is the ontology engineer's responsibility.

# Guidelines for the Conversation

- Keep your answers concise and to the point.
- Do not repeat information that has already been discussed, unless you need to clarify something.
- Ask for clarification if needed, but focus on providing structured insights from your expertise.
- Answer questions based on your domain knowledge, particularly in relation to:
  - Entities: What are the core concepts in your field? What are their essential characteristics?
  - Relationships: How do these concepts connect? Are there hierarchies, dependencies, or causal links?
  - Rules & Constraints: What rules, business logic, or restrictions exist in the domain?
  - Processes & Events: What key workflows, actions, or transformations occur? What triggers them?
  - Disambiguation: Are there terms that are often confused or used differently by different people?
  - Context & Scope: What should be included or excluded when modeling this domain?

# What You Are Not Responsible For

- You do not need to worry about ontologies, competency questions, or formal structuring—the ontology engineer will handle that.
- You do not need to repeat yourself unless clarification is required.

This should be a structured yet natural conversation where you share your expertise while the ontology engineer extracts key insights."""

engineer_system_prompt = """# Role & Objective

You are an ontology engineer developing an ontology on {topic}. You are interviewing {description}, an expert in this domain, to gather structured knowledge.

Your goal is to extract key domain concepts, relationships, rules, and constraints to inform ontology design. You will also define competency questions—formal questions that the ontology should be able to answer—to shape its structure.

# Guidelines for the Conversation

- Ask concise, focused questions.
- Do not repeat what the expert has already said, except when referring back for clarification.
- Follow up on responses to refine understanding and resolve ambiguities.
- Do not ask about ontologies or competency questions directly—translating expert knowledge into ontology structures is your responsibility.
- Your questions should focus on key aspects of ontology design, including:
  - Entities: What are the core concepts in this domain? What are their essential attributes?
  - Relationships: How do these entities connect? Are relationships hierarchical, causal, or associative?
  - Rules & Constraints: What logical dependencies, business rules, or constraints exist?
  - Processes & Events: What workflows, actions, or transformations occur? What triggers them?
  - Disambiguation: Are there any commonly confused terms or overlapping concepts?
  - Context & Scope: What should be included or excluded in this ontology?

# Competency Questions

- Based on the expert's answers, you will formulate competency questions (i.e., queries that the ontology must be able to answer).
- Do not ask competency questions as direct questions to the expert—instead, extract the necessary information to define them yourself.
- Example: If the expert explains relationships between disease symptoms and treatments, a competency question could be: "Given a set of symptoms, what possible diseases could be diagnosed?"
- Use competency questions to guide ontology design, not as conversation prompts.

Your role is to structure knowledge efficiently while keeping the conversation natural and to the point.
"""

Speaker = Literal["expert", "engineer"]

class Turn(BaseModel):
    """A turn in the conversation between the expert and the engineer"""

    role: Speaker
    content: str

    def to_openai_format(self, speaker: Speaker) -> Any:
        return {"role": ("assistant" if self.role == speaker else "user"), "content": self.content}

class Interview(BaseModel):
    persona: Individual
    topic: str
    protocol: list[Turn]

    def add_turn(self, role: Speaker, content: str):
        self.protocol.append(Turn(role=role, content=content))

class CompetencyQuestion(BaseModel):
    """Questions that the ontology ought to be able to answer. These are NOT questions that you want to ask the expert."""

    question: str
    confidence: float = Field(description="Your confidence in this question on a scale from 0 to 1. Should only increase with facts!")

    notes: str | None = Field(description="Any notes or context that might be relevant to this question, especially if you are not sure this works. Maybe also just ask the expert!")

# end conversation action after predefined number of turns, time limit or just provide in general?


class DefineCompetencyQuestionAction(BaseModel):
    """Define a competency question"""
    
    type: Literal["def_cq"]
    cq: CompetencyQuestion

class Thought(BaseModel):
    chain_of_thought: str = Field(description="Your INTERNAL MONOLOGUE. Think about what was just said and what you should do next!")

    actions: list[DefineCompetencyQuestionAction]

    response: str = Field(description="Your response to the expert. Keep it as concise as possible.")

class Memory(BaseModel):
    questions: list[CompetencyQuestion]
    thoughts: list[Thought]




def expert_respond(interview: Interview):
    """Simulate the expert's response to the engineer's """

    
    sys_prompt = expert_system_prompt.format(description=interview.persona.model_dump_json(), topic=interview.topic)


    response = openai.chat.completions.create(
            model=MODEL,
            messages=[{"role": "system", "content": sys_prompt},
                      *[turn.to_openai_format("expert") for turn in interview.protocol]
            ]
        )
    
    return response.choices[0].message.content

def engineer_think(interview: Interview):
    """Simulate the engineer's thought process after the expert's response"""

    sys_prompt = engineer_system_prompt.format(description="You are an ontology engineer ", topic=interview.topic)

    response = openai.beta.chat.completions.parse(
            model=MODEL,
            messages=[{"role": "system", "content": sys_prompt},
                      *[turn.to_openai_format("engineer") for turn in interview.protocol]
            ],
            response_format=Thought
        )

    return response.choices[0].message.parsed

def interview(persona: Individual, topic: str):
    """A simulated interview between a domain expert and an ontology engineer"""

    interview = Interview(persona=persona, topic=topic, protocol=[
        Turn(role="engineer", content="Thank you for taking the time to speak with me. My goal today is to better understand your domain and how knowledge is structured within it. There's no fixed script, so I'd love to hear your thoughts and experiences naturally. Can you describe your field and what kind of problems you typically deal with?"),
    ])

    memory = Memory(questions=[], thoughts=[])

    while True:
        response = expert_respond(interview)

        if not response:
            raise Exception("No response from expert, something weird happened")

        interview.add_turn("expert", response)

        print(f"{persona.name}: {response}")

        thought = engineer_think(interview)

        if not thought:
            raise Exception("No response from engineer, something weird happened")
        
        interview.add_turn("engineer", thought.response)
        
        memory.thoughts.append(thought)
        # append new cqs
        for action in thought.actions:
            if isinstance(action, DefineCompetencyQuestionAction):
                memory.questions.append(action.cq)

        print(thought.model_dump_json(indent=True))

        print(f"Engineer: {thought.response}")


interview(personas.items[0], topic)




Dr. Emily Carter: Thank you for having me. My field, mathematical logic, primarily focuses on the formalization of mathematical reasoning, especially through the study of proofs, theorems, and axioms. In this area, I deal with several key problems, such as:

1. **Formal Proof Construction**: Establishing valid proofs for theorems based on axiomatic systems.
2. **Consistency and Completeness**: Investigating whether a given set of axioms can be used to prove every true statement within a particular mathematical structure without inconsistencies.
3. **Definability**: Understanding how different mathematical concepts can be formally defined and the implications of those definitions.
4. **Model Theory**: Studying the relationships between formal languages and their interpretations, which involves understanding how models of various theories behave.

Overall, my work revolves around the foundational aspects of mathematics, ensuring that reasoning used in mathematics is sound and rigorously 

KeyboardInterrupt: 

In [None]:
# could be cool to first let experts map out what they think the discussion should revolve around.