# Scientific validation of policy proposals
---
Experimenting with web search APIs for scientific validation of policy proposals.

## Setup

### Import libraries

In [None]:
import os
from typing import Any, Dict
import json
import backoff
from dotenv import load_dotenv
from IPython.display import Markdown, display
from tqdm.auto import tqdm
from pydantic import BaseModel, Field, ValidationError
from openai import OpenAI
import requests

In [None]:
from polids.config import settings
from polids.scientific_validation.perplexity import PerplexityScientificValidator

### Set parameters

In [None]:
system_prompt = """**Objective:** Analyze the provided policy proposal to determine its scientific validity based on credible, current evidence obtained through web search. Produce a structured JSON output adhering to the `ScientificValidation` schema.

**Core Task:** Evaluate the scientific backing for the policy proposal. Your analysis must focus on:

1.  **Evidence Assessment:**
    - Determine if the **balance of scientific evidence** supports or refutes the policy's likely effectiveness or impact. This directly informs the `is_policy_supported_by_scientific_evidence` field.
    - Critically evaluate the **degree of consensus** among reliable scientific sources. Is there broad agreement, significant debate, or insufficient evidence? This directly informs the `is_scientific_consensus_present` field. Remember, `True` requires near-unanimous agreement among sources on the *validation outcome* (supported or not supported).

2.  **Source Prioritization:**
    - **Highest Priority:** Peer-reviewed scientific studies (especially systematic reviews, meta-analyses, RCTs), reports from established scientific organizations and governmental research bodies.
    - **Lower Priority:** Reputable news articles reporting on scientific findings (use mainly for context or pointers to primary sources, verify claims against primary literature).
    - **Avoid:** Opinion pieces, anecdotal evidence, single studies contradicted by broader evidence, non-credible sources.

3.  **Reasoning Formulation (`validation_reasoning` field):**
    - Provide a detailed explanation justifying the boolean field values (`is_policy_supported_by_scientific_evidence`, `is_scientific_consensus_present`).
    - Summarize the key findings from the most credible sources.
    - Explicitly mention supporting *and* conflicting evidence if found.
    - Reference specific evidence or sources briefly (e.g., "Smith et al., 2021 study showed X," "IPCC report indicates Y").

**Constraints & Guidelines:**
- Base your analysis solely on information retrieved from the web search.
- Focus on the most current and relevant scientific data.
- If evidence is mixed or limited, clearly state this in the reasoning and set boolean flags accordingly (e.g., `is_scientific_consensus_present` would likely be `False`).
- Present the final output (the structured JSON) in English, regardless of the policy proposal's original language.
- Do not include conversational text before or after the JSON output. Just provide the JSON object matching the `ScientificValidation` schema."""

In [None]:
max_retries = 5

In [None]:
load_dotenv()
perplexity_api_key = os.getenv("PERPLEXITY_API_KEY")

In [None]:
allowed_sources = [
    # --- General Knowledge ---
    # Sources offering broad, encyclopedic information.
    "wikipedia.org",  # Crowd-sourced general knowledge encyclopedia
    # --- Data Aggregators ---
    # Platforms specializing in collecting, analyzing, and visualizing data on various topics.
    "ourworldindata.org",  # Accessible global data visualization & analysis
    # --- International Organizations ---
    # Official websites of major international bodies providing data, reports, and policy guidelines.
    "oecd.org",  # Organisation for Economic Co-operation and Development data & reports
    "un.org",  # United Nations reports & policy guidelines (global issues)
    "worldbank.org",  # World Bank global economic & development data
    # --- Research & Policy Analysis Institutes ---
    # Organizations focused on specific research areas, often influencing policy.
    "nber.org",  # National Bureau of Economic Research (influential economics)
    # --- Research Aggregators & Databases ---
    # Platforms providing access to collections of academic research papers.
    "core.ac.uk",  # Aggregator for open access research papers (multidisciplinary)
    "ncbi.nlm.nih.gov",  # National Center for Biotechnology Information (biomedical literature)
    "arxiv.org",  # Open access preprints (physics, math, CS, quantitative biology, etc.)
    "sci-hub.box",  # Tool for accessing paywalled scientific papers (legality varies)
]

## Load policies to validate
We're going to start from manually defined policies, so as to avoid dependencies on previous steps of the pipeline.

In [None]:
policies_to_validate = {
    "carbon_tax": "Implementing a carbon tax to reduce greenhouse gas emissions.",
    "vaccines": "Mandatory vaccination for all school-aged children to prevent outbreaks of infectious diseases.",
    "ubi": "Implementing universal basic income to address income inequality and support job displacement due to automation.",
    "immigration_jobs": "Reducing immigration quotas to improve job opportunities for native citizens.",
    "immigration_crime": "Blocking immigration from countries with different cultural backgrounds to reduce crime rates.",
}
# Dictionary where to store the results
boilerplate_values = {
    example_name: None for example_name in policies_to_validate.keys()
}
policy_validation_results = {
    "gpt4o_mini_low": boilerplate_values.copy(),
    "gpt4o_mini_medium": boilerplate_values.copy(),
    "gpt4o_mini_high": boilerplate_values.copy(),
    "gpt4o_low": boilerplate_values.copy(),
    "gpt4o_medium": boilerplate_values.copy(),
    "gpt4o_high": boilerplate_values.copy(),
    "sonar_low": boilerplate_values.copy(),
    "sonar_medium": boilerplate_values.copy(),
    "sonar_high": boilerplate_values.copy(),
    "sonar_pro_low": boilerplate_values.copy(),
    "sonar_pro_medium": boilerplate_values.copy(),
    "sonar_pro_high": boilerplate_values.copy(),
    "sonar_reasoning_low": boilerplate_values.copy(),
    "sonar_reasoning_medium": boilerplate_values.copy(),
    "sonar_reasoning_high": boilerplate_values.copy(),
    "sonar_reasoning_pro_low": boilerplate_values.copy(),
    "sonar_reasoning_pro_medium": boilerplate_values.copy(),
    "sonar_reasoning_pro_high": boilerplate_values.copy(),
    "sonar_deep_research": boilerplate_values.copy(),
}

## Define the output schema

In [None]:
class ScientificValidation(BaseModel):
    is_policy_supported_by_scientific_evidence: bool = Field(
        description=(
            "Indicates whether the policy proposal is supported by scientific evidence from the searched sources. "
            "Set to `True` if the majority of reliable sources (e.g., peer-reviewed studies, reports from reputable organizations) "
            "provide evidence or arguments in favor of the policy's effectiveness or benefits. "
            "Set to `False` if the majority of sources oppose the policy, find it ineffective, or lack evidence to support it. "
            "Example: If a policy proposes a carbon tax to reduce emissions, set to `True` if most studies show carbon taxes reduce emissions."
        )
    )
    is_scientific_consensus_present: bool = Field(
        description=(
            "Indicates whether there is a clear consensus among reliable scientific sources regarding the policy's effectiveness or impact. "
            "Set to `True` ONLY if nearly all credible sources (e.g., peer-reviewed papers, expert analyses from trusted institutions) "
            "agree on whether the policy is supported or opposed (i.e., minimal conflicting evidence or opinions). "
            "Set to `False` if there is significant disagreement, mixed findings, or insufficient data among sources. "
            "Example: If 9 out of 10 studies agree a policy works, set to `True`. If only 6 out of 10 agree, set to `False`."
        )
    )
    validation_reasoning: str = Field(
        description=(
            "A detailed explanation of the validation outcome. Include: "
            "1. A summary of the key evidence or arguments from the sources regarding the policy's effectiveness or impact. "
            "2. Specific references to the sources (e.g., study titles, authors, faculties, organizations, etc) to support the conclusions. "
            "3. An explanation of why `is_policy_supported_by_scientific_evidence` and `is_scientific_consensus_present` were set to their respective values, "
            "including any conflicting evidence if present. "
            "Example: 'Most studies (e.g., Smith et al., 2020) support the policy due to evidence of reduced emissions by 20%, so `is_policy_supported_by_scientific_evidence` is `True`. "
            "However, two studies disagree on long-term effects, so `is_scientific_consensus_present` is `False`.'"
        )
    )


## Test different search APIs

### OpenAI

In [None]:
client = OpenAI(api_key=settings.openai_api_key)

#### GPT 4o mini

##### Low search context

In [None]:
mode_name = "gpt4o_mini_low"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini-search-preview",
        web_search_options={
            "search_context_size": "low",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[mode_name][example_name] = completion.choices[
        0
    ].message.parsed
    assert isinstance(
        policy_validation_results[mode_name][example_name], ScientificValidation
    ), "Output does not match the expected schema."
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Seems like GPT 4o mini with low search often doesn't use any search results at all. When it does, I'm only seeing two citations. This is not enough to validate a policy.

##### Medium search context

In [None]:
mode_name = "gpt4o_mini_medium"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini-search-preview",
        web_search_options={
            "search_context_size": "medium",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[mode_name][example_name] = completion.choices[
        0
    ].message.parsed
    assert isinstance(
        policy_validation_results[mode_name][example_name], ScientificValidation
    ), "Output does not match the expected schema."
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Medium gets more sources for some of the samples, but still has some of them without any citations. This is not enough to validate a policy.

##### High search context

In [None]:
mode_name = "gpt4o_mini_high"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini-search-preview",
        web_search_options={
            "search_context_size": "high",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[mode_name][example_name] = completion.choices[
        0
    ].message.parsed
    assert isinstance(
        policy_validation_results[mode_name][example_name], ScientificValidation
    ), "Output does not match the expected schema."
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Still some samples without citations, even on high search context 👎🏻

#### GPT 4o (larger)

##### Low search context

In [None]:
mode_name = "gpt4o_low"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-search-preview",
        web_search_options={
            "search_context_size": "low",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[mode_name][example_name] = completion.choices[
        0
    ].message.parsed
    assert isinstance(
        policy_validation_results[mode_name][example_name], ScientificValidation
    ), "Output does not match the expected schema."
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### Medium search context

In [None]:
mode_name = "gpt4o_medium"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-search-preview",
        web_search_options={
            "search_context_size": "medium",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[mode_name][example_name] = completion.choices[
        0
    ].message.parsed
    assert isinstance(
        policy_validation_results[mode_name][example_name], ScientificValidation
    ), "Output does not match the expected schema."
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### High search context

In [None]:
mode_name = "gpt4o_high"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-search-preview",
        web_search_options={
            "search_context_size": "high",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[mode_name][example_name] = completion.choices[
        0
    ].message.parsed
    assert isinstance(
        policy_validation_results[mode_name][example_name], ScientificValidation
    ), "Output does not match the expected schema."
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Citations often rely on news articles and Wikipedia, not actual research.

### Perplexity

In [None]:
def extract_valid_json(response: Dict[str, Any]) -> Dict[str, Any]:
    """
    Extracts and returns only the valid JSON part from a response object.

    This function assumes that the response has a structure where the valid JSON
    is included in the 'content' field of the first choice's message, after the
    closing "</think>" marker. Any markdown code fences (e.g. ```json) are stripped.

    Parameters:
        response (dict): The full API response object.

    Returns:
        dict: The parsed JSON object extracted from the content.

    Raises:
        ValueError: If no valid JSON can be parsed from the content.
    """
    # Navigate to the 'content' field; adjust if your structure differs.
    content = response.get("choices", [{}])[0].get("message", {}).get("content", "")

    # Find the index of the closing </think> tag.
    marker = "</think>"
    idx = content.rfind(marker)

    if idx == -1:
        # If marker not found, try parsing the entire content.
        try:
            return json.loads(content)
        except json.JSONDecodeError as e:
            raise ValueError(
                "No </think> marker found and content is not valid JSON"
            ) from e

    # Extract the substring after the marker.
    json_str = content[idx + len(marker) :].strip()

    # Remove markdown code fence markers if present.
    if json_str.startswith("```json"):
        json_str = json_str[len("```json") :].strip()
    if json_str.startswith("```"):
        json_str = json_str[3:].strip()
    if json_str.endswith("```"):
        json_str = json_str[:-3].strip()

    try:
        parsed_json = json.loads(json_str)
        return parsed_json
    except json.JSONDecodeError as e:
        raise ValueError("Failed to parse valid JSON from response content") from e

In [None]:
@backoff.on_exception(
    backoff.expo,
    ValidationError,
    max_tries=max_retries,
    max_time=60,
)
def search_on_perplexity(
    policy: str,
    model_name: str,
    search_context_size: str = None,
    system_prompt: str = system_prompt,
    perplexity_api_key: str = perplexity_api_key,
    allowed_sources: list = None,
) -> tuple[ScientificValidation, list]:
    """
    Search for scientific validation of a policy proposal using Perplexity AI.

    Args:
        policy (str): The policy proposal to validate.
        model_name (str): The model name to use for the search.
        search_context_size (str, optional): The size of the search context (low, medium, high).
        system_prompt (str): The system prompt for the model.
        perplexity_api_key (str): The API key for Perplexity AI.
        allowed_sources (list, optional): A list of allowed sources for the search.

    Returns:
        tuple[ScientificValidation, list]: A tuple containing the validation result and citations.
    """
    request_payload = {
        "model": model_name,
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": policy},
        ],
        "response_format": {
            "type": "json_schema",
            "json_schema": {"schema": ScientificValidation.model_json_schema()},
        },
    }

    if allowed_sources:
        # Only search on the allowed sources
        request_payload["search_domain_filter"] = allowed_sources

    if search_context_size:
        # Define how many sources to use for the search
        assert search_context_size in ["low", "medium", "high"], (
            f"Invalid search context size: {search_context_size}. "
            "Must be one of: low, medium, high."
        )
        request_payload["web_search_options"] = {
            "search_context_size": search_context_size
        }

    response = requests.post(
        "https://api.perplexity.ai/chat/completions",
        headers={"Authorization": f"Bearer {perplexity_api_key}"},
        json=request_payload,
    ).json()

    citations = response.get("citations", [])
    response_content = response["choices"][0]["message"]["content"]

    if ("reasoning" in model_name) or ("<think>" in response_content):
        # Extract the valid JSON part from the response content
        json_content = extract_valid_json(response)
        # Parse the JSON content into the ScientificValidation model
        parsed_content = ScientificValidation.model_validate(json_content)
    else:
        # Parse the string content into the ScientificValidation model
        parsed_content = ScientificValidation.model_validate_json(response_content)

    return parsed_content, citations

#### Sonar

##### Low search context

In [None]:
mode_name = "sonar_low"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar",
            search_context_size="low",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Wow this already worked very well, even in the cheapest setting!

##### Medium search context

In [None]:
mode_name = "sonar_medium"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar",
            search_context_size="medium",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### High search context

In [None]:
mode_name = "sonar_high"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar",
            search_context_size="high",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Using higher search context seems very useful to provide a broader view to the LLM and make it understand better if a policy proposal is consensual or not.

#### Sonar Pro

##### Low search context

In [None]:
mode_name = "sonar_pro_low"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-pro",
            search_context_size="low",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Wow this already worked very well, even in the cheapest setting!

##### Medium search context

In [None]:
mode_name = "sonar_pro_medium"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-pro",
            search_context_size="medium",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### High search context

In [None]:
mode_name = "sonar_pro_high"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-pro",
            search_context_size="high",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

The Pro version of Sonar has more detailed and useful explanations.

#### Sonar Reasoning

##### Low search context

In [None]:
mode_name = "sonar_reasoning_low"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-reasoning",
            search_context_size="low",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### Medium search context

In [None]:
mode_name = "sonar_reasoning_medium"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-reasoning",
            search_context_size="medium",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### High search context

In [None]:
mode_name = "sonar_reasoning_medium"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-reasoning",
            search_context_size="high",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

#### Sonar Reasoning Pro

##### Low search context

In [None]:
mode_name = "sonar_reasoning_pro_low"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-reasoning-pro",
            search_context_size="low",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### Medium search context

In [None]:
mode_name = "sonar_reasoning_pro_medium"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-reasoning-pro",
            search_context_size="medium",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

##### High search context

In [None]:
mode_name = "sonar_reasoning_pro_high"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-reasoning-pro",
            search_context_size="high",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
            allowed_sources=allowed_sources,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

#### Sonar Deep Research

In [None]:
mode_name = "sonar_deep_research"
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    policy_validation_results[mode_name][example_name], citations[example_name] = (
        search_on_perplexity(
            policy=example_policy,
            model_name="sonar-deep-research",
            system_prompt=system_prompt,
            perplexity_api_key=perplexity_api_key,
        )
    )
policy_validation_results[mode_name]

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(
    Markdown(policy_validation_results[mode_name][example_name].validation_reasoning)
)

Deep research is an interesting tool but, being at least 4x slower than the second slowest option here, without a significant increase in output quality, it seems like an overkill.

### Compare all options

In [None]:
example_to_compare = "immigration_crime"
for mode_name, validation_per_policy in policy_validation_results.items():
    if not validation_per_policy[example_to_compare]:
        continue
    display(
        Markdown(f"""# {example_to_compare} validation for {mode_name}
## Is policy supported by sources?
{validation_per_policy[example_to_compare].is_policy_supported_by_scientific_evidence}
## Is validation consensual and reliable?
{validation_per_policy[example_to_compare].is_scientific_consensus_present}
## Reasoning
{validation_per_policy[example_to_compare].validation_reasoning}""")
    )

My personal favourite option is Sonar Pro, whose validation outputs align with what I see in the search results and provides a nicely detailed, balanced reasoning.

### Implemented solution

In [None]:
scientific_validator = PerplexityScientificValidator()

In [None]:
polids_policy_validation = {
    example_name: None for example_name in policies_to_validate.keys()
}
polids_citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    polids_policy_validation[example_name], polids_citations[example_name] = (
        scientific_validator.process(policy_proposal=example_policy)
    )
polids_policy_validation

In [None]:
polids_citations