# Scientific validation of policy proposals
---
Experimenting with web search APIs for scientific validation of policy proposals.

## Setup

### Import libraries

In [None]:
from IPython.display import Markdown, display
from tqdm.auto import tqdm
from pydantic import BaseModel, Field
from openai import OpenAI

In [None]:
from polids.config import settings

### Set parameters

In [None]:
system_prompt = """Search the web for credible scientific context related to a given policy proposal, and determine its scientific validation.

Identify and evaluate sources such as highly cited academic papers, randomized controlled trials (RCTs), and reputable news outlets with scientific grounding. Based on this research, provide a validation outcome and a justification.

# Steps

1. **Identify Keywords**: Extract main concepts and terms from the policy proposal to guide your search.
2. **Conduct Web Search**: Use the identified keywords to search for scientific literature, credible reports, and analyses related to the policy.
3. **Evaluate Sources**: Prioritize sources based on credibility, relevance, and citation count. Look for consensus among multiple credible sources to enhance reliability.
4. **Synthesize Information**: Summarize the findings clearly indicating whether the scientific evidence supports or refutes the proposal.
5. **Conclude Validation**: Determine if the policy is scientifically validated based on gathered evidence.
6. **Provide Reasoning**: Articulate the reasoning based on the findings, citing key evidence.

# Notes

- Regardless of the original language of the proposal, the search should be conducted in English and the results should be presented in English.
- Ensure the evaluation is grounded in current and credible scientific data.
- Consider the strength and consensus of evidence rather than anecdotal or single-study claims.
- If evidence is mixed, provide a balanced view in the reasoning string."""

## Load policies to validate
We're going to start from manually defined policies, so as to avoid dependencies on previous steps of the pipeline.

In [None]:
policies_to_validate = {
    "carbon_tax": "Implementing a carbon tax to reduce greenhouse gas emissions.",
    "vaccines": "Mandatory vaccination for all school-aged children to prevent outbreaks of infectious diseases.",
    "ubi": "Implementing universal basic income to address income inequality and support job displacement due to automation.",
    "immigration_jobs": "Reducing immigration quotas to improve job opportunities for native citizens.",
    "immigration_crime": "Blocking immigration from countries with different cultural backgrounds to reduce crime rates.",
}

## Define the output schema

In [None]:
class ScientificValidation(BaseModel):
    is_validated: bool = Field(
        description="Indicates whether the policy proposal is scientifically validated or not."
    )
    is_validation_consensual_and_reliable: bool = Field(
        description="Indicates whether the validation is based on a consensus of multiple reliable sources."
    )
    reasoning: str = Field(
        description="A detailed explanation of the validation outcome, including key evidence and sources."
    )

## Test different search APIs

### OpenAI

In [None]:
client = OpenAI(api_key=settings.openai_api_key)

#### GPT 4o mini

##### Low search context

In [None]:
policy_validation_results = {
    example_name: None for example_name in policies_to_validate.keys()
}
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini-search-preview",
        web_search_options={
            "search_context_size": "low",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[example_name] = completion.choices[0].message.parsed
    assert isinstance(policy_validation_results[example_name], ScientificValidation), (
        "Output does not match the expected schema."
    )
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(Markdown(policy_validation_results[example_name].reasoning))

Seems like GPT 4o mini with low search often doesn't use any search results at all. When it does, I'm only seeing two citations. This is not enough to validate a policy.

##### Medium search context

In [None]:
policy_validation_results = {
    example_name: None for example_name in policies_to_validate.keys()
}
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini-search-preview",
        web_search_options={
            "search_context_size": "medium",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[example_name] = completion.choices[0].message.parsed
    assert isinstance(policy_validation_results[example_name], ScientificValidation), (
        "Output does not match the expected schema."
    )
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(Markdown(policy_validation_results[example_name].reasoning))

##### High search context

In [None]:
policy_validation_results = {
    example_name: None for example_name in policies_to_validate.keys()
}
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini-search-preview",
        web_search_options={
            "search_context_size": "high",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[example_name] = completion.choices[0].message.parsed
    assert isinstance(policy_validation_results[example_name], ScientificValidation), (
        "Output does not match the expected schema."
    )
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(Markdown(policy_validation_results[example_name].reasoning))

#### GPT 4o (larger)

##### Low search context

In [None]:
policy_validation_results = {
    example_name: None for example_name in policies_to_validate.keys()
}
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-search-preview",
        web_search_options={
            "search_context_size": "low",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[example_name] = completion.choices[0].message.parsed
    assert isinstance(policy_validation_results[example_name], ScientificValidation), (
        "Output does not match the expected schema."
    )
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(Markdown(policy_validation_results[example_name].reasoning))

##### Medium search context

In [None]:
policy_validation_results = {
    example_name: None for example_name in policies_to_validate.keys()
}
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-search-preview",
        web_search_options={
            "search_context_size": "medium",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[example_name] = completion.choices[0].message.parsed
    assert isinstance(policy_validation_results[example_name], ScientificValidation), (
        "Output does not match the expected schema."
    )
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(Markdown(policy_validation_results[example_name].reasoning))

##### High search context

In [None]:
policy_validation_results = {
    example_name: None for example_name in policies_to_validate.keys()
}
citations = {example_name: None for example_name in policies_to_validate.keys()}
for example_name, example_policy in tqdm(policies_to_validate.items()):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-search-preview",
        web_search_options={
            "search_context_size": "high",
        },
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": example_policy,
            },
        ],
        response_format=ScientificValidation,  # Specify the schema for the structured output
    )
    policy_validation_results[example_name] = completion.choices[0].message.parsed
    assert isinstance(policy_validation_results[example_name], ScientificValidation), (
        "Output does not match the expected schema."
    )
    citations[example_name] = completion.choices[0].message.annotations
policy_validation_results

In [None]:
citations

In [None]:
# Example of the reasoning, in an easier to read format
display(Markdown(policy_validation_results[example_name].reasoning))

### Perplexity

#### Sonar

##### Low search context

##### Medium search context

##### High search context

#### Sonar Pro

##### Low search context

##### Medium search context

##### High search context

#### Sonar Reasoning

##### Low search context

##### Medium search context

##### High search context

#### Sonar Reasoning Pro

##### Low search context

##### Medium search context

##### High search context

#### Sonar Deep Research

### Implemented solution