# Automating the Modelling of Transformative Artificial Intelligence

Risks

An Epistemic Framework for Leveraging Frontier AI Systems to Upscale Conditional Policy Assessments in Bayesian Networks on a Narrow Path towards Existential Safety

Valentin Jakob Meyer [](https://orcid.org/0009-0006-0889-5269) (University of Bayreuth, MCMP — LMU Munich)  
Dr. Timo Speith [](https://orcid.org/0000-0002-6675-154X) (University of Bayreuth)  
May 26, 2025

This thesis addresses coordination problems in AI safety by creating tools that automatically extract and formalize the assumptions and models underlying different approaches to AI governance. By transforming complex arguments into interactive visualizations showing relationships and probabilities, AMTAIR helps bridge communication gaps between technical researchers, policy specialists, and other stakeholders working to address risks from advanced AI.

# 1. Table of Contents

# Abstract

The coordination crisis in AI governance presents a paradoxical challenge: unprecedented investment in AI safety coexists alongside fundamental coordination failures across technical, policy, and ethical domains. These divisions systematically increase existential risk by creating safety gaps, misallocating resources, and fostering inconsistent approaches to interdependent problems. This thesis introduces AMTAIR (Automating Transformative AI Risk Modeling), a computational approach that addresses this coordination failure by automating the extraction of probabilistic world models from AI safety literature using frontier language models.

The AMTAIR system implements an end-to-end pipeline that transforms unstructured text into interactive Bayesian networks through a novel two-stage extraction process: first capturing argument structure in ArgDown format, then enhancing it with probability information in BayesDown. This approach bridges communication gaps between stakeholders by making implicit models explicit, enabling comparison across different worldviews, providing a common language for discussing probabilistic relationships, and supporting policy evaluation across diverse scenarios.

## 1.1 Figure Testing

In [None]:
# [ ]: Configure the Quarto Manuscript options: https://quarto.org/docs/manuscripts/components.html#including-notebooks

## 1.2 Grading

### 1.2.1 Research (10%)

-   demonstrates knowledge of the subject area as drawn from appropriate sources
-   incorporates insights from in-class discussions
-   draws on appropriate additional materials beyond those covered in class (primary as well as secondary sources)
-   covers relevant material at appropriate level of detail

## 1.3 Structure (10%)

-   outlines project clearly in the introduction

-   discussion follows a logical order

-   order of sections corresponds to outline

-   uses appropriate topic and transition sentences

-   employs proper signposting and cross-referencing throughout paper

-   sections are appropriately numbered and headlined

## 1.4 Callout Test — Language & Style (10%)

-   employs appropriate tone and academic language

-   uses effective and sophisticated sentence variety, diction, and vocabulary

-   employs correct English spelling and grammar

-   is clearly written and uses appropriate sentence complexity

-   communicates main points effectively / is easy to follow

-   formats citations and references correctly and consistently (e.g. (AUTHOR, YEAR) citation style)

-   names all primary and secondary sources

-   includes a complete list of references with full bibliographic details

More text

# 2. Introduction

## 2.1 Introduction

10% of Grade:

-   introduces and motivates the core question or problem
-   provides context for discussion (places issue within a larger debate or sphere of relevance)
-   states precise thesis or position the author will argue for
-   provides roadmap indicating structure and key content points of the essay

~ 14% of text ~ 4200 words

-   introduces and motivates the core question or problem

<!-- [![Caption/Title 2](/images/cover.png){#fig-testgraphic2 fig-scap="Short 2 caption" fig-alt="2nd Alt Text / Description." fig-align="left" width="30%"}](https://vjmeyer.com) -->

<figure id="fig-testgraphic2">
<img src="attachment:thesis_files/figure-ipynb/467cfe0e-9efe-4272-9978-f71e8dbe2b7f-1-../images/cover.png" />
<figcaption>Figure 1: Caption/Title 2</figcaption>
</figure>

Testing crossreferencing grapics <a href="#fig-testgraphic2" class="quarto-xref">Figure 1</a>.

<!-- \listoffigures -->

## 2.2 Motivation: Problem Statement

## 2.3 Motivation: Research Question

-   provides context for discussion (places issue within a larger debate or sphere of relevance)

## 2.4 Scope: Aim & Context of the Research

## 2.5 Significance of the Research: Theory of Change

• states precise thesis or position the author will argue for

## 2.6 Thesis Statement & Position: (Aim of the Paper)

-   provides roadmap indicating structure and key content points of the essay

## 2.7 Overview: Structure & Approach of the Paper (Roadmap — Theory of Change)

## 2.8 Table of Contents

## 2.9 Problem Statement — Motivation

Continued AI Progress:

-   Rapid advancements in AI technology increase both potential benefits and risks.

Existential Risks (AI X-Risk):

-   Advanced AI systems could pose significant threats if misaligned with human values.

Complexity Challenges:

-   The intricate nature of AI systems complicates policy formulation and understanding.

Limitations of Current Approaches:

-   MTAIR’s Reliance on Human Labor:
    -   Modeling Transformative AI Risks (MTAIR) is constrained by manual cognitive efforts.  
-   Need for Automation:
    -   Scaling and automating risk modeling is essential to keep pace with AI developments.

Opportunity:

-   Leveraging new technologies to enhance our ability to model and mitigate AI risks.

## 2.10 Aim of the Paper

### 2.10.1 Research Question & Scope

#### 2.10.1.1 Can frontier AI technologies be utilized to automate the modeling of transformative AI risks, so as to allow for the prediction of policy impacts?

Frontier AI Technology: Today’s most capable AI systems (e.g. GPT4 level LLMs)  
Scaling Up: Automating the previously “manual” cognitive labor  
Modeling: Formalizing the world views underlying arguments  
Transformative AI: Level of AI capabilities defined by severe impact on the world  
Safety & Governance Literature: Publications, reports etc. concerned with risks from AI

Automated Estimation: Non-manual (AI systems + scaffolding), quantified evaluations  
Probability Distributions: Formal expressions of the expectations over future worlds  
Conditional Trees of Possible Worlds: “If … then…” reasoning over ways things may play out  
Forecasting Policy Impacts: Qualitative & quantitative evaluation of expected outcomes

### 2.10.2 Significance of the Research

### 2.10.3 

## 2.11 Theory of Change — Approach & Structure of the Paper

Multiplicative Benefits:

-   Automation × Live Prediction Market Integrations × Policy Impact Evaluations

Explanation:  
Automation:

-   Increases efficiency and scalability of risk modeling.

    Live Prediction Markets:

-   Provides up-to-date, collective intelligence to inform models.

    Policy Impact Evaluations:

-   Improves the accuracy and relevance of policy assessments.

Outcome:

-   Enhanced ability to develop effective policies that mitigate AI risks.

Visual Aid:

-   A diagram illustrating how each component amplifies the others, leading to greater overall impact.

## 2.12 

## 2.13 Overview / Table of Contents

# 3. Context

### 3.0.1 20% of Grade:

-   demonstrates understanding of all relevant core concepts

-   explains why the question/thesis/problem is relevant in student’s own words (supported by quotations)

-   situates it within the debate/course material

-   reconstructs selected arguments and identifies relevant assumptions

-   describes additional relevant material that has been consulted and integrates it with the course material as well as the research question/thesis/problem

~ 29% of text ~ 8700 words

1.  successively (chunk my chunk) introduce concepts/ideas — and 2. ground each with existing literature

<!-- ---
title: "Background"
# Control if this file starts numbering
numbering:
  start-at: 2      # Start at Section 1
  level: 2         # Chapter level
--- -->

## 3.1 Theoretical Background Considerations

### 3.1.1 DAG / BayesNets

### 3.1.2 State of the art (MTAIR) — Explanation

#### 3.1.2.1 Carlsmith Model (Analytica)

### 3.1.3 (Intro) Example — Rain/Sprinkler/Lawn

/ Rain/Sprinkler/Lawn DAG / BayesNet — Extended Example

    …

Own Position/Argument: AMTAIR … Own Rain/Sprinkler/Lawn DAG / BayesNet Implementation

<!-- ---
title: "Methodology"
# Control if this file starts numbering
numbering:
  start-at: 2      # Start at Section 1
  level: 2         # Chapter level
--- -->

## 3.2 Methodology

MTAIR / Carlsmith Model (Analytica) — Explanation (— is motivation: should come first)

### 3.2.1 Kialo

### 3.2.2 Rain/Sprinkler/Lawn DAG

### 3.2.3 BayeServer

### 3.2.4 BayesNet — Extended Example

### 3.2.5 Code + documentation

# 4. AMTAIR

### 4.0.1 20% of Grade: ~ 29% of text ~ 8700 words

-   provides critical or constructive evaluation of positions introduced
-   develops strong (plausible) argument in support of author’s own position/thesis
-   argument draws on relevant course material claim/argument
-   demonstrate understanding of the course materials incl. key arguments and core concepts within the debate
-   claim/argument is original or insightful, possibly even presents an original contribution to the debate

## 4.1 Own Carlsmith Model Implementation — Explanation

## 4.2 Own Implementation: Good example from a published paper

## 4.3 Implementation

TestText

## 4.4 Results

TestText

<!-- No Headings after .md inclusion (creates a fatal bug with the ToC) -->

# 5. Discussion

## 5.1 Discussion

10% of Grade: ~ 14% of text ~ 4200 words

-   discusses a specific objection to student’s own argument
-   provides a convincing reply that bolsters or refines the main argument
-   relates to or extends beyond materials/arguments covered in class

# 6. Discussion — Exchange, Controversy & Influence

## 6.1 Challenges & Problems — Red Teaming Problems, Failures & Downsides

    Potential Failures:

-   Data Issues: Inaccurate or biased inputs.

-   Model Limitations: Oversimplifications.

-   Tech Risks: AI misinterpretations.

    Red Teaming:

-   Stress-testing models to find weaknesses.

    Impact on Theory of Change:

-   Identifying points of failure strengthens the approach.

## 6.2 Implications & Impact — Uptake, Feedback Loops, Uptake & Success – Green Teaming –

    Potential Outcomes:

-   First-Order: Reduced AI risks through better policies.

-   Second-Order: Enhanced collaboration.

-   Third-Order: Framework applied to other global risks.

    Feedback Loops:

-   Continuous model improvement.

-   Adaptive policy-making.

    Green Teaming:

-   Strategies to maximize positive impacts.

## 6.3 Known Unknowns & Unknown Unknowns — Input Data Example: Modeling Author Worldviews from Bibliographies Instead of Individual Papers

    Potential Outcomes:

-   First-Order: Reduced AI risks through better policies.

-   Second-Order: Enhanced collaboration.

-   Third-Order: Framework applied to other global risks.

    Feedback Loops:

-   Continuous model improvement.

-   Adaptive policy-making.

    Green Teaming:

-   Strategies to maximize positive impacts.

# 7. Conclusion

## 7.1 The Current State of Things & How to Continue

10% of Grade: ~ 14% of text ~ 4200 words

-   summarizes thesis and line of argument
-   outlines possible implications
-   notes outstanding issues / limitations of discussion
-   points to avenues for further research
-   overall conclusion is in line with introduction

## 7.2 Summary — Key Takeaways & Findings

### 7.2.1 Assessing Policy Effects:

Evaluating how different policies alter P(Doom).

### 7.2.2 Conditional Probability:

Calculating P(Doom \| Policy Alpha).

### 7.2.3 Methodology:

Update model parameters based on policy implementation.

Recompute probabilities accordingly.

### 7.2.4 Purpose:

Inform policymakers of potential policy effectiveness.

Prioritize interventions that significantly reduce risks.

## 7.3 Outlook — Outlook & Next Steps / Further Research

### 7.3.1 Scaling Up:

-   Include more variables and data sources.

### 7.3.2 Collaboration:

-   Partner with policymakers and researchers.

### 7.3.3 Technological Enhancements:

-   Employ advanced AI techniques.

### 7.3.4 Potential Impact:

-   Influence global AI governance.

    ### 7.3.5 Limitations of the Analysis

### 7.3.6 Policy Implications & Recommendations

### 7.3.7 Areas for Future Research

### 7.3.8 Open Questions — Central/Remaining Questions & Feedback

#### 7.3.8.1 Questions:

-   How can we improve automation accuracy?  
-   What challenges exist in policy implementation?  
-   How do we mitigate AI model biases?  
-   How can interdisciplinary efforts enhance outcomes?

#### 7.3.8.2 Feedback:

-   Invite thoughts, critiques, and suggestions.

<!-- ## List of Figures {.lof} -->

### 7.3.9 Outlook — Outlook & Next Steps / Further Research

## Bibliography/References

# Prefatory Apparatus: Illustrations and Terminology — Quick References

## MANUAL List of Tables

Table 1: Table name

Table 2: Table name

Table 3: Table name

## MANUAL List of Abbreviations

esp. especially

f., ff. following

incl. including

p., pp. page(s)

MAD Mutually Assured Destruction

## Glossary

term  
Definition of term

Another term  
Description of second term

Text

# 8. Appendices

## 8.1 Appendices

## 8.2 Appendix A

## 8.3 Appendix B

## 8.4 Appendix C

## 8.5 Appendix D

TestText

<!-- ## Affidavit -->

# Notebooks