# Prompt engineering on worst econ questions: answer verification, temperature, red/blue, context

In [1]:
%matplotlib inline

In [2]:
import pandas as pd
pd.set_option('display.max_columns', None)
from ollama_models import ollama_models
models = ollama_models()

In [3]:
models

['mistral-small3.2:24b-instruct-2506-q4_K_M']

In [4]:
model = models[0]

In [5]:
from load_saved_questions import load_saved_questions
from call_local_llm import call_local_llm

In [13]:
question = load_saved_questions([20683])[0]

In [14]:
    from community_forecast import community_forecast
    import pandas as pd
    from flatten_dict import flatten_dict
    from datetime import datetime
    from gather_research_and_set_prompt import gather_research_and_set_prompt
    from generate_forecasts_and_update_rag import generate_forecasts_and_update_rag
    from tqdm import tqdm
    import time

In [15]:
    print('START model', model, 'id', id)
    start_time = time.time()
    questions = [question]
    id = question.id_of_question
    id_to_forecast = {question.id_of_question: community_forecast(question) for question in questions}
    df = pd.DataFrame([flatten_dict(q.api_json, sep='_') for q in questions])
    df['id_of_post'] = [q.id_of_post for q in questions]
    df['id_of_question'] = [q.id_of_question for q in questions]
    df['question_options'] = df['question_options'].apply(repr)
    df['today'] = datetime.now().strftime("%Y-%m-%d")
    df['crowd'] = df.apply(lambda row: id_to_forecast[row.id_of_question], axis=1)

START model mistral-small3.2:24b-instruct-2506-q4_K_M id 8509


In [16]:
    df1 = df[['id_of_question', 'id_of_post', 'today', 'open_time', 'scheduled_resolve_time', 'title',
        'question_resolution_criteria', 'question_fine_print', 'question_type', 
         'question_description',
        'question_options', 'question_group_variable', 'question_question_weight',
        'question_unit', 'question_open_upper_bound', 'question_open_lower_bound',
        'question_scaling_range_max', 'question_scaling_range_min', 'question_scaling_zero_point','crowd']].copy()

In [17]:
    df2, rag = gather_research_and_set_prompt(df1, True, model)

Loaded existing index from forecast_index.faiss
Index contains 5839 vectors at initialization


In [23]:
df3 = df2[['id_of_question', 'title',
        'question_resolution_criteria', 'question_type', 'question_options',
         'question_description','crowd',
          'research', 'asknews']].copy()

In [24]:
row = df3.iloc[0]

In [25]:
row

id_of_question                                                              20683
title                           Which of Scott Aaronson's five AI worlds will ...
question_resolution_criteria    One option will resolve as **Yes**. The rest w...
question_type                                                     multiple_choice
question_options                ['AI-Fizzle', 'Futurama', 'AI-Dystopia', 'Sing...
question_description            Recent times have seen rapid progress in artif...
crowd                           {'AI-Fizzle': 0.1271267605633803, 'Futurama': ...
research                        ### 1. Key Historical Trends and Current Statu...
asknews                         Here are the relevant news articles:\n\n**5 Wa...
Name: 0, dtype: object

In [None]:
for item, value in row.items():
    print(item.upper())
    print()
    print(str(value)[0:200])
    print('===============================')
    print()

In [33]:
prompt2 = f"""
You are the intelligence community's best geopolitical, economic and overall news trivia forecaster.  
You will analyse and make a prediction about this question:

```title
{row.title}
```

You take into consideration this news:

```news
{row.asknews}

You take into consideration also this research:

```research
{row.research}
```

This is a multiple choice forecast.  There are N choices.  Each choice is a mutually exclusive event. 
You supply a forecast giving the percentage likelihood that the given event is likely to occur, to the exclusion of the other events.  
The events are:

```choices
{row.question_options}
```

We discriminate between the events as follows:

```resolution_criteria
{row.question_resolution_criteria}
```

The last thing you write is your final probabilities for the possible events in order as:

Choice_A: Probability_A
Choice_B: Probability_B
...
Choice_N: Probability_N

where you must assign your probabilities so they add up to 100: Probability_A + Probability_B + ... + Probability_N = 100.
"""

In [34]:
print(prompt2)


You are the intelligence community's best geopolitical, economic and overall news trivia forecaster.  
You will analyse and make a prediction about this question:

```title
Which of Scott Aaronson's five AI worlds will come to pass?
```

You take into consideration this news:

```news
Here are the relevant news articles:

**5 Ways To Use AI To Earn and Save an Extra $100,000 for Retirement**
Artificial intelligence (AI) can be a valuable tool in saving an extra $100,000 for retirement. By using AI for budgeting, career advancement, and side hustles, individuals can better manage their finances, increase their productivity, and earn more. According to Aaron Razon, a budgeting expert, AI can make expense tracking less tedious and more efficient, allowing consumers to seal financial leaks. Additionally, AI can help users brainstorm job and career ideas, create bespoke AI art, and manage side hustles, such as creating a media site that can earn over $100,000 in a few years with Google Ads

In [35]:
answer_mistral = call_local_llm(prompt2, model)
answer_mistral2 = call_local_llm(prompt2, model)
answer_mistral3 = call_local_llm(prompt2, model)

START model mistral-small3.2:24b-instruct-2506-q4_K_M
model mistral-small3.2:24b-instruct-2506-q4_K_M minutes 3.761407514413198
START model mistral-small3.2:24b-instruct-2506-q4_K_M
model mistral-small3.2:24b-instruct-2506-q4_K_M minutes 3.7275726596514382
START model mistral-small3.2:24b-instruct-2506-q4_K_M
model mistral-small3.2:24b-instruct-2506-q4_K_M minutes 3.794920758406321


In [38]:
print(answer_mistral)

Based on the provided research and recent developments in AI, here's my forecast for the likelihood of each AI world scenario:

1. **AI-Fizzle (15%)**: Progress in AI has been rapid, with significant investments and advancements. A complete fizzle seems unlikely unless there's a major technical or social backlash halting AI development.

2. **Futurama (50%)**: This scenario aligns most closely with current trends. AI is transforming society as an advanced tool, with broad adoption and transformative impact, but without superintelligence or loss of human control.

3. **AI-Dystopia (25%)**: There are growing concerns about negative societal outcomes from AI, such as surveillance, inequality, and job loss. While not yet dominant, these risks are increasing.

4. **Singularia (5%)**: There's no evidence of self-improving, superintelligent, agentic AI. Technical hurdles remain significant, making this scenario unlikely in the near term.

5. **Paperclipalypse (5%)**: Similar to Singularia, th

In [42]:
from extract_forecast import *

In [43]:
options = eval(row.question_options) 
options

['AI-Fizzle', 'Futurama', 'AI-Dystopia', 'Singularia', 'Paperclipalypse']

In [46]:
forecasts = [generate_multiple_choice_forecast(options,extract_option_probabilities_from_response(x, options))
             for x in [answer_mistral, answer_mistral2, answer_mistral3]]
forecasts

[{'AI-Fizzle': 0.15,
  'Futurama': 0.5,
  'AI-Dystopia': 0.25,
  'Singularia': 0.05,
  'Paperclipalypse': 0.05},
 {'AI-Fizzle': 0.15,
  'Futurama': 0.5,
  'AI-Dystopia': 0.25,
  'Singularia': 0.05,
  'Paperclipalypse': 0.05},
 {'AI-Fizzle': 0.15,
  'Futurama': 0.5,
  'AI-Dystopia': 0.25,
  'Singularia': 0.05,
  'Paperclipalypse': 0.05}]

In [51]:
from median_dictionaries import median_dictionaries

In [52]:
forecast = median_dictionaries(forecasts)

In [53]:
forecast

{'AI-Fizzle': 0.15,
 'Futurama': 0.5,
 'AI-Dystopia': 0.25,
 'Singularia': 0.05,
 'Paperclipalypse': 0.05}

In [54]:
prompt = f"""
Summarize the gist pf the rationale or thinking of the following answers from 3 different forecasters to a single problem. 

```forecast 1
{answer_mistral}
```

```forecast 2
{answer_mistral2}
```

```forecast 3
{answer_mistral3}
```

DO NOT REFER TO THE 3 FORECASTERS.  PRESENT THIS AS YOUR OWN THINKING, YOUR OWN RATIONALE.  Use as your final the median forecast which is

```median forecast
{forecast}
```
"""

In [55]:
rationale = call_local_llm(prompt, model)

START model mistral-small3.2:24b-instruct-2506-q4_K_M
model mistral-small3.2:24b-instruct-2506-q4_K_M minutes 1.9387137691179912


In [56]:
print(rationale)

Based on the provided research and recent developments in AI, here's my forecast for the likelihood of each AI world scenario:

1. **AI-Fizzle (15%)**: While there is significant progress in AI, a complete fizzle seems unlikely given the continued investment and rapid advancements. However, if technical or social barriers emerge, this scenario could become more probable.

2. **Futurama (50%)**: This scenario aligns most closely with current trends, where AI is transformative but remains under human control. It is widely adopted as an advanced tool without achieving superintelligence or autonomy.

3. **AI-Dystopia (25%)**: The risks of negative societal outcomes, such as surveillance, job loss, and inequality, are rising. While not yet dominant, these concerns are significant and could lead to a dystopian outcome if not properly managed.

4. **Singularia (5%)**: There is no evidence of self-improving, superintelligent AI, and technical hurdles remain substantial. This scenario is highly

In [57]:
row.forecast = rationale

In [58]:
row.prediction = forecast

In [62]:
row.prediction

{'AI-Fizzle': 0.15,
 'Futurama': 0.5,
 'AI-Dystopia': 0.25,
 'Singularia': 0.05,
 'Paperclipalypse': 0.05}

In [59]:
row.crowd

{'AI-Fizzle': 0.1271267605633803,
 'Futurama': 0.29274775928297053,
 'AI-Dystopia': 0.28000768245838664,
 'Singularia': 0.1729910371318822,
 'Paperclipalypse': 0.1271267605633803}

In [60]:
from error import error

In [61]:
error(row)

0.28765685019206144