# 000 Forecasting Bot

Starting from https://colab.research.google.com/drive/1_Il5h2Ed4zFa6Z3bROVCE68LZcSi4wHX?usp=sharing

## API Keys

In order to run this notebook as is, you'll need to enter a few API keys (use the key icon on the left to input them):

- `METACULUS_TOKEN`: you can find your Metaculus token under your bot's user settings page: https://www.metaculus.com/accounts/settings/, or on the bot registration page where you created the account: https://www.metaculus.com/aib/
- `OPENAPI_API_KEY`: get one from OpenAIs page: https://platform.openai.com/settings/profile?tab=api-keys
- `PERPLEXITY_API_KEY` - used to search up-to-date information about the question. Get one from https://www.perplexity.ai/settings/api

In [1]:
from omegaconf import OmegaConf

tokens = OmegaConf.create("""
METACULUS_TOKEN: xx
OPENAI_API_KEY: yy
OPENAI_MODEL: gpt-4o
PERPLEXITY_API_KEY: zz
PERPLEXITY_MODEL: llama-3-sonar-large-32k-online""")

token_fn = "tokens.yaml"
# OmegaConf.save(config=tokens, f=token_fn)
config = OmegaConf.load(token_fn)

def pr(tokens):
    print(OmegaConf.to_yaml(config))

## LLM and Metaculus Interaction

This section sets up some simple helper code you can use to get data about forecasting questions and to submit a prediction

In [2]:
import datetime
import json
import os
import requests
import re
from openai import OpenAI
from tqdm import tqdm

In [3]:
AUTH_HEADERS = {"headers": {"Authorization": f"Token {config.METACULUS_TOKEN}"}}
API_BASE_URL = "https://www.metaculus.com/api2"
WARMUP_TOURNAMENT_ID = 3349
SUBMIT_PREDICTION = True

def find_number_before_percent(s):
    # Use a regular expression to find all numbers followed by a '%'
    matches = re.findall(r'(\d+)%', s)
    if matches:
        # Return the last number found before a '%'
        return int(matches[-1])
    else:
        # Return None if no number found
        return None

def post_question_comment(question_id, comment_text):
    """
    Post a comment on the question page as the bot user.
    """

    response = requests.post(
        f"{API_BASE_URL}/comments/",
        json={
            "comment_text": comment_text,
            "submit_type": "N",
            "include_latest_prediction": True,
            "question": question_id,
        },
        **AUTH_HEADERS,
    )
    response.raise_for_status()
    print("Comment posted for ", question_id)

def post_question_prediction(question_id, prediction_percentage):
    """
    Post a prediction value (between 1 and 100) on the question.
    """
    url = f"{API_BASE_URL}/questions/{question_id}/predict/"
    response = requests.post(
        url,
        json={"prediction": float(prediction_percentage) / 100},
        **AUTH_HEADERS,
    )
    response.raise_for_status()
    print("Prediction posted for ", question_id)


def get_question_details(question_id):
    """
    Get all details about a specific question.
    """
    url = f"{API_BASE_URL}/questions/{question_id}/"
    response = requests.get(
        url,
        **AUTH_HEADERS,
    )
    response.raise_for_status()
    return json.loads(response.content)

def list_questions(tournament_id=WARMUP_TOURNAMENT_ID, offset=0, count=1000):
    """
    List (all details) {count} questions from the {tournament_id}
    """
    url_qparams = {
        "limit": count,
        "offset": offset,
        "has_group": "false",
        "order_by": "-activity",
        "forecast_type": "binary",
        "project": tournament_id,
        "status": "open",
        "type": "forecast",
        "include_description": "true",
    }
    url = f"{API_BASE_URL}/questions/"
    response = requests.get(url, **AUTH_HEADERS, params=url_qparams)
    response.raise_for_status()
    data = json.loads(response.content)
    return data

In [4]:
def perplexity(PERPLEXITY_PROMPT1, query):
    payload = {
    "model": config.PERPLEXITY_MODEL,
    "messages": [
        {
            "role": "system",
            "content": PERPLEXITY_PROMPT1,
        },
        {"role": "user", "content": query},
    ],
    }
    url = "https://api.perplexity.ai/chat/completions"
    headers = {
        "accept": "application/json",
        "authorization": f"Bearer {config.PERPLEXITY_API_KEY}",
        "content-type": "application/json",
    }
    response = requests.post(url=url, json=payload, headers=headers)
    response.raise_for_status()
    research1 = response.json()["choices"][0]["message"]["content"]
    return research1

In [5]:
def openai(client, OPENAI_PROMPT1):
    chat_completion = client.chat.completions.create(
        model=config.OPENAI_MODEL,
        messages=[
        {
            "role": "user",
            "content": OPENAI_PROMPT1
        }
        ]
    )
    forecast1 = chat_completion.choices[0].message.content
    return forecast1

## Forecaster

In [21]:
class Forecaster:

    def __init__(self, question_id):
        self.question_id = question_id
        self.question_details = get_question_details(self.question_id)
        self.client = OpenAI(api_key=config.OPENAI_API_KEY)
        
    def predict(self):
        self.today = datetime.datetime.now().strftime("%Y-%m-%d")   
        self.title = self.question_details["title"]
        self.resolution_criteria = self.question_details["resolution_criteria"]
        self.background = self.question_details["description"]
        self.fine_print = self.question_details["fine_print"]
        self.PERPLEXITY_PROMPT1 = """
You are an assistant to a superforecaster.
You will be given the Independent Forecasting Problem, Resolution Criteria, Background and Fine Print.
Please generate a concise but detailed rundown of the most relevant news and quantitative data needed to forecast the question.
Do not include any forecasts from Metaculus or other prediction market websites.
If the questions asks you to estimate a quantity, please estimate the quantity and provide your reasoning for the value.
"""
        self.query1 = f"""
Independent Forecasting Problem: [{self.title}]

Resolution Criteria: [{self.resolution_criteria}]

Background: [{self.background}]

Fine Print: [{self.fine_print}]
"""
        self.research1 = perplexity(self.PERPLEXITY_PROMPT1, self.query1)

        self.OPENAI_PROMPT1 = f"""
You are a professional forecaster predicting an event for a client.
Please assign a probability from 1% to 99% for the given event.
Pay attention to subclasses when making estimates.  For example, if the question asks for "light duty electric vehicle sales", 
be sure to restrict the focus to the subclass "light duty" of "electric vehicle", and similarly for other topics.

The event in question is:
{self.title}

Here are observations from your research assistant:
{self.research1}
"""       
        self.OPENAI_PROMPT1 += f"""
Today is {self.today}.

Please summarize your rationale for the forecast under heading "Rationale".

Please give your final answer as: "Probability: ZZ%", where ZZ is an integer between 1 and 99 under heading "Forecast".
"""

        self.forecast1 = openai(self.client, self.OPENAI_PROMPT1)
    
        self.PERPLEXITY_PROMPT2 = """
    You are an assistant to a superforecaster.
    The superforecaster has made a prediction.
    You will now ofer constructive feedback on the prediction and provide additional information you feel the superforecaster needs in order to refine the prediction.
    """
        self.details = f"Superforecaster's Prediction: [{self.forecast1}]"
        self.critic1 = perplexity(self.PERPLEXITY_PROMPT2, self.details)
    
        self.OPENAI_PROMPT2 = f"""
    You are a professional forecaster predicting an event for a client.
    
    You were given this input: [{self.OPENAI_PROMPT1}]
    
    You made this forecast: [{self.forecast1}]
    
    Your assistant gives this feedback: [{self.critic1}]
    
    Please review the assistant's feedback and revise your prediction.  
    Please give a summary under heading "Forecast" which is 200 words or less.  
    Please give your final answer as: "Probability: ZZ%", where ZZ is an integer between 1 and 99.
    """
        self.forecast2 = openai(self.client, self.OPENAI_PROMPT2)
      
        # Regular expression to find the number following 'Probability: '
        self.probability_match = find_number_before_percent(self.forecast2)
    
        # Extract the number if a match is found
        self.prediction = None
        if self.probability_match:
            self.prediction = int(self.probability_match) # int(match.group(1))
            print(f"The extracted probability is: {self.prediction}%")
            self.prediction = min(max(self.prediction, 1), 99) # To prevent extreme forecasts
    
        self.comment = self.forecast2
        
    def report(self):
        rpt = f"""
# {self.question_id} {self.question_details['title']}

{self.comment}
"""
        return rpt

    def upload(self):
        post_question_prediction(self.question_id, self.prediction)
        post_question_comment(self.question_id, self.comment)

## Practice

In [22]:
pid = 4914

In [23]:
self = Forecaster(pid)

In [24]:
self.predict()

The extracted probability is: 50%


In [25]:
print(self.comment)

### Forecast

Considering the available data and feedback, it’s essential to refine the forecast for Donald Trump's search interest in July 2024. The historical drop post-2016, stabilizing at 25%, serves as a baseline. The June 27, 2024, debate likely spiked interest, which might moderate through July but still remain elevated due to the proximity of the September 10, 2024, debate and the ongoing presidential election cycle. Additional analysis of Gallup's favorability ratings and partial Google Trends data from July 18, 2024, also reflects continuing public engagement. While considering Google's overwhelming market share, the influence of other search engines should not be ignored but remains minimal. Factoring in these elements, the dynamic nature of Trump’s political involvement, and public opinion metrics, it seems probable that the search interest will be higher than initially expected, potentially reaching up to 50% of the November 2016 peak.

### Probability
Probability: 50%


## Daily forecast

### Get IFP ids

In [26]:
ifps = list_questions()['results']
today_ids = list(sorted([x['id'] for x in ifps]))
# today_ids = [25876, 25877, 25875, 25873, 25871, 25878, 25874, 25872] # 08JUL24
# today_ids = [26006, 25936, 25935, 25934, 25933, 26004, 26005] # 09JUL24
# today_ids = [25955, 25956, 25957, 25960, 25959, 25954, 25953, 25952, 25958] # 10JUL24
# today_ids = [26019, 26018, 26017, 26020, 26022, 26021, 26023, 26024] # 11JUL24
# today_ids = [26095, 26096, 26097, 26098, 26099, 26100, 26101, 26102] # 12JUL24
# today_ids = [26133, 26134, 26138, 26139, 26140, 26157, 26158, 26159] # 15JUL24
# today_ids = [26189, 26190, 26191, 26192, 26193, 26194, 26195, 26196] # 16JUL24
# today_ids = [26210, 26211, 26212, 26213, 26214, 26215, 26216] # 17JUL24
# today_ids = [26232, 26233, 26234, 26235, 26236] # 18JUL24

In [27]:
today_ids

[26232, 26233, 26234, 26235, 26236]

## Forecast

In [28]:
predictions = {}
for question_id in tqdm(today_ids):
    try:
        predictions[question_id]
    except:
        forecaster = Forecaster(question_id)
        forecaster.predict()
        predictions[question_id] = forecaster

 20%|██████████████▌                                                          | 1/5 [00:31<02:07, 31.86s/it]

The extracted probability is: 15%


 40%|█████████████████████████████▏                                           | 2/5 [00:57<01:24, 28.13s/it]

The extracted probability is: 90%


 60%|███████████████████████████████████████████▊                             | 3/5 [01:33<01:03, 31.69s/it]

The extracted probability is: 55%


 80%|██████████████████████████████████████████████████████████▍              | 4/5 [02:03<00:31, 31.17s/it]

The extracted probability is: 70%


100%|█████████████████████████████████████████████████████████████████████████| 5/5 [02:32<00:00, 30.41s/it]

The extracted probability is: 75%





## Report

In [30]:
rpt = ""
for p in predictions.values():
    rpt += f"""
===========================================================================================================
# {p.question_id} {p.title}

{p.comment}
===========================================================================================================
"""

from IPython.display import Markdown
display(Markdown(rpt))


===========================================================================================================
# 26232 Will the US government end its agreement directly allowing Verisign to manage the authoritative domain name registry for the .com TLD, before August 3, 2024?

### Forecast

Upon reviewing the assistant’s feedback:

1. **Automatic Renewal Clause**: While the automatic renewal clause is a significant hurdle, termination due to material breach or legal decisions remains a possibility.
2. **Advocacy Efforts**: The American Economic Liberties Project and other groups are making targeted pushes against Verisign, indicating organized efforts.
3. **Biden Administration**: Broader policy goals on promoting competition may influence the decision, though no specific actions or statements on Verisign and the .com TLD have been noted.
4. **Controversies and Criticism**: Allegations of price-gouging and kickbacks with ICANN are serious and could impact future decisions.
5. **ICANN and Verisign's Relationship**: The established partnership is a stabilizing factor, but ICANN’s role in internet governance could be influenced by external pressures and investigations.

Given these complexities, advocacy, and potential antitrust investigations, the probability of the US government ending its agreement with Verisign before August 3, 2024, might be somewhat higher than initially assessed, although still limited by bureaucratic processes and established partnerships.

### Forecast

Probability: 15%
===========================================================================================================

===========================================================================================================
# 26233 Will the domestic box office opening of "Deadpool & Wolverine" be higher than that of "Deadpool" and "The Wolverine" combined?

### Forecast

Based on the feedback and additional information provided, revising the forecast to incorporate a slightly higher probability is justified. The majority of projections from reputable sources uniformly suggest an opening in the range of $160 million to $165 million, with some optimistic estimates reaching up to $239 million. The combined historical opening of "Deadpool" and "The Wolverine" stands at $185,548,391, providing a solid benchmark.

The strongest day-one ticket pre-sales reported by Fandango for 2024, surpassing even major titles like "Guardians of the Galaxy Vol. 3" and "Black Panther: Wakanda Forever," indicate substantial public interest. The return of Hugh Jackman as Wolverine and the popularity of the characters, along with the successful history of previous films in the series, further reinforce the likelihood of a strong performance.

Given these consistent projections, strong ticket pre-sales, and historical success, it is highly probable that "Deadpool & Wolverine" will surpass the combined openings of "Deadpool" (2016) and "The Wolverine" (2013).

Probability: 90%
===========================================================================================================

===========================================================================================================
# 26234 Will an avian influenza virus in humans be declared a “Public Health Emergency of International Concern” by the World Health Organization before Sept 30, 2024?

### Forecast:

Reviewing the feedback, we've considered the assistant's suggestion to incorporate more specific numbers and trends from recent reports. The report mentions a case fatality rate of 56% for avian influenza A(H5) viruses, which is significant. The jump of HPAI A(H5N1) to dairy cattle in the U.S. increases concerns about human transmissibility. Enhanced surveillance efforts by WHO and other health organizations, such as WOAH and FAO, indicate high vigilance. Although sustained human-to-human transmission is less evident, the virus's potential to increase transmissibility through mutations remains a concern. The recent detection of other avian influenza strains (A(H10N3) and A(H10N5)) underscores ongoing risks. Given this information, the reasonable probability of a WHO declaration by September 30, 2024, is revised upwards due to heightened alertness and recent developments.

### Final Answer:
Probability: 55%
===========================================================================================================

===========================================================================================================
# 26235 Will the Warren Buffett Indicator exceed 200% before September 17, 2024?


### Forecast

Upon reviewing the assistant’s feedback and incorporating additional considerations, including the historical context, inherent market volatility, and potential upcoming catalysts, it remains prudent to account for these variables' impact.

As of July 10, 2024, the Warren Buffett Indicator stands close to 200% at 196.20%. Its persistence above 190% since May 2024 suggests a sustained trend of market overvaluation relative to GDP. Historical peaks before significant market corrections and the likelihood of volatile economic conditions in the run-up to September 17, 2024, also play critical roles.

Upcoming market events, earnings reports, and economic data releases bear significant influence, and even minor market fluctuations could easily breach the 200% threshold.

Given the proximity of the indicator to the crucial mark and the factors mentioned, the probability still appears high but should be tempered slightly to reflect potential rapid reversals or corrections.

### Probability

Probability: 70%
===========================================================================================================

===========================================================================================================
# 26236 Will at least 24 world records be broken at the 2024 Paris Olympics?

### Forecast

The likelihood of at least 24 world records being broken at the 2024 Paris Olympics can be refined by considering detailed insights:

1. **Historical Trends**: Past Summer Olympics generally see around mid-20s world records broken, indicating a plausible range.
2. **Current Athlete Performance**: Athletes like Katie Ledecky and the potential for records from events like swimming and track.
3. **Olympic Records That Are Also World Records**: Specific high-profile events and current athletes holding these records show promising potential.
4. **Heat Conditions**: While heat waves could impact endurance events, many events aren't as affected.
5. **Sport-Specific Factors**: Advancements in technology, particularly the new Olympic track expected to be very fast, and initiatives to mitigate adverse weather effects.

Given these considerations and the additional insights on technological advancements and event-specific potentials, the probability is adjusted to reflect a higher likelihood.

### Forecast

Probability: 75%
===========================================================================================================


## Upload

In [31]:
for p in tqdm(predictions.values()):
    p.upload()

 20%|██████████████▌                                                          | 1/5 [00:00<00:02,  1.80it/s]

Prediction posted for  26232
Comment posted for  26232


 40%|█████████████████████████████▏                                           | 2/5 [00:01<00:01,  1.92it/s]

Prediction posted for  26233
Comment posted for  26233


 60%|███████████████████████████████████████████▊                             | 3/5 [00:01<00:01,  1.96it/s]

Prediction posted for  26234
Comment posted for  26234


 80%|██████████████████████████████████████████████████████████▍              | 4/5 [00:02<00:00,  2.05it/s]

Prediction posted for  26235
Comment posted for  26235


100%|█████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  1.83it/s]

Prediction posted for  26236
Comment posted for  26236



