In [1]:
pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
from src.kaggle_submission import SubmissionBase, test_submission
from src.client import openai_client, qdrant_client
from pydantic import BaseModel
from typing import Literal
from src.python_agent import PythonAgent
from src.handler_rag import QdrantRAG, HandMadeRAG
from src.utils import save_json

test_questions = pd.read_csv("data/test.csv")
train_questions = pd.read_csv("data/train.csv")


  from .autonotebook import tqdm as notebook_tqdm


**V1 - Simple Context Improvement**



In [2]:
context = """
You are an AI expert in reliability engineering. Your task is to answer multiple-choice questions (MCQs) accurately and concisely. Each question will have exactly one correct answer.

Instructions:
- Read the question and the possible answers.
- Identify the single correct answer based on your expertise in reliability engineering.
- Respond only with the letter of the correct answer (e.g., a, b, c, or d). Do not provide any explanations or additional text.
For example, if you want to say that answer [d] is the right one, you should only retun "d".


Example usage:
Question: Which metric measures the average time between system failures?
a. MTTR
b. MTBF
c. Availability
d. Failure Rate

Expected response:
b
"""


class SimpleContext(SubmissionBase):
    def get_1_answer(self, q):
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": context},
                {"role": "user", "content": q},
            ],
        )

        return response.choices[0].message.content


v1 = SimpleContext(test_questions, openai_client)

In [4]:
test_submission(v1, fake_multiple_attempts=True)

 --> Prediction 1 for question 1 : d <-- 
 --> Prediction 1 for question 2 : b <-- 
 --> Prediction 1 for question 3 : a <-- 
 --> Prediction 1 for question 4 : d <-- 
 --> Prediction 1 for question 5 : a <-- 
 --> Prediction 1 for question 6 : d <-- 
 --> Prediction 1 for question 7 : b <-- 
 --> Prediction 1 for question 8 : b <-- 
 --> Prediction 1 for question 9 : c <-- 
 --> Prediction 1 for question 10 : d <-- 
 --> Prediction 1 for question 11 : d <-- 
 --> Prediction 1 for question 12 : d <-- 
 --> Prediction 1 for question 13 : b <-- 
 --> Prediction 1 for question 14 : b <-- 
 --> Prediction 1 for question 15 : c <-- 
 --> Prediction 1 for question 16 : b <-- 
 --> Prediction 1 for question 17 : d <-- 
 --> Prediction 1 for question 18 : d <-- 
 --> Prediction 1 for question 19 : a <-- 
 --> Prediction 1 for question 20 : c <-- 
 --> Prediction 1 for question 21 : b <-- 
 --> Prediction 1 for question 22 : c <-- 
 --> Prediction 1 for question 23 : a <-- 
 --> Prediction 1 fo

np.float64(0.6)

**Reasoning with chain of thought**

This uses prompt engineering to make the model think and reason.

In [3]:
SYSTEM_PROMPT = """
You are a highly knowledgeable AI specializing in reliability engineering, with expertise in industry standards (e.g., MIL-HDBK-217, ARP4761, IEC 61508, ISO 9001) and methodologies (e.g., FMEA, FTA, reliability block diagrams).
Your task is to answer multiple-choice questions about reliability engineering with 100% accuracy. Each question has exactly one correct answer.

Instructions:
Understand the Question and Options

Carefully read and analyze the question and all answer choices.
Ensure complete comprehension of technical terms, context, and nuances.
Engage in Detailed Reasoning (Chain of Thought)

Use a step-by-step internal reasoning process to evaluate each answer choice.
Reference standard definitions, frameworks, and principles to support your analysis.
Eliminate incorrect options by identifying inaccuracies or inconsistencies.
Apply ReAct Methodology

If you encounter any ambiguities or need additional clarification, address them proactively.
Use your expertise to "act" on the information given, ensuring a thorough understanding before reaching a conclusion.
Explain Your Reasoning Concisely

Provide a brief explanation of your thought process to enhance transparency.
Keep explanations focused and relevant to the question.
End with the Correct Answer

Always conclude your response with the phrase: So the answer is: <letter>
Replace <letter> with the correct answer (e.g., a, b, c, d).
The answer should appear as the last character in your response without any punctuation, brackets, or additional text following it.
Example:
Question: What is the expected operational time between failures called?

Choices: a. MTTR
b. MTBF
c. Reliability Factor
d. Failure Rate

Response:

Option a (MTTR): Mean Time To Repair refers to the average time required to repair a failed component or system, not the operational time between failures.
Option b (MTBF): Mean Time Between Failures represents the expected operational time between inherent failures during normal operation.
Option c (Reliability Factor): This is not a standard term used to describe operational time between failures.
Option d (Failure Rate): This measures the frequency of failures over a specified period, not the time between them.
Conclusion: MTBF accurately describes the expected operational time between failures.
So the answer is: b

"""



class ChainOfThought(SubmissionBase):
    messages_to_save = []
    def get_1_answer(self, q):
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": q},
        ]

        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0.61,
        )
        answer = response.choices[0].message.content
        letter = answer[-1]
        messages += [
            {"role": "assistant", "content": answer},
        ]
        self.messages_to_save += messages
        return letter
    
    def get_submission(self, save_path = "generated/submission.csv", fake_multiple_attempts=False):
        self.messages_to_save = []
        sub = super().get_submission(save_path, fake_multiple_attempts)
        save_json(self.messages_to_save, "generated/all_messages.json")
        return sub
        


chain_of_thought = ChainOfThought(test_questions, openai_client)


In [24]:
test_submission(chain_of_thought, fake_multiple_attempts=True)

 --> Prediction 1 for question 1 : d <-- 
 --> Prediction 1 for question 2 : b <-- 
 --> Prediction 1 for question 3 : a <-- 
 --> Prediction 1 for question 4 : d <-- 
 --> Prediction 1 for question 5 : a <-- 
 --> Prediction 1 for question 6 : d <-- 
 --> Prediction 1 for question 7 : b <-- 
 --> Prediction 1 for question 8 : c <-- 
 --> Prediction 1 for question 9 : c <-- 
 --> Prediction 1 for question 10 : d <-- 
 --> Prediction 1 for question 11 : d <-- 
 --> Prediction 1 for question 12 : d <-- 
 --> Prediction 1 for question 13 : c <-- 
 --> Prediction 1 for question 14 : b <-- 
 --> Prediction 1 for question 15 : b <-- 
 --> Prediction 1 for question 16 : b <-- 
 --> Prediction 1 for question 17 : d <-- 
 --> Prediction 1 for question 18 : d <-- 
 --> Prediction 1 for question 19 : c <-- 
 --> Prediction 1 for question 20 : d <-- 
 --> Prediction 1 for question 21 : c <-- 
 --> Prediction 1 for question 22 : a <-- 
 --> Prediction 1 for question 23 : a <-- 
 --> Prediction 1 fo

np.float64(0.68)

**Double Prompting**

This time we tried to make the model "doubt".
We first get a zero-shot answer, and then ask the model if it is sure and explain the reasoning steps. This ensures the model does not lose the context.


In [21]:
class FullReasoning(BaseModel):
    steps: list[str]
    final_answer: Literal["a", "b", "c", "d"]


SYSTEM_PROMPT = context

DOUBT_PROMPT = """
I have a doubt. Are you totally sure ? Double-check your answer and explain briefly in 2 steps.
"""


class DoublePrompting(SubmissionBase):
    messages_to_save = []
    def get_1_answer(self, q):
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": q},
        ]

        first_response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0.7,
        )

        messages += [
            {"role": "assistant", "content": first_response.choices[0].message.content},
            {"role": "user", "content": DOUBT_PROMPT},
        ]

        response = openai_client.beta.chat.completions.parse(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0.61,
            response_format=FullReasoning,
        )
        answer = response.choices[0].message.parsed
        messages += [
            {"role": "assistant", "content": answer.steps},
            {"role": "assistant", "content": answer.final_answer},
        ]
        self.messages_to_save += messages
        return answer.final_answer
    
    def get_submission(self, save_path = "generated/submission.csv", fake_multiple_attempts=False):
        self.messages_to_save = []
        sub = super().get_submission(save_path, fake_multiple_attempts)
        save_json(self.messages_to_save, "generated/all_messages.json")
        return sub
        


double_prompting = DoublePrompting(test_questions, openai_client)

In [22]:
test_submission(double_prompting, fake_multiple_attempts=True)

 --> Prediction 1 for question 1 : d <-- 
 --> Prediction 1 for question 2 : b <-- 
 --> Prediction 1 for question 3 : a <-- 
 --> Prediction 1 for question 4 : d <-- 
 --> Prediction 1 for question 5 : a <-- 
 --> Prediction 1 for question 6 : d <-- 
 --> Prediction 1 for question 7 : b <-- 
 --> Prediction 1 for question 8 : b <-- 
 --> Prediction 1 for question 9 : c <-- 
 --> Prediction 1 for question 10 : d <-- 
 --> Prediction 1 for question 11 : d <-- 
 --> Prediction 1 for question 12 : d <-- 
 --> Prediction 1 for question 13 : b <-- 
 --> Prediction 1 for question 14 : b <-- 
 --> Prediction 1 for question 15 : b <-- 
 --> Prediction 1 for question 16 : b <-- 
 --> Prediction 1 for question 17 : d <-- 
 --> Prediction 1 for question 18 : d <-- 
 --> Prediction 1 for question 19 : b <-- 
 --> Prediction 1 for question 20 : c <-- 
 --> Prediction 1 for question 21 : c <-- 
 --> Prediction 1 for question 22 : c <-- 
 --> Prediction 1 for question 23 : b <-- 
 --> Prediction 1 fo

np.float64(0.64)

**Multiway prompting**

This time we initially don't provide the choices to Chat. We first let it think about an answer, and then provide it the MCQ in order to choose the best answer.

In [4]:
SYSTEM_PROMPT = """You are a reliability expert. You will be asked to answer to several questions based on your knowledge and the definitions you know.
You will need to explain your reasonning and explain the steps that allowed you to choose your answers.
"""


class SimpleAnswer(BaseModel):
    choice: Literal["a", "b", "c", "d"]


def provide_choices(choices):
    return f"""Based on your previous answer, you should now assess the veracity of each of the following possible answers one by one:
{choices}"""


SELECTION_PROMPT = """Now please select the 1 possibility that fits bests the initial question.
It is possible that none of the possible answer seems acceptable to you. In this case, please choose the one that is the closest to your opinion."""


class MultiPrompting(SubmissionBase):
    def get_1_answer(self, q):
        question, choices = q.split("[Choices]")

        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": question},
        ]

        first_response = (
            openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0.61,
            )
            .choices[0]
            .message.content
        )

        messages += [
            {"role": "assistant", "content": first_response},
            {"role": "user", "content": provide_choices(choices)},
        ]

        chat_opinion = (
            openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0.61,
            )
            .choices[0]
            .message.content
        )

        messages += [
            {"role": "assistant", "content": chat_opinion},
            {"role": "user", "content": SELECTION_PROMPT},
        ]

        return (
            openai_client.beta.chat.completions.parse(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0.61,
                response_format=SimpleAnswer,
            )
            .choices[0]
            .message.parsed.choice
        )


mp = MultiPrompting(test_questions, openai_client)

In [7]:
test_submission(mp, fake_multiple_attempts=True)

 --> Prediction 1 for question 1 : d
 --> Prediction 1 for question 2 : d
 --> Prediction 1 for question 3 : a
 --> Prediction 1 for question 4 : d
 --> Prediction 1 for question 5 : a
 --> Prediction 1 for question 6 : d
 --> Prediction 1 for question 7 : b
 --> Prediction 1 for question 8 : c
 --> Prediction 1 for question 9 : c
 --> Prediction 1 for question 10 : d
 --> Prediction 1 for question 11 : d
 --> Prediction 1 for question 12 : a
 --> Prediction 1 for question 13 : a
 --> Prediction 1 for question 14 : b
 --> Prediction 1 for question 15 : b
 --> Prediction 1 for question 16 : a
 --> Prediction 1 for question 17 : d
 --> Prediction 1 for question 18 : d
 --> Prediction 1 for question 19 : c
 --> Prediction 1 for question 20 : b
 --> Prediction 1 for question 21 : a
 --> Prediction 1 for question 22 : a
 --> Prediction 1 for question 23 : a
 --> Prediction 1 for question 24 : c
 --> Prediction 1 for question 25 : a
--------------------
Score : 0.72 for model MultiPrompting


np.float64(0.72)

**Agentic system**

Now we will give the model the capacity to write and execute Python scripts

In [11]:
PythonAgent.inject_python(openai_client=openai_client)
test_submission(double_prompting, fake_multiple_attempts=True)

Found a python script to execute.
Executing the following script: 

import scipy.stats as stats

# Given values
mean = 150  # Mean (μ)
std_dev = 20  # Standard deviation (σ)
percentile = 0.10  # 10th percentile

# Find the z-score for the 10th percentile
z_score = stats.norm.ppf(percentile)

# Calculate B10 life
B10_life = mean + z_score * std_dev
print(B10_life)
```
Script executed successfully.
Prompting with the result: 124.36896868910799
 --> Prediction 1 for question 1 : b
 --> Prediction 1 for question 2 : a
 --> Prediction 1 for question 3 : c
Found a python script to execute.
Executing the following script: 

import scipy.stats as stats

# Parameters
n = 20  # sample size
alpha = 0.05

# Degrees of freedom
df = n - 1

# Chi-squared critical values
chi2_lower = stats.chi2.ppf(alpha / 2, df)
chi2_upper = stats.chi2.ppf(1 - alpha / 2, df)

(chi2_lower, chi2_upper)
```
The script returned no output. Trying again
Found a python script to execute.
Executing the following script: 

im

Unnamed: 0,question_id,prediction_1,prediction_2,prediction_3,prediction_4,prediction_5
0,1,b,b,b,b,b
1,2,a,a,a,a,a
2,3,c,c,c,c,c
3,4,a,a,a,a,a
4,5,d,d,d,d,d
5,6,a,a,a,a,a
6,7,d,d,d,d,d
7,8,a,a,a,a,a
8,9,a,a,a,a,a
9,10,c,c,c,c,c


**RAG prompting**

We now augment the knowledge of our model using Retrieval Augmented Generation. We built a database containing specific information about reliability engineering, and will use it to augment our prompts.

In [6]:
from src.utils import init_string

use_qdrant = False

if use_qdrant:
    rag = QdrantRAG(rag_path="data/rag_db_seed.json", qdrant_client=qdrant_client)
else:
    rag = HandMadeRAG(db_path="data/rag_db.json", openai_client=openai_client)

MAX_CONTEXT_LENGTH = 1200


class RiskHive(ChainOfThought):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.augment_max_length = MAX_CONTEXT_LENGTH
        init_string()

    def inject_context(self, q: str):
        question, _ = q.split("[Choices]")
        search = rag.search(question, limit=1)
        injected_context = ""
        for idx, el in enumerate(search):
            injected_context += f"{idx+1} : {el}\n\n"
        return f"{q}\n\n Here is some information that could help:  ```{injected_context[0:self.augment_max_length]}(...)```"

    def get_1_answer(self, q):
        return super().get_1_answer(self.inject_context(q))


rh = RiskHive(test_questions, openai_client)


||                               ||
||          _________            ||
||         |         |           ||
||         |R I S K  |           ||
||         |  H I V E|           ||
||         |_________|           ||
||                               ||
[STATUS]: All systems up. Ready to analyze some risk!



In [None]:
PythonAgent.inject_python(openai_client=openai_client)
test_submission(rh, fake_multiple_attempts=True)
# rh.get_submission(save_path="generated/risk_hive_submission.csv", fake_multiple_attempts=True)

 --> Prediction 1 for question 1 : d <-- 
Found a python script to execute.
Executing the following script: 

```python
import numpy as np
from scipy.stats import weibull_min
from scipy.special import gamma

# Given data
failure_times = [42]  # Failure time for the failed item
censored_times = [50] * 4  # Censored times for the remaining items
all_times = failure_times + censored_times

# Parameters for Weibull distribution
beta = 2.2
n = len(all_times)

# MLE for η calculation
eta_hat = (np.mean(np.array(all_times) ** beta)) ** (1 / beta)

# Standard error calculation for η
# Using the variance formula for Weibull distribution
variance = (eta_hat ** 2) * (gamma(1 + 2/beta) - (gamma(1 + 1/beta) ** 2))
standard_error = np.sqrt(variance / n)

# Z value for 95% confidence
z_alpha = 1.96  # For 95% confidence

# Lower confidence limit
lower_confidence_limit = eta_hat - z_alpha * standard_error
lower_confidence_limit
print(lower_confidence_limit)
```
Script executed successfully.
Prompting 