## A motivating example

In this notebook, we show how to use the embedding API from OpenAI to create a simple RAG system. As an example, we take one question from one data challenge: 

```markdown
[Question]: 3. You are asked to construct a Zero failure test for a redesigned ball bearing(   $ \\beta $ =2.5) that the design folks believe should have an    $ \\eta $ =1000hrs.  Program Mgmnt wants you to use only 5 tests. How long  should you test these five samples to be 90% confident that the ball  bearing design is better than 1000hrs? \n\n[Choices]: [a] 733hrs,  | [b] 851hrs | [c] 975hrs | [d] 1500.hrs
```

The correct answer of this question is [a]. However, if we use test this question directly, we will get the wrong answer (for most of the times).

In [2]:
from openai import OpenAI
import re, json, os
import pandas as pd
import numpy as np
from datetime import datetime


# Define a supporting function that calls OpenAI API to get an answer.
def get_answer(sys_prompt, question, correct_answer):
    client = OpenAI()

    # Clean up the prompt and answer
    clean_question = question.strip()

    # Query the OpenAI API using ChatCompletion
    input_msg = [
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": clean_question}
        ]
    completion = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=input_msg,
        temperature=.4
        )

    # Extract GPT-4's answer and explanation
    response = completion.choices[0].message.content
    generated_answer = response.split("[Answer]")[-1].strip().split('\n')[0]

    # Regular expression to extract the letter
    letters  = re.findall(r'\[([a-zA-Z])\]', generated_answer)
    extracted_answer = ', '.join(letters)

    # Log the result
    print(f"Question: {clean_question}\n")
    print(f'Response: {response}\n')
    print(f'Extracted answer: {extracted_answer}\n')
    print(f"Correct answer: {correct_answer}\n")
    print(f'Correct/Wrong: {extracted_answer==correct_answer}\n')

# Define the question, system prompt, and correct answer.
sys_prompt = '''You will be given a multiple choice question about reliability engineering. Choose the correct answer. At the end of your response, start a new line and use the following format to output your answer: [Answer] [The letters you choose]. For example, if you think the answer [a] is correct, output [Answer] [a]. If you think there are multiple correct answer, using a comma to separate them, e.g., [Answer] [a], [b]. Limit your output to 400 words maximum.
'''
question = "[Question]: 3. You are asked to construct a Zero failure test for a redesigned ball bearing(   $ \\beta $ =2.5) that the design folks believe should have an    $ \\eta $ =1000hrs.  Program Mgmnt wants you to use only 5 tests. How long  should you test these five samples to be 90% confident that the ball  bearing design is better than 1000hrs? \n\n[Choices]: [a] 733hrs,   | [b] 851hrs | [c] 975hrs | [d] 1500.hrs"
correct_answer = 'a'

# Run the query.
get_answer(sys_prompt, question, correct_answer)

Question: [Question]: 3. You are asked to construct a Zero failure test for a redesigned ball bearing(   $ \beta $ =2.5) that the design folks believe should have an    $ \eta $ =1000hrs.  Program Mgmnt wants you to use only 5 tests. How long  should you test these five samples to be 90% confident that the ball  bearing design is better than 1000hrs? 

[Choices]: [a] 733hrs,   | [b] 851hrs | [c] 975hrs | [d] 1500.hrs

Response: To determine the testing duration required to achieve 90% confidence that the redesigned ball bearing has a mean life greater than 1000 hours, we can use the concept of reliability testing and the statistical properties of life data analysis.

Given:
- The shape parameter (β) is 2.5.
- The target mean life (η) is 1000 hours.
- The number of tests (n) is 5.
- We need to find the test duration (t) such that we can be 90% confident that the mean life exceeds 1000 hours.

The required test duration can be calculated using the following formula derived from the Weibu

This question is a classifical problem of calculating the confidence level related to a test. However, an analysis of the response reveals that the LLM does not have knowledge about this type of problem.

## Retrieval module

Now, let's develop a simple retrieval module that takes in a query and a list of documents and returns the most relervant document using cosine similarity. As an exmaple, we define a small document collection with three short passages: One related to the confidence level of a test, while the other two related to the confidence intervals of mle estimators. 

In [3]:
doc_db = [
    r'''Confidence level of reliability tests \n Suppose we test $ n $ samples for $ t $ units of time and observe $ f $ failure. Let $ R(t) $ represents the reliability function at $ t $. The confidence level of the test is defined as the probability of observing no more than $ f $ failures: \n $$ C = 1 - \sum_{k=0}^f \frac{n!}{k! (n-k)!} (1-R(t)^k R(t)^{(n-k)}) $$, where $ n $ is sample size, $ f $ is the number of failure, $ R(t) $ is the reliability function at $ t $. When $ f=0 $, the test is called a zero-failure test, and the confidence level becomes $ C = 1 - R(t)^n $.''',

    '''# Confidence Interval for Mean Time to Failure (Exponential Distribution) \n If the lifetime of a system follows an exponential distribution, the mean time to failure (MTTF) is given by \( \theta = \frac{1}{\lambda} \), where \( \lambda \) is the failure rate. The confidence interval (CI) for \( \theta \) is based on the chi-square distribution. \n ## Two-Sided Confidence Interval
\[
\theta \in \left[ \frac{2n \bar{x}}{\chi^2_{1-\alpha/2, \, 2n}}, \; \frac{2n \bar{x}}{\chi^2_{\alpha/2, \, 2n}} \right]
\]
## One-Sided Confidence Intervals
- Lower bound (upper confidence limit):
  \[
  \theta \leq \frac{2n \bar{x}}{\chi^2_{\alpha, \, 2n}}
  \]
- Upper bound (lower confidence limit):
  \[
  \theta \geq \frac{2n \bar{x}}{\chi^2_{1-\alpha, \, 2n}}
  \]
**Symbols:**
- \( \theta \): Mean time to failure (MTTF).
- \( \lambda \): Failure rate (inverse of \( \theta \)).
- \( \bar{x} \): Sample mean of the observed lifetimes.
- \( n \): Sample size.
- \( \chi^2_{\alpha, \, \text{df}} \): Critical value from the chi-square distribution with \( \text{df} \) degrees of freedom.
- \( \alpha \): Significance level (\( 1-\alpha \) is the confidence level).
The choice between one-sided and two-sided intervals depends on the specific application and whether bounds in a particular direction are of interest.''',

    '''# Confidence Interval for Population Mean \n To calculate the confidence interval (CI) for the population mean \( \mu \), we need to consider whether the population standard deviation \( \sigma \) is known or unknown. \n ## Case 1: \( \sigma \) is Known
If the population standard deviation \( \sigma \) is known, the CI is calculated using the Z-distribution: \n
\[
\mu \in \left[ \bar{x} - Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \; \bar{x} + Z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right]
\] \n
**Symbols:**
- \( \mu \): Population mean (unknown, estimated).
- \( \bar{x} \): Sample mean (calculated from the data).
- \( Z_{\alpha/2} \): Critical value from the standard normal distribution for a confidence level \( 1-\alpha \).
- \( \sigma \): Population standard deviation (known).
- \( n \): Sample size. \n
## Case 2: \( \sigma \) is Unknown \n
If the population standard deviation \( \sigma \) is unknown, the CI is calculated using the t-distribution:\n
\[
\mu \in \left[ \bar{x} - t_{\alpha/2, \; n-1} \frac{s}{\sqrt{n}}, \; \bar{x} + t_{\alpha/2, \; n-1} \frac{s}{\sqrt{n}} \right]
\]\n
**Symbols:**
- \( t_{\alpha/2, \; n-1} \): Critical value from the t-distribution with \( n-1 \) degrees of freedom for a confidence level \( 1-\alpha \).
- \( s \): Sample standard deviation (calculated from the data).
- Other symbols are the same as defined in Case 1.\n
The choice of formula depends on the availability of \( \sigma \). If \( \sigma \) is unknown, \( s \) is used as an estimate in conjunction with the t-distribution.
    '''
]

  '''# Confidence Interval for Mean Time to Failure (Exponential Distribution) \n If the lifetime of a system follows an exponential distribution, the mean time to failure (MTTF) is given by \( \theta = \frac{1}{\lambda} \), where \( \lambda \) is the failure rate. The confidence interval (CI) for \( \theta \) is based on the chi-square distribution. \n ## Two-Sided Confidence Interval
  '''# Confidence Interval for Population Mean \n To calculate the confidence interval (CI) for the population mean \( \mu \), we need to consider whether the population standard deviation \( \sigma \) is known or unknown. \n ## Case 1: \( \sigma \) is Known


We now use the OpenAI [API](https://platform.openai.com/docs/guides/embeddings) to create embeddings for each of the documents in our corpus. Then, we use cosine similarity to find the most similar document to a given query.

In [4]:
client = OpenAI()

# Get embeddings for all documents
doc_embeddings = []
for doc in doc_db:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=doc
    )
    doc_embeddings.append(response.data[0].embedding)

# Get embedding for the query
query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=question
)
query_vector = query_embedding.data[0].embedding

# Calculate cosine similarity
def cosine_similarity(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

# Find most relevant document
similarities = [cosine_similarity(query_vector, doc_vec) for doc_vec in doc_embeddings]
most_relevant_idx = np.argmax(similarities)
context = doc_db[most_relevant_idx]

## RAG

Now, let's use the retrieved chunks to create a improve the geneartion. The idea is simple: Add the retrieved chunks to the prompt as context and let the LLM generate the answer.

In [6]:
# Define the question, system prompt, and correct answer.
sys_prompt = '''You will be given a multiple choice question about reliability engineering. Choose the correct answer. At the end of your response, start a new line and use the following format to output your answer: [Answer] [The letters you choose]. For example, if you think the answer [a] is correct, output [Answer] [a]. If you think there are multiple correct answer, using a comma to separate them, e.g., [Answer] [a], [b]. You will be given some context followed by [Context]. Use it to reason step-by-step. If you need to calculate anything, generate a python script to do the calculation. Limit your output to 400 words maximum.
'''
question = "[Question]: 3. You are asked to construct a Zero failure test for a redesigned ball bearing(   $ \\beta $ =2.5) that the design folks believe should have an    $ \\eta $ =1000hrs.  Program Mgmnt wants you to use only 5 tests. How long  should you test these five samples to be 90% confident that the ball  bearing design is better than 1000hrs? \n\n[Choices]: [a] 733hrs,   | [b] 851hrs | [c] 975hrs | [d] 1500.hrs"
correct_answer = 'a'



In [7]:
# Create new prompt with context
rag_prompt = sys_prompt + f"\n[Context]: {context}\n"

# Get answer with context
get_answer(rag_prompt, question, correct_answer)

Question: [Question]: 3. You are asked to construct a Zero failure test for a redesigned ball bearing(   $ \beta $ =2.5) that the design folks believe should have an    $ \eta $ =1000hrs.  Program Mgmnt wants you to use only 5 tests. How long  should you test these five samples to be 90% confident that the ball  bearing design is better than 1000hrs? 

[Choices]: [a] 733hrs,   | [b] 851hrs | [c] 975hrs | [d] 1500.hrs

Response: To determine how long to test the five samples to achieve a 90% confidence level that the ball bearing design is better than 1000 hours, we can use the zero-failure test formula provided in the context. 

The confidence level for a zero-failure test is given by:

\[ C = 1 - R(t)^n \]

Where:
- \( C \) is the confidence level (0.90 for 90% confidence),
- \( R(t) \) is the reliability function at time \( t \),
- \( n \) is the sample size (5 in this case).

Since we want to find the time \( t \) such that \( C = 0.90 \), we can rearrange the formula:

\[ R(t)^n = 