# Example of generating QAs for an ML book using Azure OpenAI

### Before running the code

Make sure you have a .env file in the root directory with following parameter values in the root directory of this project
```
    AZURE_API_KEY="YOUR_API_KEY"
    AZURE_ENDPOINT="YOUR_ENDPOINT"
    AZURE_DEPLOYMENT_NAME="YOUR_DEPLOYMENT_NAME"
    AZURE_API_VERSION="YOUR_API_VERSION"
```
`AZURE_API_KEY`, `AZURE_ENDPOINT`, and `AZURE_DEPLOYMENT_NAME` can be accessed at your Azure OpenAI portal. Available `AZURE_API_VERSION` can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions)

### Load packages

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

In [2]:
import os
import pandas as pd
from dotenv import load_dotenv
from uniflow.flow.client import ExtractClient, TransformClient
from uniflow.flow.config import ExtractHTMLConfig, TransformAzureOpenAIConfig
from uniflow.flow.flow_factory import FlowFactory
from uniflow.op.model.model_config import AzureOpenAIModelConfig
from uniflow.op.prompt import Context, PromptTemplate

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

In [3]:
FlowFactory.list()

{'extract': ['ExtractHTMLFlow',
  'ExtractImageFlow',
  'ExtractIpynbFlow',
  'ExtractMarkdownFlow',
  'ExtractPDFFlow',
  'ExtractTxtFlow'],
 'transform': ['TransformAzureOpenAIFlow',
  'TransformCopyFlow',
  'TransformGoogleFlow',
  'TransformGoogleMultiModalModelFlow',
  'TransformHuggingFaceFlow',
  'TransformLMQGFlow',
  'TransformOpenAIFlow'],
 'rater': ['RaterFlow']}

### Prepare the input data

Set file name

In [4]:
html_file = "22.11_information-theory.html"

Set current directory and input data directory

In [5]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", html_file)

Load the html file via ExtractClient

In [6]:
input_data = [{"filename": input_file}]

In [7]:
extract_client = ExtractClient(ExtractHTMLConfig())

In [8]:
extract_output = extract_client.run(input_data)

100%|██████████| 1/1 [00:00<00:00,  3.21it/s]


### Prepare input dataset

In [9]:
guided_prompt = PromptTemplate(
        instruction="Generate one question and its corresponding answer based on context. Following the format of the examples below to include the same context, question, and answer in the response.",
        few_shot_prompt=[
            Context(
                context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
                question="Who published A Mathematical Theory of Communication in 1948?",
                answer="Claude E. Shannon.",
            )
        ]
)

In [10]:
data = [ Context(context=p) for p in extract_output[0]['output'][0]['text'] if len(p) > 200 ]

### Run ModelFlow


In [11]:
config = TransformAzureOpenAIConfig(
    prompt_template=guided_prompt,
    model_config=AzureOpenAIModelConfig(response_format={"type": "json_object"}),
)

In [12]:
client = TransformClient(config)

In [13]:
data = data[-10:]

In [14]:
output = client.run(data)

Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


  0%|          | 0/10 [00:00<?, ?it/s]

 10%|█         | 1/10 [00:05<00:50,  5.66s/it]

Received response: {'id': 'chatcmpl-8xTyGzjjpGzIVJztqgDYCweZrRBSR', 'object': 'chat.completion', 'created': 1709187252, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{"context": "If we dive deep into the classification objective function with cross-entropy loss \\\\(\\\\textrm{CE}\\\\), we will find minimizing \\\\(\\\\textrm{CE}\\\\) is equivalent to maximizing the log-likelihood function \\\\(L\\\\).", "question": "What is minimizing cross-entropy loss equivalent to in the context of the classification objective function?", "answer": "Maximizing the log-likelihood function \\\\(L\\\\)."}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 203, 'completion_tokens': 97, 'total_tokens': 300}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 20%|██        | 2/10 [00:15<01:03,  7.96s/it]

Received response: {'id': 'chatcmpl-8xTyMOsN0JKeECTk9SYcDn6dV4HVa', 'object': 'chat.completion', 'created': 1709187258, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "To begin with, suppose that we are given a dataset with \\\\(n\\\\) examples, and it can be classified into \\\\(k\\\\) -classes. For each data example \\\\(i\\\\), we represent any \\\\(k\\\\)-class label \\\\(\\\\mathbf{y}_i = (y_{i1}, \\\\ldots, y_{ik})\\\\) by one-hot encoding. To be specific, if the example \\\\(i\\\\) belongs to class \\\\(j\\\\), then we set the \\\\(j\\\\)-th entry to \\\\(1\\\\), and all other components to \\\\(0\\\\), i.e.,",\n  "question": "What encoding method is used to represent class labels in the given dataset?",\n  "answer": "One-hot encoding."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 289, 'completion_tokens': 169, 'total_tokens': 458}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call wit

 30%|███       | 3/10 [00:22<00:52,  7.44s/it]

Received response: {'id': 'chatcmpl-8xTyVbwemV6KX0K663uduEruxFXu4', 'object': 'chat.completion', 'created': 1709187267, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "For instance, if a multi-class classification problem contains three classes \\\\(A\\\\) , \\\\(B\\\\) , and \\\\(C\\\\) , then the labels \\\\(\\\\mathbf{y}_i\\\\) can be encoded in { \\\\(A: (1, 0, 0); B: (0, 1, 0); C: (0, 0, 1)\\\\) }.",\n  "question": "How are the labels for classes A, B, and C encoded in a multi-class classification problem?",\n  "answer": "A: (1, 0, 0); B: (0, 1, 0); C: (0, 0, 1)"\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 238, 'completion_tokens': 159, 'total_tokens': 397}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 40%|████      | 4/10 [00:30<00:46,  7.74s/it]

Received response: {'id': 'chatcmpl-8xTycGyAWWpI4j8fB3msuYsXywiFT', 'object': 'chat.completion', 'created': 1709187274, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{"context": "On the other side, we can also approach the problem through maximum likelihood estimation. To begin with, let\'s quickly introduce a k-class multinoulli distribution. It is an extension of the Bernoulli distribution from binary class to multi-class. If a random variable z = (z_{1}, \\\\ldots, z_{k}) follows a k-class multinoulli distribution with probabilities p = (p_{1}, \\\\ldots, p_{k}), i.e.,", "question": "What is a k-class multinoulli distribution an extension of?", "answer": "The Bernoulli distribution."}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 281, 'completion_tokens': 129, 'total_tokens': 410}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer

 50%|█████     | 5/10 [00:34<00:31,  6.34s/it]

Received response: {'id': 'chatcmpl-8xTykATLKrOQuQgpkCDlPMnGacCDc', 'object': 'chat.completion', 'created': 1709187282, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "In 1948, Claude E. Shannon published A Mathematical Theory of\\nCommunication (Shannon, 1948) establishing the theory of\\ninformation. In his article, Shannon introduced the concept of\\ninformation entropy for the first time. We will begin our journey here.",\n  "question": "What concept did Claude E. Shannon introduce for the first time in his 1948 article?",\n  "answer": "The concept of information entropy."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 294, 'completion_tokens': 97, 'total_tokens': 391}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 60%|██████    | 6/10 [00:43<00:29,  7.27s/it]

Received response: {'id': 'chatcmpl-8xTyoPxWaXI751uEM0ViMteVK0TDz', 'object': 'chat.completion', 'created': 1709187286, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "Since in maximum likelihood estimation, we maximizing the objective function \\\\(l(\\\\theta)\\\\) by having \\\\(\\\\pi_{j} = p_{\\\\theta} (y_{ij} \\\\mid \\\\mathbf{x}_i)\\\\) . Therefore, for any multi-class classification, maximizing the above log-likelihood function \\\\(l(\\\\theta)\\\\) is equivalent to minimizing the CE loss \\\\(\\\\textrm{CE}(y, \\\\hat{y})\\\\) .",\n  "question": "What is equivalent to minimizing the CE loss in multi-class classification?",\n  "answer": "Maximizing the log-likelihood function \\\\(l(\\\\theta)\\\\)."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 251, 'completion_tokens': 151, 'total_tokens': 402}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate

 70%|███████   | 7/10 [00:47<00:19,  6.42s/it]

Received response: {'id': 'chatcmpl-8xTyx3cAqB8B0oAd54louszkSm5b0', 'object': 'chat.completion', 'created': 1709187295, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "To test the above proof, let’s apply the built-in measure NegativeLogLikelihood. Using the same labels and preds as in the earlier example, we will get the same numerical loss as the previous example up to the 5 decimal place.",\n  "question": "What measure is applied to test the proof in the context?",\n  "answer": "NegativeLogLikelihood."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 201, 'completion_tokens': 83, 'total_tokens': 284}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 80%|████████  | 8/10 [00:56<00:14,  7.28s/it]

Received response: {'id': 'chatcmpl-8xTz2STxqVx7wAkuhcIXJKzxxby3Q', 'object': 'chat.completion', 'created': 1709187300, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "Information theory is a field of study about encoding, decoding, transmitting, and manipulating information. Entropy is the unit to measure how much information is presented in different signals. KL divergence can also measure the divergence between two distributions. Cross-entropy can be viewed as an objective function of multi-class classification. Minimizing cross-entropy loss is equivalent to maximizing the log-likelihood function.",\n  "question": "What is the objective of minimizing cross-entropy loss in multi-class classification?",\n  "answer": "Minimizing cross-entropy loss is equivalent to maximizing the log-likelihood function."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 241, 'completion_tokens': 126, 'total_tokens': 36

 90%|█████████ | 9/10 [01:02<00:06,  6.59s/it]

Received response: {'id': 'chatcmpl-8xTzBskRd8aDDh6qUzLQeZmYDNtDs', 'object': 'chat.completion', 'created': 1709187309, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "In 1948, Claude E. Shannon published A Mathematical Theory of Communication (Shannon, 1948) establishing the theory of information. In his article, Shannon introduced the concept of information entropy for the first time. We will begin our journey here.",\n  "question": "What concept did Claude E. Shannon introduce for the first time in his 1948 article?",\n  "answer": "Information entropy."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 756, 'completion_tokens': 91, 'total_tokens': 847}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


100%|██████████| 10/10 [01:19<00:00,  7.93s/it]

Received response: {'id': 'chatcmpl-8xTzGwv9aG41kI51Aaw5zBkz5JUrt', 'object': 'chat.completion', 'created': 1709187314, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "22.11. Information Theory\\n22.11.1. Information\\n22.11.1.1. Self-information\\n22.11.2. Entropy\\n22.11.2.1. Motivating Entropy\\n22.11.2.2. Definition\\n22.11.2.3. Interpretations\\n22.11.2.4. Properties of Entropy\\n22.11.3. Mutual Information\\n22.11.3.1. Joint Entropy\\n22.11.3.2. Conditional Entropy\\n22.11.3.3. Mutual Information\\n22.11.3.4. Properties of Mutual Information\\n22.11.3.5. Pointwise Mutual Information\\n22.11.3.6. Applications of Mutual Information\\n22.11.4. Kullback–Leibler Divergence\\n22.11.4.1. Definition\\n22.11.4.2. KL Divergence Properties\\n22.11.4.3. Example\\n22.11.5. Cross-Entropy\\n22.11.5.1. Formal Definition\\n22.11.5.2. Properties\\n22.11.5.3. Cross-Entropy as An Objective Function of Multi-class Class




### Format result into pandas table

In [15]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

for item in output:
    for i in item['output']:
        for response in i['response']:
            contexts.append(response['context'])
            questions.append(response['question'])
            answers.append(response['answer'])

df = pd.DataFrame({
    'context': contexts,
    'question': questions,
    'answer': answers
})

# Set display options
pd.set_option('display.max_colwidth', None)  # or use a specific width like 50
pd.set_option('display.width', 1000)

styled_df = df.style.set_properties(**{'text-align': 'left'}).set_table_styles([{
    'selector': 'th',
    'props': [('text-align', 'left')]
}])
styled_df

Unnamed: 0,context,question,answer
0,"If we dive deep into the classification objective function with cross-entropy loss \(\textrm{CE}\), we will find minimizing \(\textrm{CE}\) is equivalent to maximizing the log-likelihood function \(L\).",What is minimizing cross-entropy loss equivalent to in the context of the classification objective function?,Maximizing the log-likelihood function \(L\).
1,"To begin with, suppose that we are given a dataset with \(n\) examples, and it can be classified into \(k\) -classes. For each data example \(i\), we represent any \(k\)-class label \(\mathbf{y}_i = (y_{i1}, \ldots, y_{ik})\) by one-hot encoding. To be specific, if the example \(i\) belongs to class \(j\), then we set the \(j\)-th entry to \(1\), and all other components to \(0\), i.e.,",What encoding method is used to represent class labels in the given dataset?,One-hot encoding.
2,"For instance, if a multi-class classification problem contains three classes \(A\) , \(B\) , and \(C\) , then the labels \(\mathbf{y}_i\) can be encoded in { \(A: (1, 0, 0); B: (0, 1, 0); C: (0, 0, 1)\) }.","How are the labels for classes A, B, and C encoded in a multi-class classification problem?","A: (1, 0, 0); B: (0, 1, 0); C: (0, 0, 1)"
3,"On the other side, we can also approach the problem through maximum likelihood estimation. To begin with, let's quickly introduce a k-class multinoulli distribution. It is an extension of the Bernoulli distribution from binary class to multi-class. If a random variable z = (z_{1}, \ldots, z_{k}) follows a k-class multinoulli distribution with probabilities p = (p_{1}, \ldots, p_{k}), i.e.,",What is a k-class multinoulli distribution an extension of?,The Bernoulli distribution.
4,"In 1948, Claude E. Shannon published A Mathematical Theory of Communication (Shannon, 1948) establishing the theory of information. In his article, Shannon introduced the concept of information entropy for the first time. We will begin our journey here.",What concept did Claude E. Shannon introduce for the first time in his 1948 article?,The concept of information entropy.
5,"Since in maximum likelihood estimation, we maximizing the objective function \(l(\theta)\) by having \(\pi_{j} = p_{\theta} (y_{ij} \mid \mathbf{x}_i)\) . Therefore, for any multi-class classification, maximizing the above log-likelihood function \(l(\theta)\) is equivalent to minimizing the CE loss \(\textrm{CE}(y, \hat{y})\) .",What is equivalent to minimizing the CE loss in multi-class classification?,Maximizing the log-likelihood function \(l(\theta)\).
6,"To test the above proof, let’s apply the built-in measure NegativeLogLikelihood. Using the same labels and preds as in the earlier example, we will get the same numerical loss as the previous example up to the 5 decimal place.",What measure is applied to test the proof in the context?,NegativeLogLikelihood.
7,"Information theory is a field of study about encoding, decoding, transmitting, and manipulating information. Entropy is the unit to measure how much information is presented in different signals. KL divergence can also measure the divergence between two distributions. Cross-entropy can be viewed as an objective function of multi-class classification. Minimizing cross-entropy loss is equivalent to maximizing the log-likelihood function.",What is the objective of minimizing cross-entropy loss in multi-class classification?,Minimizing cross-entropy loss is equivalent to maximizing the log-likelihood function.
8,"In 1948, Claude E. Shannon published A Mathematical Theory of Communication (Shannon, 1948) establishing the theory of information. In his article, Shannon introduced the concept of information entropy for the first time. We will begin our journey here.",What concept did Claude E. Shannon introduce for the first time in his 1948 article?,Information entropy.
9,22.11. Information Theory 22.11.1. Information 22.11.1.1. Self-information 22.11.2. Entropy 22.11.2.1. Motivating Entropy 22.11.2.2. Definition 22.11.2.3. Interpretations 22.11.2.4. Properties of Entropy 22.11.3. Mutual Information 22.11.3.1. Joint Entropy 22.11.3.2. Conditional Entropy 22.11.3.3. Mutual Information 22.11.3.4. Properties of Mutual Information 22.11.3.5. Pointwise Mutual Information 22.11.3.6. Applications of Mutual Information 22.11.4. Kullback–Leibler Divergence 22.11.4.1. Definition 22.11.4.2. KL Divergence Properties 22.11.4.3. Example 22.11.5. Cross-Entropy 22.11.5.1. Formal Definition 22.11.5.2. Properties 22.11.5.3. Cross-Entropy as An Objective Function of Multi-class Classification 22.11.6. Summary 22.11.7. Exercises,What concept serves as an objective function for multi-class classification according to the context?,Cross-Entropy.
