# Example of generating QAs for an ML book using Azure OpenAI

### Before running the code

You will need to have the following packages installed:
```
    pip install langchain pandas unstructured
```

Also, make sure you have a .env file with your following parameter values in the root directory of this project
```
    api_key="YOUR_API_KEY"
    endpoint="YOUR_END_POINT"
    deployment_id="YOUR_DEPLOYMENT_ID"
    model_version="YOUR_MODEL_VERSION"
```

### Load packages

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

In [2]:
import os
import pandas as pd
from dotenv import load_dotenv
from uniflow.flow.client import ExtractClient, TransformClient
from uniflow.flow.config import ExtractHTMLConfig, TransformAzureOpenAIConfig
from uniflow.flow.flow_factory import FlowFactory
from uniflow.op.model.model_config import AzureOpenAIModelConfig
from uniflow.op.prompt import Context, PromptTemplate

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

In [3]:
FlowFactory.list()

{'extract': ['ExtractHTMLFlow',
  'ExtractImageFlow',
  'ExtractIpynbFlow',
  'ExtractMarkdownFlow',
  'ExtractPDFFlow',
  'ExtractTxtFlow'],
 'transform': ['TransformAzureOpenAIFlow',
  'TransformCopyFlow',
  'TransformHuggingFaceFlow',
  'TransformLMQGFlow',
  'TransformOpenAIFlow'],
 'rater': ['RaterFlow']}

### Prepare the input data

Set file name

In [4]:
html_file = "22.11_information-theory.html"

Set current directory and input data directory

In [5]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", html_file)

Load the html file via ExtractClient

In [6]:
input_data = [{"filename": input_file}]

In [7]:
extract_client = ExtractClient(ExtractHTMLConfig())

In [8]:
extract_output = extract_client.run(input_data)

100%|██████████| 1/1 [00:00<00:00,  1.71it/s]


### Prepare input dataset

In [9]:
guided_prompt = PromptTemplate(
        instruction="Generate one question and its corresponding answer based on context. Following the format of the examples below to include the same context, question, and answer in the response.",
        few_shot_prompt=[
            Context(
                context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
                question="Who published A Mathematical Theory of Communication in 1948?",
                answer="Claude E. Shannon.",
            )
        ]
)

In [10]:
data = [ Context(context=p) for p in extract_output[0]['output'][0]['text'] if len(p) > 200 ]

In [11]:
# data = data[-2:]

### Run ModelFlow


In [12]:
config = TransformAzureOpenAIConfig(
    prompt_template=guided_prompt,
    model_config=AzureOpenAIModelConfig(response_format={"type": "json_object"}),
)

In [13]:
client = TransformClient(config)

In [14]:
output = client.run(data)

Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


  0%|          | 0/15 [00:00<?, ?it/s]

  7%|▋         | 1/15 [00:04<00:58,  4.19s/it]

Received response: {'id': 'chatcmpl-8x1Xq6TdHaVCelyTOG8qM2mXErSqE', 'object': 'chat.completion', 'created': 1709077982, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{"context": "In 1948, Claude E. Shannon published A Mathematical Theory of\\nCommunication (Shannon, 1948) establishing the theory of\\ninformation. In his article, Shannon introduced the concept of\\ninformation entropy for the first time. We will begin our journey here.", "question": "What concept did Claude E. Shannon introduce for the first time in his 1948 article?", "answer": "The concept of information entropy."}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 1975, 'completion_tokens': 93, 'total_tokens': 2068}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 13%|█▎        | 2/15 [00:11<01:19,  6.13s/it]

Received response: {'id': 'chatcmpl-8x1XuC7LlLg62S63JXm76LhXzi7Zz', 'object': 'chat.completion', 'created': 1709077986, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "In 1948, Claude E. Shannon published A Mathematical Theory of Communication (Shannon, 1948) establishing the theory of information. In his article, Shannon introduced the concept of information entropy for the first time. We will begin our journey here.",\n  "question": "What concept did Claude E. Shannon introduce for the first time in his 1948 article?",\n  "answer": "Information entropy."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 1975, 'completion_tokens': 91, 'total_tokens': 2066}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 20%|██        | 3/15 [00:17<01:12,  6.03s/it]

Received response: {'id': 'chatcmpl-8x1Y2bww7TI1kkQMt3vz7SdUJdlef', 'object': 'chat.completion', 'created': 1709077994, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "The universe is overflowing with information. Information provides a common language across disciplinary rifts: from Shakespeare’s Sonnet to researchers’ paper on Cornell ArXiv, from Van Gogh’s printing Starry Night to Beethoven’s music Symphony No. 5, from the first programming language Plankalkül to the state-of-the-art machine learning algorithms. Everything must follow the rules of information theory, no matter the format. With information theory, we can measure and compare how much information is present in different signals. In this section, we will investigate the fundamental concepts of information theory and applications of information theory in machine learning.",\n  "question": "What is the common language that connects different

 27%|██▋       | 4/15 [00:20<00:54,  4.98s/it]

Received response: {'id': 'chatcmpl-8x1Y8EukJtl9AmS2DUS91LGEncB4b', 'object': 'chat.completion', 'created': 1709078000, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "Consider the following thought experiment. We have a friend with a deck of cards. They will shuffle the deck, flip over some cards, and tell us statements about the cards. We will try to assess the information content of each statement.",\n  "question": "What is the purpose of the thought experiment with a deck of cards?",\n  "answer": "To assess the information content of statements about the cards."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 199, 'completion_tokens': 89, 'total_tokens': 288}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 33%|███▎      | 5/15 [00:27<00:55,  5.53s/it]

Received response: {'id': 'chatcmpl-8x1YB5LrGbUzFkR95syPjPg11Z1y0', 'object': 'chat.completion', 'created': 1709078003, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "If we read through these thought experiments, we see a natural idea. As a starting point, rather than caring about the knowledge, we may build off the idea that information represents the degree of surprise or the abstract possibility of the event. For example, if we want to describe an unusual event, we need a lot information. For a common event, we may not need much information.",\n  "question": "What does information represent according to the thought experiments?",\n  "answer": "Information represents the degree of surprise or the abstract possibility of the event."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 231, 'completion_tokens': 118, 'total_tokens': 349}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {

 40%|████      | 6/15 [00:32<00:48,  5.41s/it]

Received response: {'id': 'chatcmpl-8x1YIF6EzEJuutFt2pdBuRVGgxQY2', 'object': 'chat.completion', 'created': 1709078010, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "The information we gain by observing a random variable does not depend on what we call the elements, or the presence of additional elements which have probability zero.\\nThe information we gain by observing two random variables is no more than the sum of the information we gain by observing them separately.\\nIf they are independent, then it is exactly the sum.\\nThe information gained when observing (nearly) certain events is (nearly) zero.",\n  "question": "Does the information gained from observing two independent random variables equal the sum of the information gained from observing each one separately?",\n  "answer": "Yes."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 240, 'completion_tokens': 128, 'total_tokens': 368}, 'syst

 47%|████▋     | 7/15 [00:36<00:40,  5.02s/it]

Received response: {'id': 'chatcmpl-8x1YNGp0ikGdZPDhVCICjRWqgrEdk', 'object': 'chat.completion', 'created': 1709078015, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "While proving this fact is beyond the scope of our text, it is important to know that this uniquely determines the form that entropy must take. The only ambiguity that these allow is in the choice of fundamental units, which is most often normalized by making the choice we saw before that the information provided by a single fair coin flip is one bit.",\n  "question": "What unit is often used as the fundamental unit of information?",\n  "answer": "One bit."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 221, 'completion_tokens': 100, 'total_tokens': 321}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


 53%|█████▎    | 8/15 [00:41<00:34,  4.94s/it]

Received response: {'id': 'chatcmpl-8x1YRvZ1esSVRqhl6o1QwQorCBXJB', 'object': 'chat.completion', 'created': 1709078019, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "If P is a continuous random variable, then the story becomes much more complicated. However, if we additionally impose that P is supported on a finite interval (with all values between 0 and 1), then P has the highest entropy if it is the uniform distribution on that interval.",\n  "question": "Under what condition does a continuous random variable have the highest entropy?",\n  "answer": "A continuous random variable has the highest entropy if it is supported on a finite interval (with all values between 0 and 1) and is the uniform distribution on that interval."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 604, 'completion_tokens': 123, 'total_tokens': 727}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instr

 60%|██████    | 9/15 [00:52<00:40,  6.81s/it]

Received response: {'id': 'chatcmpl-8x1YW0FW81m8qiO2BVxupgYygNPgk', 'object': 'chat.completion', 'created': 1709078024, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "Mutual information is symmetric, i.e.,\\n\\\\(I(X, Y) = I(Y, X)\\\\)\\n.\\nMutual information is non-negative, i.e.,\\n\\\\(I(X, Y) \\\\geq 0\\\\)\\n.\\n\\\\(I(X, Y) = 0\\\\)\\nif and only if\\n\\\\(X\\\\)\\nand\\n\\\\(Y\\\\)\\nare\\nindependent. For example, if\\n\\\\(X\\\\)\\nand\\n\\\\(Y\\\\)\\nare independent,\\nthen knowing\\n\\\\(Y\\\\)\\ndoes not give any information about\\n\\\\(X\\\\)\\nand vice versa, so their mutual information is zero.\\nAlternatively, if\\n\\\\(X\\\\)\\nis an invertible function of\\n\\\\(Y\\\\)\\n,\\nthen\\n\\\\(Y\\\\)\\nand\\n\\\\(X\\\\)\\nshare all information and\\n\\\\[I(X, Y) = H(Y) = H(X).\\\\]",\n  "question": "What does it mean when the mutual information \\\\(I(X, Y)\\\\) between two variables X and Y

INFO [abs_llm_processor]: Attempt 1 failed, retrying...


Received response: {'id': 'chatcmpl-8x1YhWZsPxIbRWH5qKWRsy5Y0rZQ5', 'object': 'chat.completion', 'created': 1709078035, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "In this case, mutual information can help us resolve this ambiguity. We first find the group of words that each has a relatively large mutual information with the company Amazon, such as e-commerce, technology, and online. Second, we find another group of words that each has a relatively large mutual information with the Amazon rain forest, such as rain, forest, and tropical. When we need to disambiguate \\\\"\n                                                                                                                                                                                                                                                                                                                                               

 67%|██████▋   | 10/15 [04:00<05:13, 62.64s/it]

Received response: {'id': 'chatcmpl-8x1bcEteVVLOHopLtHDsZKbMwOdPA', 'object': 'chat.completion', 'created': 1709078216, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "In this case, mutual information can help us resolve this ambiguity. We first find the group of words that each has a relatively large mutual information with the company Amazon, such as e-commerce, technology, and online. Second, we find another group of words that each has a relatively large mutual information with the Amazon rain forest, such as rain, forest, and tropical. When we need to disambiguate “Amazon”, we can compare which group has more occurrence in the context of the word Amazon. In this case the article would go on to describe the forest, and make the context clear.",\n  "question": "What method is used to disambiguate the word \'Amazon\'?",\n  "answer": "Mutual information is used to disambiguate the word \'Amazon\'."\n}'},

 73%|███████▎  | 11/15 [04:30<03:31, 52.84s/it]

Received response: {'id': 'chatcmpl-8x1bi2S5dtZap6GLHAZecDlu35B6D', 'object': 'chat.completion', 'created': 1709078222, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "KL divergence is non-symmetric, i.e., there are\\n\\\\(P,Q\\\\)\\nsuch that\\n(22.11.22)\\n\\n\\\\[D_{\\\\textrm{KL}}(P\\\\|Q) \\\\neq D_{\\\\textrm{KL}}(Q\\\\|P).\\\\]\\nKL divergence is non-negative, i.e.,\\n(22.11.23)\\n\\n\\\\[D_{\\\\textrm{KL}}(P\\\\|Q) \\\\geq 0.\\\\]\\nNote that the equality holds only when\\n\\\\(P = Q\\\\)\\n.\\nIf there exists an\\n\\\\(x\\\\)\\nsuch that\\n\\\\(p(x) > 0\\\\)\\nand\\n\\\\(q(x) = 0\\\\)\\n, then\\n\\\\(D_{\\\\textrm{KL}}(P\\\\|Q) = \\\\infty\\\\)\\n.\\nThere is a close relationship between KL divergence and mutual\\ninformation. Besides the relationship shown in\\nFig. 22.11.1\\n,\\n\\\\(I(X, Y)\\\\)\\nis also\\nnumerically equivalent with the following terms:\\n\\\\(D_{\\\\textrm{KL}}(P(X, Y) \\\\

 80%|████████  | 12/15 [04:38<01:56, 38.99s/it]

Received response: {'id': 'chatcmpl-8x1cDLNZ83sv7P3S4LJfKYaHugUmH', 'object': 'chat.completion', 'created': 1709078253, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "Maximizing predictive probability of\\n\\\\(Q\\\\)\\nfor distribution\\n\\\\(P\\\\)\\n, (i.e.,\\n\\\\(E_{x \\\\sim P} [\\\\log (q(x))]\\\\)\\n);\\nMinimizing cross-entropy\\n\\\\(\\\\textrm{CE} (P, Q)\\\\)\\n;\\nMinimizing the KL divergence\\n\\\\(D_{\\\\textrm{KL}}(P\\\\|Q)\\\\)\\n.",\n  "question": "What are the three objectives mentioned in the context?",\n  "answer": "Maximizing predictive probability of Q for distribution P, minimizing cross-entropy CE (P, Q), and minimizing the KL divergence D_{\\\\textrm{KL}}(P\\\\|Q)."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 253, 'completion_tokens': 173, 'total_tokens': 426}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and it

 87%|████████▋ | 13/15 [04:41<00:56, 28.31s/it]

Received response: {'id': 'chatcmpl-8x1cKFkDrkjpo6fsTP5WL6DQ3vNWc', 'object': 'chat.completion', 'created': 1709078260, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "Information theory is a field of study about encoding, decoding, transmitting, and manipulating information.\\nEntropy is the unit to measure how much information is presented in different signals.\\nKL divergence can also measure the divergence between two distributions.\\nCross-entropy can be viewed as an objective function of multi-class classification. Minimizing cross-entropy loss is equivalent to maximizing the log-likelihood function.",\n  "question": "What is cross-entropy commonly used as in multi-class classification?",\n  "answer": "An objective function."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 241, 'completion_tokens': 113, 'total_tokens': 354}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"in

 93%|█████████▎| 14/15 [04:45<00:20, 20.96s/it]

Received response: {'id': 'chatcmpl-8x1cOfMNWwwhyuUf4Ow2icOFvaN0T', 'object': 'chat.completion', 'created': 1709078264, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "In 1948, Claude E. Shannon published A Mathematical Theory of\\nCommunication (Shannon, 1948) establishing the theory of\\ninformation. In his article, Shannon introduced the concept of\\ninformation entropy for the first time. We will begin our journey here.",\n  "question": "What concept did Claude E. Shannon introduce for the first time in his 1948 publication?",\n  "answer": "Information entropy."\n}'}, 'logprobs': None}], 'usage': {'prompt_tokens': 756, 'completion_tokens': 94, 'total_tokens': 850}, 'system_fingerprint': 'fp_8abb16fa4e'}
Making API call with data: {"instruction": "Generate one question and its corresponding answer based on context. Following the 


100%|██████████| 15/15 [05:08<00:00, 20.59s/it]

Received response: {'id': 'chatcmpl-8x1cSFnwh6h3Qqr03YAumFVDrKwuL', 'object': 'chat.completion', 'created': 1709078268, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': '{\n  "context": "22.11. Information Theory\\n22.11.1. Information\\n22.11.1.1. Self-information\\n22.11.2. Entropy\\n22.11.2.1. Motivating Entropy\\n22.11.2.2. Definition\\n22.11.2.3. Interpretations\\n22.11.2.4. Properties of Entropy\\n22.11.3. Mutual Information\\n22.11.3.1. Joint Entropy\\n22.11.3.2. Conditional Entropy\\n22.11.3.3. Mutual Information\\n22.11.3.4. Properties of Mutual Information\\n22.11.3.5. Pointwise Mutual Information\\n22.11.3.6. Applications of Mutual Information\\n22.11.4. Kullback–Leibler Divergence\\n22.11.4.1. Definition\\n22.11.4.2. KL Divergence Properties\\n22.11.4.3. Example\\n22.11.5. Cross-Entropy\\n22.11.5.1. Formal Definition\\n22.11.5.2. Properties\\n22.11.5.3. Cross-Entropy as An Objective Function of Multi-class Class




In [15]:
# output

### Format result into pandas table

In [16]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

for item in output:
    for i in item['output']:
        for response in i['response']:
            contexts.append(response['context'])
            questions.append(response['question'])
            answers.append(response['answer'])

df = pd.DataFrame({
    'context': contexts,
    'question': questions,
    'answer': answers
})

# Set display options
pd.set_option('display.max_colwidth', None)  # or use a specific width like 50
pd.set_option('display.width', 1000)

styled_df = df.style.set_properties(**{'text-align': 'left'}).set_table_styles([{
    'selector': 'th',
    'props': [('text-align', 'left')]
}])
styled_df

Unnamed: 0,context,question,answer
0,"In 1948, Claude E. Shannon published A Mathematical Theory of Communication (Shannon, 1948) establishing the theory of information. In his article, Shannon introduced the concept of information entropy for the first time. We will begin our journey here.",What concept did Claude E. Shannon introduce for the first time in his 1948 article?,The concept of information entropy.
1,"In 1948, Claude E. Shannon published A Mathematical Theory of Communication (Shannon, 1948) establishing the theory of information. In his article, Shannon introduced the concept of information entropy for the first time. We will begin our journey here.",What concept did Claude E. Shannon introduce for the first time in his 1948 article?,Information entropy.
2,"The universe is overflowing with information. Information provides a common language across disciplinary rifts: from Shakespeare’s Sonnet to researchers’ paper on Cornell ArXiv, from Van Gogh’s printing Starry Night to Beethoven’s music Symphony No. 5, from the first programming language Plankalkül to the state-of-the-art machine learning algorithms. Everything must follow the rules of information theory, no matter the format. With information theory, we can measure and compare how much information is present in different signals. In this section, we will investigate the fundamental concepts of information theory and applications of information theory in machine learning.",What is the common language that connects different disciplines according to the context?,Information.
3,"Consider the following thought experiment. We have a friend with a deck of cards. They will shuffle the deck, flip over some cards, and tell us statements about the cards. We will try to assess the information content of each statement.",What is the purpose of the thought experiment with a deck of cards?,To assess the information content of statements about the cards.
4,"If we read through these thought experiments, we see a natural idea. As a starting point, rather than caring about the knowledge, we may build off the idea that information represents the degree of surprise or the abstract possibility of the event. For example, if we want to describe an unusual event, we need a lot information. For a common event, we may not need much information.",What does information represent according to the thought experiments?,Information represents the degree of surprise or the abstract possibility of the event.
5,"The information we gain by observing a random variable does not depend on what we call the elements, or the presence of additional elements which have probability zero. The information we gain by observing two random variables is no more than the sum of the information we gain by observing them separately. If they are independent, then it is exactly the sum. The information gained when observing (nearly) certain events is (nearly) zero.",Does the information gained from observing two independent random variables equal the sum of the information gained from observing each one separately?,Yes.
6,"While proving this fact is beyond the scope of our text, it is important to know that this uniquely determines the form that entropy must take. The only ambiguity that these allow is in the choice of fundamental units, which is most often normalized by making the choice we saw before that the information provided by a single fair coin flip is one bit.",What unit is often used as the fundamental unit of information?,One bit.
7,"If P is a continuous random variable, then the story becomes much more complicated. However, if we additionally impose that P is supported on a finite interval (with all values between 0 and 1), then P has the highest entropy if it is the uniform distribution on that interval.",Under what condition does a continuous random variable have the highest entropy?,A continuous random variable has the highest entropy if it is supported on a finite interval (with all values between 0 and 1) and is the uniform distribution on that interval.
8,"Mutual information is symmetric, i.e., \(I(X, Y) = I(Y, X)\) . Mutual information is non-negative, i.e., \(I(X, Y) \geq 0\) . \(I(X, Y) = 0\) if and only if \(X\) and \(Y\) are independent. For example, if \(X\) and \(Y\) are independent, then knowing \(Y\) does not give any information about \(X\) and vice versa, so their mutual information is zero. Alternatively, if \(X\) is an invertible function of \(Y\) , then \(Y\) and \(X\) share all information and \[I(X, Y) = H(Y) = H(X).\]","What does it mean when the mutual information \(I(X, Y)\) between two variables X and Y is zero?","It means that X and Y are independent, implying that knowing Y does not give any information about X and vice versa."
9,"In this case, mutual information can help us resolve this ambiguity. We first find the group of words that each has a relatively large mutual information with the company Amazon, such as e-commerce, technology, and online. Second, we find another group of words that each has a relatively large mutual information with the Amazon rain forest, such as rain, forest, and tropical. When we need to disambiguate “Amazon”, we can compare which group has more occurrence in the context of the word Amazon. In this case the article would go on to describe the forest, and make the context clear.",What method is used to disambiguate the word 'Amazon'?,Mutual information is used to disambiguate the word 'Amazon'.
