***Original Link:*** https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-subdoc-summary/examples/subdoc-summary.ipynb

### **Sub-Document Summary Metadata Pack**

This LlamaPack provides an advanced technique for injecting each chunk with "sub-document" metadata. This context augmentation technique is helpful for both retrieving relevant context and for synthesizing correct answers.

It is a step beyond simply adding a summary of the document as the metadata to each chunk. Within a long document, there can be multiple distinct themes, and we want each chunk to be grounded in global but relevant context.

In [1]:
%pip install -q llama-index-packs-subdoc-summary llama-index-llms-openai llama-index-embeddings-openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m36.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.0/108.0 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m71.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [8]:
!pip install -q llama-index openai

### **Load & Setup Data**

In [3]:
!mkdir -p 'data/'
!curl 'https://arxiv.org/pdf/1406.2661.pdf' -o 'data/Generative Adversarial Nets.pdf'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  6  518k    6 32819    0     0   172k      0  0:00:03 --:--:--  0:00:03  172k100  518k  100  518k    0     0  2273k      0 --:--:-- --:--:-- --:--:-- 2272k


In [4]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

### **Run the Sub-Document Summary Metadata Pack**

In [9]:
from llama_index.packs.subdoc_summary import SubDocSummaryPack
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
import openai

OPENAI_API_KEY = "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"
openai.api_key = OPENAI_API_KEY

subdoc_summary_pack = SubDocSummaryPack(
    documents,
    parent_chunk_size = 8192,  # default,
    child_chunk_size = 512,  # default
    llm = OpenAI(model = "gpt-3.5-turbo"),
    embed_model = OpenAIEmbedding(),
)

In [10]:
from IPython.display import Markdown, display
from llama_index.core.response.notebook_utils import display_source_node

#### **Example - 1**

In [12]:
response = subdoc_summary_pack.run("Summarize the GAN paper.")
display(Markdown(str(response)))

The paper discusses Generative Adversarial Networks (GANs), which involve a two-player minimax game between a generator and a discriminator. The training process alternates between optimizing the discriminator and the generator iteratively to prevent overfitting and ensure effective learning. GANs aim to recover the data generating distribution without the need for Markov chains, relying on backpropagation for training gradients. The framework of GANs allows for straightforward extensions such as conditional generative models and learned approximate inference. The advantages of GANs include computational efficiency, statistical advantages from indirect updates through gradients, and the ability to represent sharp distributions compared to methods relying on Markov chains.

In [13]:
for n in response.source_nodes:
    display_source_node(n, source_length=10000, metadata_mode="all")

**Node ID:** b97c78fc-c41b-45e4-b992-785e48f7c61b<br>**Similarity:** 0.8636541927459149<br>**Text:** page_label: 3
file_name: Generative Adversarial Nets.pdf
file_path: data/Generative Adversarial Nets.pdf
file_type: application/pdf
file_size: 530482
creation_date: 2024-02-26
last_modified_date: 2024-02-26
last_accessed_date: 2024-02-26
context_summary: Generative Adversarial Networks (GANs) involve a two-player minimax game between a generator and a discriminator, aiming to recover the data generating distribution. The training process alternates between optimizing the discriminator and the generator iteratively to prevent overfitting and ensure effective learning.

x
z
X
Z
X
Z
. . .
X
Z
(a) (b) (c) (d)
Figure 1: Generative adversarial nets are trained by simultaneously updating the discriminative distribution
(D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black,
dotted line)pxfrom those of the generative distribution pg(G) (green, solid line). The lower horizontal line is
the domain from which zis sampled, in this case uniformly. The horizontal line above is part of the domain
ofx. The upward arrows show how the mapping x=G(z)imposes the non-uniform distribution pgon
transformed samples. Gcontracts in regions of high density and expands in regions of low density of pg. (a)
Consider an adversarial pair near convergence: pgis similar to pdataandDis a partially accurate classiﬁer.
(b) In the inner loop of the algorithm Dis trained to discriminate samples from data, converging to D∗(x) =
pdata(x)
pdata(x)+pg(x). (c) After an update to G, gradient of Dhas guidedG(z)to ﬂow to regions that are more likely
to be classiﬁed as data. (d) After several steps of training, if GandDhave enough capacity, they will reach a
point at which both cannot improve because pg=pdata. The discriminator is unable to differentiate between
the two distributions, i.e. D(x) =1
2.
4 Theoretical Results
The generator Gimplicitly deﬁnes a probability distribution pgas the distribution of the samples
G(z)obtained when z∼pz. Therefore, we would like Algorithm 1 to converge to a good estimator
ofpdata, if given enough capacity and training time. The results of this section are done in a non-
parametric setting, e.g. we represent a model with inﬁnite capacity by studying convergence in the
space of probability density functions.
We will show in section 4.1 that this minimax game has a global optimum for pg=pdata. We will
then show in section 4.2 that Algorithm 1 optimizes Eq 1, thus obtaining the desired result.
3<br>

**Node ID:** 12ac3e22-0e62-4f9a-a154-4c504401733e<br>**Similarity:** 0.855367213245717<br>**Text:** page_label: 7
file_name: Generative Adversarial Nets.pdf
file_path: data/Generative Adversarial Nets.pdf
file_type: application/pdf
file_size: 530482
creation_date: 2024-02-26
last_modified_date: 2024-02-26
last_accessed_date: 2024-02-26
context_summary: The context discusses generative adversarial networks (GANs) and their advantages and disadvantages compared to other generative modeling approaches. GANs use a generator and a discriminator to learn the data distribution without needing Markov chains, relying on backpropagation for training gradients.

Table 2 summarizes
the comparison of generative adversarial nets with other generative modeling approaches.
The aforementioned advantages are primarily computational. Adversarial models may also gain
some statistical advantage from the generator network not being updated directly with data exam-
ples, but only with gradients ﬂowing through the discriminator. This means that components of the
input are not copied directly into the generator’s parameters. Another advantage of adversarial net-
works is that they can represent very sharp, even degenerate distributions, while methods based on
Markov chains require that the distribution be somewhat blurry in order for the chains to be able to
mix between modes.
7 Conclusions and future work
This framework admits many straightforward extensions:
1. A conditional generative modelp(x|c)can be obtained by adding cas input to both GandD.
2.Learned approximate inference can be performed by training an auxiliary network to predict z
given x. This is similar to the inference net trained by the wake-sleep algorithm [15] but with
the advantage that the inference net may be trained for a ﬁxed generator net after the generator
net has ﬁnished training.
7<br>

#### **Example - 2**

In [14]:
response = subdoc_summary_pack.run("What is the discriminator loss and generator loss?")
display(Markdown(str(response)))

The discriminator loss is defined as Ex∼pdata[logD∗G(x)] + Ex∼pg[log(1−D∗G(x))], while the generator loss is Ex∼pdata[log(1−D∗G(x))].

In [15]:
for n in response.source_nodes:
    display_source_node(n, source_length=10000, metadata_mode="all")

**Node ID:** b97c78fc-c41b-45e4-b992-785e48f7c61b<br>**Similarity:** 0.836854163445305<br>**Text:** page_label: 3
file_name: Generative Adversarial Nets.pdf
file_path: data/Generative Adversarial Nets.pdf
file_type: application/pdf
file_size: 530482
creation_date: 2024-02-26
last_modified_date: 2024-02-26
last_accessed_date: 2024-02-26
context_summary: Generative Adversarial Networks (GANs) involve a two-player minimax game between a generator and a discriminator, aiming to recover the data generating distribution. The training process alternates between optimizing the discriminator and the generator iteratively to prevent overfitting and ensure effective learning.

x
z
X
Z
X
Z
. . .
X
Z
(a) (b) (c) (d)
Figure 1: Generative adversarial nets are trained by simultaneously updating the discriminative distribution
(D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black,
dotted line)pxfrom those of the generative distribution pg(G) (green, solid line). The lower horizontal line is
the domain from which zis sampled, in this case uniformly. The horizontal line above is part of the domain
ofx. The upward arrows show how the mapping x=G(z)imposes the non-uniform distribution pgon
transformed samples. Gcontracts in regions of high density and expands in regions of low density of pg. (a)
Consider an adversarial pair near convergence: pgis similar to pdataandDis a partially accurate classiﬁer.
(b) In the inner loop of the algorithm Dis trained to discriminate samples from data, converging to D∗(x) =
pdata(x)
pdata(x)+pg(x). (c) After an update to G, gradient of Dhas guidedG(z)to ﬂow to regions that are more likely
to be classiﬁed as data. (d) After several steps of training, if GandDhave enough capacity, they will reach a
point at which both cannot improve because pg=pdata. The discriminator is unable to differentiate between
the two distributions, i.e. D(x) =1
2.
4 Theoretical Results
The generator Gimplicitly deﬁnes a probability distribution pgas the distribution of the samples
G(z)obtained when z∼pz. Therefore, we would like Algorithm 1 to converge to a good estimator
ofpdata, if given enough capacity and training time. The results of this section are done in a non-
parametric setting, e.g. we represent a model with inﬁnite capacity by studying convergence in the
space of probability density functions.
We will show in section 4.1 that this minimax game has a global optimum for pg=pdata. We will
then show in section 4.2 that Algorithm 1 optimizes Eq 1, thus obtaining the desired result.
3<br>

**Node ID:** 7dc8ad80-2e98-4102-a79e-828e765b3225<br>**Similarity:** 0.8289804466795243<br>**Text:** page_label: 4
file_name: Generative Adversarial Nets.pdf
file_path: data/Generative Adversarial Nets.pdf
file_type: application/pdf
file_size: 530482
creation_date: 2024-02-26
last_modified_date: 2024-02-26
last_accessed_date: 2024-02-26
context_summary: The context discusses the training process of generative adversarial nets using minibatch stochastic gradient descent, where a generator and discriminator are trained simultaneously in a minimax game. The optimal discriminator for a fixed generator is derived, showing that it should output the ratio of the data generating distribution to the sum of the data generating and noise distributions.

The discriminator does not need to be deﬁned outside of Supp (pdata)∪Supp (pg),
concluding the proof.
Note that the training objective for Dcan be interpreted as maximizing the log-likelihood for es-
timating the conditional probability P(Y=y|x), whereYindicates whether xcomes from pdata
(withy= 1) or frompg(withy= 0). The minimax game in Eq. 1 can now be reformulated as:
C(G) = max
DV(G,D )
=Ex∼pdata[logD∗
G(x)] +Ez∼pz[log(1−D∗
G(G(z)))] (4)
=Ex∼pdata[logD∗
G(x)] +Ex∼pg[log(1−D∗
G(x))]
=Ex∼pdata[
logpdata(x)
Pdata(x) +pg(x)]
+Ex∼pg[
logpg(x)
pdata(x) +pg(x)]
4<br>

                                            -: END : -