# Sub-Document Summary Metadata Pack

This LlamaPack provides an advanced technique for injecting each chunk with “sub-document” metadata. This context augmentation technique is helpful for both retrieving relevant context and for synthesizing correct answers.

It is a step beyond simply adding a summary of the document as the metadata to each chunk. Within a long document, there can be multiple distinct themes, and we want each chunk to be grounded in global but relevant context.

Source: https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-subdoc-summary/examples/subdoc-summary.ipynb Video: https://www.youtube.com/watch?v=m6P1Rp91AzM&t=1s

## Setup Data

In [None]:
!mkdir -p 'data/'
!curl 'https://arxiv.org/pdf/2307.09288.pdf' -o 'data/llama2.pdf'

811.82s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
817.00s - pydevd: Sending message related to process being replaced timed-out after 5 seconds

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.0M  100 13.0M    0     0  27.7M      0 --:--:-- --:--:-- --:--:-- 28.0M

In [None]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

## Run the Sub-Document Summary Metadata Pack

In [None]:
%pip install llama-index-packs-subdoc-summary llama-index-llms-openai llama-index-embeddings-openai

In [None]:
from llama_index.packs.subdoc_summary import SubDocSummaryPack
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

subdoc_summary_pack = SubDocSummaryPack(
    documents,
    parent_chunk_size=8192,  # default,
    child_chunk_size=512,  # default
    llm=OpenAI(model="gpt-3.5-turbo"),
    embed_model=OpenAIEmbedding(),
)

In [None]:
from IPython.display import Markdown, display
from llama_index.core.response.notebook_utils import display_source_node

response = subdoc_summary_pack.run("How was Llama2 pretrained?")
display(Markdown(str(response)))
for n in response.source_nodes:
    display_source_node(n, source_length=10000, metadata_mode="all")

In [None]:
from IPython.display import Markdown, display

response = subdoc_summary_pack.run(
    "What is the functionality of latest ChatGPT memory."
)
display(Markdown(str(response)))

for n in response.source_nodes:
    display_source_node(n, source_length=10000, metadata_mode="all")