# Summarisation

In [1]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [3]:
len(docs)

1

In [4]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)
print(f"Generated {len(split_docs)} documents.")

Created a chunk of size 1003, which is longer than the specified 1000


Generated 14 documents.


In [6]:
from typing import NamedTuple

class Prompt(NamedTuple):
    text: str
    role: str

In [None]:
def summary_prompt(text:str) -> Prompt:
    return {"role": "system", "content": f"Generate a concise summary of the following text:\n{text}"}

In [35]:
def reduce_prompt(summaries:list) -> Prompt:
    return {"role": "system", "content": f"Reduce the following list of summaries into a single cohesive summary:\n\n{'\n'.join(summaries)}"}

In [11]:
from openai import OpenAI

class LLM:
    def __init__(self):
        self.client = OpenAI()

    def __call__(self, prompt: Prompt) -> tuple[str, int]:
        completion =  self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                prompt
            ],
        )
        message = completion.choices[0].message.content
        n_tokens = completion.usage.completion_tokens
        return message, n_tokens

In [15]:
split_docs[0].page_content

"LLM Powered Autonomous Agents | Lil'Log\n\nLil'Log\n\n|\n\n\nPosts\n\n\nArchive\n\n\nSearch\n\n\nTags\n\n\nFAQ\n\n\nemojisearch.app\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\

In [None]:
LLM()(summary_prompt(split_docs[0]))

('The article "LLM Powered Autonomous Agents" by Lilian Weng discusses the concept of building autonomous agents that utilize large language models (LLMs) as their core controllers. It highlights various proof-of-concept demonstrations such as AutoGPT, GPT-Engineer, and BabyAGI, showcasing the potential of LLMs as effective problem solvers beyond mere text generation.\n\nThe article is structured into three main components of LLM-powered agent systems:\n\n1. **Planning**: Agents can break down complex tasks into manageable subgoals and engage in self-reflection to learn from past actions and enhance future performance.\n\n2. **Memory**: The agents have short-term memory for in-context learning and long-term memory to retain and recall information over extended periods, typically using an external vector store for efficient retrieval.\n\n3. **Tool Use**: Agents adapt to call external APIs to acquire missing information from their models, thus extending their capabilities with current da

In [29]:
import math
from tqdm import tqdm

llm = LLM()
token_max = 1000

def make_summaries(docs:list[str]) -> tuple[list[str], list[int]]:
    summaries = []
    lengths = []
    for doc in tqdm(docs):
        summary, length = llm(summary_prompt(doc))
        summaries.append(summary)
        lengths.append(length)
    return summaries, lengths

length = math.inf
summaries = [d.page_content for d in split_docs]
count = 0
while length > token_max and count < 3:
    new_summaries, lengths = make_summaries(summaries)
    length = sum(lengths)
    if length <= token_max:
        summaries = new_summaries
    else:
        summaries = []
        for l,s in zip(lengths, new_summaries):
            text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
                chunk_size=l//2, chunk_overlap=l//8
            )
            summaries += text_splitter.split_text(s)
    print(f"{count}: {len(summaries)} summaries, total length: {length}")
    count += 1

100%|██████████| 14/14 [00:51<00:00,  3.68s/it]
Created a chunk of size 92, which is longer than the specified 90


0: 23 summaries, total length: 2740


100%|██████████| 23/23 [01:02<00:00,  2.73s/it]
Created a chunk of size 94, which is longer than the specified 56
Created a chunk of size 51, which is longer than the specified 42


1: 27 summaries, total length: 2553


100%|██████████| 27/27 [00:46<00:00,  1.74s/it]


2: 27 summaries, total length: 2423


 37%|███▋      | 10/27 [00:14<00:24,  1.45s/it]


KeyboardInterrupt: 

In [30]:
summaries

['The text discusses the development of autonomous agents that function with large language models (LLMs) as their primary control system. It highlights the capabilities of LLMs, including planning, memory utilization, and tool use, and details the crucial components that constitute a system driven by these models, emphasizing their importance in the operation of autonomous agents.',
 'The text discusses three main functions of agents, highlighting their roles and responsibilities. However, specific details about these functions are not provided in the given summary.',
 'Agents enhance their performance through planning by dividing large tasks into manageable goals and reflecting on past actions. They utilize both short-term and long-term memory for learning and information retrieval, often relying on external storage to improve recall. Additionally, agents increase their capabilities by accessing external APIs for current information and executing code, which supplements their existin

In [32]:
lengths, sum(lengths)

([68,
  31,
  67,
  23,
  51,
  42,
  123,
  100,
  79,
  85,
  189,
  143,
  164,
  61,
  98,
  92,
  153,
  89,
  101,
  84,
  101,
  24,
  55,
  22,
  106,
  144,
  128],
 2423)

In [38]:
llm(reduce_prompt(summaries))

("The text explores the development and functionalities of autonomous agents powered by large language models (LLMs), outlining their capabilities in reasoning, planning, and integration with memory and actions. Key components of these agents include planning techniques such as Chain of Thought and Tree of Thoughts for task decomposition, and self-reflection methods like ReAct and Reflexion to enhance performance through learning from past actions. The text discusses various algorithms to improve LLM capabilities, including Chain of Hindsight and Algorithm Distillation, as well as the importance of external tools and APIs to enhance their functionality.\n\nSeveral proof-of-concept demonstrations, such as AutoGPT and BabyAGI, showcase LLMs' abilities to address complex problems, particularly in fields like organic synthesis and cancer drug discovery. The text further analyzes memory types and their relevance in AI learning processes, emphasizing the trade-offs between speed and accuracy