**Note:** This notebook uses OpenAI's models (for chat completion and embeddings), which require a paid OpenAI account. You can replace these models with open-source alternatives. For instance, the [Instructor package](https://python.useinstructor.com/) supports various models. Similarly, [LangChain](https://python.langchain.com/v0.1/docs/integrations/llms/) also supports other models. If you're open to spending a bit on OpenAI's models, here’s the [pricing](https://openai.com/api/pricing/). Learning their API can also be a valuable experience.

In [6]:
import json

import instructor
from openai import OpenAI
from pydantic import BaseModel

# Structured output

## Regular output

In [9]:
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Describe Spider man"},
    ],
)

print(response.choices[0].message.content)

Spider-Man is a fictional superhero created by writer Stan Lee and artist Steve Ditko for Marvel Comics. His real name is Peter Parker, a high school student who gains superhuman abilities after being bitten by a radioactive spider. 

Spider-Man is known for his red and blue costume with a web design, as well as his ability to crawl on walls, shoot webs from his wrists, and have a "spider-sense" that warns him of danger. He is also known for his wit and sense of humor, often cracking jokes during battles with villains.

Peter Parker juggles his responsibilities as Spider-Man with his personal life, often struggling with the balance between his superhero duties and his relationships. He is often seen as a symbol of empowerment and responsibility, as he uses his powers to protect New York City from various threats while also dealing with his own internal struggles and insecurities.


It's impossible to have a reliable automation based on LLM's output if it's unstructured. Pydantic data classes to the rescue.

For more information, [OpenAI's API reference for chat completion](https://platform.openai.com/docs/api-reference/chat/create).

## Pydantic data model

In [10]:
class Character(BaseModel):
    """Details of a character"""

    name: str
    real_name: str
    age: int


schema = Character.model_json_schema()
print(json.dumps(schema, indent=4))

{
    "description": "Details of a character",
    "properties": {
        "name": {
            "title": "Name",
            "type": "string"
        },
        "real_name": {
            "title": "Real Name",
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        }
    },
    "required": [
        "name",
        "real_name",
        "age"
    ],
    "title": "Character",
    "type": "object"
}


About [Pydantic data model](https://docs.pydantic.dev/latest/concepts/models/).

## Structured output

In [11]:
client = instructor.patch(OpenAI())

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Describe Spider man"},
    ],
    response_model=Character,
)

print(response.model_dump_json(indent=4))

{
    "name": "Spider Man",
    "real_name": "Peter Parker",
    "age": 24
}


Learn more about structured output with LLMs: https://github.com/wandb/edu/tree/main/llm-structured-extraction

# RAG

Based on https://aiplanet.com/learn/rag-agents-bootcamp/module-3/2439/notebook-simple-rag-using-langchain

## Load and split the PDF document

In [None]:
!pip -q install langchain langchain_community pypdf fastembed chromadb

In [12]:
from langchain_community.document_loaders.pdf import PyPDFLoader

# Book: https://www.manning.com/books/build-a-large-language-model-from-scratch
loader = PyPDFLoader("uu/Build_a_Large_Language_Model_(From_Scrat.pdf")
data = loader.load()
len(data)

370

I.e., 370 pages of a PDF were read. Let's check one of the page 17:

In [13]:
print(data[16].page_content)

xvabout this book
Build a Large Language Model (From Scratch)  was written to help you understand and
c r e at e  y o ur  o w n G P T- l ik e  l ar g e  la ng u ag e  m o d e ls  ( L L M s )  f r o m  t h e  gr o u nd  u p.  I tbegins by focusing on the fu ndamentals of working with text data and coding atten-
tion mechanisms and then guides you through implementing a complete GPT
model from scratch. The book then cove rs the pretraining mechanism as well as
fine-tuning for specific tasks such as text classification and following instructions. By
the end of this book, you’ll have a deep understanding of how LLMs work and the
skills to build your own models. While the models you’ll create are smaller in scalecompared to the large founda tional models, they use the same concepts and serve
as powerful educational tools to grasp th e core mechanisms and techniques used in
building state-of-the-art LLMs.
Who should read this book
Build a Large Language  Model (From Scratch)  is for machine 

PDF pages are too large and need to be split into chunks:

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 512,
chunk_overlap = 50,
)

chunks = text_splitter.split_documents(data)
len(chunks)

1717

In [15]:
print(chunks[100].page_content)

put text. In a translation task, for example, the encoder would encode the text fromthe source language into vectors, and the decoder would decode these vectors to gen-
erate text in the target language. Both th e encoder and decoder consist of many layers
connected by a so-called self-attention mechanism. You may have many questionsregarding how the inputs are preprocessed an d encoded. These will be addressed in a
step-by-step implementation in subsequent chapters.


## Embeddings and a vector database

The next step is to embed all of these chunks and add them to a vector database

In [16]:
from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()

Text run for a single chunk:

In [17]:
embedding = embedding_model.embed_documents(chunks[37].page_content)
len(embedding[0])

1536

This is the size of the embedding. Read more [here](https://openai.com/index/new-embedding-models-and-api-updates/). Let's see the first 10 values:

In [18]:
embedding[0][:10]

[-0.013922779820859432,
 -0.014787135645747185,
 0.002779236761853099,
 -0.030584903433918953,
 -0.03484019264578819,
 0.010199400596320629,
 -0.0164626557379961,
 -0.01055844034999609,
 -0.004241993185132742,
 -0.03172851353883743]

Create a vector database, embed all chunks and add them to the vector database:

In [19]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(chunks, embedding_model, persist_directory="./uu/chroma_db")

## Test run: retrieve relevant documents

Define User Query:



In [20]:
query = "who should read the LLM from scratch book?"

Let's retrieve 4 most similar documents:

In [21]:
docs = db.similarity_search(query, k=4)
for i, doc in enumerate(docs):
    print(f"Document {i+1}:")
    print(f"Page {doc.metadata['page']}")
    print(doc.page_content)
    print()

Document 1:
Page 16
Who should read this book
Build a Large Language  Model (From Scratch)  is for machine learning enthusiasts, engi-
neers, researchers, students , and practitioners who want to gain a deep understand-
ing of how LLMs work and learn to buil d their own models from scratch. Both
beginners and experienced developers will be able to use their existing skills and
knowledge to grasp the co ncepts and techniques used in creating LLMs.

Document 2:
Page 16
Who should read this book
Build a Large Language  Model (From Scratch)  is for machine learning enthusiasts, engi-
neers, researchers, students , and practitioners who want to gain a deep understand-
ing of how LLMs work and learn to buil d their own models from scratch. Both
beginners and experienced developers will be able to use their existing skills and
knowledge to grasp the co ncepts and techniques used in creating LLMs.

Document 3:
Page 16
Who should read this book
Build a Large Language  Model (From Scratch)  is f

## Create a simple RAG solution

Initialize the retriever

In [22]:
retriever = db.as_retriever(
    search_type="mmr", # Maximum Marginal Relevance, e.g., https://docs.llamaindex.ai/en/stable/examples/vector_stores/SimpleIndexDemoMMR/
    search_kwargs={'k': 4}
)

Define LLM and Chain Components:



In [23]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [25]:
template = """
<s>
[INST]
You are an AI Assistant that follows instructions extremely well.
Please be truthful and give direct answers. Please tell 'I don't know' if user query is not in CONTEXT
[/INST]

CONTEXT: {context}
</s>

[INST]
{query}
[/INST]
"""

In [26]:
llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = ChatPromptTemplate.from_template(template)
output_parser = StrOutputParser()

In [27]:
chain = (
{"context": retriever, "query": RunnablePassthrough()}
| prompt
| llm
| output_parser
)

In [28]:
response = chain.invoke(query)
print(response)

The book "Build a Large Language Model (From Scratch)" is intended for machine learning enthusiasts, engineers, researchers, students, and practitioners who want to gain a deep understanding of how LLMs work and learn to build their own models from scratch. Both beginners and experienced developers will be able to use their existing skills and knowledge to grasp the concepts and techniques used in creating LLMs.


In [33]:
print(chain.invoke("Which page talks about the architecture of GPT and what does it say?"))

The architecture of GPT is discussed on page 30 of the document. It states that the GPT architecture is designed for tasks that require generating texts, including machine translation, text summarization, fiction writing, and writing computer code. It also mentions that GPT models are primarily designed and trained to perform text completion tasks and show remarkable versatility in their capabilities, being adept at executing both zero-shot and few-shot learning tasks.
