### Notebook On Understanding How A Large Language Model Works

##### 1. Text is converted into tokens

In [1]:
text = 'Data Engineering is about preparing data for analytical workload.'

In [4]:
tokens = text.split(' ')

In [5]:
tokens

['Data',
 'Engineering',
 'is',
 'about',
 'preparing',
 'data',
 'for',
 'analytical',
 'workload.']

#### 2. Tokens are mapped to embeddings (vectors)

In [7]:
import math

car = [0.9, 0.1]
truck = [0.85, 0.15]
banana = [-0.8, 0.2]

def cosine_similarity(v1, v2):
    dot = sum(a*b for a,b in zip(v1,v2))
    mag1 = math.sqrt(sum(a*a for a in v1))
    mag2 = math.sqrt(sum(b*b for b in v2))
    return dot / (mag1 * mag2)

print("car vs truck:", cosine_similarity(car, truck))
print("car vs banana:", cosine_similarity(car, banana))


car vs truck: 0.9979517409161514
car vs banana: -0.9374252720097651


Step 3: Transformer Processes Context Using Attention

I deposited money in the bank. bank > financial institution {deposited, money}


I sat near the river bank. > Natural landscape { river bank }


####  Demo On Attention

In [9]:
%pip install numpy

import numpy as np

# word embeddings (fake)
words = ["I", "love", "data"]
embeddings = np.array([
    [1, 0, 1],   # I
    [1, 1, 0],   # love
    [0, 1, 1]    # data
])

# Query, Key, Value = same (self-attention)
Q = embeddings
K = embeddings
V = embeddings

# Attention scores
scores = Q @ K.T

# Softmax
def softmax(x):
    e = np.exp(x - np.max(x))
    return e / e.sum(axis=1, keepdims=True)

weights = softmax(scores)

# Output
output = weights @ V

print("Attention Weights:\n", weights)
print("\nFinal Representations:\n", output)


Note: you may need to restart the kernel to use updated packages.
Attention Weights:
 [[0.57611688 0.21194156 0.21194156]
 [0.21194156 0.57611688 0.21194156]
 [0.21194156 0.21194156 0.57611688]]

Final Representations:
 [[0.78805844 0.42388312 0.78805844]
 [0.78805844 0.78805844 0.42388312]
 [0.42388312 0.78805844 0.78805844]]


In [10]:
tokens

['Data',
 'Engineering',
 'is',
 'about',
 'preparing',
 'data',
 'for',
 'analytical',
 'workload.']

In [12]:
'Data engineering is'

'Data engineering is'

In [13]:
import random

next_token_probs = {
    "Data engineering is": [
        ("important", 0.45),
        ("challenging", 0.30),
        ("fun", 0.15),
        ("boring", 0.10)
    ]
}

def predict_next(text):
    words, probs = zip(*next_token_probs[text])
    return random.choices(words, probs)[0]

for _ in range(5):
    print("Data engineering is", predict_next("Data engineering is"))


Data engineering is fun
Data engineering is important
Data engineering is boring
Data engineering is important
Data engineering is important


#### bedrock architecture

Client (App / Notebook / Lambda)
        |
        v
AWS SDK / API (InvokeModel)
        |
        v
-----------------------------
| Amazon Bedrock Service   |
|--------------------------|
| Control Plane            |
| Runtime Plane            |
-----------------------------
        |        
        v
Foundation Models
(Claude, Llama, Titan, etc.)


#### Basics Of Prompt Engineering

Strategies

* Act like someone -> assign a persona
* Better or Detailed Context -> 
* Automated Incorporation Of Context ->
* Giving Examples -> Few-Shot prompting
* Tone -> 
* Evaluate the results before it gives an output -> Giving the model more time to think and process the output before it is returned. [Avoid Hallucination]
* Chain-Of-Thought prompting -> 
* Constraint -> 

What comprises a good prompt ?


Instruction + Context + Constraints

In [None]:
prompt = 'Explain ETL to a beginner' # Instruction + context

In [None]:
prompt = 'Explain ETL to a beginner in less than 30 words ' # Instruction + context + constraints

In [7]:

import boto3
import json

from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-west-2")


def invoke_llama_model(prompt,temperature=0.5,max_tokens=512):

    # Set the model ID, e.g., Llama 3 70b Instruct.

    model_id = "meta.llama3-70b-instruct-v1:0"

    # Embed the prompt in Llama 3's instruction format.
    formatted_prompt = f"""
    <|begin_of_text|><|start_header_id|>user<|end_header_id|>
    {prompt}
    <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """

    # Format the request payload using the model's native structure.
    native_request = {
        "prompt": formatted_prompt,
        "max_gen_len": max_tokens,
        "temperature": temperature,
    }

    # Convert the native request to JSON.
    request = json.dumps(native_request)

    try:
        # Invoke the model with the request.
        response = client.invoke_model(modelId=model_id, body=request)

    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

    # Decode the response body.
    model_response = json.loads(response["body"].read())

    # Extract and print the response text.
    response_text = model_response["generation"]
    
    return response_text


In [2]:

# Define the prompt for the model.
prompt = "Explain ETL in simple terms for a junior data engineer with Python knowledge."

In [3]:
output_text = invoke_llama_model(prompt)

In [5]:
print(output_text)

 As a junior data engineer with Python knowledge, you're already halfway there! ETL is a fundamental concept in data engineering, and I'm happy to explain it in simple terms.

**What is ETL?**

ETL stands for Extract, Transform, Load. It's a process used to move data from multiple sources to a centralized location, such as a database or data warehouse, in a standardized format. Think of it like a pipeline that helps you collect, clean, and prepare data for analysis or other uses.

**The Three Steps of ETL:**

1. **Extract**: This is the first step, where you "extract" or collect data from various sources, such as:
	* Databases (e.g., MySQL, PostgreSQL)
	* Files (e.g., CSV, JSON, Excel)
	* APIs (e.g., Twitter, Facebook)
	* Other systems (e.g., CRM, ERP)

You'll use Python libraries like `pandas`, `sqlalchemy`, or `requests` to connect to these sources and retrieve the data.

2. **Transform**: In this step, you "transform" the extracted data into a standardized format, making it consiste

#### Introduction To Embeddings

In [1]:
import boto3
import json
import numpy as np
from numpy.linalg import norm


client = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1"
)


def generate_embedding(input_text):
    
    # Set the model ID, e.g., Titan Text Embeddings V2.
    model_id = "amazon.titan-embed-text-v2:0"

    # Create the request for the model.
    native_request = {"inputText": input_text}

    # Convert the native request to JSON.
    request = json.dumps(native_request)

    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=request)

    # Decode the model's native response body.
    model_response = json.loads(response["body"].read())

    # Extract and print the generated embedding and the input text token count.
    # embedding = np.array(model_response["embedding"])
    embedding = model_response["embedding"]

    input_token_count = model_response["inputTextTokenCount"]

    return embedding

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (norm(vec1) * norm(vec2))




In [2]:
documents = [
    "AWS Bedrock provides foundation models for GenAI",
    "Amazon Bedrock is used to build AI-powered applications",
    "S3 is an object storage service",
    "EC2 provides scalable virtual servers",
    "I love playing football on weekends"
]


In [3]:
query = "How can I build generative AI apps on AWS?"

In [4]:
embeddings_1st_document = generate_embedding(input_text=documents[0])

In [6]:
len(embeddings_1st_document)

1024

In [7]:
embeddings_1st_document

[-0.07949460297822952,
 0.030731264501810074,
 -0.02727186307311058,
 -0.02470552548766136,
 -0.045918285846710205,
 0.03651558235287666,
 -0.0007422379567287862,
 -0.06103047356009483,
 -0.004920692648738623,
 -0.03857846558094025,
 -0.014252245426177979,
 0.022503286600112915,
 -0.02800152823328972,
 0.004751893226057291,
 -0.03129654750227928,
 -0.023151861503720284,
 0.011659901589155197,
 -0.03913196921348572,
 0.051816198974847794,
 0.06175302341580391,
 0.014440671540796757,
 0.054469142109155655,
 0.017102694138884544,
 0.011674132198095322,
 0.006063031032681465,
 -0.02420109324157238,
 -0.036562688648700714,
 -0.043070290237665176,
 -0.0053103044629096985,
 -0.006639107596129179,
 0.008288824930787086,
 0.03486979007720947,
 -0.049595557153224945,
 0.01225364301353693,
 0.05425912141799927,
 -0.06444203108549118,
 0.14650984108448029,
 -0.02944527566432953,
 -0.06131630390882492,
 0.03967762365937233,
 0.021629849448800087,
 0.054367076605558395,
 -0.0013997697969898582,
 0.0

In [8]:
doc_embeddings = [generate_embedding(doc) for doc in documents]
query_embedding = generate_embedding(query)


In [10]:
len(doc_embeddings)

5

In [11]:
results = []

for doc, emb in zip(documents, doc_embeddings):
    score = cosine_similarity(query_embedding, emb)
    results.append((doc, score))

results = sorted(results, key=lambda x: x[1], reverse=True)

for doc, score in results:
    print(f"{score:.4f} → {doc}")


0.5258 → Amazon Bedrock is used to build AI-powered applications
0.5071 → AWS Bedrock provides foundation models for GenAI
0.2210 → EC2 provides scalable virtual servers
0.1815 → S3 is an object storage service
0.0373 → I love playing football on weekends


In [12]:
query

'How can I build generative AI apps on AWS?'

In [2]:
import psycopg2
import json



conn = psycopg2.connect(
    host="knowledgebase-abccorp-instance-1.cedfajhbgrc5.us-east-1.rds.amazonaws.com",
    database="postgres",
    user="postgres",
    password="test1234",
    port=5432
)

cur = conn.cursor()

docs = [
    "AWS RDS PostgreSQL supports pgvector extension",
    "Vector search enables semantic similarity search",
    "Amazon Bedrock provides foundation models",
    "pgvector allows storing embeddings in PostgreSQL"
]

for d in docs:
    emb = generate_embedding(d)
    cur.execute(
        "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
        (d, emb)
    )

conn.commit()
cur.close()
conn.close()


In [6]:
query = "How do I do semantic search in PostgreSQL?"
query_embedding = generate_embedding(query)

In [7]:
conn = psycopg2.connect(
    host="knowledgebase-abccorp-instance-1.cedfajhbgrc5.us-east-1.rds.amazonaws.com",
    database="postgres",
    user="postgres",
    password="test1234",
    port=5432
)


cur = conn.cursor()

cur.execute("""
    SELECT content,
           embedding <-> %s::vector AS distance
    FROM documents
    ORDER BY distance
    LIMIT 3;
""", (query_embedding,))


results = cur.fetchall()

for r in results:
    print(r)

cur.close()
conn.close()

('Vector search enables semantic similarity search', 1.035783685510718)
('pgvector allows storing embeddings in PostgreSQL', 1.151150981608623)
('AWS RDS PostgreSQL supports pgvector extension', 1.2771864861320277)


In [5]:
query

'How do I do semantic search in PostgreSQL?'