***Full RAG pipeline to work locally on an Apple Silicon device (such as an M1 Max)***

Steps:
- Look through pdf and create a list of chunks each with fewer tokens than the maximum allowed by the sentence embedding model
- embed each of the *n* chunks and convert to an *n* by *e* mx array where *e* is the vector shape of the sentence embedding model (eg 768 for bge-base)
- create a query string and embed it with the same model
- perform vector search/semantic search between the query vector and the embedded data 
- build up the prompt
- pass the new fancy prompt to the local llm (loaded onto the GPU)

In [62]:
import os
import requests

# pdf_path = 'human-nutrition-text.pdf'
pdf_path = 'POH-Cessna-172S.pdf'
# Download PDF
if not os.path.exists(pdf_path):
    print("[INFO] File doesn't exist, downloading...")

    # Enter the URL of the PDF
    url = "path/to/downloadable/pdf.pdf"

    # The local filename to save the downloaded file
    filename = pdf_path

    # Send a GET request to the URL
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        # Open the file and save it
        with open(filename, "wb") as file:
            file.write(response.content)
        print(f"[INFO] The file has been download and saved as {filename}")
    else:
        print(f"[INFO] Failed to download the file. Status code: {response.status_code}")

else:
    print(f"File {pdf_path} exists.")

File POH-Cessna-172S.pdf exists.


In [63]:
import fitz # requires: !pip install PyMuPDF, see: https://github.com/pymupdf/PyMuPDF
from tqdm.auto import tqdm # pip install tqdm

def text_formatter(text: str) -> str:
    """Performs minor formatting on text."""
    cleaned_text = text.replace("\n", " ").strip()

    # Potentially more text formatting functions can go here
    return cleaned_text

def open_and_read_pdf(pdf_path: str) -> list[dict]:
    doc = fitz.open(pdf_path)
    pages_and_texts = []
    for page_number, page in tqdm(enumerate(doc)):
        text = page.get_text()
        text = text_formatter(text=text)
        pages_and_texts.append({"page_number": page_number - 41,
                                "page_char_count": len(text),
                                "page_word_count": len(text.split(" ")),
                                # "page_setence_count_raw": len(text.split(". ")),
                                "page_token_count": len(text) / 4, # 1 token = ~4 characters
                                "text": text})
    return pages_and_texts
    

pages_and_texts = open_and_read_pdf(pdf_path=pdf_path)
pages_and_texts[:2]

499it [00:01, 276.74it/s]


[{'page_number': -41,
  'page_char_count': 678,
  'page_word_count': 148,
  'page_token_count': 169.5,
  'text': "CESSNA  MODEL 172S  NOTICE  INTRODUCTION  AT THE TIME OF ISSUANCE, THIS INFORMATION  MANUAL WAS AN EXACT DUPLICATE OF THE  OFFICIAL PILOT'S OPERATING HANDBOOK AND  FAA APPROVED AIRPLANE FLIGHT MANUAL AND  IS TO BE USED FOR GENERAL PURPOSES ONLY.  IT  WILL  NOT  BE  KEPT  CURRENT  AND,  THEREFORE,  CANNOT  BE  USED  AS  A  SUBSTITUTE  FOR  THE  OFFICIAL  PILOT'S  OPERATING HANDBOOK AND FAA APPROVED  AIRPLANE  FLIGHT  MANUAL  INTENDED  FOR  OPERATION OF THE AIRPLANE.  THE PILOT'S OPERATING HANDBOOK MUST BE  CARRIED IN THE AIRPLANE AND AVAILABLE TO  THE PILOT AT ALL TIMES.  I Revision 5  Cessna Aircraft Company  Original Issue - 8 July 1998  Revision 5 - 19 July 2004  U.S."},
 {'page_number': -40,
  'page_char_count': 1516,
  'page_word_count': 230,
  'page_token_count': 379.0,
  'text': 'INTRODUCTION  CESSNA  MODEL 172S  PERFORMANCE - SPECIFICATIONS  *SPEED:  Maximum at Sea L

In [64]:
# Use 'poetry run pip3 install spacy'

#split text into chunks of ~10 sentences using nltk and spacy
from spacy.lang.en import English

nlp = English()

#add a sentencizer pipeline
nlp.add_pipe('sentencizer')

for item in tqdm(pages_and_texts):
  #make sure all sentences are strings instead of spacy datatypes
  item['sentences'] = [str(sentence) for sentence in list(nlp(item['text']).sents)]
  #count the sentences
  # item['page_sentence_count_spacy'] = len(item['sentences'])

100%|██████████| 499/499 [00:01<00:00, 396.32it/s]


In [65]:
# Define split size to turn groups of sentences into chunks
num_sentence_chunk_size = 10

# Create a function to split lists of texts recursively into chunk size
# e.g. [20] -> [10, 10] or [25] -> [10, 10, 5]
def split_list(input_list: list[str],
               slice_size: int=num_sentence_chunk_size) -> list[list[str]]:
    return [input_list[i:i+slice_size] for i in range(0, len(input_list), slice_size)]

for item in tqdm(pages_and_texts):
  item['sentence_chunks'] = split_list(item['sentences'], slice_size = num_sentence_chunk_size)
  item['num_chunks'] = len(item['sentence_chunks'])
  # del item['page_sentence_count_spacy']
  # del item['page_setence_count_raw']



100%|██████████| 499/499 [00:00<00:00, 430472.58it/s]


In [66]:
import re

#split each chunk into its own item
pages_and_chunks = []
for item in tqdm(pages_and_texts):
  for sentence_chunk in item['sentence_chunks']:
    chunk_dict = {}
    chunk_dict['page_number'] = item['page_number']
    #join sentences into one paragraph
    joined_sentence_chunk = ''.join(sentence_chunk).replace("  ", " ").strip()
    joined_sentence_chunk = re.sub(r'\.([A-Z])', r'.  \1', joined_sentence_chunk) # ".A" => ". A" (will work for any capital letter)

    chunk_dict['sentence_chunk'] = joined_sentence_chunk
    #get some stats
    # chunk_dict['chunk_char_count'] = len(joined_sentence_chunk)
    # chunk_dict["chunk_word_count"] = len([word for word in joined_sentence_chunk.split(" ")])
    chunk_dict["chunk_token_count"] = len(joined_sentence_chunk) / 4 # 1 token = ~4 chars

    pages_and_chunks.append(chunk_dict)

len(pages_and_chunks)


100%|██████████| 499/499 [00:00<00:00, 48078.60it/s]


849

In [67]:
import pandas as pd
df = pd.DataFrame(pages_and_chunks)

In [68]:
min_token_len = 15
pages_and_chunks_over_min_len  = df[df['chunk_token_count']>min_token_len].to_dict(orient='records')

In [98]:
from mlx_embedding_models.embedding import EmbeddingModel
embedding_model = EmbeddingModel.from_registry("bge-base")


<class 'mlx_embedding_models.embedding.EmbeddingModel'>


In [70]:
# Embed all texts in batches
text_chunks = [item['sentence_chunk'] for item in pages_and_chunks_over_min_len]
# print(len(text_chunks),text_chunks[1])
text_chunk_embeddings = embedding_model.encode(text_chunks,
                                               batch_size=16, # you can use different batch sizes here for speed/performance, I found 16 works well for this use case
                                               convert_to_tensor=True) # optional to return embeddings as tensor instead of array

# print(text_chunk_embeddings[1])
# text_chunk_embeddings.shape, type(text_chunk_embeddings)


809 INTRODUCTION CESSNA MODEL 172S PERFORMANCE - SPECIFICATIONS *SPEED: Maximum at Sea Level ......................... 126 KNOTS Cruise, 75% Power at 8500 Feet. .................124 KNOTS CRUISE: Recommended lean mixture with fuel allowance for engine start, taxi, takeoff, climb and 45 minutes reserve. 75% Power at 8500 Feet ..................... Range - 518 NM 53 Gallons Usable Fuel. .................... Time - 4.26 HRS Range at 10,000 Feet, 45% Power ............. Range - 638 NM 53 Gallons Usable Fuel. .................... Time - 6.72 HRS RATE-OF-CLIMB AT SEA LEVEL ...................... 730 FPM SERVICE CEILING ............................. 14,000 FEET TAKEOFF PERFORMANCE: Ground Roll .................................... 960 FEET Total Distance Over 50 Foot Obstacle ............... 1630 FEET LANDING PERFORMANCE: Ground Roll .................................... 575 FEET Total Distance Over 50 Foot Obstacle ............... 1335 FEET STALL SPEED: Flaps Up, Power Off ....................

100%|██████████| 809/809 [00:11<00:00, 72.37it/s, seq_len=16]  

[ 1.47708384e-02  3.08182836e-03  1.87444258e-02  7.65345804e-03
  5.92592657e-02  7.39933327e-02  5.23329712e-02 -3.14114406e-03
 -8.99336394e-03 -1.69833079e-02 -8.27064505e-04 -1.31342225e-02
 -3.97360958e-02  3.11353877e-02  9.77181946e-04  7.56912082e-02
  7.81385303e-02 -1.63551383e-02 -7.07022846e-03  9.60304122e-03
  6.71873422e-05 -1.73685085e-02  6.69338033e-02  3.93118011e-03
  1.70510467e-02  1.48436828e-02  4.46833577e-03  3.95420268e-02
 -6.40615746e-02  2.34335195e-03  2.03746632e-02  3.14671695e-02
 -1.94595587e-02  1.50003878e-03 -5.12044169e-02 -4.87474054e-02
 -2.41672471e-02 -6.71750773e-03 -2.12826785e-02  3.01649645e-02
 -6.40323833e-02  1.44656748e-02  1.96865574e-03 -2.62964144e-03
 -5.76697737e-02 -4.23812456e-02 -2.26162951e-02 -1.57966297e-02
 -2.91295610e-02 -3.79749015e-03 -9.22579616e-02  4.63917181e-02
 -2.84026377e-02 -9.04204790e-03 -3.49845141e-02  4.79005203e-02
 -2.18656436e-02 -2.92229820e-02  3.54803167e-02 -6.47780746e-02
  1.09233009e-02 -2.78477




((809, 768), numpy.ndarray)

In [71]:
#save embeddings to csv file
text_chunks_and_embeddings_df = pd.DataFrame(pages_and_chunks_over_min_len)
embeddings_df_save_path = 'c172_embeddings.csv'
text_chunks_and_embeddings_df.to_csv(embeddings_df_save_path, index=False)


In [72]:
#load df from csv
text_chunks_and_embeddings_df_load = pd.read_csv(embeddings_df_save_path)
text_chunks_and_embeddings_df_load.head()

Unnamed: 0,page_number,sentence_chunk,chunk_token_count
0,-41,CESSNA MODEL 172S NOTICE INTRODUCTION AT THE T...,159.75
1,-40,INTRODUCTION CESSNA MODEL 172S PERFORMANCE - S...,366.5
2,-39,CESSNA MODEL 172S INTRODUCTION PERFORMANCE - S...,278.25
3,-37,CESSNA MODEL 172S INTRODUCTION Information Man...,90.5
4,-35,CESSNA MODEL 172S INTRODUCTION TABLE OF CONTEN...,147.5


In [73]:
import mlx.core as mx
import mlx.nn as nn
import numpy as np

In [74]:
#alternatively store just the embedding array to npy
np.save('text_chunk_embeddings.npy', text_chunk_embeddings)


In [75]:
#load the embeddings to avoid creating again 
embeddings = mx.load('text_chunk_embeddings.npy')

In [77]:
#save embeddings to file
import pandas as pd
text_chunks_and_embeddings_df = pd.DataFrame(pages_and_chunks_over_min_len)
embeddings_df_save_path = 'c172_embeddings.csv'
text_chunks_and_embeddings_df.to_csv(embeddings_df_save_path, index=False)
text_chunks_and_embeddings_df = pd.read_csv('c172_embeddings.csv')
#convert to numpy array 
text_chunks_and_embeddings_df['embedding'] = [emb.tolist() for emb in embeddings]
# text_chunks_and_embeddings_df['embedding'] = text_chunks_and_embeddings_df['embedding'].apply(lambda x: np.fromstring(x.strip('[]'), sep = ' ', dtype=float))
#convert dataframe to list of dicts
pages_and_chunks = text_chunks_and_embeddings_df.to_dict('records')
#get just embedding array as mx array
# embeddings = mx.array(np.array(text_chunks_and_embeddings_df['embedding'].tolist())) #shape (1681, 768)

# embeddings.shape

In [78]:
def get_results(query:str, embeddings:mx.array, n:int)->tuple[list]:
    query = [query] #convert to list for model compatibility
    #embed query with same model as the passages/document
    query_embedding = mx.array(embedding_model.encode(query))
    #calculate dot product between query and passages/document or use cosine similarity if model not normalised
    cosine_scores = nn.losses.cosine_similarity_loss(query_embedding, embeddings)
    top_results = mx.topk(cosine_scores, k=n)[::-1] #largest to smallest
    top_indices = mx.argpartition(cosine_scores, -n)[-n:][::-1] #argpartition seems to already be sort from smallest to largest
    return top_results.tolist(), top_indices.tolist()

In [79]:
import textwrap
def print_wrapped(text, width = 80):
    print(textwrap.fill(text, width))

In [80]:
# Automatically Generated Questions related to Cessna 172S POH
gpt4_questions = [
    "What are the key performance specifications of the Cessna 172S, and how do they impact flight planning?",
    "How does the fuel system of the Cessna 172S function, and what are the critical safety considerations?",
    "Describe the normal operating procedures for the Cessna 172S during takeoff and landing.",
    "What are the limitations imposed by the Cessna 172S POH for weight and balance, and why are they important?",
    "Explain the emergency procedures outlined in the Cessna 172S POH for an engine failure during flight."
]

# Manually Created Questions related to Cessna 172S POH
manual_questions = [
    "What are the recommended airspeeds for emergency operations in the Cessna 172S?",
    "How does the Cessna 172S POH suggest handling an engine fire during flight?",
    "What are the flap limitations for the Cessna 172S, and how should they be used during different phases of flight?",
    "What are the standard procedures for operating the fuel selector valve on a Cessna 172S during flight?",
    "How does the Cessna 172S POH define and explain the aircraft's maneuvering speed (VA)?"
]

query_list = gpt4_questions + manual_questions

In [99]:
#local llm 
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/quantized-gemma-7b-it")


Fetching 8 files: 100%|██████████| 8/8 [00:00<00:00, 30202.01it/s]


<class 'mlx_lm.tokenizer_utils.TokenizerWrapper'>


In [82]:
def prompt_formatter(query: str, 
                     context_items: list[dict]) -> str:
    """
    Augments query with text-based context from context_items.
    """
    # Join context items into one dotted paragraph
    context = "- " + "\n- ".join([item["sentence_chunk"] for item in context_items])

    # Create a base prompt with examples to help the model
    # Note: this is very customizable, I've chosen to use 3 examples of the answer style we'd like.
    # We could also write this in a txt file and import it in if we wanted.
    base_prompt = """Based on the following context items, please answer the query.
Give yourself room to think by extracting relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Make sure your answers are as explanatory as possible.
Use the following examples as reference for the ideal answer style.
\nExample 1:
Query: What are the key performance specifications of the Cessna 172S, and how do they impact flight planning?
Answer: The key performance specifications of the Cessna 172S include a maximum speed of 126 knots at sea level, a service ceiling of 14,000 feet, and a rate of climb of 730 feet per minute at sea level. These specifications impact flight planning by determining the range, endurance, and optimal cruising altitude for flights. For instance, the aircraft's range is approximately 518 nautical miles at 75% power, making it suitable for medium-range flights without refueling.
\nExample 2:
Query: How does the Cessna 172S fuel system operate, and what are the critical safety considerations?
Answer: The Cessna 172S fuel system includes two tanks, one in each wing, with a total capacity of 56 gallons, of which 53 gallons are usable. The system is gravity-fed and includes a selector valve that allows the pilot to draw fuel from either tank or both tanks simultaneously. Critical safety considerations include ensuring the fuel selector is in the "BOTH" position during takeoff and landing, monitoring fuel levels carefully, and avoiding prolonged uncoordinated flight when fuel is low, as this could cause engine starvation.
\nExample 3:
Query: What are the emergency procedures for an engine failure in a Cessna 172S?
Answer: In the event of an engine failure in a Cessna 172S, the pilot should first establish the best glide speed of 68 knots. Then, identify a suitable landing site and prepare for a forced landing. Attempt to restart the engine by switching fuel tanks, setting the mixture to rich, and engaging the starter. If the restart fails, turn off the fuel, master switch, and magnetos, and prepare for landing with flaps as required. The POH outlines these steps in detail under emergency procedures.
\nExample 4:
Query: What is the maneuvering speed (VA) of the Cessna 172S, and why is it important?
Answer: The maneuvering speed (VA) of the Cessna 172S varies with weight but is typically around 105 knots at maximum gross weight. This speed is important because it represents the maximum speed at which the aircraft can be flown with full, abrupt control inputs without exceeding its structural limits. Flying at or below VA protects the aircraft from structural damage during turbulence or aggressive maneuvers.
\nNow use the following context items to answer the user query:
{context}
\nRelevant passages: <extract relevant passages from the context here>
User query: {query}
Answer:"""


    # Update base prompt with context items and query   
    base_prompt = base_prompt.format(context=context, query=query)

    # Create prompt template for instruction-tuned model
    dialogue_template = [
        {"role": "user",
        "content": base_prompt}
    ]

    # Apply the chat template
    prompt = tokenizer.apply_chat_template(conversation=dialogue_template,
                                          tokenize=False,
                                          add_generation_prompt=True)
    # prompt = base_prompt
    return prompt

In [96]:
import random
query = random.choice(query_list)
print(f"Query: {query}")

# Get relevant resources
scores, indices = get_results(query=query, embeddings=embeddings, n = 5)
# print(scores, indices)
# Create a list of context items
context_items = [pages_and_chunks[i] for i in indices]

# Format prompt with context items
prompt = prompt_formatter(query=query,
                          context_items=context_items)
# print(context_items)
print('\n'.join([f"{i['page_number']},{i['sentence_chunk']}" for i in context_items]))

Query: What are the standard procedures for operating the fuel selector valve on a Cessna 172S during flight?
query='What are the standard procedures for operating the fuel selector valve on a Cessna 172S during flight?', embeddings=array([[-0.0305906, 0.0159667, -0.0407652, ..., 0.00859571, 0.00298534, -0.0109268],
       [0.0147708, 0.00308183, 0.0187444, ..., 0.0214718, 0.0389165, -0.0126432],
       [0.0232945, 0.0323477, -0.00277491, ..., -0.0187079, 0.0429621, -0.0249111],
       ...,
       [-0.00185245, 0.0243362, -0.0221245, ..., -0.00377295, 0.0356885, -0.0181716],
       [0.0102832, 0.0152642, -0.0152659, ..., 0.00589541, 0.0484961, -0.0434573],
       [0.0047273, 0.0283591, -0.0471328, ..., 0.0211699, 0.0347227, -0.0170642]], dtype=float32)


100%|██████████| 1/1 [00:00<00:00, 19.99it/s, seq_len=16]

[0.7942339777946472, 0.792177140712738, 0.7763713002204895, 0.7750421166419983, 0.770821213722229] [156, 319, 78, 158, 321]
45,Fuel Selector Valve -- BOTH. 9.  Fuel Shutoff Valve -- ON (push full in). 10.  Avionics Circuit Breakers -- CHECK IN. Revision 4 4-11
146,SECTION 7 CESSNA MODEL 172S AIRPLANE & SYSTEMS DESCRIPTION FUEL VENTING Fuel system venting is essential to system operation.  Blockage of the system will result in decreasing fuel flow and eventual engine stoppage.  Venting is accomplished by an interconnecting line from the right fuel tank to the left tank.  The left fuel tank is vented overboard through a vent line, equipped with a check valve, which protrudes from the bottom surface of the left wing near the wing strut.  Both fuel filler caps are also vented. REDUCED TANK CAPACITY The airplane may be serviced to a reduced capacity to permit heavier cabin loadings.  This is accomplished by filling each tank to the bottom edge of the fuel filler tab, thus giving a reduced f




In [97]:
#local llm 
import time
t = time.perf_counter()
response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens = 200)
# print(response)
print(f"time taken for generation = {time.perf_counter()-t}s")

Prompt: <start_of_turn>user
Based on the following context items, please answer the query.
Give yourself room to think by extracting relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Make sure your answers are as explanatory as possible.
Use the following examples as reference for the ideal answer style.

Example 1:
Query: What are the key performance specifications of the Cessna 172S, and how do they impact flight planning?
Answer: The key performance specifications of the Cessna 172S include a maximum speed of 126 knots at sea level, a service ceiling of 14,000 feet, and a rate of climb of 730 feet per minute at sea level. These specifications impact flight planning by determining the range, endurance, and optimal cruising altitude for flights. For instance, the aircraft's range is approximately 518 nautical miles at 75% power, making it suitable for medium-range flights without refueling.

Example 2:
Query: How does the

In [91]:
def ask(query:str)->str:
    '''A utility function'''
    # Get relevant resources
    scores, indices = get_results(query=query, embeddings=embeddings, n = 5)
    # Create a list of context items
    context_items = [pages_and_chunks[i] for i in indices]
    # Format prompt with context items
    prompt = prompt_formatter(query=query,
                            context_items=context_items)
    response = generate(model, tokenizer,temp=0.7, prompt=prompt, verbose=True, max_tokens = 200)
    return response

In [106]:
ask("What are the standard procedures for operating the fuel selector valve on a Cessna 172S during flight")
# ask("What is the importance of hydration for physical performance?")

query='What are the standard procedures for operating the fuel selector valve on a Cessna 172S during flight', embeddings=array([[-0.0305906, 0.0159667, -0.0407652, ..., 0.00859571, 0.00298534, -0.0109268],
       [0.0147708, 0.00308183, 0.0187444, ..., 0.0214718, 0.0389165, -0.0126432],
       [0.0232945, 0.0323477, -0.00277491, ..., -0.0187079, 0.0429621, -0.0249111],
       ...,
       [-0.00185245, 0.0243362, -0.0221245, ..., -0.00377295, 0.0356885, -0.0181716],
       [0.0102832, 0.0152642, -0.0152659, ..., 0.00589541, 0.0484961, -0.0434573],
       [0.0047273, 0.0283591, -0.0471328, ..., 0.0211699, 0.0347227, -0.0170642]], dtype=float32)


100%|██████████| 1/1 [00:00<00:00,  3.32it/s, seq_len=16]


[156, 319, 78, 158, 321]=
Prompt: <start_of_turn>user
Based on the following context items, please answer the query.
Give yourself room to think by extracting relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Make sure your answers are as explanatory as possible.
Use the following examples as reference for the ideal answer style.

Example 1:
Query: What are the key performance specifications of the Cessna 172S, and how do they impact flight planning?
Answer: The key performance specifications of the Cessna 172S include a maximum speed of 126 knots at sea level, a service ceiling of 14,000 feet, and a rate of climb of 730 feet per minute at sea level. These specifications impact flight planning by determining the range, endurance, and optimal cruising altitude for flights. For instance, the aircraft's range is approximately 518 nautical miles at 75% power, making it suitable for medium-range flights without refueling.

Exam

'The standard procedures for operating the fuel selector valve on a Cessna 172S during flight are clearly outlined in the provided text. According to the manual, the fuel selector valve should be in the "BOTH" position for takeoff, climb, landing, and maneuvers that involve prolonged slips or skids of more than 30 seconds. Operation from either the left or right tank is reserved for cruising flight.\n\nIt is important to note that when the fuel selector valve handle is in the "BOTH" position in cruising flight, unequal fuel flow from each tank may occur if the wings are not maintained exactly level. Therefore, it is crucial to maintain level wings during cruising flight to ensure even fuel flow.'