In [None]:
# Engineering Notes – Contextual AI Search (Prototype)

This notebook demonstrates a **working prototype** of an ETL pipeline for contextual AI systems.

### What this prototype includes:
- Extracting technical text (sample engineering notes)
- Text preprocessing and chunking
- Embedding generation using sentence-transformers
- Vector similarity search using FAISS

This prototype serves as the foundation for a full Retrieval-Augmented Generation (RAG) system.


In [1]:
!pip install sentence-transformers faiss-cpu


Collecting faiss-cpu
  Downloading faiss_cpu-1.13.1-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Downloading faiss_cpu-1.13.1-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.1


In [2]:
engineering_text = """
UMA (Uniform Memory Access) systems provide equal access time to memory for all processors.
ccNUMA (Cache-Coherent Non-Uniform Memory Access) systems have variable memory access times depending on memory location.
Intel VT-x introduces root and non-root modes to support hardware-assisted virtualization.
A hypervisor manages virtual machines and allocates hardware resources efficiently.
YARN separates resource management from application execution in Hadoop clusters.
"""


In [3]:
clean_text = engineering_text.replace("\n", " ").strip()

chunks = [chunk.strip() for chunk in clean_text.split(".") if chunk.strip()]

print("Total chunks created:", len(chunks))
for i, chunk in enumerate(chunks):
    print(f"{i+1}. {chunk}")


Total chunks created: 5
1. UMA (Uniform Memory Access) systems provide equal access time to memory for all processors
2. ccNUMA (Cache-Coherent Non-Uniform Memory Access) systems have variable memory access times depending on memory location
3. Intel VT-x introduces root and non-root modes to support hardware-assisted virtualization
4. A hypervisor manages virtual machines and allocates hardware resources efficiently
5. YARN separates resource management from application execution in Hadoop clusters


In [4]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(chunks)

print("Embedding shape:", embeddings.shape)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embedding shape: (5, 384)


In [5]:
import faiss
import numpy as np

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

print("Total vectors in index:", index.ntotal)


Total vectors in index: 5


In [6]:
query = "Explain ccNUMA memory architecture"
query_embedding = model.encode([query])

D, I = index.search(np.array(query_embedding), k=2)

print("Query:", query)
print("\nTop relevant chunks:")
for idx in I[0]:
    print("-", chunks[idx])


Query: Explain ccNUMA memory architecture

Top relevant chunks:
- ccNUMA (Cache-Coherent Non-Uniform Memory Access) systems have variable memory access times depending on memory location
- UMA (Uniform Memory Access) systems provide equal access time to memory for all processors
