- Install dependencies

In [1]:
!pip install sentence-transformers faiss-cpu



- Load clean_parts.csv

In [2]:
import pandas as pd

df = pd.read_csv("clean_parts.csv")

print(df.shape)
df.head()

(663, 33)


Unnamed: 0,ID,DESCRIPTION,Attribut1,Additional Feature,Application,Characteristic,Temp,Height,Length in mm,Rating,...,Physical Dimension,Pre-arcing time-Min (ms),Product Diameter,Product Length,Rated Breaking Capacity (A),Rated Current (A),Rated Voltage (V),Rated Voltage(AC) (V),Rated Voltage(DC) (V),combined_text
0,A1,Indicator Red Fast Movement 1.6A 250V Holder P...,Fast,,Primary Protection In Equipment,VERY FAST,,20mm,5.2mm,1.6A,...,5.2mm x 20mm,3ms,5.2mm,20mm,1500A,1.6A,250V,250V,,indicator red fast movement 1 6a 250v holder p...
1,A2,"Non Resettable Indicators Electric Indicator, ...",,,Primary Protection In Equipment,VERY FAST,,20mm,5.2mm,,...,5.2mm x 20mm,3ms,5.2mm,20mm,1500A,6.3A,250V,250V,,non resettable indicators electric indicator v...
2,A3,Indicator Red Fast Movement 8A 250V Holder Pla...,Fast,,Primary Protection In Equipment,VERY FAST,,20mm,5.2mm,8A,...,5.2mm x 20mm,10ms,5.2mm,20mm,1500A,8A,250V,250V,,indicator red fast movement 8a 250v holder pla...
3,A4,"Non Resettable Indicators Electric Indicator, ...",,,Primary Protection In Equipment,VERY FAST,,20mm,5.2mm,,...,5.2mm x 20mm,10ms,5.2mm,20mm,1500A,10A,250V,250V,,non resettable indicators electric indicator v...
4,A5,Indicator Red Fast Movement 12.5A 250V Holder ...,Fast,,Primary Protection In Equipment,VERY FAST,,20mm,5.2mm,12.5A,...,5.2mm x 20mm,10ms,5.2mm,20mm,500A,12.5A,250V,250V,,indicator red fast movement 12 5a 250v holder ...


- Initialize Embedding Model

In [3]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

print("✅ Model loaded")

✅ Model loaded


- Generate Embeddings

In [4]:
texts = df["combined_text"].tolist()

embeddings = model.encode(
    texts,
    show_progress_bar=True
)

print("Embeddings shape:", embeddings.shape)

Batches:   0%|          | 0/21 [00:00<?, ?it/s]

Embeddings shape: (663, 384)


- Create FAISS Index

In [5]:
import faiss
import numpy as np

# Convert embeddings to float32 (FAISS requirement)
embeddings = np.array(embeddings).astype("float32")

dimension = embeddings.shape[1]

# Create FAISS index (L2 / cosine-style search)
index = faiss.IndexFlatL2(dimension)

# Add vectors
index.add(embeddings)

print("Total vectors in index:", index.ntotal)

Total vectors in index: 663


- Similarity Search Helper

In [6]:
from sentence_transformers import SentenceTransformer

def search_parts(query, top_k=5):
    # Encode query
    q_emb = model.encode([query]).astype("float32")

    # Search FAISS
    distances, indices = index.search(q_emb, top_k)

    results = df.iloc[indices[0]][["ID", "DESCRIPTION", "combined_text"]].copy()
    results["distance"] = distances[0]

    return results

- Example Query

In [7]:
query = "fast blow ceramic fuse 5x20mm 250v"

results = search_parts(query, top_k=5)
results

Unnamed: 0,ID,DESCRIPTION,combined_text,distance
204,A261,"Indicator 250V 5X20 6.3A Electric Indicator, F...",indicator 250v 5x20 6 3a electric indicator fa...,0.996591
207,A265,Indicator Red Fast Movement 0.1A 250V Axial 5 ...,indicator red fast movement 0 1a 250v axial 5 ...,1.008121
195,A252,Indicator Red Fast Movement 1.6A 250V Axial 5 ...,indicator red fast movement 1 6a 250v axial 5 ...,1.032024
184,A237,Indicator Red Slow Blow Movement 12A 250V Axia...,indicator red slow blow movement 12a 250v axia...,1.048557
206,A264,Indicator Red Fast Movement 0.1A 250V Axial 5 ...,indicator red fast movement 0 1a 250v axial 5 ...,1.051002


In [8]:
# Try this -----Non Resettable Indicators Electric Indicator, Time Lag Blow, 4A, 250VAC, 150VDC, 1500A (IR), Inline/holder, 5x20mm
# The above thing is for A15. A15 and A16 are very similar. Only 4A and 5A is differeent. 

- Save FAISS Index to Disk

In [9]:
import faiss
import pickle

# Save FAISS index
faiss.write_index(index, "parts_index.faiss")

# Save dataframe (ID mapping)
df.to_pickle("parts_metadata.pkl")

print("✅ FAISS index + metadata saved")

✅ FAISS index + metadata saved


- Reload Test

In [10]:
# Reload index
index = faiss.read_index("parts_index.faiss")

# Reload metadata
df = pd.read_pickle("parts_metadata.pkl")

print("Reloaded vectors:", index.ntotal)

Reloaded vectors: 663


Why MiniLM (all-MiniLM-L6-v2)?


1. Lightweight → runs on CPU (perfect for your i5 + 8GB RAM)

2. Produces 384-dim embeddings → fast FAISS search

3. Strong semantic performance on short technical texts

4. Industry standard for retrieval tasks

5. HuggingFace benchmarked

6. Much faster than BERT / MPNet on laptops

- all-MiniLM-L6-v2 was selected due to its balance between semantic accuracy and computational efficiency, making it suitable for local deployment on limited hardware while still producing high-quality embeddings for technical product descriptions.

- Create Ollama Helper

In [23]:
import subprocess

def ask_phi3(prompt):
    result = subprocess.run(
        ["ollama", "run", "phi3"],
        input=prompt.encode("utf-8"),   # <-- encode input
        capture_output=True
    )

    return result.stdout.decode("utf-8", errors="ignore")

- RAG Recommendation Function

In [30]:
def recommend_parts(query, top_k=5):
    q_emb = model.encode([query]).astype("float32")
    distances, indices = index.search(q_emb, top_k)

    retrieved = df.iloc[indices[0]][["ID", "DESCRIPTION"]].reset_index(drop=True)

    context = ""
    for i, row in retrieved.iterrows():
        context += f"{i+1}. {row['ID']} | {row['DESCRIPTION']}\n"

    prompt = f"""
You are an industrial text similarity system.

You are given EXACTLY {top_k} parts.

Rules:
- Do NOT add or remove items.
- Do NOT invent IDs.
- Do NOT rank or reject.
- Do NOT speculate.
- For each numbered item, explain in ONE short sentence how it is similar to the user input.

User input:
{query}

Parts:
{context}

Output format (must match numbering):

1. ID – similarity explanation
2. ID – similarity explanation
3. ID – similarity explanation
4. ID – similarity explanation
5. ID – similarity explanation
"""

    response = ask_phi3(prompt)

    return retrieved, response

- Test Full System

In [31]:
query = "fast blow ceramic fuse 5x20mm 250v"

matches, explanation = recommend_parts(query)

matches, explanation

(     ID                                        DESCRIPTION
 0  A261  Indicator 250V 5X20 6.3A Electric Indicator, F...
 1  A265  Indicator Red Fast Movement 0.1A 250V Axial 5 ...
 2  A252  Indicator Red Fast Movement 1.6A 250V Axial 5 ...
 3  A237  Indicator Red Slow Blow Movement 12A 250V Axia...
 4  A264  Indicator Red Fast Movement 0.1A 250V Axial 5 ...,
 '\n1. A261 - Similar to the user input due to its voltage rating of 250V, blow size indication with a fast response time (indicated by "Fast Blow"), and dimensions in 5x20mm which match exactly as specified in the user query.\n\n2. A265 - This part is similar because it also carries a voltage rating of 250V, features quick action with its red fast movement (indicated by "Fast Movement"), and has ceramic bulk construction suitable for industrial applications as implied in the user\'s description; both parts have dimensions that fit together perfectly at 5x20mm.\n\n3. A252 - It matches due to sharing a voltage rating of exactly 250V

# ✍️ Model Justification
- Phi-3 Mini

Phi-3 Mini was selected due to its lightweight footprint, strong instruction-following ability, and efficient CPU inference, making it suitable for deployment on resource-constrained systems (Intel i5, 8GB RAM). Larger models such as Llama-3 or Mistral were avoided due to higher memory requirements with limited accuracy gains for structured recommendation tasks.

- MiniLM Embeddings

all-MiniLM-L6-v2 was chosen for semantic embedding because of its compact 384-dimensional vectors, fast inference speed, and strong performance on sentence similarity benchmarks, making it ideal for local vector search.