Skip to content

cwccie/netembeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

netembeddings

Pre-computed and on-demand vector embeddings for networking concepts — protocol specs, CLI commands, RFC summaries, and vendor documentation snippets — designed for RAG pipelines, similarity search, and ML applications.

The Problem

General-purpose embedding models treat "BGP" and "OSPF" as unrelated tokens. Network engineers building AI-powered tools need embeddings that understand the semantic relationships between networking concepts: that BGP and OSPF are both routing protocols, that VXLAN and MPLS are both tunneling/overlay technologies, and that show ip route is related to routing tables.

netembeddings provides:

  • A curated registry of 50+ networking concepts with metadata (category, related terms, RFCs)
  • Pre-built datasets of protocol and CLI command descriptions
  • A lightweight TF-IDF embedding generator (no API keys or GPU required)
  • Cosine similarity search over embedding vectors using pure numpy
  • A CLI for quick lookups and exploration

Installation

pip install netembeddings

Or install from source:

git clone https://github.com/cwccie/netembeddings.git
cd netembeddings
pip install -e ".[dev]"

Quick Start

from netembeddings import ConceptRegistry, EmbeddingStore, TFIDFGenerator
from netembeddings.registry import build_default_registry

# Load the built-in registry of networking concepts
registry = build_default_registry()

# Browse concepts
bgp = registry.get("BGP")
print(f"{bgp.name}: {bgp.description}")
print(f"RFC: {bgp.rfc}")
print(f"Related: {bgp.related_terms}")

# List all protocols
for concept in registry.by_category("protocol"):
    print(f"  {concept.name}")

Text Search

# Fuzzy search across names, descriptions, and related terms
results = registry.search("routing protocol")
for concept in results[:5]:
    print(f"{concept.name} [{concept.category}]")

Building Embeddings

# Generate TF-IDF embeddings (no API needed)
concepts = registry.list_all()
corpus = [c.description for c in concepts]

generator = TFIDFGenerator(output_dim=128)
generator.fit(corpus)

# Create an embedding store
store = EmbeddingStore(dimension=128)
for concept in concepts:
    vec = generator.generate(concept.description)
    store.add(concept.name, vec)

# Similarity search
query_vec = generator.generate("link-state routing with areas")
results = store.search(query_vec, top_k=5)
for name, score in results:
    print(f"  {name}: {score:.4f}")

Persistence

# Save embeddings
store.save("network_embeddings.npz")

# Load later
store = EmbeddingStore.load("network_embeddings.npz")

Similarity Utilities

from netembeddings import cosine_similarity, concept_similarity

# Between two vectors
sim = cosine_similarity(vec_a, vec_b)

# Between named concepts
sim = concept_similarity("BGP", "OSPF", store)
print(f"BGP-OSPF similarity: {sim:.4f}")

Using Pre-built Datasets

from netembeddings.datasets import get_protocol_concepts, get_command_concepts

# 30 protocol descriptions
protocols = get_protocol_concepts()
for p in protocols[:3]:
    print(f"{p['name']}: {p['description'][:60]}...")

# 20 CLI command descriptions
commands = get_command_concepts()

CLI

# Search for concepts
netembeddings search "routing protocol" --top-k 5

# List all concepts or filter by category
netembeddings list
netembeddings list --category protocol

# Find similar concepts
netembeddings similar BGP --top-k 5

Custom Embeddings

The TF-IDF generator is a zero-dependency fallback. For production use, generate embeddings with your preferred model and store them:

store = EmbeddingStore(dimension=3072)

# Add vectors from any source (OpenAI, Gemini, sentence-transformers, etc.)
store.add("BGP", your_model.embed("Border Gateway Protocol..."))
store.add("OSPF", your_model.embed("Open Shortest Path First..."))

store.save("custom_embeddings.npz")

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest -v --cov=netembeddings

# Lint
ruff check src/ tests/

License

MIT License. Copyright (c) 2026 Corey Wade.

About

Pre-computed vector embeddings for networking concepts — RFCs, CLI commands, protocol specifications, vendor documentation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors