📊 VectorDB From Scratch

Learning project: Building my own vector database to understand how they work. Implemented from scratch to trace every step.

Why I Built This

To understand how vector databases like Pinecone, Weaviate, and Chroma work by building one myself.

Goal: Learn by doing, not just using.

What I Learned

✅ Vector embeddings and similarity search
✅ TF-IDF vectorization from scratch
✅ Word embeddings (Word2Vec concepts)
✅ PCA for dimensionality reduction
✅ Visualizing high-dimensional data in 2D

What It Does

Converts text to vector embeddings
Stores vectors in memory
Searches similar vectors using cosine similarity
Visualizes embeddings with PCA

Quick Start

python wordEmbedding.py
# or
python tfidf.py

Files

wordEmbedding.py - Word embedding implementation
tfidf.py - TF-IDF vectorization
Both include PCA visualization

How It Works

# 1. Convert text to vectors
vectors = create_embeddings(documents)

# 2. Store in "database" (Python dict)
db = VectorDatabase()
db.add(vectors)

# 3. Search similar vectors
results = db.search(query_vector, top_k=5)

# 4. Visualize with PCA
visualize_2d(vectors)

What Vector DBs Are Used For

Semantic search (search by meaning, not keywords)
Recommendation systems
RAG (Retrieval Augmented Generation) for LLMs
Image similarity search
Duplicate detection

Key Concepts I Traced

Embeddings: Text → Numbers (vectors)
Similarity: Cosine distance between vectors
Indexing: Fast lookup structures
PCA: High dimensions → 2D for visualization

Tech Stack

Python
NumPy (vector operations)
scikit-learn (PCA)
Matplotlib (visualization)

Next Steps

Author

Goutham N
GitHub: @GOUTHAM-2002

⭐ Star if you're also learning by building!

Note: This is a learning project. For production use, check out Pinecone, Weaviate, or Chroma.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
README.md		README.md
tfidf.py		tfidf.py
wordEmbedding.py		wordEmbedding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 VectorDB From Scratch

Why I Built This

What I Learned

What It Does

Quick Start

Files

How It Works

What Vector DBs Are Used For

Key Concepts I Traced

Tech Stack

Next Steps

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊 VectorDB From Scratch

Why I Built This

What I Learned

What It Does

Quick Start

Files

How It Works

What Vector DBs Are Used For

Key Concepts I Traced

Tech Stack

Next Steps

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages