BugLens is a production-ready Bug Similarity Search Engine designed to find duplicate or semantically similar bug reports in under 50ms. It leverages NLP tools (TF-IDF) and NearestNeighbors (via scikit-learn) built atop a lightning-fast asynchronous FastAPI web backend.
- Intelligent Semantic Search: Understands and processes titles and descriptions to find contextually identical software bugs.
- Microsecond Inferences: Evaluates similarities between items with
O(1)SQLite row mapping for instantaneous metadata retrieval. - Zero Cold-Start: Employs FastAPI lifespan events to eager-load Machine Learning Models locally on application start.
- Scalable Architecture: Designed with future migrations in mind (FAISS, SBERT, Redis LRU Cache).
- Hardened Resiliency: Built-in defenses against DDoS payloads, integrated comprehensive PyTest validation, and formatted structured logging.
graph TD;
Client-->|POST /search| FastAPI;
FastAPI-->|Load on startup| Models[TF-IDF & NN Index];
FastAPI-->|Transform & k-neighbors| Inquiry[Search Engine];
Inquiry-->|Query Metadata| SQLite[(Bug Metadata DB)];
SQLite-->FastAPI;
FastAPI-->Client;
Requires Python 3.11+.
# Clone the repository
git clone https://github.com/ShashankChinthirla/BugLens.git
cd BugLens/src
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Or .\venv\Scripts\activate on Windows
# Install Dependencies
pip install -r requirements.txtTo initialize the models and populate the database from the data/raw/bugs.csv:
python training/train.pyThis script versions models in models/v1/ and populates data/processed/metadata.db.
Boot up the uvicorn ASGI server:
uvicorn app.main:app --reloadNavigate to http://127.0.0.1:8000/docs to use the auto-generated Swagger UI to test real-time search queries.
The repository includes unit integration tests and latency stress testers.
pytest tests/
python tests/evaluate.py
python tests/stress_test.py- Memory Footprint (TF-IDF CSR Matrix):
Configured for
max_features=5000. Memory peaks cleanly at~100MBfor 100k items. - Precision Validation Structure:
Our foundational static
Precision@5test successfully evaluates core accuracy thresholds over standard datasets. - Logging & System State: Highly structured per-request latency tracking using Python extra dict layouts. Fully isolated
try...finallydatabase threads natively manage asynchronous DB connections.
BugLens is container-ready out of the box.
docker build -t buglens-api .
docker run -p 8000:8000 buglens-api