Skip to content

A High-Performance Distributed Search Engine for CS Books built in Java. Features O(1) Inverted Indexing, Vector Space Ranking, and Cloud Streaming. No Lucene/ElasticSearch dependency.

Notifications You must be signed in to change notification settings

Kas-sim/DevShelf

Repository files navigation

📚 DevShelf

Search Engine that Gets Smarter with Every Search

Java Build Status Platform License


Try DevShelf locally — no cloud dependency required.
Offline-first • Fast • Built from first principles


Download DevShelf for Windows

Linux users: download the latest release from the same page.


📖 Overview

DevShelf is a high-performance vertical search engine for Computer Science textbooks.

Unlike traditional library software, DevShelf is built from first principles using a custom Positional Inverted Index, enabling O(1) query-time lookups without relying on Lucene, Elasticsearch, or external IR frameworks.

The system is designed for:

  • Speed
  • Precision
  • Offline-first usage
  • Cloud-synced freshness

⚡ Engineering Philosophy

DevShelf addresses the Information Retrieval (IR) problem at a local scale with production-grade constraints.

Design Goals

  1. Fast
    Sub-millisecond query latency using optimized data structures.

  2. Smart
    Ranking goes beyond keyword matching by combining:

    • TF-IDF
    • Vector Space Models
    • Behavioral analytics
  3. Distributed by Design
    Index and metadata are fetched from a lightweight serverless source (GitHub Raw), allowing users to receive updated data without application updates.


🏗 System Architecture

DevShelf follows Domain-Driven Design (DDD) principles.

The system is divided into two major layers:

Offline Indexing Layer

  • Parses books.json
  • Builds the inverted index
  • Analyzes interaction logs
  • Produces popularity vectors

Online Query Engine

  • Accepts user queries via CLI or JavaFX GUI
  • Processes queries (tokenization, fuzzy matching, autocomplete)
  • Ranks results using hybrid scoring
  • Returns sorted documents

🧠 Ranking Model

Search relevance is computed using a weighted hybrid score:

Score(d, q) = 0.6 × TF-IDF
0.2 × Popularity
0.2 × Rating

Ranking Signals

Signal Description
TF-IDF Statistical importance of query terms
Popularity Derived from offline click and usage logs
Rating Quality signals embedded in the dataset

🚀 Key Features

Core Search Engine

  • Custom inverted index for constant-time term lookup
  • Trie-based autocomplete with linear time complexity
  • Fuzzy matching using Levenshtein distance for typo tolerance

Intelligent Features

  • Recommendation graph based on category overlap and usage patterns
  • Dynamic filtering by relevance, popularity, year, and rating
  • Memory-mapped caching for frequently accessed index segments

Cloud Sync

  • Automatically fetches the latest index and metadata on startup
  • Feedback pipeline captures missing content requests

📥 Installation

For Users (Windows)

  1. Open the Releases page
  2. Download DevShelf-Setup.exe
  3. Run the installer
  4. Launch the application

For Developers

DevShelf is a Maven-based Java project.

git clone https://github.com/Kas-sim/DevShelf.git
cd DevShelf
mvn clean install
mvn javafx:run

👥 Engineering Team

Name Role Focus
Muhammad Qasim Lead Architect Core search engine, system architecture, ranking algorithms
Nancy Chawla Frontend Engineer JavaFX UI, UX design, view controllers
Ritika Lund Feature Engineer Recommendations, filtering logic, data analysis

Built with pure Java, mathematics, and first principles.

About

A High-Performance Distributed Search Engine for CS Books built in Java. Features O(1) Inverted Indexing, Vector Space Ranking, and Cloud Streaming. No Lucene/ElasticSearch dependency.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •