Skip to content

cypher-me/neural-search-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

neural-search-java

Semantic search engine for an e-commerce product catalog. Replaces traditional keyword search with vector embedding-based semantic search, allowing users to find products using natural language queries like "outfit for a rainy hike in Nairobi".

Technology Stack

  • Java 25
  • Spring Boot 3.2
  • LangChain4j — local AllMiniLmL6V2 quantized embedding model (384 dimensions, no external API required)
  • Milvus — vector database for storing and querying embeddings with cosine similarity
  • PostgreSQL — relational database for product catalog
  • JUnit 5 + Testcontainers — integration testing with real containers
  • Docker + Docker Compose — local infrastructure

Architecture

controller/         REST API endpoints
service/            Application logic, orchestration
embedding/          LangChain4j embedding generation
vector/             Milvus vector store operations
repository/         Spring Data JPA for PostgreSQL
entity/             JPA product entity
dto/                Request and response records
config/             Spring beans for Milvus and embedding model

API Endpoints

Ingest a product

POST /api/products

Request body:

{
  "name": "Waterproof Hiking Jacket",
  "description": "Durable jacket designed for rainy mountain conditions",
  "category": "Outdoor Apparel",
  "price": 89.99,
  "location": "Nairobi"
}

Response (201 Created):

{
  "id": 1,
  "name": "Waterproof Hiking Jacket",
  "description": "Durable jacket designed for rainy mountain conditions",
  "category": "Outdoor Apparel",
  "price": 89.99,
  "location": "Nairobi",
  "createdTimestamp": "2026-03-11T10:00:00"
}

List all products (paginated)

GET /api/products?page=0&size=20

Semantic search

GET /api/search?q=outfit+for+a+rainy+hike+in+Nairobi&topK=10

Response:

[
  {
    "product": {
      "id": 1,
      "name": "Waterproof Hiking Jacket",
      ...
    },
    "similarityScore": 0.94
  }
]

Results are ranked by cosine similarity score (highest first).

Running Locally

Prerequisites

  • Java 25
  • Maven 3.9+
  • Docker and Docker Compose

1. Start the infrastructure

cd docker
docker compose up -d

Wait for all services to be healthy (Milvus takes approximately 30-60 seconds to initialize):

docker compose ps

2. Build and run the application

./mvnw spring-boot:run

The application starts on port 8080 and automatically:

  • Connects to PostgreSQL and creates the products table
  • Connects to Milvus and creates the product_embeddings collection with a COSINE index
  • Loads the embedding collection into memory for search

3. Seed the product catalog

./scripts/seed-products.sh

This inserts 15 sample products across categories like outdoor apparel, footwear, and hiking equipment.

4. Run a semantic search

curl "http://localhost:8080/api/search?q=outfit+for+a+rainy+hike+in+Nairobi"
curl "http://localhost:8080/api/search?q=cheap+hiking+jacket"
curl "http://localhost:8080/api/search?q=lightweight+travel+backpack"
curl "http://localhost:8080/api/search?q=running+shoes+for+long+distance"

Running as a Container

Mount the docker/application.yml as the Spring configuration to use Docker network hostnames:

./mvnw package -DskipTests
docker run --rm \
  --network docker_neural-search \
  -p 8080:8080 \
  -v "$(pwd)/docker/application.yml:/workspace/config/application.yml" \
  -e SPRING_CONFIG_LOCATION=file:/workspace/config/ \
  neural-search-java:0.0.1-SNAPSHOT

Running Integration Tests

Integration tests use Testcontainers to spin up PostgreSQL and Milvus standalone containers automatically. No external infrastructure is required.

./mvnw verify

The test suite verifies:

  • Product ingestion stores records in PostgreSQL
  • Embedding generation stores vectors in Milvus
  • Semantic search returns relevant results for natural language queries
  • Search results are ranked by similarity score

Configuration Reference

Property Default Description
spring.datasource.url jdbc:postgresql://localhost:5432/neuralsearch PostgreSQL connection URL
spring.datasource.username neuralsearch PostgreSQL username
spring.datasource.password neuralsearch PostgreSQL password
milvus.host localhost Milvus host
milvus.port 19530 Milvus gRPC port
milvus.collection-name product_embeddings Milvus collection name
milvus.embedding-dimension 384 Vector dimension (matches AllMiniLmL6V2)

All properties support environment variable overrides using the ${ENV_VAR:default} syntax documented in application.yml.

How Semantic Search Works

  1. On product ingestion, the system concatenates name + description + category and passes the text to the AllMiniLmL6V2 quantized model (bundled in the JAR, no API key required), producing a 384-dimensional float vector.
  2. The vector is stored in Milvus under the product_embeddings collection alongside the product ID.
  3. On search, the user query is embedded with the same model, producing a query vector.
  4. Milvus performs an approximate nearest neighbor search using COSINE similarity and returns the top-K most similar product IDs.
  5. Product details are fetched from PostgreSQL by ID and returned alongside their similarity scores, ranked highest first.

Project Structure

neural-search-java/
├── docker/
│   ├── docker-compose.yml       Infrastructure: PostgreSQL, etcd, Milvus standalone
│   └── application.yml          Spring config for containerised deployment
├── scripts/
│   └── seed-products.sh         Seed script for 15 sample products
├── src/
│   ├── main/java/com/neuralsearch/
│   │   ├── NeuralSearchApplication.java
│   │   ├── config/
│   │   │   ├── EmbeddingModelConfig.java
│   │   │   ├── MilvusConfig.java
│   │   │   └── MilvusProperties.java
│   │   ├── controller/
│   │   │   ├── ProductController.java
│   │   │   └── SearchController.java
│   │   ├── dto/
│   │   │   ├── PagedResponse.java
│   │   │   ├── ProductRequest.java
│   │   │   ├── ProductResponse.java
│   │   │   └── SearchResult.java
│   │   ├── embedding/
│   │   │   └── ProductEmbeddingService.java
│   │   ├── entity/
│   │   │   └── Product.java
│   │   ├── repository/
│   │   │   └── ProductRepository.java
│   │   ├── service/
│   │   │   ├── ProductService.java
│   │   │   └── SearchService.java
│   │   └── vector/
│   │       ├── MilvusVectorStore.java
│   │       └── VectorSearchResult.java
│   └── main/resources/
│       └── application.yml
└── src/test/java/com/neuralsearch/integration/
    ├── AbstractIntegrationTest.java
    ├── ProductIngestionIntegrationTest.java
    └── SemanticSearchIntegrationTest.java

About

Semantic search engine for an e-commerce product catalog. Replaces traditional keyword search with vector embedding-based semantic search, allowing users to find products using natural language queries like "outfit for a rainy hike in Nairobi"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors