A comprehensive microservices-based recommendation engine that uses image vectorization and textual similarity to provide accurate product recommendations. The system employs Change Data Capture (CDC), vector embeddings, and graph relationships to create a robust recommendation platform.
The recommendation system consists of several microservices working together:
- Entities Server: Manages product data in PostgreSQL
- Vectorization Server: Converts product images and text fields into vector embeddings
- Relation Builder: Creates and manages similarity relationships between products
- Recommendation Server: Handles recommendation requests and processing
- Infrastructure: Contains shared services like Kafka, Zookeeper, Redis, PostgreSQL, and Neo4j
The system uses Debezium to capture changes in the PostgreSQL database in real-time. When a product is added, updated, or deleted:
- Debezium captures the change and sends it to Kafka
- The Recommendation Server consumes these events from Kafka
- The data flows through the recommendation pipeline for processing
This CDC approach ensures recommendations remain current without requiring batch processing or manual triggers.
The system recommends products based on two primary factors:
- Image Similarity: Uses ResNet50 to generate vector embeddings from product images
- Text Field Similarity: Uses Sentence Transformers to create vector embeddings from configurable text fields (e.g., name, description)
The fields used for similarity calculations are configurable in the .env
file through the FIELDS_LETTER
parameter. For example, FIELDS_LETTER=name.description
configures the system to use both name and description fields for text similarity.
The system calculates similarity using cosine similarity between vector embeddings:
-
Cosine Similarity: Measures the cosine of the angle between two vectors, providing a similarity score between -1 and 1
func CosineSimilarity(vec1, vec2 []float64) float64 { var normAB, normA, normB float64 for i := 0; i < len(vec1); i++ { normAB += vec1[i] * vec2[i] normA += vec1[i] * vec1[i] normB += vec2[i] * vec2[i] } if normA == 0 || normB == 0 { return 0 } return normAB / (math.Sqrt(normA) * math.Sqrt(normB)) }
-
Configurable Weights: The system allows setting thresholds for image and text similarity to determine when relationships should be created
- Image similarity threshold (
w_image
) - Text information similarity threshold (
w_info
)
- Image similarity threshold (
The system uses Neo4j graph database to store and query relationships between products:
-
Relationship Types:
SIMILAR_TO
: Represents image-based similarityINFO_SIMILAR_TO
: Represents text-based similarity
-
Recommendation Query: The recommendation algorithm combines both similarity scores to provide a total score for recommendations:
MATCH (p:Product {product_id: $product_id})-[:SIMILAR_TO|INFO_SIMILAR_TO]-(recommended:Product) OPTIONAL MATCH (p)-[r1:SIMILAR_TO]-(recommended) OPTIONAL MATCH (p)-[r2:INFO_SIMILAR_TO]-(recommended) WITH recommended, COALESCE(SUM(r1.score), 0) AS similarity_score, COALESCE(SUM(r2.score), 0) AS info_similarity_score WITH recommended, similarity_score + info_similarity_score AS total_score RETURN recommended, total_score ORDER BY total_score DESC LIMIT 10
-
Product Creation/Update:
- Product data is stored in PostgreSQL
- Debezium captures the change and publishes to Kafka topic
productdb_server.public.products
-
Vectorization:
- Recommendation Server consumes the product event and publishes to
kafka.create.vector
topic - Vectorization Server processes the product data:
- Generates image vectors using ResNet50
- Generates text vectors for specified fields (name, description) using Sentence Transformers
- Publishes vector data to
kafka.vector.created
topic
- Recommendation Server consumes the product event and publishes to
-
Relationship Building:
- Relation Builder consumes the vector data
- Calculates similarity with existing products
- Creates relationships in Neo4j when similarity exceeds thresholds
- Stores recommendations in Redis for quick access
-
Recommendation Serving:
- When a recommendation is requested for a product, the system queries Neo4j
- Returns products with highest combined similarity scores
The system uses environment variables for configuration:
- Field Selection:
FIELDS_LETTER
determines which text fields are used for similarity calculation - Vectorization Models:
- Image vectors: ResNet50
- Text vectors: Configurable (default: all-MiniLM-L6-v2)
- Vector Dimensions: Configurable vector sizes
- Similarity Thresholds: Determine when to create relationships between products
Run the system using the provided start script:
./start.sh
This will start all required services in Docker containers.
This project is currently in its core development phase, focusing on the fundamental recommendation engine functionality. Future enhancements may include:
- User preference integration
- A/B testing framework
- Performance optimization
- Additional recommendation algorithms
- User interface for system configuration