A Scalable, Hybrid, and AI Powered Infrastructure for Modern E-Commerce.
AetherMart has evolved from a niche garage startup selling custom gaming PCs to a leading e-commerce platform for cutting-edge technology and smart home devices.
It is an end-to-end hybrid data + AI platform that combines relational storage, NoSQL document models, vector search, semantic retrieval, and real-time synchronization to power intelligent e-commerce experiences.
Originally built as a scalable backend system, it evolved into an AI-ready platform, capable of supporting semantic product search, embeddings, and agent-like memory retrieval β the same foundational components used in modern LLM-based agent systems.
This project details the engineering of AetherMart's backend data infrastructure. Over the course of six milestones, we transformed a basic relational database into a sophisticated Hybrid Data Platform capable of handling high-volume transactions, providing real-time analytics, and supporting AI-driven features.
- Vector Database (FAISS-style architecture)
- Semantic Search using Transformer embeddings
- Natural-language query support (βquiet mechanical keyboard under $100β)
- Embedding pipeline for product descriptions, reviews, and metadata
- Vector memory layer for persistent context
- Hybrid structured + unstructured knowledge base
- Real-time world-state sync for agent reliability
- Gemini API integration through Python orchestrators
- MariaDB (Galera Cluster for HA + replication scaling)
- MongoDB (Document-oriented for flexible storage)
- Hybrid OLTP + NoSQL ecosystem
- AWS Ubuntu EC2 Instance
- Python ETL pipelines (
pymysql,pymongo) - Change Data Capture (CDC) triggers + sync queues
- Continuous sync worker (MariaDB β MongoDB)
- RBAC with granular permission levels
- PII Masking via secure SQL views
- Data lineage + data quality logging
- Environment-variable-based credential isolation
- AWS security Groups
- Transformer Embeddings (Gemini / Sentence Transformers)
- Vector Search (FAISS-style VectorDB)
- Semantic Search Pipeline
- MariaDB
- Galera Cluster (Multi-master HA)
- PrimaryβReplica Replication
- MongoDB
- NoSQL Document Store
- Nested Data Models (profiles, reviews)
pymysqlβ Relational DB automationpymongoβ MongoDB connectivityorchestrator.pyβ Full environment automationmongo_sync_worker.pyβ Real-time CDC sync
- AWS EC2 (Ubuntu Linux)
- Security Groups (port-level restrictions)
- Environment Variables for credentials
- RBAC roles (
alex,sarah,maria) - PII-masked customer views
- Data lineage tracking tables
- Sync audit logs
Goal: Establish a robust relational schema in 3rd Normal Form (3NF).
- Designed the core Entity Relationship Diagram (ERD).
- Implemented tables for
Customers,Products,Orders,Order_Items, andCategories. - Enforced data integrity using Primary Keys (PK), Foreign Keys (FK), and constraints.
- Key Deliverable:
milestone1.sql(Core Schema).
Goal: Embed business logic directly into the database for performance and consistency.
- Stored Procedures: Created procedures for complex workflows (e.g.,
sp_PlaceOrderto handle transaction atomicity). - Triggers: Implemented automation (e.g., auto-decrementing stock levels after an order).
- Views: Created virtual tables for reporting (e.g.,
v_QuarterlySales). - Key Deliverable:
milestone2.sql(Business Logic Layer).
Goal: Eliminate single points of failure and ensure zero downtime.
- Standard Replication: Configured a Primary-Replica architecture for read-heavy scaling.
- Config:
server_id,log_bin,read_only=1on replicas.
- Config:
- Galera Cluster: Implemented a multi-master synchronous cluster for high availability.
- Config:
wsrep_on=ON,binlog_format=row,wsrep_cluster_address.
- Config:
- Network Security: Configured AWS Security Groups to allow internal communication on ports
3306,4567,4568,4444strictly between nodes. - Key Deliverable:
Technical README (Milestone 3).pdf.
Goal: Adapt the schema for new business verticals and prepare for AI integration.
- Updated schema to support "Services" (Consultations) alongside physical products.
- Optimized indexing for complex queries.
- Laid the groundwork for vector embeddings by identifying descriptive fields for Semantic Search and implemented through Gemini API.
- Key Deliverable:
milestone4.sql.
Goal: Introduce flexibility for unstructured data (Reviews & Profiles).
- Provisioned MongoDB v7.0 on AWS EC2.
- Designed NoSQL document schemas for
customer_profiles(nested addresses) andreviews(media arrays). - Established basic Python connectivity using
pymongo.
Goal: Create a unified, real-time ecosystem with end-to-end governance.
- Real-time Synchronization:
- Implemented Change Data Capture (CDC) using MariaDB triggers (
sync.sql). - Created persistent Sync Queues (
customer_sync_queue, etc.) in MariaDB. - Developed a Python Sync Worker (
mongo_sync_worker.py) to poll queues and update MongoDB in real-time.
- Implemented Change Data Capture (CDC) using MariaDB triggers (
- Advanced Orchestration:
- Built
orchestrator.pyto automate the entire lifecycle: Environment Teardown -> Schema Apply -> Data Gen -> Security Apply -> Initial Migration.
- Built
- Data Governance & Security:
- RBAC: Defined granular roles (
alex,sarah,maria). - PII Masking: Created
v_customers_maskedto hide sensitive data from analysts. - Lineage & Quality: Implemented
data_lineage_trackeranddata_quality_logstables. - Credential Security: Migrated all scripts to use Environment Variables.
- RBAC: Defined granular roles (
The entire platform can be spun up using the Master Orchestrator.
- MariaDB and MongoDB installed and running.
- Python 3.x installed with dependencies:
pip install pymysql pymongo
- Environment Variables set (recommended) or updated in
orchestrator.py:export MARIA_DB_PASS="your_pass" export MONGO_ADMIN_PASS="your_mongo_pass"
-
Run the Orchestrator: This script wipes the DBs, applies all SQL schemas (M1-M6), sets up security, generates data, and performs the initial migration.
python3 orchestrator.py
-
Start Real-time Sync: Open a new terminal to keep the sync worker running.
python3 mongo_sync_worker.py
-
Verify: Make changes in MariaDB (e.g.,
UPDATE Customers...) and observe them reflect instantly in MongoDB.
βββ orchestrator.py # MASTER SCRIPT: Sets up the entire environment
βββ generator.py # Generates dummy data for MariaDB
βββ migrate2.py # Performs initial bulk load from MariaDB -> MongoDB
βββ mongo_sync_worker.py # REAL-TIME WORKER: Polls queues and syncs data
β
βββ SQL_Scripts/
β βββ milestone1.sql # Core Schema
β βββ milestone2.sql # Stored Procedures & Views
β βββ milestone4.sql # Schema Evolution
β βββ security.sql # RBAC, PII Views, Governance Tables
β βββ sync.sql # Sync Triggers & Queue Definitions
β
βββ Docs/
βββ Project.docx # Original Project Scope
βββ Technical_M3.pdf # Clustering Documentation
βββ Technical_M4.pdf # Vector Serach, Gemini API, ETL Pipeline
βββ Technical_M5.pdf # MariaDB to MongoDB
βββ Test Scripts/
Below are the deliverables for each phase of the AetherMart project.
| Milestone | Focus Area | Presentation (PPT) | Video |
|---|---|---|---|
| Milestone 1 | Schema Design & Normalization | View PPT | Watch Video |
| Milestone 2 | Advanced SQL & Automation | View PPT | Watch Video |
| Milestone 3 | High Availability (Galera/Replication) | View PPT | Watch Video |
| Milestone 4 | Schema Evolution & Optimization | View PPT | Watch Video |
| Milestone 5 | NoSQL Introduction | View PPT | Watch Video |
| Milestone 6 | Final Mastery (Hybrid Real-time Sync) | View PPT | Watch Video |
This repository is actively expanding. Feel free to:
- Open issues
- Contribute enhancements via pull requests
Β© 2026 Yash Chetan Doshi. All rights reserved.
You may not copy, modify, distribute, or use any part of this repository or its contents without prior written permission from the author.