Skip to content

candelatesla/AetherMart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AetherMart Unified Data & AI Platform πŸ›’πŸš€

A Scalable, Hybrid, and AI Powered Infrastructure for Modern E-Commerce.

πŸ“– Project Overview

AetherMart has evolved from a niche garage startup selling custom gaming PCs to a leading e-commerce platform for cutting-edge technology and smart home devices.

It is an end-to-end hybrid data + AI platform that combines relational storage, NoSQL document models, vector search, semantic retrieval, and real-time synchronization to power intelligent e-commerce experiences.

Originally built as a scalable backend system, it evolved into an AI-ready platform, capable of supporting semantic product search, embeddings, and agent-like memory retrieval β€” the same foundational components used in modern LLM-based agent systems.

This project details the engineering of AetherMart's backend data infrastructure. Over the course of six milestones, we transformed a basic relational database into a sophisticated Hybrid Data Platform capable of handling high-volume transactions, providing real-time analytics, and supporting AI-driven features.


⭐ Key Capabilities

πŸ€– AI & Intelligent Retrieval

  • Vector Database (FAISS-style architecture)
  • Semantic Search using Transformer embeddings
  • Natural-language query support (β€œquiet mechanical keyboard under $100”)
  • Embedding pipeline for product descriptions, reviews, and metadata

🧠 Agentic AI Foundations

  • Vector memory layer for persistent context
  • Hybrid structured + unstructured knowledge base
  • Real-time world-state sync for agent reliability
  • Gemini API integration through Python orchestrators

πŸ—„οΈ Data & Storage

  • MariaDB (Galera Cluster for HA + replication scaling)
  • MongoDB (Document-oriented for flexible storage)
  • Hybrid OLTP + NoSQL ecosystem
  • AWS Ubuntu EC2 Instance

βš™οΈ Real-Time Orchestration

  • Python ETL pipelines (pymysql, pymongo)
  • Change Data Capture (CDC) triggers + sync queues
  • Continuous sync worker (MariaDB β†’ MongoDB)

πŸ” Governance & Security

  • RBAC with granular permission levels
  • PII Masking via secure SQL views
  • Data lineage + data quality logging
  • Environment-variable-based credential isolation
  • AWS security Groups

πŸ› οΈ Tech Stack

AI Layer

  • Transformer Embeddings (Gemini / Sentence Transformers)
  • Vector Search (FAISS-style VectorDB)
  • Semantic Search Pipeline

Databases

  • MariaDB
    • Galera Cluster (Multi-master HA)
    • Primary–Replica Replication
  • MongoDB
    • NoSQL Document Store
    • Nested Data Models (profiles, reviews)

Python Orchestration

  • pymysql – Relational DB automation
  • pymongo – MongoDB connectivity
  • orchestrator.py – Full environment automation
  • mongo_sync_worker.py – Real-time CDC sync

Infrastructure

  • AWS EC2 (Ubuntu Linux)
  • Security Groups (port-level restrictions)
  • Environment Variables for credentials

Governance

  • RBAC roles (alex, sarah, maria)
  • PII-masked customer views
  • Data lineage tracking tables
  • Sync audit logs

πŸ—οΈ Architecture Evolution (Milestones 1-6)

πŸ”Ή Milestone 1: The Foundation (Schema Design)

Goal: Establish a robust relational schema in 3rd Normal Form (3NF).

  • Designed the core Entity Relationship Diagram (ERD).
  • Implemented tables for Customers, Products, Orders, Order_Items, and Categories.
  • Enforced data integrity using Primary Keys (PK), Foreign Keys (FK), and constraints.
  • Key Deliverable: milestone1.sql (Core Schema).

πŸ”Ή Milestone 2: Programmability & Automation

Goal: Embed business logic directly into the database for performance and consistency.

  • Stored Procedures: Created procedures for complex workflows (e.g., sp_PlaceOrder to handle transaction atomicity).
  • Triggers: Implemented automation (e.g., auto-decrementing stock levels after an order).
  • Views: Created virtual tables for reporting (e.g., v_QuarterlySales).
  • Key Deliverable: milestone2.sql (Business Logic Layer).

πŸ”Ή Milestone 3: High Availability & Scalability

Goal: Eliminate single points of failure and ensure zero downtime.

  • Standard Replication: Configured a Primary-Replica architecture for read-heavy scaling.
    • Config: server_id, log_bin, read_only=1 on replicas.
  • Galera Cluster: Implemented a multi-master synchronous cluster for high availability.
    • Config: wsrep_on=ON, binlog_format=row, wsrep_cluster_address.
  • Network Security: Configured AWS Security Groups to allow internal communication on ports 3306, 4567, 4568, 4444 strictly between nodes.
  • Key Deliverable: Technical README (Milestone 3).pdf.

πŸ”Ή Milestone 4: Schema Evolution & AI Integration

Goal: Adapt the schema for new business verticals and prepare for AI integration.

  • Updated schema to support "Services" (Consultations) alongside physical products.
  • Optimized indexing for complex queries.
  • Laid the groundwork for vector embeddings by identifying descriptive fields for Semantic Search and implemented through Gemini API.
  • Key Deliverable: milestone4.sql.

πŸ”Ή Milestone 5: NoSQL Integration

Goal: Introduce flexibility for unstructured data (Reviews & Profiles).

  • Provisioned MongoDB v7.0 on AWS EC2.
  • Designed NoSQL document schemas for customer_profiles (nested addresses) and reviews (media arrays).
  • Established basic Python connectivity using pymongo.

πŸ”Ή Milestone 6: Operational Mastery (Real-time Hybrid Sync)

Goal: Create a unified, real-time ecosystem with end-to-end governance.

  • Real-time Synchronization:
    • Implemented Change Data Capture (CDC) using MariaDB triggers (sync.sql).
    • Created persistent Sync Queues (customer_sync_queue, etc.) in MariaDB.
    • Developed a Python Sync Worker (mongo_sync_worker.py) to poll queues and update MongoDB in real-time.
  • Advanced Orchestration:
    • Built orchestrator.py to automate the entire lifecycle: Environment Teardown -> Schema Apply -> Data Gen -> Security Apply -> Initial Migration.
  • Data Governance & Security:
    • RBAC: Defined granular roles (alex, sarah, maria).
    • PII Masking: Created v_customers_masked to hide sensitive data from analysts.
    • Lineage & Quality: Implemented data_lineage_tracker and data_quality_logs tables.
    • Credential Security: Migrated all scripts to use Environment Variables.

πŸš€ How to Run the Project (Final Version)

The entire platform can be spun up using the Master Orchestrator.

Prerequisites

  1. MariaDB and MongoDB installed and running.
  2. Python 3.x installed with dependencies:
    pip install pymysql pymongo
  3. Environment Variables set (recommended) or updated in orchestrator.py:
    export MARIA_DB_PASS="your_pass"
    export MONGO_ADMIN_PASS="your_mongo_pass"

Execution Steps

  1. Run the Orchestrator: This script wipes the DBs, applies all SQL schemas (M1-M6), sets up security, generates data, and performs the initial migration.

    python3 orchestrator.py
  2. Start Real-time Sync: Open a new terminal to keep the sync worker running.

    python3 mongo_sync_worker.py
  3. Verify: Make changes in MariaDB (e.g., UPDATE Customers...) and observe them reflect instantly in MongoDB.


πŸ“‚ Project Structure

β”œβ”€β”€ orchestrator.py        # MASTER SCRIPT: Sets up the entire environment
β”œβ”€β”€ generator.py           # Generates dummy data for MariaDB
β”œβ”€β”€ migrate2.py            # Performs initial bulk load from MariaDB -> MongoDB
β”œβ”€β”€ mongo_sync_worker.py   # REAL-TIME WORKER: Polls queues and syncs data
β”‚
β”œβ”€β”€ SQL_Scripts/
β”‚   β”œβ”€β”€ milestone1.sql     # Core Schema
β”‚   β”œβ”€β”€ milestone2.sql     # Stored Procedures & Views
β”‚   β”œβ”€β”€ milestone4.sql     # Schema Evolution
β”‚   β”œβ”€β”€ security.sql       # RBAC, PII Views, Governance Tables
β”‚   └── sync.sql           # Sync Triggers & Queue Definitions
β”‚
└── Docs/
    β”œβ”€β”€ Project.docx       # Original Project Scope
    └── Technical_M3.pdf   # Clustering Documentation
    β”œβ”€β”€ Technical_M4.pdf   # Vector Serach, Gemini API, ETL Pipeline
    └── Technical_M5.pdf   # MariaDB to MongoDB
└── Test Scripts/


πŸ“½οΈ Project Artifacts (Presentations & Demos)

Below are the deliverables for each phase of the AetherMart project.

Watch Playlist

Milestone Focus Area Presentation (PPT) Video
Milestone 1 Schema Design & Normalization View PPT Watch Video
Milestone 2 Advanced SQL & Automation View PPT Watch Video
Milestone 3 High Availability (Galera/Replication) View PPT Watch Video
Milestone 4 Schema Evolution & Optimization View PPT Watch Video
Milestone 5 NoSQL Introduction View PPT Watch Video
Milestone 6 Final Mastery (Hybrid Real-time Sync) View PPT Watch Video

🀝 Contributing

This repository is actively expanding. Feel free to:

  • Open issues
  • Contribute enhancements via pull requests

License / Copyright

Β© 2026 Yash Chetan Doshi. All rights reserved.

You may not copy, modify, distribute, or use any part of this repository or its contents without prior written permission from the author.

πŸ”— Connect

Yash Doshi Email

About

A Scalable, Hybrid, and AI Powered Infrastructure for Modern E-Commerce.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors