🇻🇳 Tiếng Việt | 🇬🇧 English — Tài liệu song ngữ / Bilingual documentation
[VN] Đây là dự án học tập của Nguyễn Sơn — xây dựng một Internal Developer Platform hoàn chỉnh với kiến trúc production-ready. Mọi góp ý, feedback, hoặc pull request đều được chào đón. Hãy mở issue hoặc liên hệ trực tiếp nếu bạn có ý tưởng cải thiện!
[EN] This is a learning project by Nguyen Son — building a complete Internal Developer Platform with production-ready architecture. All feedback, suggestions, and pull requests are welcome. Feel free to open an issue or reach out directly if you have ideas for improvement!
Enterprise-grade platform engineering solution for modern DevOps teams
A production-ready Internal Developer Platform that enables engineering teams to self-service infrastructure provisioning, application deployments, and environment management through a unified portal. Built with cloud-native principles, GitOps workflows, and enterprise security patterns.
Một Internal Developer Platform cấp production giúp các team kỹ thuật tự phục vụ việc cung cấp hạ tầng, triển khai ứng dụng, và quản lý môi trường thông qua một portal thống nhất. Được xây dựng theo nguyên tắc cloud-native, quy trình GitOps, và các mẫu bảo mật doanh nghiệp.
Dự án bao gồm:
- Portal tự phục vụ (React) — quản lý service, deployment, environment
- API Gateway (Node.js/Express) — xác thực, phân quyền, rate limiting
- Hạ tầng dưới dạng mã (Terraform) — 10 module cho AWS (EKS, RDS, VPC...)
- CI/CD hoàn chỉnh (GitHub Actions) — 14 workflow tự động
- Observability stack — Prometheus, Grafana, Jaeger, Loki
- Service mesh (Istio) + Canary deployment (Flagger)
- Chaos engineering (LitmusChaos) cho kiểm thử độ bền
- 14 GitHub Actions workflows covering CI, CD, security scanning, and compliance
- 10 Terraform modules for AWS infrastructure provisioning (EKS, RDS, ElastiCache, VPC)
- Full Kubernetes stack with Istio service mesh, Flagger canary deployments, and chaos engineering
- Event-driven architecture using NATS JetStream for real-time deployment notifications
- Comprehensive documentation including ADRs, runbooks, SLOs, and onboarding guides
- Monorepo with Turborepo for optimized builds with remote caching
Self-service UI for managing services, deployments, and environments.
| Login | Dashboard | Service Catalog |
|---|---|---|
![]() |
![]() |
![]() |
| Deployments | Health Monitoring | Environments |
|---|---|---|
![]() |
![]() |
![]() |
| Full Portal View |
|---|
![]() |
| Incidents | Audit Log | API Docs |
|---|---|---|
![]() |
![]() |
![]() |
| API Health |
|---|
![]() |
┌─────────────────────────────────────────────────────────────────────────┐
│ Developer Portal (React) │
│ Self-Service UI • Service Catalog │
└──────────────────────────────────┬──────────────────────────────────────┘
│ HTTPS/WSS
┌──────────────────────────────────▼──────────────────────────────────────┐
│ API Gateway (Node.js) │
│ Auth • RBAC • Rate Limiting • Audit Logging │
└───────┬──────────────┬──────────────┬──────────────┬────────────────────┘
│ │ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Service │ │ Infra │ │ Deploy │ │ Config │
│ Catalog │ │Provision│ │ Engine │ │ Mgmt │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
┌───────▼──────────────▼──────────────▼──────────────▼────────────────────┐
│ Infrastructure Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Kubernetes│ │Terraform │ │ ArgoCD │ │PostgreSQL│ │ Redis │ │
│ │ (EKS) │ │ (IaC) │ │ (GitOps) │ │ (DB) │ │ (Cache) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Observability Stack │
│ Prometheus • Grafana • Loki • AlertManager • Jaeger │
└─────────────────────────────────────────────────────────────────────────┘
| Category | Features |
|---|---|
| Self-Service Portal | Service catalog, one-click deployments, environment provisioning |
| Security & Compliance | RBAC, audit logging, secret management, vulnerability scanning |
| Infrastructure as Code | Terraform modules, Kubernetes manifests, GitOps with ArgoCD |
| CI/CD Pipelines | Multi-stage builds, automated testing, canary deployments |
| Observability | Prometheus metrics, Grafana dashboards, distributed tracing |
| Notifications | Real-time WebSocket updates, Slack integration, PagerDuty |
| Multi-Environment | Dev, staging, production with automated promotion workflows |
| Service Mesh | Istio for traffic management, mTLS, and observability |
| Chaos Engineering | LitmusChaos experiments for resilience testing |
| Performance | Turborepo caching, Docker layer optimization, CDN integration |
- Docker and Docker Compose v2.20+
- Node.js >= 20.0.0
- pnpm >= 8.0.0
# Clone the repository
git clone https://github.com/JasonTM17/Internal_Developer_Platform_DevOps.git
cd Internal_Developer_Platform_DevOps
# Copy environment configuration
cp .env.example .env
# Start all services
docker compose up -d
# Access the platform
# Portal: http://localhost:5173
# API: http://localhost:3000
# Grafana: http://localhost:3001# Install dependencies
pnpm install
# Start development servers
pnpm dev
# Run tests
pnpm test
# Build all packages
pnpm build| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 18, TypeScript, Vite | Developer portal UI |
| Backend | Node.js, Express, TypeScript | API server |
| Database | PostgreSQL 16 | Primary data store |
| Cache & Queues | Redis 7, BullMQ | Session management, job queues |
| Event Bus | NATS JetStream | Async event-driven messaging |
| Container Orchestration | Kubernetes (EKS) | Production workloads |
| Service Mesh | Istio | Traffic management, mTLS |
| Infrastructure as Code | Terraform | Cloud resource provisioning |
| GitOps | ArgoCD | Continuous deployment |
| CI/CD | GitHub Actions | Build, test, deploy pipelines |
| Monitoring | Prometheus, Grafana | Metrics and dashboards |
| Logging | Loki, Promtail | Centralized log aggregation |
| Tracing | Jaeger, OpenTelemetry | Distributed tracing |
| Security | Trivy, Snyk, CodeQL | Vulnerability scanning |
| Monorepo | Turborepo, pnpm | Build orchestration |
| Testing | Vitest, Playwright | Unit, integration, E2E |
| Code Quality | ESLint, Prettier, Husky | Linting and formatting |
├── apps/
│ ├── api/ # Backend API service (Express + TypeScript)
│ └── portal/ # Frontend developer portal (React + Vite)
├── packages/
│ ├── shared/ # Shared utilities and types
│ ├── ui/ # Shared UI component library
│ ├── cli/ # CLI tool for platform operations
│ └── config/ # Shared ESLint, TypeScript, Prettier configs
├── infra/
│ ├── terraform/ # 10 IaC modules (EKS, RDS, VPC, IAM, etc.)
│ ├── kubernetes/ # K8s manifests and Helm charts
│ ├── argocd/ # GitOps application definitions
│ ├── istio/ # Service mesh configuration
│ ├── flagger/ # Canary deployment automation
│ ├── chaos/ # LitmusChaos experiments
│ └── monitoring/ # Prometheus, Grafana, Loki, Jaeger
├── docs/
│ ├── adr/ # 10 Architecture Decision Records
│ ├── api/ # OpenAPI 3.1 spec + auth/pagination docs
│ ├── architecture/ # Technology radar, system diagrams
│ ├── runbooks/ # Disaster recovery, incident response
│ ├── slo/ # Service Level Objectives
│ └── onboarding/ # Developer onboarding guides
├── scripts/ # Automation and utility scripts
├── .github/
│ ├── workflows/ # 14 CI/CD pipeline definitions
│ └── ISSUE_TEMPLATE/ # Issue and PR templates
├── docker-compose.yaml # Local development environment
├── turbo.json # Turborepo pipeline configuration
└── pnpm-workspace.yaml # Monorepo workspace definition
| Workflow | Trigger | Purpose |
|---|---|---|
| CI | Push, PR | Lint, test, build, type-check |
| Docker Build | Tags, Dockerfile edits | Multi-service container builds to GHCR |
| Security Scan | Weekly + manual | Trivy, Snyk, CodeQL, TruffleHog, Gitleaks |
| CD Dev | Push to develop | Auto-deploy to development environment |
| CD Staging | Push to release/* | Deploy to staging with integration tests |
| CD Production | Manual approval | Blue-green deploy with canary analysis |
| Terraform Plan | PR with infra changes | Preview infrastructure changes |
| Terraform Apply | Merge to main | Apply approved infrastructure changes |
| Release | Version tags | Semantic versioning and changelog |
| Compliance Audit | Weekly | License check, SBOM generation |
| Environment | Branch | URL | Strategy |
|---|---|---|---|
| Development | develop |
dev.idp.internal |
Auto on push |
| Staging | release/* |
staging.idp.internal |
Auto on push |
| Production | main |
idp.internal |
Canary + manual |
Code Push → CI Pipeline → Build → Security Scan → Deploy to Dev
↓
Integration Tests
↓
Promote to Staging
↓
E2E Tests + Canary Analysis
↓
Deploy to Production (manual gate)
# Automatic rollback on failed health checks (via Flagger)
# Manual rollback via ArgoCD
argocd app rollback <app-name>
# Or via kubectl
kubectl rollout undo deployment/<deployment-name>Published to GitHub Container Registry:
ghcr.io/jasontm17/internal_developer_platform_devops/idp-api:latest
ghcr.io/jasontm17/internal_developer_platform_devops/idp-portal:latest
Tagged releases:
ghcr.io/jasontm17/internal_developer_platform_devops/idp-api:v1.0.0
ghcr.io/jasontm17/internal_developer_platform_devops/idp-portal:v1.0.0
| Document | Description |
|---|---|
| Architecture Overview | System design and technology radar |
| API Documentation | OpenAPI 3.1 spec and examples |
| ADR Records | 10 Architecture Decision Records |
| Operations Runbooks | Disaster recovery, incident response |
| SLO Definitions | Service Level Objectives |
| Onboarding Guide | New developer setup guide |
| Security Policy | Vulnerability reporting process |
| Contributing Guide | How to contribute |
| Changelog | Release history |
| Release Process | Versioning and release workflow |
| Branching Strategy | Git workflow documentation |
| Roadmap | Feature roadmap and milestones |
# Development
pnpm dev # Start all services in dev mode
pnpm build # Build all packages
pnpm clean # Clean build artifacts
# Code Quality
pnpm lint # Run ESLint across all packages
pnpm lint:fix # Auto-fix lint issues
pnpm format # Format code with Prettier
pnpm format:check # Check formatting
pnpm typecheck # TypeScript type checking
# Testing
pnpm test # Run all tests
pnpm test:unit # Run unit tests only
pnpm test:integration # Run integration tests
# Infrastructure
make terraform-plan # Preview infrastructure changes
make terraform-apply # Apply infrastructure changes
make k8s-deploy # Deploy to KubernetesContributions are welcome. See the Contributing Guide for details.
[VN] Mọi đóng góp đều được hoan nghênh! Dù là sửa lỗi nhỏ, cải thiện tài liệu, hay đề xuất tính năng mới — hãy thoải mái mở issue hoặc pull request. Đây là dự án học tập nên mình rất trân trọng mọi góp ý từ cộng đồng.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project uses Conventional Commits:
feat:New featuresfix:Bug fixesdocs:Documentation changeschore:Maintenance tasksci:CI/CD changesrefactor:Code refactoringtest:Test additions/changesperf:Performance improvements
This project is licensed under the MIT License. See the LICENSE file for details.










