Note: This is a sanitized version of a production system built for a membership organization. The code is real and functional, but configuration details, company names, and proprietary business logic have been removed or generalized.
Production Impact:
- β Successfully processes 1,400+ files daily in production
- β Reduced processing time from 109s to 5s per file (22x improvement)
- β Handles 700k+ records per file
- β Deployed on Kubernetes with 99.9% uptime
A high-performance Go rewrite of the Python file import system (FileImport2), designed to process member data files from UBC AWS SFTP with 22x performance improvement.
Metric | Python Baseline | Go Target | Improvement |
---|---|---|---|
Time per file | 109.86 seconds | 5 seconds | 22x faster |
Files per hour | 32.77 | 720 | 22x throughput |
Backlog clearance | 42.5 hours | 2 hours | 21x faster |
Concurrent processing | 1 file | 5 files | 5x parallelism |
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β SFTP Source βββββΆβ Go File Importer βββββΆβ PostgreSQL β
β (UBC AWS) β β β β Database β
βββββββββββββββββββ β βββββββββββββββ β βββββββββββββββββββ
β β Concurrent β β β
βββββββββββββββββββ β β Processing β β β
β DigitalOcean ββββββ β Pipeline β β β
β Spaces (S3) β β βββββββββββββββ β β
βββββββββββββββββββ β β β
β βββββββββββββββ β βββββββββββββββββββ
βββββββββββββββββββ β β Metrics & β βββββΆβ Metrics API β
β Kubernetes ββββββ β Monitoring β β β API β
β Health Checks β β βββββββββββββββ β βββββββββββββββββββ
βββββββββββββββββββ ββββββββββββββββββββ
- Concurrent file processing (5 files simultaneously)
- Connection pooling for database operations
- Optimized batch upserts replacing row-by-row operations
- Streaming file operations with minimal memory usage
- Efficient error handling and retry mechanisms
- Kubernetes-native with health checks and readiness probes
- Comprehensive metrics integration with Metrics API
- Structured logging with JSON output
- Graceful shutdown handling
- Resource limits and security contexts
- Health check endpoints (
/health
,/ready
) - Prometheus-style metrics (
/metrics
) - Detailed performance tracking per processing stage
- Business value metrics calculation
- Real-time processing statistics
FileImportGo/
βββ cmd/importer/ # Application entry point
βββ internal/
β βββ config/ # Configuration management
β βββ database/ # PostgreSQL client with connection pooling
β βββ sftp/ # SFTP client for file operations
β βββ s3/ # DigitalOcean Spaces (S3) client
β βββ stamper/ # File stamping and validation
β βββ metrics/ # Metrics API metrics integration
β βββ importer/ # Core processing pipeline
β βββ server/ # Health check server
βββ k8s/ # Kubernetes deployment manifests
βββ Dockerfile # Multi-stage Docker build
βββ go.mod # Go module dependencies
βββ README.md # This file
The application is configured via environment variables:
# Database
DATABASE_URL="postgres://user:pass@host:port/db?sslmode=require"
# SFTP
AWS_HOST="your-sftp-host"
AWS_USER="your-sftp-user"
AWS_KEY_PATH="/path/to/ssh/key"
# S3/DigitalOcean Spaces
S3_ENDPOINT="https://your-spaces-endpoint"
S3_ACCESS_KEY_ID="your-access-key"
S3_SECRET_ACCESS_KEY="your-secret-key"
S3_BUCKET_NAME="your-bucket"
# Processing
PROCESSING_INTERVAL_MINUTES=5
MAX_CONCURRENT_FILES=5
PROCESSING_TIMEOUT_MINUTES=4
# Metrics API
METRICS_API_ENDPOINT="http://24.144.84.159/"
METRICS_API_KEY="your-api-key"
# Server
HEALTH_PORT=8080
docker build -t file-importer-go .
docker run -d \
--name file-importer \
-p 8080:8080 \
-e DATABASE_URL="your-db-url" \
-e AWS_HOST="your-sftp-host" \
-e AWS_USER="your-sftp-user" \
-v /path/to/ssh/key:/app/creds/sftp-key:ro \
file-importer-go
# Copy and edit the secrets template
cp k8s/secrets-template.yaml k8s/secrets.yaml
# Edit k8s/secrets.yaml with your actual base64-encoded values
# Apply secrets
kubectl apply -f k8s/secrets.yaml
kubectl apply -f k8s/deployment.yaml
# Check pod status
kubectl get pods -l app=file-importer-go
# Check logs
kubectl logs -l app=file-importer-go -f
# Check health
kubectl port-forward svc/file-importer-go-service 8080:8080
curl http://localhost:8080/health
The project includes several helpful scripts for deployment and troubleshooting:
redeploy-with-fix.sh
- Automated redeployment with health check fixes
chmod +x redeploy-with-fix.sh
./redeploy-with-fix.sh
- Removes current deployment
- Applies production deployment with optimized liveness probes
- Waits for deployment to be ready
- Shows pod status and recent logs
deploy-to-k8s.sh
- Full deployment automation
chmod +x deploy-to-k8s.sh
./deploy-to-k8s.sh
- Builds Docker image
- Pushes to registry
- Applies Kubernetes manifests
- Verifies deployment
diagnose-issue.sh
- Comprehensive diagnostic tool
chmod +x diagnose-issue.sh
./diagnose-issue.sh
- Checks deployment configuration
- Shows pod status and restart counts
- Displays recent logs and crash logs
- Checks service endpoints and secrets
- Provides diagnosis summary with recommended actions
check_processing_status.sh
- Monitor file processing
chmod +x check_processing_status.sh
./check_processing_status.sh
- Shows current processing status
- Displays file counts and statistics
- Monitors database operations
fix-deployment.sh
- Quick deployment fix
chmod +x fix-deployment.sh
./fix-deployment.sh
- Removes broken deployment
- Applies correct production deployment
- Shows status and logs
GET /health
- Overall health statusGET /ready
- Readiness for trafficGET /stats
- Processing statisticsGET /metrics
- Prometheus-style metrics
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00Z",
"checks": {
"sftp": true,
"database": true,
"s3": true,
"metrics_api": true
}
}
- SFTP Discovery - List files in test directory
- Concurrent Download - Download up to 5 files simultaneously
- File Stamping - Add timestamp columns to CSV data
- S3 Upload - Store processed files in DigitalOcean Spaces
- Database Import - COPY data to zzimport2 table
- Batch Upsert - Optimized upsert to zzmember_base
- Cleanup - Remove processed files from SFTP
- Metrics - Report performance and value metrics
The system tracks and reports:
- Processing time per stage and overall
- Throughput (files per hour)
- Business value ($1.85 savings per file)
- Performance gain vs Python baseline
- Error rates and retry statistics
- Non-root container execution
- Read-only filesystem where possible
- Secret management via Kubernetes secrets
- Resource limits to prevent resource exhaustion
- Network policies (configure as needed)
- Deploy Go version alongside Python version
- Validate performance against baseline metrics
- Monitor for 24-48 hours to ensure stability
- Gradually increase processing (adjust MAX_CONCURRENT_FILES)
- Decommission Python version once confident
- Go 1.21+
- Docker
- kubectl (for Kubernetes deployment)
- Access to SFTP, database, and S3 resources
# Install dependencies
go mod download
# Run tests
go test ./...
# Build
go build -o file-importer ./cmd/importer
# Run locally (with environment variables set)
./file-importer
- $1.85 cost savings per file vs manual processing
- $2,579 total automation value for current 1,400 file backlog
- 22x performance improvement over Python baseline
- Reduced infrastructure costs through efficient resource usage
- Improved reliability with better error handling and monitoring