aws-samples · ybezsonov · Oct 2, 2025 · Sep 30, 2025 · Sep 30, 2025 · Sep 30, 2025
diff --git a/apps/jvm-analysis-service/.gitattributes b/apps/jvm-analysis-service/.gitattributes
@@ -0,0 +1,2 @@
+/mvnw text eol=lf
+*.cmd text eol=crlf
diff --git a/apps/jvm-analysis-service/.gitignore b/apps/jvm-analysis-service/.gitignore
@@ -0,0 +1,33 @@
+HELP.md
+target/
+.mvn/wrapper/maven-wrapper.jar
+!**/src/main/**/target/
+!**/src/test/**/target/
+
+### STS ###
+.apt_generated
+.classpath
+.factorypath
+.project
+.settings
+.springBeans
+.sts4-cache
+
+### IntelliJ IDEA ###
+.idea
+*.iws
+*.iml
+*.ipr
+
+### NetBeans ###
+/nbproject/private/
+/nbbuild/
+/dist/
+/nbdist/
+/.nb-gradle/
+build/
+!**/src/main/**/build/
+!**/src/test/**/build/
+
+### VS Code ###
+.vscode/
diff --git a/apps/jvm-analysis-service/README.md b/apps/jvm-analysis-service/README.md
@@ -0,0 +1,260 @@
+# JVM Analysis Service
+
+A Spring Boot microservice that provides automated JVM performance analysis using AI-powered recommendations. The service processes alert webhooks, retrieves thread dumps and profiling data, generates flame graphs, and provides intelligent analysis using AWS Bedrock.
+
+## Features
+
+- **Automated JVM Analysis**: Processes performance alerts and generates comprehensive analysis reports
+- **AI-Powered Recommendations**: Uses AWS Bedrock (Claude 3.7 Sonnet) for intelligent performance insights
+- **Flame Graph Generation**: Converts profiling data to interactive HTML flame graphs
+- **S3 Integration**: Stores and retrieves profiling data, thread dumps, and analysis results
+- **Resilient Design**: Built-in retry mechanisms for external service calls
+
+## Architecture
+
+### Components
+
+- **JvmAnalysisController**: REST API endpoint for webhook processing
+- **JvmAnalysisService**: Core business logic orchestrating the analysis workflow
+- **AIRecommendation**: AWS Bedrock integration for AI-powered analysis
+- **FlameGraphConverter**: Converts collapsed profiling data to HTML flame graphs
+- **S3Connector**: Handles all S3 operations for data storage and retrieval
+
+### Workflow
+
+1. Receives alert webhook with pod information
+2. Retrieves thread dump from target pod
+3. Fetches latest profiling data from S3
+4. Converts profiling data to flame graph
+5. Analyzes performance using AI recommendations
+6. Stores results (thread dump, flame graph, analysis) in S3
+
+## API Reference
+
+### POST /webhook
+
+Processes performance alert webhooks and triggers JVM analysis.
+
+**Request Body:**
+```json
+{
+  "alerts": [
+    {
+      "labels": {
+        "pod": "my-app-pod-123",
+        "instance": "10.0.1.100:8080"
+      }
+    }
+  ]
+}
+```
+
+**Response:**
+```json
+{
+  "message": "Processed alerts",
+  "count": 1
+}
+```
+
+**Status Codes:**
+- `200 OK`: Successfully processed alerts
+- `400 Bad Request`: Invalid request format
+- `500 Internal Server Error`: Processing failed
+
+### Health Endpoints
+
+- `GET /actuator/health`: Application health status
+- `GET /health`: Custom health endpoint for readiness probe
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AWS_REGION` | AWS region for services | `us-east-1` |
+| `AWS_S3_BUCKET` | S3 bucket for data storage | `default_bucket_name` |
+| `AWS_S3_PREFIX_ANALYSIS` | S3 prefix for analysis results | `analysis/` |
+| `AWS_S3_PREFIX_PROFILING` | S3 prefix for profiling data | `profiling/` |
+| `AWS_BEDROCK_MODEL_ID` | Bedrock model identifier | `us.anthropic.claude-3-7-sonnet-20250219-v1:0` |
+| `AWS_BEDROCK_MAX_TOKENS` | Maximum tokens for AI analysis | `10000` |
+| `THREADDUMP_URL_TEMPLATE` | Thread dump endpoint template | `http://{podIp}:8080/actuator/threaddump` |
+
+### Application Properties
+
+```properties
+# Resilience4J retry configuration
+resilience4j.retry.instances.threadDump.max-attempts=3
+resilience4j.retry.instances.threadDump.wait-duration=2s
+resilience4j.retry.instances.threadDump.exponential-backoff-multiplier=2
+```
+
+## Prerequisites
+
+- Java 21+
+- Maven 3.6+
+- AWS Account with appropriate permissions
+- S3 bucket for data storage
+- AWS Bedrock access (Claude 3.7 Sonnet model)
+
+### Required AWS Permissions
+
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": [
+        "s3:GetObject",
+        "s3:PutObject",
+        "s3:ListBucket"
+      ],
+      "Resource": [
+        "arn:aws:s3:::your-bucket-name",
+        "arn:aws:s3:::your-bucket-name/*"
+      ]
+    },
+    {
+      "Effect": "Allow",
+      "Action": [
+        "bedrock:InvokeModel"
+      ],
+      "Resource": "arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-7-sonnet-*"
+    }
+  ]
+}
+```
+
+## Development
+
+### Build
+
+```bash
+mvn clean compile
+```
+
+### Test
+
+```bash
+mvn test
+```
+
+### Package
+
+```bash
+mvn clean package
+```
+
+### Run Locally
+
+```bash
+mvn spring-boot:run
+```
+
+### Docker Build
+
+```bash
+mvn compile jib:dockerBuild
+```
+
+## Deployment
+
+### Kubernetes
+
+1. **Set up AWS permissions:**
+   ```bash
+   ./k8s/enable-s3-bedrock-access.sh
+   ```
+
+2. **Deploy to cluster:**
+   ```bash
+   ./k8s/deploy.sh
+   ```
+
+### Manual Deployment
+
+```bash
+# Apply Kubernetes manifests
+kubectl apply -f k8s/deloyment.yaml
+
+# Wait for deployment
+kubectl wait deployment jvm-analysis-service -n monitoring --for condition=Available=True --timeout=120s
+
+# Check logs
+kubectl logs -l app=jvm-analysis-service -n monitoring
+```
+
+## Monitoring
+
+### Health Checks
+
+- **Readiness Probe**: `GET /health` (30s initial delay, 10s interval)
+- **Liveness Probe**: `GET /actuator/health` (60s initial delay, 30s interval)
+
+### Resource Requirements
+
+- **CPU**: 1 core (request and limit)
+- **Memory**: 2Gi (request and limit)
+
+## Data Storage
+
+### S3 Structure
+
+```
+bucket/
+├── profiling/
+│   └── {pod-name}/
+│       └── {date}/
+│           └── {timestamp}.txt
+└── analysis/
+    ├── {timestamp}_profiling_{pod-name}.txt
+    ├── {timestamp}_profiling_{pod-name}.html
+    ├── {timestamp}_threaddump_{pod-name}.json
+    └── {timestamp}_analysis_{pod-name}.md
+```
+
+## AI Analysis Output
+
+The service generates comprehensive analysis reports including:
+
+- **Health Status**: Overall application health rating
+- **Thread Analysis**: Thread state distribution and patterns
+- **Top Issues**: Critical performance problems with root causes
+- **Performance Hotspots**: CPU consumers and bottlenecks from flame graphs
+- **Recommendations**: Immediate and short-term improvement suggestions
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Thread dump retrieval fails**
+   - Verify pod IP and port accessibility
+   - Check actuator endpoints are enabled on target pods
+
+2. **S3 access denied**
+   - Verify AWS credentials and permissions
+   - Check bucket name and region configuration
+
+3. **Bedrock model access**
+   - Ensure model is available in your region
+   - Verify Bedrock permissions and quotas
+
+### Logs
+
+Check application logs for detailed error information:
+```bash
+kubectl logs -l app=jvm-analysis-service -n monitoring -f
+```
+
+## Contributing
+
+1. Fork the repository
+2. Create a feature branch
+3. Make changes with appropriate tests
+4. Submit a pull request
+
+## License
+
+This project is licensed under the MIT License.
diff --git a/apps/jvm-analysis-service/k8s/deloyment.yaml b/apps/jvm-analysis-service/k8s/deloyment.yaml
@@ -0,0 +1,77 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: jvm-analysis-service
+  namespace: monitoring
+  labels:
+    app: jvm-analysis-service
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: jvm-analysis-service
+  template:
+    metadata:
+      labels:
+        app: jvm-analysis-service
+    spec:
+      serviceAccountName: jvm-analysis-service
+      containers:
+        - name: jvm-analysis-service
+          resources:
+            requests:
+              cpu: "1"
+              memory: "2Gi"
+            limits:
+              cpu: "1"
+              memory: "2Gi"
+          image: ${ECR_URI}:latest
+          ports:
+            - containerPort: 8080
+          env:
+            - name: AWS_REGION
+              value: "${AWS_REGION:-us-east-1}"
+            - name: AWS_S3_BUCKET
+              value: "${S3_BUCKET}"
+            - name: AWS_S3_PREFIX_ANALYSIS
+              value: "analysis/"
+            - name: AWS_S3_PREFIX_PROFILING
+              value: "profiling/"
+            - name: SPRING_AI_BEDROCK_CONVERSE_CHAT_OPTIONS_MODEL
+              value: "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
+            - name: SPRING_AI_BEDROCK_CONVERSE_CHAT_OPTIONS_MAX_TOKENS
+              value: "10000"
+          readinessProbe:
+            httpGet:
+              path: /health
+              port: 8080
+            initialDelaySeconds: 30
+            periodSeconds: 10
+          livenessProbe:
+            httpGet:
+              path: /actuator/health
+              port: 8080
+            initialDelaySeconds: 60
+            periodSeconds: 30
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: jvm-analysis-service
+  namespace: monitoring
+  labels:
+    app: jvm-analysis-service
+spec:
+  selector:
+    app: jvm-analysis-service
+  ports:
+    - port: 80
+      targetPort: 8080
+      protocol: TCP
+  type: ClusterIP
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: jvm-analysis-service
+  namespace: monitoring
diff --git a/apps/jvm-analysis-service/k8s/deploy.sh b/apps/jvm-analysis-service/k8s/deploy.sh
@@ -0,0 +1,5 @@
+S3_BUCKET=$(aws ssm get-parameter --name unicornstore-lambda-bucket-name --query 'Parameter.Value' --output text)
+kubectl apply -f deployment.yaml
+kubectl wait deployment jvm-analysis-service -n monitoring --for condition=Available=True --timeout=120s
+sleep 15
+kubectl logs $(kubectl get pods -n monitoring -l app=jvm-analysis-service --field-selector=status.phase=Running -o json | jq -r '.items[0].metadata.name') -n monitoring
diff --git a/apps/jvm-analysis-service/k8s/enable-s3-bedrock-access.sh b/apps/jvm-analysis-service/k8s/enable-s3-bedrock-access.sh
@@ -0,0 +1,10 @@
+CLUSTER_NAME=$(kubectl config current-context | cut -d'/' -f2)
+
+if ! aws eks list-pod-identity-associations --cluster-name $CLUSTER_NAME --query "associations[?serviceAccount=='jvm-analysis-service' && namespace=='monitoring']" --output text | grep -q .; then
+    aws eks create-pod-identity-association \
+        --cluster-name $CLUSTER_NAME \
+        --namespace monitoring \
+        --service-account jvm-analysis-service \
+        --role-arn $(aws iam get-role --role-name jvm-analysis-service-eks-pod-role --query 'Role.Arn' --output text)
+fi
+sleep 15