Skip to content

feat: consolidate Airavata into single JVM with multiplexed Thrift port#593

Merged
yasithdev merged 29 commits intomasterfrom
feat/single-jvm
Mar 26, 2026
Merged

feat: consolidate Airavata into single JVM with multiplexed Thrift port#593
yasithdev merged 29 commits intomasterfrom
feat/single-jvm

Conversation

@yasithdev
Copy link
Copy Markdown
Contributor

@yasithdev yasithdev commented Mar 26, 2026

Summary

Consolidates Airavata from multiple separate Thrift servers into a single JVM. All services multiplexed on one port. Unified database. All background services started from a single bootstrap.

106 files changed, +2209/-5346

Architecture (before → after)

Before: 8 separate server processes (API, Registry, Sharing, CredentialStore, Profile, Orchestrator + Helix controller/participant/workflow managers as separate scripts), 7 databases, inter-service Thrift RPC via network.

After: Single AiravataServer JVM with 9 multiplexed Thrift services on port 8930, 12 background IServer workers, 1 unified database, zero inter-service network hops.

Changes

AiravataServer — single entry point replacing ServerMain, AiravataAPIServer, RegistryAPIServer, SharingRegistryServer, CredentialStoreServer, OrchestratorServer, ProfileServiceServer

  • 9 Thrift services on TMultiplexedProcessor (port 8930)
  • 12 background services with unified IServer lifecycle (run()/stop()/getName()/getStatus())
  • Helix cluster dependency detection via ZooKeeper polling (no sleep delays)

AiravataServerHandler — 759 ThriftClientPool RPC calls → direct in-process handler calls

IServer — simplified to extends Runnable with run()/stop()/getName()/getStatus(). Thread management handled by caller. All services implement directly (HelixController, GlobalParticipant, Pre/Post/ParserWorkflowManager, EmailBasedMonitor, RealtimeMonitor, DBEventManagerRunner, MonitoringServer, etc.)

AbstractMonitor — flattened from abstract class to interface (composition over inheritance)

WorkflowManager — flattened to helper class (Pre/Post/Parser managers use via composition)

Database — all 7 JDBC configs read airavata.jdbc.* properties. Single airavata database.

ThriftClientPool — added serviceName parameter for TMultiplexedProtocol. All internal callers (WorkflowManager, AbstractMonitor, RegistryServiceDBEventHandler, metascheduler Utils, SharingRegistryServiceClientFactory) updated.

Python SDK — all 7 clients use TMultiplexedProtocol on single port. Removed separate host/port configs.

DevOps — ansible templates, deployment scripts, compose.yml updated. GFac references removed. All docs and READMEs rewritten with mermaid diagrams.

TestsDatabaseTestCases migrated from Derby to Testcontainers MariaDB.

Startup sequence

docker compose up -d  →  MariaDB, RabbitMQ, ZooKeeper, Kafka, Keycloak
./scripts/setup.sh    →  generate keystores, build
./scripts/start.sh    →  AiravataServer bootstrap:
  1. DB init (7 catalogs, unified schema)
  2. Thrift handlers (9 services, in-process)
  3. TMultiplexedProcessor on :8930
  4. DBEventManager, MonitoringServer
  5. HelixController → waitForHelixCluster()
  6. HelixParticipant (19 task classes)
  7. Pre/Post/Parser WorkflowManagers
  8. Email/Realtime Monitors

Verified from clean teardown

Component Status
9 Thrift services (:8930) 0 errors
12 background services all healthy
Agent Service (:18880) running
File Server (:8050) running
Research Service (:18889) running
REST Proxy (:8082) running
Python SDK 7/7 connected
MariaDB (149 tables) connected
RabbitMQ (11 exchanges) connected
ZooKeeper (Helix cluster) connected
Kafka (realtime monitor) connected

Creates AiravataUnifiedServer.java hosting Airavata, RegistryService,
SharingRegistry, CredentialStore, and all four profile services on a
single TMultiplexedProcessor bound to port 8930 (apiserver.port).
Does not compile until Task 3 adds the AiravataServerHandler constructor
that accepts in-process handler references.
…method calls

AiravataServerHandler now takes RegistryServerHandler, SharingRegistryServerHandler,
and CredentialStoreServerHandler as constructor dependencies instead of communicating
via ThriftClientPool over the network. All ~759 pool references (getResource,
returnResource, returnBrokenResource) have been eliminated. Private helper methods
that previously accepted Client parameters now use the handler fields directly.
… port via TMultiplexedProtocol

All Thrift service clients (Airavata, SharingRegistry, CredentialStore, and
profile services) now connect to API_SERVER_HOSTNAME:API_SERVER_PORT with
service-specific names via TMultiplexedProtocol. Removed separate host/port
settings for PROFILE_SERVICE, SHARING_API, and CREDENTIAL_STORE.
Delete ServerMain, RegistryAPIServer, SharingRegistryServer,
CredentialStoreServer, AiravataAPIServer and the nine per-service shell
scripts (orchestrator, controller, participant, sharing-registry, etc.)
now that AiravataUnifiedServer consolidates all Thrift services in a
single JVM.

Add airavata.sh as the single entry-point startup script.

Fix SharingServiceDBEventHandler to inline the two string constants that
were imported from the deleted SharingRegistryServer class.
- docker-startup.sh: Start single AiravataUnifiedServer instead of separate services
- airavata-server.properties: Remove per-service host/port configs, keep unified DB config
- Ansible start_services.yml: Start unified-server.sh instead of orchestrator/controller/etc
- Ansible stop_services.yml: Stop unified-server and simplify port/log directory configs
- Ansible defaults/main.yml: Replace service class configs with unified_server_class
- services_up.sh: Start unified-server instead of separate services
- services_down.sh: Stop unified-server instead of separate services
- deploy_api_server.yml: Make unified-server.sh executable

All services now multiplexed on port 8930 via AiravataUnifiedServer.
- Add missing WorkflowCatalog and CredentialStore DB init configs
- Fix Python SDK _validate_transport for services without getAPIVersion
- Unify test and deployment airavata-server.properties to airavata.jdbc.*
…e properties

- Change apiserver.host from airavata.localhost to 0.0.0.0
- Change all airavata.localhost refs to localhost in properties
- Add null guards on experimentPublisher.publish() calls
- Update apiserver.class to AiravataUnifiedServer

Server now starts successfully with all 8 multiplexed Thrift services
on port 8930 (requires only MariaDB; RabbitMQ/Kafka/ZK are optional).
…ties

- Add missing CREDENTIALS and COMMUNITY_USER tables to V1 baseline
  (used by credential store via raw JDBC, not JPA)
- Change all airavata.localhost to localhost in properties
- Change apiserver.host to 0.0.0.0
- Fix Kafka advertised host in compose.yml
- Add null guards for experimentPublisher
- Update .gitignore for generated keystores

Server now starts cleanly with: docker compose up -d && mvn package && java -cp ... AiravataUnifiedServer
setup.sh: starts docker infra, generates keystores, builds
start.sh: launches AiravataUnifiedServer with correct classpath

Usage:
  ./scripts/setup.sh   # first time
  ./scripts/start.sh   # subsequent runs
- Add OrchestratorService to TMultiplexedProcessor (optional,
  graceful failure if ZooKeeper not available)
- Fix setup.sh to use mvn instead of ./mvnw
- 9 services now registered on port 8930

Note: module services (agent-service, file-server, research-service)
are separate Spring Boot apps that run independently.
- ThriftClientPool: add serviceName support for TMultiplexedProtocol
- file-server: use RegistryService via multiplexed protocol on port 8930
- agent-service: fix localhost, ddl-auto=update
- research-service: fix localhost, unified server port
- restproxy: fix localhost
- Register Orchestrator service on unified server
- All modules build, start, and run successfully

Verified services:
  Port 8930:  Unified Thrift (9 services multiplexed)
  Port 18880: Agent Service (Spring Boot)
  Port 8050:  File Server (Spring Boot)
  Port 18889: Research Service (Spring Boot)
  Port 8082:  REST Proxy (Spring Boot)
  Python SDK: connected, API version 0.18.0
Drop 'Unified' terminology everywhere — class, log messages,
scripts, ansible, deployment configs. The server is simply
AiravataServer.
Move conf/db/ (V1 baseline, create-database.sql, README) from
modules/distribution/src/main/resources/ to airavata-api/src/main/resources/.
Update compose.yml reference.
AiravataServer.start() now launches all services in sequence:
1. Database initialization (all catalogs)
2. Thrift handlers (in-process)
3. TMultiplexedProcessor (9 services on port 8930)
4. Background services (each with independent lifecycle):
   - db_event_manager (async DB event processing)
   - monitoring_server (Prometheus metrics on configurable port)
   - cluster_status_monitor (if enabled)
   - data_interpreter (if enabled)
   - process_rescheduler (if enabled)

Each background service starts independently — failure in one
does not affect others or the Thrift server.
- IServer: remove getVersion(), restart(), configure(); fix STOPING→STOPPING
- MonitoringServer: implement IServer with getName(), getStatus(), lifecycle
- ComputationalResourceMonitoringService: fix null getName()
- DBEventManagerRunner: implement stop() via DBEventManagerMessagingFactory.close()
- Delete dead OrchestratorServer and ProfileServiceServer
- All implementations now consistently implement the lean interface
- airavata_services role: unified airavata.jdbc.* config, AiravataServer class
- api-orch role: remove old server classes and per-catalog JDBC
- registry role: same cleanup
- pga role: profile_service_port → 8930
- dev/staging inventories: consolidated JDBC vars
- Remove stale per-service distribution scripts from modules

Zero remaining references to old server classes or per-catalog JDBC.
@yasithdev yasithdev added the enhancement Enhancement to code, styling, CI/CD, or distribution. label Mar 26, 2026
Module distribution scripts (agent-service.sh, file-service.sh,
research-service.sh, restproxy.sh) are still needed — these modules
run as separate Spring Boot apps.

Test files (SSHSummaryTest, CommunityUserDAOTest, CredentialsDAOTest,
JDBCUserStoreTest, SessionDBUserStoreTest) were broken by the DB
consolidation PR, not by this change — restored to master state.
Add HelixController, GlobalParticipant, PreWorkflowManager,
PostWorkflowManager, EmailBasedMonitor, and RealtimeMonitor to
startBackgroundServices() in AiravataServer. Each runs in a daemon
thread via a new startDaemon() helper; failures are non-fatal.
EmailBasedMonitor is gated on email.based.monitoring.enabled config.
Add public startServer() to RealtimeMonitor to expose runConsumer().
Remove enable.gfac property, gfac monitors= classpath entries, gfac
inventory variables, the gfac Ansible playbook, a stale TODO comment,
and update service comment in properties templates. Java GFac classes
are untouched.
WorkflowManager, AbstractMonitor, RegistryServiceDBEventHandler,
metascheduler Utils, and SharingRegistryServiceClientFactory now
use TMultiplexedProtocol when connecting to services on the
unified port. Added regserver/sharing/credential host:port
properties pointing to localhost:8930.

All execution services (helix controller, participant, pre/post
workflow managers, realtime monitor) now start and connect
successfully.
…l multiplexing

- ParserWorkflowManager: added startServer(), started as daemon
- Added parser.workflow.manager.name config property
- WorkflowManager, AbstractMonitor, RegistryServiceDBEventHandler,
  metascheduler Utils: pass service name to ThriftClientPool
- SharingRegistryServiceClientFactory: use TMultiplexedProtocol
- Added regserver/sharing/credential host:port properties for
  internal components that still use ThriftClientPool

All background services now start successfully:
  db_event_manager, monitoring_server, helix_controller,
  helix_participant, pre_workflow_manager, post_workflow_manager,
  parser_workflow_manager, email_monitor (config-gated),
  realtime_monitor
Add BackgroundServiceAdapter to wrap blocking startServer() calls as
IServer instances. Convert all 7 startDaemon() calls and the
MonitoringServer startup to registerAndStart(). Remove DaemonStarter
interface and startDaemon() method — every service is now tracked in
backgroundServices for proper shutdown.
Remove BackgroundServiceAdapter wrapper from AiravataServer. Each of the
7 background service classes (HelixController, GlobalParticipant,
PreWorkflowManager, PostWorkflowManager, ParserWorkflowManager,
EmailBasedMonitor, RealtimeMonitor) now implements IServer directly, with
start() running startServer() in a daemon thread.
- IServer now extends Runnable with no start(); blocking work goes in run()
- AiravataServer no longer implements IServer; manages services via registerAndStart() which creates daemon threads calling service.run()
- AbstractMonitor converted from abstract class to interface with submitJobStatus() contract
- EmailBasedMonitor and RealtimeMonitor implement AbstractMonitor via composition (own pool/producer fields), no longer extend base class
- WorkflowManager converted from inheritance target to helper: Pre/Post/ParserWorkflowManager hold wfManager field
- All IServer implementations (HelixController, GlobalParticipant, DBEventManagerRunner, MonitoringServer, ComputationalResourceMonitoringService, DataInterpreterService, ProcessReschedulingService) implement run() as blocking/interruptible work
- Quartz-based services park thread after scheduler.start() until interrupted
…ties

- Replace Thread.sleep(5000) with waitForHelixCluster() that polls
  ZooKeeper for the AiravataCluster znode before starting participants
- Remove stale server class properties (CredentialStoreServer,
  OrchestratorServer, RegistryAPIServer, SharingRegistryServer,
  ProfileServiceServer) from airavata-server.properties
- Remove duplicate profile.service.server entries

All 12 background services now start cleanly with zero errors.
- README: replace multi-server orchestrator description with AiravataServer
  entry point, 9 multiplexed Thrift services on port 8930, and table of
  background IServer workers; replace old Getting Started section with
  docker compose + setup.sh + start.sh quick start
- INSTALL: rewrite for Java 17+/Docker prerequisites and three-step setup
- RELEASE_NOTES: add 0.21-SNAPSHOT section documenting the consolidation
- dev-tools/ansible/SETUP_FLOW.md: update database role to reflect unified
  airavata schema (Flyway) and airavata_services role to name AiravataServer
- dev-tools/ansible/README.md: note Flyway-managed unified database
- dev-tools/airavata-python-sdk/README.md: document single-port multiplexed
  connection model and TMultiplexedProtocol usage
- README: replace 3 static PNGs with mermaid diagrams (architecture,
  state transitions, startup sequence)
- README: update architecture section for single-JVM consolidation
- INSTALL: rewrite for docker compose + setup.sh workflow
- RELEASE_NOTES: add 0.21-SNAPSHOT section
- Ansible docs: update for unified database and single server
- Python SDK README: document TMultiplexedProtocol on single port
- Delete assets/airavata-{dataflow,state-transitions,components}.png
@yasithdev yasithdev changed the title feat: consolidate all services into single JVM with multiplexed Thrift feat: consolidate Airavata into single JVM with multiplexed Thrift port Mar 26, 2026
@yasithdev yasithdev merged commit dea11f4 into master Mar 26, 2026
6 of 8 checks passed
@yasithdev yasithdev deleted the feat/single-jvm branch March 26, 2026 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement to code, styling, CI/CD, or distribution.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant