RIKEN's internal AI Inference Infrastructure for Scientific Computing and General-purpose Applications
RiVault is a security-first AI Inference infrastructure designed for scientific computing and general-purpose applications at RIKEN ([1] Overview Slides). It provides users with multiple access methods to leverage powerful AI models and capabilities, through a web-based interface, API endpoints, and support for custom agentic systems.
Note: currently RiVault is only accessible from within RIKEN intranet. For AI for Science Supercomputer (pre-)production, we plan to make it available more broadly.
The following diagram illustrates the RiVault setup and how users interact with its components:
flowchart TD
U[Users]
subgraph RiVault
W[WebUI]
R[RAG system, e.g. RAGFlow]
D[API Endpoints]
M[MCP Servers, e.g. Paper-Search]
M2[Tools, eg. search, compile, exec, OS usage, data retrival]
M3[Resources, eg. Internet, Knowledge bases, Compute]
I[Interfacing via liteLLM]
I1[Inference Runtimes via vLLM, SGLang]
end
M1[MCP Package Manager]
M11[Bring-your-own-MCP]
I2[Model weights]
I21[huggingface]
I22[Bring-your-own-Model]
A[User-facing agentic system]
A1[Agentic Frameworks; e.g. AgentZero, LangGraph]
A2[Agents/Skills]
O[onDemand RiVault]
S[Supercomputing Hardware]
U --> A
U --> M
U --> D
U --> W
U --> R
U --> O
O --> D
I22 --> O
A --> A1
A1 --> A2
A2 --> D
A2 --> M
M --> M2
M1 --> M11
M11 --> W
M2 --> M3
D --> I
I --> I1
I2 --> I1
I21 --> I2
W --> D
W --> M
R --> D
R --> M
I1 --> S
M3 --> S
Users can interact with RiVault through several pathways:
- WebUI: A graphical web interface for direct interaction [1]
- API Endpoints: Programmatic access for integration into workflows
- MCP Servers: Model Context Protocol servers for extended functionality
- RAG System: Retrieval-Augmented Generation capabilities, e.g., RAGFlow
- onDemand RiVault: Custom deployments with bring-your-own models
The WebUI provides an intuitive control interface [1]:
- Left panel: Access to previous chats and new chat creation
- Middle: Dropdown menu to select model(s)
- Top-right: Detailed configuration options for chats
- Bottom of chat: Additional features including image generation, text-to-speech, rating, retry, and translations
RiVault supports MCP (Model Context Protocol) servers to extend functionality. Examples include [1]:
- papersearch: Retrieves live paper information from arXiv, bioRxiv, and other scientific repositories
- time: Provides time information
Users can also bring their own MCP servers through the MCP Package Manager.
- Interfacing: Uses liteLLM for unified model access
- Inference Runtimes: Powered by vLLM and SGLang for efficient model serving
- Model Weights: Supports models from HuggingFace or custom bring-your-own models
- Frameworks: Supports AgentZero, LangGraph, and other agentic frameworks
- Agents/Skills: Custom agents that can access both APIs and MCP servers
- Tools Layer: Provides search, compilation, execution, OS usage, and data retrieval capabilities
- Resources: Connects to internet, knowledge bases, and compute resources
- Supercomputing Hardware: All computation runs on RIKEN's supercomputing infrastructure
- Access the WebUI: Navigate to the RiVault web interface to start chatting with models
- Select a Model: Use the dropdown in the middle of the interface to choose your preferred model [1]
- Configure Settings: Adjust parameters using the top-right configuration options [1]
- Try MCP Servers: Access extended functionality like paper search directly from the chat interface [1]
- API Access: Use API endpoints for programmatic integration into your workflows
For additional assistance or to deploy custom MCP servers and models, please refer to the documentation or contact the RiVault support team via RIKEN's internal slack or the issue tracker in this repo.