Skip to content
@ai-pipestream

ai.pipestream

Pipestream AI - a platform for search

Pipestream AI

Open Source Document Processing Platform for Intelligent Search and Indexing

MIT License Website

🌟 What is Pipestream AI?

Pipestream AI is an open-source platform that transforms documents into searchable knowledge using AI-powered processing. It provides a flexible, network-based architecture for ingesting, parsing, chunking, and embedding documents for intelligent search and indexing.

🚀 Key Features

  • Network Graph Architecture - Not a linear pipeline, but a flexible network with fan-in and fan-out capabilities
  • Multiple Entry Points - Connectors, direct API calls, or Kafka events
  • Flexible Storage - S3 repository or in-memory processing
  • Multiple Chunking Strategies - Apply different chunking approaches to the same document
  • Multiple Embedding Models - Generate vector embeddings using multiple models simultaneously
  • OpenSearch Integration - Full-text, vector, and hybrid search capabilities
  • Transport Flexibility - gRPC for low latency, Kafka for high throughput

📖 Documentation

🏗️ Architecture

The Pipestream Platform operates as a network graph, not a linear pipeline. The Pipeline Engine acts as the central routing hub, orchestrating data flow between processing nodes:

  1. Data Loading - Digital assets are ingested
  2. Data Transformation - Assets are transformed to text (parsing)
  3. Data Enhancement - Text is enhanced with chunking, embeddings, and AI processing
  4. Sink - Data is indexed to a search engine (OpenSearch)

🛠️ Core Services

  • Connectors - Discover, authenticate, and stream documents from various sources
  • Repository Service - Manages S3 storage and metadata, publishes events
  • Pipeline Engine - Orchestrates routing and transport between modules
  • Processing Modules - Parsers, chunkers, embedders, and specialized processors

📦 Repositories

This organization contains multiple repositories:

  • Core Services - Platform services and orchestration
  • Processing Modules - Specialized document processors
  • Connectors - Document ingestion from various sources
  • Frontend - Web interface and management tools

🤝 Contributing

Pipestream AI is open source under the MIT License. We welcome contributions!

  1. Check out our documentation
  2. Review the architecture
  3. Open issues or pull requests in the relevant repositories

📄 License

This project is licensed under the MIT License - see the LICENSE file in each repository for details.

🔗 Links


Building the future of intelligent document processing. 🚀

Popular repositories Loading

  1. module-echo module-echo Public

    Simple test/validation module

    Java

  2. module-parser module-parser Public

    Document parsing

    Java

  3. module-chunker module-chunker Public

    Document chunking

    PureBasic

  4. module-embedder module-embedder Public

    Vector embeddings

    Java

  5. module-opensearch-sink module-opensearch-sink Public

    OpenSearch sink module

    Java

  6. platform-registration-service platform-registration-service Public

    Service discovery & registration (Consul integration)

    Java

Repositories

Showing 10 of 21 repositories
  • platform-libraries Public

    Common java libraries for the platform.

    ai-pipestream/platform-libraries’s past year of commit activity
    Java 0 MIT 0 11 0 Updated Nov 26, 2025
  • .github Public

    Github repository for homepage

    ai-pipestream/.github’s past year of commit activity
    HTML 0 0 0 0 Updated Nov 26, 2025
  • repository-service Public

    Document storage (Redis + S3 refactor)

    ai-pipestream/repository-service’s past year of commit activity
    Java 0 MIT 0 0 1 Updated Nov 24, 2025
  • opensearch-manager Public

    OpenSearch indexing & management

    ai-pipestream/opensearch-manager’s past year of commit activity
    Java 0 MIT 0 1 0 Updated Nov 22, 2025
  • account-service Public

    User/account management

    ai-pipestream/account-service’s past year of commit activity
    Java 0 MIT 0 0 0 Updated Nov 22, 2025
  • connector-admin Public

    Service to administer external connectors

    ai-pipestream/connector-admin’s past year of commit activity
    Java 0 MIT 0 0 0 Updated Nov 22, 2025
  • connector-intake-service Public

    Gateway for document intake (stateless refactor)

    ai-pipestream/connector-intake-service’s past year of commit activity
    Java 0 MIT 0 0 0 Updated Nov 21, 2025
  • dev-assets Public

    Development assets - scripts, dev environment setup, sample containers, tutorials, and documentation

    ai-pipestream/dev-assets’s past year of commit activity
    Shell 0 MIT 0 0 0 Updated Nov 21, 2025
  • platform-registration-service Public

    Service discovery & registration (Consul integration)

    ai-pipestream/platform-registration-service’s past year of commit activity
    Java 0 MIT 0 0 0 Updated Nov 21, 2025
  • sample-documents Public

    A set of sample documents that will be used by the parser and testing harness

    ai-pipestream/sample-documents’s past year of commit activity
    PureBasic 0 MIT 0 0 0 Updated Nov 19, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…