📖 Just launched Data Learning Center - Resources on data engineering and data infrastructure
The Customer Data Platform for Developers
Website · Documentation · Docs · Changelog · Blog · Slack · Twitter
As the leading open source Customer Data Platform (CDP), RudderStack provides data pipelines that make it easy to collect data from every application, website and SaaS platform, then activate it in your warehouse and business tools.
With RudderStack, you can build customer data pipelines that connect your whole customer data stack and then make them smarter by triggering enrichment and activation in customer tools based on analysis in your data warehouse. It's easy-to-use SDKs and event source integrations, Cloud Extract integrations, transformations, and expansive library of destination and warehouse integrations makes building customer data pipelines for both event streaming and cloud-to-warehouse ELT simple.
| Try RudderStack Cloud Free - a free tier of RudderStack Cloud. Click here to start building a smarter customer data pipeline today, with RudderStack Cloud. |
|---|
-
Warehouse-first: RudderStack treats your data warehouse as a first class citizen among destinations, with advanced features and configurable, near real-time sync. Warehouse capabilities include configurable backfill with date-range support for historical data re-sync, selective sync with per-table and per-column filtering, warehouse replay from archived events for targeted re-processing, enhanced health monitoring with Prometheus metrics, per-upload tracking, and alerting thresholds, and idempotent sync validation across all 9 warehouse connectors (Snowflake, BigQuery, Redshift, ClickHouse, Delta Lake, PostgreSQL, MSSQL, Azure Synapse, and Datalake).
-
Developer-focused: RudderStack is built API-first. It integrates seamlessly with the tools that the developers already use and love.
-
High Availability: RudderStack comes with at least 99.99% uptime. We have built a sophisticated error handling and retry system that ensures that your data will be delivered even in the event of network partitions or destinations downtime.
-
Privacy and Security: You can collect and store your customer data without sending everything to a third-party vendor. With RudderStack, you get fine-grained control over what data to forward to which analytical tool.
-
Unlimited Events: Event volume-based pricing of most of the commercial systems is broken. With RudderStack Open Source, you can collect as much data as possible without worrying about overrunning your event budgets.
-
Segment API-compatible: RudderStack is fully compatible with the Segment API and achieves 100% field-level parity with the Twilio Segment Event Specification across all six core event types (
identify,track,page,screen,group,alias), including structured Client Hints pass-through (context.userAgentData) and semantic event category support. RudderStack has also validated drop-in SDK compatibility with Segment's JavaScript (analytics.js/ Analytics 2.0), iOS (analytics-ios), Android (analytics-android), and server-side SDKs (Node.js, Python, Go, Java, Ruby). Existing Segment SDK users can migrate by swapping the endpoint URL and Write Key — no code changes required. See the SDK Compatibility Migration Guides for per-SDK instructions. -
Production-ready: Companies like Mattermost, IFTTT, Torpedo, Grofers, 1mg, Nana, OnceHub, and dozens of large companies use RudderStack for collecting their events.
-
Seamless Integration: RudderStack currently supports integration with over 90 popular tool and warehouse destinations.
-
User-specified Transformation: RudderStack offers a powerful JavaScript-based event transformation framework which lets you enhance or transform your event data by combining it with your other internal data. Furthermore, as RudderStack runs inside your cloud or on-premise environment, you can easily access your production data to join with the event data.
The easiest way to experience RudderStack is to sign up for RudderStack Cloud Free - a completely free tier of RudderStack Cloud.
You can also set up RudderStack on your platform of choice with these two easy steps:
Note: If you are planning to use RudderStack in production, we STRONGLY recommend using our Kubernetes Helm charts. We update our Docker images with bug fixes much more frequently than our GitHub repo.
Once you have installed RudderStack, send test events to verify the setup.
RudderStack is an independent, stand-alone system with a dependency only on the database (PostgreSQL). Its backend is written in Go with a rich UI written in React.js.
A high-level view of RudderStack’s architecture is shown below:
For more details on the various architectural components, refer to our documentation.
For detailed architecture documentation, see the Architecture Overview. See also: Data Flow | Pipeline Stages | Deployment Topologies | Warehouse State Machine
Comprehensive documentation is available in the docs/ directory, covering architecture, API references, integration guides, operational runbooks, and Segment parity analysis.
| Category | Description |
|---|---|
| Gap Report | Segment parity gap analysis and sprint roadmap |
| Architecture | System architecture, data flows, deployment topologies |
| API Reference | HTTP API, Event Spec, gRPC API, error codes |
| Getting Started | Installation, configuration, first events |
| Migration Guide | Segment-to-RudderStack migration |
| Source SDKs | JavaScript, iOS, Android, server-side SDK guides |
| Destinations | Stream, cloud, and warehouse destination guides |
| Transformations | Custom transforms and Functions |
| Governance | Tracking plans, consent, event filtering |
| Identity | Identity resolution and profiles |
| Operations | Warehouse sync, warehouse replay, backfill, capacity planning |
| Warehouse Connectors | Per-warehouse setup and configuration guides |
| Backfill API | Warehouse backfill with configurable date ranges |
| Health Monitoring | Warehouse sync health metrics, Prometheus integration, alerting |
| Selective Sync | Per-table and per-column warehouse sync filtering |
| Warehouse Replay | Replay archived events through the warehouse pipeline |
| Reference | Configuration, environment variables, glossary |
| Contributing | Development setup, destination onboarding, testing |
| SDK Compatibility | Segment SDK migration guides for JavaScript, iOS, Android, and server-side SDKs |
| Cloud Source Framework | Cloud source ingestion architecture design for polling/webhook-based SaaS integrations |
A comprehensive gap analysis comparing RudderStack capabilities against Twilio Segment features is available in the Gap Report. The Event Spec Parity dimension has achieved 100% field-level parity with the Twilio Segment Event Specification, covering all six core event types (identify, track, page, screen, group, alias), all 18 standard context fields, structured Client Hints (context.userAgentData), 17 reserved identify traits, 12 reserved group traits, and seven semantic event categories (E-Commerce v2, Video, Mobile, B2B SaaS, Email, Live Chat, A/B Testing). Source SDK Compatibility has been validated across JavaScript, iOS, Android, and five server-side SDKs, raising the Source Catalog parity score from ~60% to ~85%. A Cloud Source Framework design has been produced to address the 140 cloud app source gap through a polling/webhook-based ingestion architecture. RudderStack extensions beyond the Segment spec — including /v1/replay, /internal/v1/retl, /beacon/v1/*, /pixel/v1/*, and the merge call type — are documented in the Event Spec API Reference. The analysis also covers destination catalog coverage, transformation/Functions, Protocols enforcement, identity resolution, and warehouse sync.
Warehouse sync parity has been improved from ~80% to ~95% through the Sprint 7–9 Warehouse Feature Enhancement, which delivered idempotent sync validation across all 9 warehouse connectors (Snowflake, BigQuery, Redshift, ClickHouse, Delta Lake, PostgreSQL, MSSQL, Azure Synapse, and Datalake), configurable backfill with date-range support, enhanced health monitoring with Prometheus metrics and alerting, selective sync with per-table and per-column filtering, and warehouse replay from archived events. See the Backfill API, Health Monitoring, Selective Sync, and Warehouse Replay documentation for details.
Note: Segment Engage/Campaigns and Reverse ETL are planned for Phase 2.
RudderStack Gateway supports drop-in compatibility with Segment SDK client libraries. Existing Segment SDK users can migrate to RudderStack by replacing the endpoint URL (api.segment.io → <your-rudderstack-data-plane-url>) and substituting a RudderStack Write Key — no application code changes are required.
The following SDKs have been validated for full compatibility:
| SDK | Library | Validated Capabilities |
|---|---|---|
| JavaScript | analytics.js / Analytics 2.0 |
All 6 event types, batch (/v1/batch), beacon (/beacon/v1/batch), pixel (/pixel/v1/track, /pixel/v1/page) |
| iOS | analytics-ios (Swift) |
All event types, mobile context auto-collection (device, os, app, network, screen), lifecycle events |
| Android | analytics-android (Kotlin) |
All event types, mobile context auto-collection (device, os, app, network, screen), lifecycle events |
| Node.js | analytics-node |
Batch endpoint, retry behavior |
| Python | analytics-python |
Batch endpoint, flush behavior |
| Go | analytics-go |
Batch endpoint |
| Java | analytics-java |
Batch endpoint |
| Ruby | analytics-ruby |
Batch endpoint, retry behavior |
Migration guides:
- Segment SDK Migration Guide — Master migration reference
- Web SDK Guide — JavaScript / Analytics 2.0 compatibility and device-mode limitations
- Mobile SDK Guide — iOS and Android lifecycle event support and context auto-collection
- Server SDK Guide — Node.js, Python, Go, Java, and Ruby batch endpoint usage and retry behavior
We would love to see you contribute to RudderStack. Get more information on how to contribute here.
RudderStack server is released under the Elastic License 2.0.
