tipoca-stream

A near realtime cloud native data pipeline using Kafka, KafkaConnect, and RedshiftSink in AWS. RedshiftSink is a high performance, low overhead data loader for Redshift, open-sourced by Practo. It comes with a rich data masking support so you can create a universal data access in your organization while preserving your customer's privacy!

Release blog.

Tipoca Stream is a successor to an internal non-realtime datawarehousing project called Tipoca, which itself derives its name from Tipoca City - home of the Clones in the Star Wars universe.

Install

The pipeline is a combination of services deployed independently. This repo holds the code for the redshiftsink only.

RedshiftSink Please follow REDSHIFTSINK.md to install the RedshiftSink Kubernetes Operator. Creating the RedshiftSink resource installs Batcher and Loader pods in the cluster. These pods sinks the data from Kafka topics to Redshift, it takes care of the database migration when required. Redshiftsink has a rich masking support. It also supports table reloads in Redshift when masking configurations are modified in Github.

      kubectl get redshiftsink

Kafka Install Kafka using Strimzi CRDs or self hosted or managed kafka.

      kubectl get kafka

Producer Install Producer using Strimzi CRDs and Debezium. Creating the kafkaconnect and kafkaconnector creates a kafkaconnect pod in the cluster which start streaming the data from the source(MYSQL, RDS, etc..) to Kafka.

      kubectl get kafkaconnect
      kubectl get kafkaconnector

The project has pluggable libraries which can be composed to solve any other data pipeline use case.

Contribute

Please follow this to bring a change.

Thanks

Debezium.
Strimzi.io for the Kafka CRDs.
Yelp for open-sourcing the the blog on the redshift connector.
Linkedin for open-sourcing goavro.
Linkedin for donating Kafka.
Shopify for open-sourcing sarama.
Thockin for open-sourcing go-build-template.
Clever for open-sourcing s3-to-redshift library.
danielqsj for kafka-exporter.

Name		Name	Last commit message	Last commit date
Latest commit History 1,080 Commits
api/v1		api/v1
build		build
cmd		cmd
config		config
controllers		controllers
pkg		pkg
vendor		vendor
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile.in		Dockerfile.in
LICENSE		LICENSE
MASKING.md		MASKING.md
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
REDSHIFTSINK.md		REDSHIFTSINK.md
go.mod		go.mod
go.sum		go.sum

License

practo/tipoca-stream

Folders and files

Latest commit

History

Repository files navigation

tipoca-stream

Install

Contribute

Thanks

About

Topics

Resources

License

Stars

Watchers

Forks

Languages