From 8b9fb0e8c7ea74b3ec118b21e61c8a9e2bfb8f75 Mon Sep 17 00:00:00 2001 From: Kaushik Iska Date: Mon, 6 Jan 2025 11:11:13 -0600 Subject: [PATCH 1/3] [clickpipes] Add more FAQs - also move schema changes page here --- .../data-ingestion/clickpipes/postgres/faq.md | 31 +++++++++++++++++-- .../clickpipes/postgres/schema-changes.md | 12 +++++++ 2 files changed, 41 insertions(+), 2 deletions(-) create mode 100644 docs/en/integrations/data-ingestion/clickpipes/postgres/schema-changes.md diff --git a/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md b/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md index 80584e97147..4a4374d331d 100644 --- a/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md +++ b/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md @@ -13,13 +13,40 @@ If your ClickHouse Cloud service is idling, your Postgres CDC clickpipe will con As an example, if your sync interval is set to 30 mins and your service idle time is set to 10 mins, Your service will wake-up every 30 mins and be active for 10 mins, then go back to idling. - ### How are TOAST columns handled in ClickPipes for Postgres? Please refer to the [Handling TOAST Columns](./toast) page for more information. - ### How are generated columns handled in ClickPipes for Postgres? Please refer to the [Postgres Generated Columns: Gotchas and Best Practices](./generated_columns) page for more information. +### Do tables need to have primary keys to be part of Postgres CDC? + +Yes, for CDC, tables must have either a primary key or a [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY). The REPLICA IDENTITY can be set to FULL or configured to use a unique index. + +### Do you support partitioned tables as part of Postgres CDC? + +Yes, partitioned tables are supported out of the box, as long as they have a PRIMARY KEY or REPLICA IDENTITY defined. The PRIMARY KEY and REPLICA IDENTITY must be present on both the parent table and its partitions. You can read more about it [here](https://blog.peerdb.io/real-time-change-data-capture-for-postgres-partitioned-tables). + +### Can I connect Postgres databases that don't have a public IP or are in private networks? + +ClickPipes for Postgres supports SSH tunneling (see the optional step [[here](https://clickhouse.com/docs/en/integrations/clickpipes/postgres#adding-your-source-postgres-database-connection)](https://clickhouse.com/docs/en/integrations/clickpipes/postgres#adding-your-source-postgres-database-connection)) to connect to Postgres sources with private IPs. SSH tunneling works in most cases, and if it doesn't, we also support [[AWS PrivateLink](https://clickhouse.com/docs/knowledgebase/aws-privatelink-setup-for-clickpipes)](https://clickhouse.com/docs/knowledgebase/aws-privatelink-setup-for-clickpipes). + +### How do you handle UPDATEs and DELETEs? + +ClickPipes for Postgres captures both INSERTs and UPDATEs from Postgres as new rows with different versions (using the _peerdb_version column) in ClickHouse. The ReplacingMergeTree table engine periodically performs deduplication in the background based on the ordering key (ORDER BY columns), retaining only the row with the latest _peerdb_version. + +DELETEs from Postgres are propagated as new rows marked as deleted (using the _peerdb_is_deleted column). Since the deduplication process is asynchronous, you might temporarily see duplicates. To address this, you need to handle deduplication at the query layer. + +For more details, refer to: +- [ReplacingMergeTree table engine best practices](https://docs.peerdb.io/bestpractices/clickhouse_datamodeling#replacingmergetree-table-engine) +- [Postgres-to-ClickHouse CDC internals blog](https://clickhouse.com/blog/postgres-to-clickhouse-data-modeling-tips) + +### Do you support schema changes? + +Please refer to the [ClickPipes for Postgres: Schema Changes Propagation Support](./schema-changes) page for more information. + +### What are the costs for ClickPipes for Postgres CDC? + +During the preview, ClickPipes is free of cost. Post-GA, pricing is still to be determined. The goal is to make the pricing reasonable and highly competitive compared to external ETL tools. \ No newline at end of file diff --git a/docs/en/integrations/data-ingestion/clickpipes/postgres/schema-changes.md b/docs/en/integrations/data-ingestion/clickpipes/postgres/schema-changes.md new file mode 100644 index 00000000000..cbdf83d4bc4 --- /dev/null +++ b/docs/en/integrations/data-ingestion/clickpipes/postgres/schema-changes.md @@ -0,0 +1,12 @@ +--- +title: "ClickPipes for Postgres: Schema Changes Propagation Support" +slug: /en/integrations/clickpipes/postgres/schema-changes +--- + +ClickPipes for Postgres can detect schema changes in the source tables. It can propagate some of these changes to the corresponding destination tables as well. The way each schema change is handled is documented below: + +| Schema Change Type | Behaviour | +| ----------------------------------------------------------------------------------- | ------------------------------------- | +| Adding a new column (`ALTER TABLE ADD COLUMN ...`) | Propagated automatically, all rows after the change will have all columns filled | +| Adding a new column with a default value (`ALTER TABLE ADD COLUMN ... DEFAULT ...`) | Propagated automatically, all rows after the change will have all columns filled but existing rows will not show the DEFAULT value without a full table refresh | +| Dropping an existing column (`ALTER TABLE DROP COLUMN ...`) | Detected, but not propagated. All rows after the change will have NULL for the dropped columns | \ No newline at end of file From 7adeedfee875860e20d9fbf6a731a75905efc05c Mon Sep 17 00:00:00 2001 From: Kaushik Iska Date: Mon, 6 Jan 2025 11:34:09 -0600 Subject: [PATCH 2/3] fix escaping --- .../integrations/data-ingestion/clickpipes/postgres/faq.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md b/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md index 4a4374d331d..d4f98d3d1ce 100644 --- a/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md +++ b/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md @@ -31,7 +31,7 @@ Yes, partitioned tables are supported out of the box, as long as they have a PRI ### Can I connect Postgres databases that don't have a public IP or are in private networks? -ClickPipes for Postgres supports SSH tunneling (see the optional step [[here](https://clickhouse.com/docs/en/integrations/clickpipes/postgres#adding-your-source-postgres-database-connection)](https://clickhouse.com/docs/en/integrations/clickpipes/postgres#adding-your-source-postgres-database-connection)) to connect to Postgres sources with private IPs. SSH tunneling works in most cases, and if it doesn't, we also support [[AWS PrivateLink](https://clickhouse.com/docs/knowledgebase/aws-privatelink-setup-for-clickpipes)](https://clickhouse.com/docs/knowledgebase/aws-privatelink-setup-for-clickpipes). +ClickPipes for Postgres supports SSH tunneling (see the optional step [here](https://clickhouse.com/docs/en/integrations/clickpipes/postgres#adding-your-source-postgres-database-connection)) to connect to Postgres sources with private IPs. SSH tunneling works in most cases, and if it doesn't, we also support [AWS PrivateLink](https://clickhouse.com/docs/knowledgebase/aws-privatelink-setup-for-clickpipes). ### How do you handle UPDATEs and DELETEs? @@ -40,8 +40,9 @@ ClickPipes for Postgres captures both INSERTs and UPDATEs from Postgres as new r DELETEs from Postgres are propagated as new rows marked as deleted (using the _peerdb_is_deleted column). Since the deduplication process is asynchronous, you might temporarily see duplicates. To address this, you need to handle deduplication at the query layer. For more details, refer to: -- [ReplacingMergeTree table engine best practices](https://docs.peerdb.io/bestpractices/clickhouse_datamodeling#replacingmergetree-table-engine) -- [Postgres-to-ClickHouse CDC internals blog](https://clickhouse.com/blog/postgres-to-clickhouse-data-modeling-tips) + +* [ReplacingMergeTree table engine best practices](https://docs.peerdb.io/bestpractices/clickhouse_datamodeling#replacingmergetree-table-engine) +* [Postgres-to-ClickHouse CDC internals blog](https://clickhouse.com/blog/postgres-to-clickhouse-data-modeling-tips) ### Do you support schema changes? From 9872404625837258033647db95d1dc84aba34a64 Mon Sep 17 00:00:00 2001 From: Kaushik Iska Date: Mon, 6 Jan 2025 11:46:56 -0600 Subject: [PATCH 3/3] details for private link --- .../data-ingestion/clickpipes/postgres/faq.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md b/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md index d4f98d3d1ce..34db44bf3c8 100644 --- a/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md +++ b/docs/en/integrations/data-ingestion/clickpipes/postgres/faq.md @@ -31,7 +31,20 @@ Yes, partitioned tables are supported out of the box, as long as they have a PRI ### Can I connect Postgres databases that don't have a public IP or are in private networks? -ClickPipes for Postgres supports SSH tunneling (see the optional step [here](https://clickhouse.com/docs/en/integrations/clickpipes/postgres#adding-your-source-postgres-database-connection)) to connect to Postgres sources with private IPs. SSH tunneling works in most cases, and if it doesn't, we also support [AWS PrivateLink](https://clickhouse.com/docs/knowledgebase/aws-privatelink-setup-for-clickpipes). +Yes! ClickPipes for Postgres offers two ways to connect to databases in private networks: + +1. **SSH Tunneling** + - Works well for most use cases + - See the setup instructions [here](https://clickhouse.com/docs/en/integrations/clickpipes/postgres#adding-your-source-postgres-database-connection) + - Works across all regions + +2. **AWS PrivateLink** + - Available in three AWS regions: + - us-east-1 + - us-east-2 + - eu-central-1 + - For detailed setup instructions, see our [PrivateLink documentation](https://clickhouse.com/docs/knowledgebase/aws-privatelink-setup-for-clickpipes#requirements) + - For regions where PrivateLink is not available, please use SSH tunneling ### How do you handle UPDATEs and DELETEs?