Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch exports mega-issue #15997

Open
27 of 36 tasks
hazzadous opened this issue Jun 12, 2023 · 31 comments
Open
27 of 36 tasks

Batch exports mega-issue #15997

hazzadous opened this issue Jun 12, 2023 · 31 comments
Assignees
Labels
enhancement New feature or request

Comments

@hazzadous
Copy link
Contributor

hazzadous commented Jun 12, 2023

We're moving a lot of currently available apps to a new export system based on Temporal, which exports events in batches. It's proven to be a lot more reliable than our previous streaming system.

Done

In progress

Roadmap

  • Add support for more output formats (S3/blob storage).
  • RBAC for AWS.
  • Postgres SSL support.
  • Static IP range for exports.
  • Historical export metrics/UI.

related tickets:

Add support for more output formats (S3/blob storage) / Static IP range for exports: https://posthoghelp.zendesk.com/agent/tickets/6490

@elijahbenizzy
Copy link

Really excited about this! Possible I haven't read it, but is there a backfill capability in the works? I'd love to be able to export every event I have for ad-hoc analysis. Used to do this with the S3 app (which ended up getting kinda slow/painful).

@tomasfarias
Copy link
Contributor

Hey @elijahbenizzy! Backfill is supported already, but we need to expose it in the UI (currently you can trigger a backfill by calling the /batch_exports/{export_id}/backfill endpoint).

@elijahbenizzy
Copy link

Woo! Great to hear. Any idea for an ETA for the UI? Happy to try the API.

@tomasfarias
Copy link
Contributor

@elijahbenizzy We are focusing on ironing out the backend, so it's hard to say an exact ETA, but I imagine UI improvements will start shipping this week and continue through next week.

@Anakin100100
Copy link

@hazzadous I see that the Postgres app is no longer available because of the redevelopment of the export system. When will it be available?

@levintrix
Copy link

+1 for postgres support

@tomasfarias
Copy link
Contributor

We will be adding support for Postgres exports in the next sprint.

@Mohanbalaji1998
Copy link

waiting for s3 to ingest my events into that. Kindly tell me when can I see that

@ClemensKoehlerTemedica
Copy link

Great to see the exports getting some love!🥳 Any idea on when the BigQuery export might be available again?

@i0exception
Copy link

^ Same. Any idea when BigQuery will be available?

@nstreicher
Copy link

+1 on BigQuery export

@tomasfarias
Copy link
Contributor

Postgres and BigQuery exports are our next two priorities. Postgres will likely come first (this sprint), while you can expect BigQuery by end of the month.

@tomasfarias
Copy link
Contributor

Work on Big Query exports has been scheduled for next sprint. Folks running a Big Query export can expect to be migrated over to the next system around end of month.

@Mohanbalaji1998
Copy link

Mohanbalaji1998 commented Aug 16, 2023 via email

@jedwhite
Copy link

Any news or eta on Redshift please? It's challenging to take down the previous version before making the new version available when users already have it in production use.

@mrnadia
Copy link

mrnadia commented Aug 30, 2023

Eagerly waiting for BQ. Any updates?

@mrnadia
Copy link

mrnadia commented Sep 5, 2023

Come on guys, that's not like a minor feature. The whole set up for product analytics is stuck cause of that. I doubt there are too many companies who only use the built-in posthog analytics tools.

@timgl
Copy link
Collaborator

timgl commented Sep 6, 2023

Hey @mrnadia, we're actively working on the Bigquery destination, and you can follow along here: #17170

@jedwhite We've got a Postgres destination live now that you should be able to use.

@tomasfarias
Copy link
Contributor

BigQuery batch exports were released last week. Documentation for Postgres and BigQuery is missing and coming next.

@jedwhite
Copy link

Hey @mrnadia, we're actively working on the Bigquery destination, and you can follow along here: #17170

@jedwhite We've got a Postgres destination live now that you should be able to use.

Hey thanks @timgl - is there an additional step needed to enable the Postgres destination? I had a look under Apps and settings but can't see the option to enable the export. Is this hosted version or local only? Thanks heaps :)

@jedwhite
Copy link

Thanks @timgl - got it now. Had to refresh Apps page to get the new Batch Exports menu. Much appreciated for the work on Postgres export!

@jedwhite
Copy link

Quick follow up in the interim of docs @timgl - it looks like it fails as soon as it tries writing to the database. Is there a create statement needed first for the events table, and any details on the schema for it available please if so? Set it up with defaults for public and events on the posthog side, and the database itself is already created but empty. Thanks kindly!

@jedwhite
Copy link

Just FYI @timgl I tried pausing and archiving the failed run, and creating a new one, and then Create Historic Export, but the screen just remains displaying the Button to Create Historic Export. If I can provide any more information to help debug the postgres connection please let me know. The table has not been created, which appears to be the expected behaviour according to the Snowflake docs that are there for the batch exports connector for that. Thanks again

@jedwhite
Copy link

Was anyone able to get postgres working so far? It appears the table create isn't compatible with redshift so I set up a new postgres instance on RDS but without logging in the UI or instructions it's hard to see where it's failing.

@timgl
Copy link
Collaborator

timgl commented Sep 20, 2023

@tomasfarias Could you have a look?

@jedwhite
Copy link

Thanks all. Just to let you know, I tried with Snowflake - set up an account and db, set up connection and it connected and started populating the historic export, but stopped after a few hours in and no activity on the backlog for the last 12 hours. The hourly current updates are running. Is there a way to re-kick off the failed backlog export? Or is the best bet to delete the table on snowflake and try again? If it's recreated without deleting the already exported content, will it de-duplicate? Sorry for the questions and curious what other folks have got working.

@jedwhite
Copy link

Just an update in case it's useful - after another 5-6 hours the ongoing hourly batch export also stops, in addition to the historic export. I'll wait overnight and see if it resumes.

@cstoneham
Copy link

cstoneham commented Oct 2, 2023

Hey all, awesome feature. I'm having some issues getting Postgres to work, is there any way to inspect the error message so that I can debug?

edit: the error in the console was

latest_error: "OperationalError: invalid sslmode value: \"no-verify\"\n"`

I get that regardless of what's checked for the SSL mode.

@MarconLP
Copy link
Member

+1 Static IP range for exports https://posthoghelp.zendesk.com/agent/tickets/7623

@MarconLP
Copy link
Member

MarconLP commented Dec 7, 2023

+1 Static IP range for exports https://posthoghelp.zendesk.com/agent/tickets/7879

@d3rp3tt3
Copy link

Adding ClickHouse as a request <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests