Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More general types of Kafka source #652

Closed
umanwizard opened this issue Sep 29, 2019 · 4 comments
Closed

More general types of Kafka source #652

umanwizard opened this issue Sep 29, 2019 · 4 comments
Labels
A-integration Area: external integrations C-feature Category: new feature or request
Milestone

Comments

@umanwizard
Copy link
Contributor

Discussed this offline with @frankmcsherry and @quodlibetor ...

Currently we have a hard-coded assumption that our Kafka sources will be in Debezium format; i.e., a log of database modifications with before and after representing row deletions and additions.

It may also be useful to allow other models; for example, reading a topic that is an append-only event log might be broadly useful.

@umanwizard umanwizard added the C-musing Category: not-yet-actionable discussions label Sep 29, 2019
@rjnn rjnn added A-integration Area: external integrations C-feature Category: new feature or request and removed C-musing Category: not-yet-actionable discussions labels Oct 11, 2019
@rjnn
Copy link
Contributor

rjnn commented Oct 11, 2019

This issue is now a higher priority than previously. Note that adding support for formats can still be somewhat opinionated - you can require that the CREATE SOURCE command enumerates the list of fields that will be populated, and that if a field is not present in a message it will be NULL (as long as that field is indicated as NULLABLE at source creation time, and error out otherwise), etc.

This doesn't mean we need full support for arbitrary gnarly nested JSON - but we should be able to ingest JSON messages that just have the bare columns as named fields, ingesting them as an append-only list of rows/

Finally, we want the ability to optionally designate one of them as the timestamp column.

@cuongdo
Copy link
Contributor

cuongdo commented Oct 11, 2019

@rjnn is this different from #207?

@rjnn
Copy link
Contributor

rjnn commented Oct 11, 2019

@cuongdo Yes. One could support #207 by parsing and ingesting JSON but still require that that JSON field have the prescribed before: {}, after: {}, ts_ms: 1234 schema. To complete this issue we want to suppose people who have data in either avro or JSON but without that prescriptive before/after schema.

@rjnn rjnn added the Epic [Auto] Multi-stage issue label Oct 11, 2019
@frankmcsherry
Copy link
Contributor

Related to #777

@cuongdo cuongdo added this to the Later milestone Jan 27, 2020
@cuongdo cuongdo removed the Epic [Auto] Multi-stage issue label Feb 17, 2020
@nmeagan11 nmeagan11 added this to Icebox in Storage (Old) Dec 13, 2021
@elindsey elindsey closed this as completed May 9, 2022
Storage (Old) automation moved this from Icebox to Landed May 9, 2022
@nmeagan11 nmeagan11 removed this from Landed in Storage (Old) Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-integration Area: external integrations C-feature Category: new feature or request
Projects
None yet
Development

No branches or pull requests

5 participants