Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Snowplow's self-describing schemas #541

Closed
wants to merge 1 commit into from

Conversation

bjornhenriksson
Copy link

@bjornhenriksson bjornhenriksson commented Jul 20, 2023

Many companies use Snowplow Analytics to manage their events. Snowplow uses a specific JSON schema structure (iglu) that involves a concept of self-describing schemas. These schemas use a self property to add information about what version or name etc the event is. In extension this means that snowplow schemas typically don't include $id or title fields, as you can see in this official example: https://github.com/snowplow/iglu-example-schema-registry/blob/master/schemas/com.example_company/example_event/jsonschema/1-0-0.

In the above example project's file structure (that many companies follow) we can't fall back on the file name either since it will be represented as numbers (not valid interface names).

This PR adds support to transform the self.name field to the schema title via the normalizer, which should be favourable when using snowplow/iglu schemas.

For reviewers

  • In this "naive" implementation I'm only checking if the self property exists and assume that's a value that contains "name" and then delete the whole "self" property. I'm happy to revise this if you think it will collide with other peoples use of this library, if its too unsafe or if you'd like to decorate further information from the self etc. Open to suggestions :))
  • I've added a snapshot and a normalizer test
  • Updated yarn.lock when running npx yarn (with the node version specified in the .nvmrc)

@bcherny
Copy link
Owner

bcherny commented Aug 27, 2023

Thanks for the contribution, but I'm not sure we want to support this non-standard behavior. Here's what I would suggest:

  1. Fix the upstream Snowplow schema generation to emit a proper $id field. Without this, JSTT and many other tools will not work correctly.
  2. Feel free to open an issue (and submit a PR) for emitting valid interface names when interfaces are numbers. I think Allow unicode (Fixes #279) #539 might cover it, actually.
  3. Finally, feel free to open an issue in this repo to track support for self. If there's a lot of interest, we can resurrect this PR.

@bcherny bcherny closed this Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants