Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sink/integration developer experience #182

Closed
fracek opened this issue Jul 25, 2023 · 3 comments
Closed

Improve sink/integration developer experience #182

fracek opened this issue Jul 25, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@fracek
Copy link
Contributor

fracek commented Jul 25, 2023

Is your feature request related to a problem? Please describe.
At the moment, the developer experience of using sinks/integrations is not ideal:

  • data filter is a json file, so we need external code generation to generate filters
  • the javascript script is only used for transformation
  • options are set using cli arguments or environment variables

Describe the solution you'd like
We should unify everything to improve the developer experience. We do that by using the javascript file for both configuration and transformation.

import { Configuration, StarknetFilter, StarknetBlock, PostgresSink } from '@apibara/integration'

export const config: Configuration<StarknetFilter, PostgresSink> = {
  type: 'starknet',
  stream: {
    url: 'https://mainnet.starknet.a5a.ch',
    bearerToken: Deno.env.get('DNA_TOKEN'),
    // other options
  },
  startingCursor: 123_456,
  filter: Filter().withHeader({ weak: false }).toObject(),
  sink: {
    type: 'postgres',
    options: {
      connectionUrl: 'postgres://....',
    }
  }
}

export default function transform(batch: StarknetBlock[]) {
  // do something with data
}

Notice that the configuration needs to be generic over the filter and sink types.

One of the challenges is that for the hosted service we want users to connect to the streams using the internal network (to avoid paying for egress charges), so we cannot let them freely select the stream url and token and instead we want to override that config.

We achieve this by having the following priority for the configuration (higher is better).

  1. defaults
  2. config from script
  3. environment variables
  4. command line arguments

This way, users can use any value in the script for testing and when they deploy to the hosted service we overrides the problematic values.

We will provide a new apibara cli tool that is the entrypoint for running Apibara scripts. For example:

  • apibara run script.ts: runs the script
  • apibara run script.ts --stream.bearer-token=xxx: overrides the stream bearer token

We want to keep the sink abstraction extensible to encourage developers to build their own to integrate with their favourite tools. We do that by delegating the execution of the script to another tool based on the value of sink.type.

The execution trace of apibara run is as follows:

  • reads and validates script.
  • gets value of sink.type.
  • forwards script and cli flags to apibara-sink-<sink.type> (e.g. apibara-sink-postgres) (the executable is expected to be in $PATH).

In the future, we can replace the third step with a more sophisticated approach where the sink and the runner communicate through a grpc service, but for now it adds complexity for no clear benefit.

By convention, sink options can be overriden as follows:

  • cli --<sink.type>.<option-name> (e.g. --postgres.connection-url)
  • env var <SINK_TYPE>_<OPTION_NAME> (e.g. POSTGRES_CONNECTION_URL)

Configuration through env variables is important for production since we can't hard-code secrets in the script.

Additional context
The configuration approach is similar to Grafana K6, the multi-binary approach is similar to Pulumi.

@fracek fracek added the enhancement New feature or request label Jul 25, 2023
@bigherc18
Copy link
Collaborator

  • How do you want to read the JS/TS file ? Deno ?
  • I suggest we avoid nesting, I'd prefer the config to look like
export const config: Configuration<StarknetFilter, PostgresSink> = {
  type: 'starknet',
  streamUrl: 'https://mainnet.starknet.a5a.ch',
  bearerToken: Deno.env.get('DNA_TOKEN'),
  // other options
  startingCursor: 123_456,
  filter: Filter().withHeader({ weak: false }).toObject(),
  sinkType: 'postgres',
  sinkOptions: {
      connectionUrl: 'postgres://....',
    }
  }
}
  • --stream.bearer-token=xxx this looks not so standard to me, I'd prefer it to be just --bearer-token or --stream-bearer-token
  • Why do we need to prefix the sink options with --<sink.type> ? it looks cumbersome to me

@fracek
Copy link
Contributor Author

fracek commented Jul 26, 2023

  • Yes, the javascript runtime is Deno. The ts -> js transpilation is provided by a crate that leverages swc
  • I agree it looks better without nesting.
  • The dot in the name is used by some tool (grafana etc, geth) to mimic the nesting in the config. If we remove nesting it's not needed anymore.
  • The prefix is needed to avoid clashing when using env variables. So let's say I need to use a bearer token to authenticate with my db, then it's in the DB_BEARER_TOKEN env variable and doesn't clash with the stream bearer token.

@fracek
Copy link
Contributor Author

fracek commented Aug 1, 2023

This was done in #188

@fracek fracek closed this as completed Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants