Skip to content

Commit

Permalink
Allow defining environmen variables in config
Browse files Browse the repository at this point in the history
  • Loading branch information
TimDaub committed Apr 5, 2023
1 parent 207227d commit b290aa2
Show file tree
Hide file tree
Showing 10 changed files with 255 additions and 44 deletions.
12 changes: 6 additions & 6 deletions config.mjs
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import { env } from "process";
import { resolve } from "path";

import * as blockLogs from "@attestate/crawler-call-block-logs";
Expand All @@ -15,6 +14,7 @@ const topics = [
"0x0000000000000000000000000000000000000000000000000000000000000000",
];

const dataDir = resolve("./data");
export default {
path: [
{
Expand All @@ -23,26 +23,26 @@ export default {
module: blockLogs.extractor,
args: [range.start, range.end, address, topics, stepSize],
output: {
path: resolve(env.DATA_DIR, "call-block-logs-extraction"),
path: resolve(dataDir, "call-block-logs-extraction"),
},
},
transformer: {
module: blockLogs.transformer,
args: [],
input: {
path: resolve(env.DATA_DIR, "call-block-logs-extraction"),
path: resolve(dataDir, "call-block-logs-extraction"),
},
output: {
path: resolve(env.DATA_DIR, "call-block-logs-transformation"),
path: resolve(dataDir, "call-block-logs-transformation"),
},
},
loader: {
module: blockLogs.loader,
input: {
path: resolve(env.DATA_DIR, "call-block-logs-transformation"),
path: resolve(dataDir, "call-block-logs-transformation"),
},
output: {
path: resolve(env.DATA_DIR, "call-block-logs-load"),
path: resolve(dataDir, "call-block-logs-load"),
},
},
},
Expand Down
52 changes: 43 additions & 9 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,21 @@ variables from a ``.env`` file in the project root into Node.js's
already been set in, e.g., the ``.env`` file can be overwritten by passing them
before an invoking command.

The following environment variables are **required** for ``@attestate/crawler``
to run:

* ``RPC_HTTP_HOST`` describes the host that Ethereum JSON-RPC extraction request are made against. It must be set to an URL to an Ethereum full node's JSON-RPC endpoint that starts with ``https://``. ``ws://`` or ``wss://`` prefixes are currently not supported. We support URLs that include the API's bearer token as is the case with, e.g., Infura or Alchemy.
* ``RPC_API_KEY`` is the API key for the host extraction requests are made against. It must be set if an Ethereum full node was provisioned behind an HTTP proxy that requires a bearer token authorization via the HTTP ``Authorization`` header. In this case, the header is structurally set as follows: ``Authorization: Bearer ${RPC_API_KEY}``.
* ``DATA_DIR`` is the directory that stores all results from extraction and transformation of the crawler. It must be set to a file system path (relative or absolute).
* ``IPFS_HTTPS_GATEWAY`` describes the host that IPFS extraction requests are made against. A list of publicly accessible IPFS gateways can be found `here <https://ipfs.github.io/public-gateway-checker/>`_.
* ``IPFS_HTTPS_GATEWAY_KEY`` is the API key for the IPFS host extraction requests are made against. It must be set if an IPFS node was provisioned behind an HTTP proxy that requires a bearer token authorization via the HTTP ``Authorization`` header. In this case, the header is structurally set as follows: ``Authorization: Bearer ${IPFS_HTTPS_GATEWAY_KEY}``.
* ``ARWEAVE_HTTPS_GATEWAY`` describes the host that Arweave extraction requests are made against. A commonly-used Arweave gateway is ``https://arweave.net``.
``@attestate/crawler`` guarantees any downstream plugin or strategy the
presence and validity of environment variables. To avoid having to define
``.env`` files in applications that programmatically embed the crawler,
environment variables can be set in the config file and they'll be made
available through ``process.env``.

The environment variables with an asterisk \* are **required** for
``@attestate/crawler`` to run:

* ``RPC_HTTP_HOST``\* describes the host that Ethereum JSON-RPC extraction request are made against. It must be set to an URL to an Ethereum full node's JSON-RPC endpoint that starts with ``https://``. ``ws://`` or ``wss://`` prefixes are currently not supported. We support URLs that include the API's bearer token as is the case with, e.g., Infura or Alchemy.
* ``RPC_API_KEY`` is the API key for the host extraction requests are made against. It must be set if an Ethereum full node was provisioned behind an HTTP proxy that requires a bearer token authorization via the HTTP ``Authorization`` header. In this case, the header is structurally set as follows: ``Authorization: Bearer ${RPC_API_KEY}``.
* ``DATA_DIR``\* is the directory that stores all results from extraction and transformation of the crawler. It must be set to a file system path (relative or absolute).
* ``IPFS_HTTPS_GATEWAY``\* describes the host that IPFS extraction requests are made against. A list of publicly accessible IPFS gateways can be found `here <https://ipfs.github.io/public-gateway-checker/>`_.
* ``IPFS_HTTPS_GATEWAY_KEY`` is the API key for the IPFS host extraction requests are made against. It must be set if an IPFS node was provisioned behind an HTTP proxy that requires a bearer token authorization via the HTTP ``Authorization`` header. In this case, the header is structurally set as follows: ``Authorization: Bearer ${IPFS_HTTPS_GATEWAY_KEY}``.
* ``ARWEAVE_HTTPS_GATEWAY``\* describes the host that Arweave extraction requests are made against. A commonly-used Arweave gateway is ``https://arweave.net``.

.. note::
In some cases, you may only work with Ethereum, however, the crawler will
Expand All @@ -46,6 +52,27 @@ to run:
those cases it is sufficient to define those variables as an empty string:
``RPC_API_KEY=""``.

Overwriting environment variables in the configuration
------------------------------------------------------

To use the crawler in downstream applications in JavaScript, we want to avoid
leaking the requirement of ``dotenv``, which requires a ``.env`` file to be
present in the application folder.

Hence, applications don't need to define the environmental variables in a
``.env`` file, they can pass those into the ``function boot(config)`` in
``config.environment``. However, those variables' names will then be mapped to
camel case such that ``RPC_HTTP_HOST`` becomes ``rpcHttpHost``.

Defining ``config.environment.rpcHttpHost`` will take presedence over
``RPC_HTTP_HOST``.

.. note::
For all downstream applications, like strategies, the environment variables
will always be defined, independently of whether the developer choses to
write them in the ``config.mjs`` or ``.env`` file.


.. _configuration-crawl-path:

Configuration File
Expand All @@ -69,6 +96,13 @@ Structurally, it is defined as follows:
path: {
"...": "..."
},
// All environment variables can be alternatively defined as properties in
// the configuration file. However, they're using camel-case format here, such
// that "RPC_HTTP_HOST" becomes "rpcHttpHost".
environment: {
"rpcHttpHost": "https://example.com",
"...": "..."
},
queue: {
options: {
// The queue's concurrency controls how many requests are sent to
Expand Down
11 changes: 11 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@
"dotenv": "16.0.0",
"eth-fun": "0.9.1",
"lmdb": "2.7.9",
"yargs": "17.5.1"
"yargs": "17.5.1",
"lodash.invert": "4.3.0"
},
"devDependencies": {
"@attestate/crawler-call-block-logs": "0.2.2",
Expand Down
14 changes: 12 additions & 2 deletions src/boot.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ export function validateConfig(config) {
}

export async function createWorker(config) {
environment.validate(environment.requiredVars);
await disc.provisionDir(resolve(env.DATA_DIR));
await disc.provisionDir(resolve(config.environment.dataDir));

const worker = new Worker(workerPath, {
workerData: config,
Expand All @@ -45,8 +44,19 @@ export async function createWorker(config) {
return worker;
}

export function augment(config) {
const { requiredVars, optionalVars } = environment;
const collected = environment.collect({ ...requiredVars, ...optionalVars });
const configEnv = config.environment ?? {};
const copy = { ...config };
copy.environment = { ...configEnv, ...collected };
return copy;
}

export async function boot(config) {
config = augment(config);
validateConfig(config);
environment.set(config.environment);
// NOTE: We still use @neume-network/extraction-worker that implements a
// older version of the crawler configuration. But since in
// @attestate/crawler, we've merged the path and the config, we'll have to
Expand Down
38 changes: 25 additions & 13 deletions src/environment.mjs
Original file line number Diff line number Diff line change
@@ -1,21 +1,33 @@
// @format
import { env } from "process";

import invert from "lodash.invert";

import { NotFoundError } from "./errors.mjs";

export const requiredVars = [
"RPC_HTTP_HOST",
"DATA_DIR",
"IPFS_HTTPS_GATEWAY",
"ARWEAVE_HTTPS_GATEWAY",
];
export const requiredVars = {
RPC_HTTP_HOST: "rpcHttpHost",
DATA_DIR: "dataDir",
IPFS_HTTPS_GATEWAY: "ipfsHttpsGateway",
ARWEAVE_HTTPS_GATEWAY: "arweaveHttpsGateway",
};

export const optionalVars = {
RPC_API_KEY: "rpcApiKey",
IPFS_HTTPS_GATEWAY_KEY: "ipfsHttpsGatewayKey",
};

export function collect(vars) {
const config = {};
for (const [envAlias, configAlias] of Object.entries(vars)) {
if (env[envAlias]) config[configAlias] = env[envAlias];
}
return config;
}

export function validate(required) {
for (const name of required) {
if (!env[name]) {
throw new NotFoundError(
`Didn't find required name "${name}" in environment`
);
}
export function set(configuration) {
const allVars = invert({ ...requiredVars, ...optionalVars });
for (const [configAlias, envAlias] of Object.entries(allVars)) {
env[envAlias] = configuration[configAlias];
}
}
2 changes: 0 additions & 2 deletions src/lifecycle.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ import path from "path";
import { createInterface } from "readline";
import { createReadStream, appendFileSync } from "fs";
import EventEmitter, { once } from "events";
import { env } from "process";

import Ajv from "ajv";
import addFormats from "ajv-formats";
Expand All @@ -23,7 +22,6 @@ export const EXTRACTOR_CODES = {
};

const log = logger("lifecycle");
const dataDir = path.resolve(env.DATA_DIR);
const ajv = new Ajv();
addFormats(ajv);

Expand Down
38 changes: 38 additions & 0 deletions src/schemata/configuration.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,48 @@ const path = {
},
};

const environment = {
type: "object",
additionalProperties: false,
required: [
"rpcHttpHost",
"dataDir",
"ipfsHttpsGateway",
"arweaveHttpsGateway",
],
properties: {
rpcHttpHost: {
type: "string",
format: "uri",
pattern: "^https?://",
},
rpcApiKey: {
type: "string",
},
dataDir: {
type: "string",
},
ipfsHttpsGateway: {
type: "string",
format: "uri",
pattern: "^https?://",
},
ipfsHttpsGatewayKey: {
type: "string",
},
arweaveHttpsGateway: {
type: "string",
format: "uri",
pattern: "^https?://",
},
},
};

const config = {
type: "object",
required: ["queue"],
properties: {
environment,
path: { ...path },
queue: {
type: "object",
Expand Down
51 changes: 49 additions & 2 deletions test/boot_test.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,54 @@ import { env } from "process";
import { dirname, resolve } from "path";
import { fileURLToPath } from "url";

import { boot, createWorker, getConfig, validateConfig } from "../src/boot.mjs";
import {
boot,
createWorker,
getConfig,
validateConfig,
augment,
} from "../src/boot.mjs";

import configuration from "../src/schemata/configuration.mjs";
import { requiredVars } from "../src/environment.mjs";

const __dirname = dirname(fileURLToPath(import.meta.url));

test("if required environment vars in module and schema match", (t) => {
const configVars = configuration.properties.environment.required;
const envVars = Object.values(requiredVars);
t.deepEqual(configVars, envVars);
});

test.serial("if non-existent environment object is handled gracefully", (t) => {
env.RPC_API_KEY = "ABC";
const config = {};
const nextConfig = augment(config);
t.is(nextConfig.environment.rpcApiKey, env.RPC_API_KEY);
});

test.serial("falling back on config vars when env vars aren't present", (t) => {
delete env.RPC_API_KEY;
const config = {
environment: {
rpcApiKey: "key",
},
};
const nextConfig = augment(config);
t.is(nextConfig.environment.rpcApiKey, config.environment.rpcApiKey);
});

test.serial("overwriting config variables with the environment", (t) => {
env.RPC_API_KEY = "OVERWRITE";
const config = {
environment: {
rpcApiKey: "key",
},
};
const nextConfig = augment(config);
t.is(nextConfig.environment.rpcApiKey, env.RPC_API_KEY);
});

test.serial("if boot can be started programmatically", async (t) => {
let hitInit = false;
let hitUpdate = false;
Expand Down Expand Up @@ -76,7 +120,10 @@ test.serial("if boot can throw errors", async (t) => {

test.serial("should be able to create worker", (t) => {
return new Promise((resolve, reject) => {
createWorker({ queue: { options: { concurrent: 10 } } }).then((w) => {
createWorker({
environment: { dataDir: "data" },
queue: { options: { concurrent: 10 } },
}).then((w) => {
setTimeout(() => {
// NOTE: no error has occured until now, safe to pass the test
t.pass();
Expand Down

0 comments on commit b290aa2

Please sign in to comment.