Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

133/prometheus #173

Merged
merged 11 commits into from Apr 28, 2021
55 changes: 41 additions & 14 deletions README.md
Expand Up @@ -14,7 +14,6 @@ the relayer works, but the Quick Start probably gives a better intro.
- RPC addresses of 2 full nodes on compatible, IBC-enabled chains
- See [Chain Requirements below](#Chain-Requirements) for details of what chains are supported


## Installation

### NPM
Expand Down Expand Up @@ -66,7 +65,7 @@ Reads the configuration and starts relaying packets.
- pulls default `registry.yaml` to relayer's home
- funds addresses on both sides so relayer can pay the fee while relaying packets

> **NOTE:** Test blockchains `relayer_test_1` and `relayer_test_2` are running in the public. You do not need to start any blockchain locally to complete the quick start guide.
> **NOTE:** Test blockchains `relayer_test_1` and `relayer_test_2` are running in the public. You do not need to start any blockchain locally to complete the quick start guide.

> **NOTE:** Run `ibc-setup balances` to see the amount of tokens on each address.

Expand All @@ -88,15 +87,16 @@ Reads the configuration and starts relaying packets.
### Send tokens between chains

1. Make sure `wasmd` binary is installed on your system

- you must be running Linux or OSX on amd64 (not arm64/Mac M1)
- [install Go 1.15+](https://golang.org/doc/install) and ensure that `$PATH` includes Go binaries (you may need to restart your terminal session)
- clone and install `wasmd`:
```sh
git clone https://github.com/CosmWasm/wasmd.git
cd wasmd
git checkout v0.15.1
make install
```
- clone and install `wasmd`:
```sh
git clone https://github.com/CosmWasm/wasmd.git
cd wasmd
git checkout v0.15.1
make install
```

2. Create a new account and fund it

Expand Down Expand Up @@ -125,6 +125,7 @@ Reads the configuration and starts relaying packets.
## Configuration overview

The relayer configuration is stored under relayer's home directory. By default, it's located at `$HOME/.ibc-setup`, however, can be customized with `home` option, e.g.:

```sh
# initialize the configuration at /home/user/relayer_custom_home
ibc-setup init --home /home/user/relayer_custom_home
Expand All @@ -136,19 +137,45 @@ ibc-relayer start --home /home/user/relayer_custom_home
There are 3 files that live in the relayer's home.

- **registry.yaml** (required)
Contains a list of available chains with corresponding information. The chains from the registry can be referenced by `ibc-setup` binary or within the `app.yaml` file. [View an example of registry.yaml file.](demo/registry.yaml)

Contains a list of available chains with corresponding information. The chains from the registry can be referenced by `ibc-setup` binary or within the `app.yaml` file. [View an example of registry.yaml file.](demo/registry.yaml)

- **app.yaml** (optional)

Holds the relayer-specific options such as source or destination chains. These options can be overridden with CLI flags or environment variables.

Holds the relayer-specific options such as source or destination chains. These options can be overridden with CLI flags or environment variables.

- **last-queried-heights.json** (optional)

Stores last queried heights for better performance on relayer startup. It's constantly overwritten with new heights when relayer is running. Simply delete this file to scan the events since forever.

[Learn more about configuration.](spec/config.md)

## Monitoring

The relayer collects various metrics that a [Prometheus](https://prometheus.io/docs/introduction/overview/) instance can consume.

To enable metrics collection, pass the `--enable-metrics` flag when starting the relayer:

```sh
ibc-relayer start --enable-metrics
```

> **NOTE:** Metrics can also be enabled via an environment variable `RELAYER_ENABLE_METRICS=true`, or with an `enableMetrics: true` entry in the `app.yaml` file, as explained in the [config specification](./spec/config.md#configuration).

The `GET /metrics` endpoint will be exposed by default on port `26660`, which you can override with `--metrics-port` flag, `RELAYER_METRICS_PORT` env variable, or `metricsPort` entry in `app.yaml`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice list of options to set the port. 🌷
Just a note: port 26660 is used by Tendermint as their default port for Prometheus and therefore for any blockchain based on cosmos-sdk like wasmd. This may cause some conflicts when both run on the same machine. 🤷
If you want to change this then I would go with 8080 where people expect conflicts anyway :-)


### Local setup

1. Start the relayer with metrics enabled
2. Spin up the Prometheus instance:
```sh
docker run -it -v $(pwd):/prometheus -p9090:9090 prom/prometheus --config.file=prometheus.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, a default prometheus config committed in repo. Better than any docs.

```
> **NOTE:** Ensure that `the --config.file=<path>` flag points at the existing configuration file. You can find an example here: [prometheus.yaml](prometheus.yaml).
3. Open the Prometheus dashboard in a browser at [http://localhost:9090](http://localhost:9090)

## Development

[Refer to the development page.](DEVELOPMENT.md)

## Chain Requirements
Expand Down
1 change: 1 addition & 0 deletions package.json
Expand Up @@ -43,6 +43,7 @@
"fast-safe-stringify": "2.0.4",
"js-yaml": "4.0.0",
"lodash": "4.17.21",
"prom-client": "13.1.0",
"protobufjs": "6.10.2",
"table": "6.0.7",
"triple-beam": "1.3.0",
Expand Down
10 changes: 10 additions & 0 deletions prometheus.yaml
@@ -0,0 +1,10 @@
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: confio_relayer

scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:26660']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example configuration file. Is this the right place to store it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to commit it to this repo, so it is maintained.

You could put it in a samples or ops dir. We can add example systemd and other configs there as needed.
(Maybe we move it together with the registry.conf file?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that init command has registry.yaml hardcoded to main branch, so I cannot move registry under samples.

'https://raw.githubusercontent.com/confio/ts-relayer/main/demo/registry.yaml'

One way to "fix" this would be to use parametrized link with version:

`https://raw.githubusercontent.com/confio/ts-relayer/tree/v${packageJson.version}/demo/registry.yaml`

But it fixes the future versions only. Everything till 0.1.3 included will break.

👉 For now, I moved prometheus.yaml under demo. 92287b8

Copy link
Member

@ethanfrey ethanfrey Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's keep it in demo for now (marked deprecated), and make the main copy in another directory (final location)

I am not sure about fixing to a version. We can think about a better location, but nice to be able to roll out registry without requiring upgrades

90 changes: 62 additions & 28 deletions src/binary/ibc-relayer/commands/start.ts
Expand Up @@ -16,6 +16,7 @@ import { resolveHomeOption } from '../../utils/options/shared/resolve-home-optio
import { resolveKeyFileOption } from '../../utils/options/shared/resolve-key-file-option';
import { resolveMnemonicOption } from '../../utils/options/shared/resolve-mnemonic-option';
import { signingClient } from '../../utils/signing-client';
import { Metrics, setupPrometheus } from '../setup-prometheus';

type ResolveHeightsParams = {
scanFromSrc: number | null;
Expand Down Expand Up @@ -67,53 +68,57 @@ function resolveHeights(
}

type LoopFlags = {
poll?: string;
maxAgeSrc?: string;
maxAgeDest?: string;
once: boolean;
readonly poll?: string;
readonly maxAgeSrc?: string;
readonly maxAgeDest?: string;
readonly once: boolean;
};
type LoopOptions = {
// how many seconds we sleep between relaying batches
poll: number;
readonly poll: number;
// number of seconds old the client on chain A can be
maxAgeSrc: number;
readonly maxAgeSrc: number;
// number of seconds old the client on chain B can be
maxAgeDest: number;
readonly maxAgeDest: number;
// if set to 'true' quit after one pass
once: boolean;
readonly once: boolean;
};

type Flags = {
interactive: boolean;
home?: string;
src?: string;
dest?: string;
keyFile?: string;
mnemonic?: string;
srcConnection?: string;
destConnection?: string;
scanFromSrc?: string;
scanFromDest?: string;
readonly interactive: boolean;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will was here 😄

readonly home?: string;
readonly src?: string;
readonly dest?: string;
readonly keyFile?: string;
readonly mnemonic?: string;
readonly srcConnection?: string;
readonly destConnection?: string;
readonly scanFromSrc?: string;
readonly scanFromDest?: string;
readonly enableMetrics: boolean;
readonly metricsPort?: string;
} & LoggerFlags &
LoopFlags;

type Options = {
home: string;
src: string;
dest: string;
mnemonic: string;
srcConnection: string;
destConnection: string;
heights: RelayedHeights | null;
readonly home: string;
readonly src: string;
readonly dest: string;
readonly mnemonic: string;
readonly srcConnection: string;
readonly destConnection: string;
readonly heights: RelayedHeights | null;
readonly enableMetrics: boolean;
readonly metricsPort: number;
} & LoopOptions;

// some defaults for looping
export const defaults = {
// check once per minute
poll: 60,
// once per day: 86400s
maxAgeSrc: 86400,
maxAgeDest: 86400,
metricsPort: 26660,
};

export async function start(flags: Flags, logger: Logger) {
Expand Down Expand Up @@ -171,6 +176,20 @@ export async function start(flags: Flags, logger: Logger) {
flags.scanFromDest,
process.env.RELAYER_SCAN_FROM_DEST
);
const enableMetrics =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use resolveOption here like everywhere else?

Ah, there is no support for boolean, only string and integer?
If that is the case, leave it for this PR and make an issue to add boolean support, pointing to this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

#174

flags.enableMetrics ||
Boolean(process.env.RELAYER_ENABLE_METRICS) ||
app?.enableMetrics ||
false;
const metricsPort = resolveOption('metricsPort', {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

integer: true,
required: true,
})(
flags.metricsPort,
process.env.RELAYER_METRICS_PORT,
app?.metricsPort,
defaults.metricsPort
);

const heights = resolveHeights({ scanFromSrc, scanFromDest, home }, logger);

Expand All @@ -189,12 +208,20 @@ export async function start(flags: Flags, logger: Logger) {
maxAgeDest,
once,
heights,
enableMetrics,
metricsPort,
};

await run(options, logger);
}

async function run(options: Options, logger: Logger) {
const metrics = setupPrometheus({
enabled: options.enableMetrics,
port: options.metricsPort,
logger,
});

const registryFilePath = path.join(options.home, registryFile);
const { chains } = loadAndValidateRegistry(registryFilePath);
const srcChain = chains[options.src];
Expand Down Expand Up @@ -224,10 +251,15 @@ async function run(options: Options, logger: Logger) {
logger
);

await relayerLoop(link, options, logger);
await relayerLoop(link, options, logger, metrics);
}

async function relayerLoop(link: Link, options: Options, logger: Logger) {
async function relayerLoop(
link: Link,
options: Options,
logger: Logger,
metrics: Metrics
) {
let nextRelay = options.heights ?? {};
const lastQueriedHeightsFilePath = path.join(
options.home,
Expand Down Expand Up @@ -262,5 +294,7 @@ async function relayerLoop(link: Link, options: Options, logger: Logger) {
logger.info(`Sleeping ${options.poll} seconds...`);
await sleep(options.poll * 1000);
logger.info('... waking up and checking for packets!');

metrics?.pollCounter?.inc();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With nullish coalescing operator, we can collect metrics nicely w/o condition blocks (if metrics are disabled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, but I find it less clear than an if statement.

}
}
7 changes: 7 additions & 0 deletions src/binary/ibc-relayer/index.ts
Expand Up @@ -38,6 +38,13 @@ const startCommand = program
.addOption(mnemonicOption)
.addOption(srcConnection)
.addOption(destConnection)
.option(
'--enable-metrics',
'Enable Prometheus metrics collection and GET /metrics endpoint'
)
.option(
`--metrics-port <port>', 'Specify port for GET /metrics http server (default: ${startDefaults.metricsPort})`
)
.option(
'--poll <frequency>',
`How many seconds we sleep between checking for packets (default: ${startDefaults.poll})`
Expand Down
60 changes: 60 additions & 0 deletions src/binary/ibc-relayer/setup-prometheus.ts
@@ -0,0 +1,60 @@
import http from 'http';

import client from 'prom-client';

import { Logger } from '../create-logger';

let initialized = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do we need some kind of global state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just nice to have to make sure we don't initialize the HTTP server more than once: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR28-R32

I thought of implementing a singleton here but since it takes arguments like enabled I decided to just throw on second call.


function getMetrics() {
return {
pollCounter: new client.Counter({
name: 'poll_counter',
help: 'Poll counter',
}),
Comment on lines +11 to +14
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only an example of a custom metric that should be removed. Let's replace this with some real ones.

Copy link
Contributor

@alpe alpe Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good start! Counter makes sense here for this metric. 👍

Some notes for a future version:
I have noticed though that the naming does not fit the others, yet:

confio_relayer_nodejs_gc_duration_seconds_count{kind="minor"} 80
poll_counter 13

With a common prefix we can find all relayer metrics easily. Otherwise some context is missing as a prometheus instance is likely not exclusive to the relayer.

No idea if the relayer is always bidirectional but as this would involve 2 chains the semantics of poll is not fully clear to me. You can use labels to mark which chain you polled.

Naming is always hard. You don't need to carry the type inside the name. Maybe "_count" or "_total" is a better fit. I always found this page very helpful: https://prometheus.io/docs/practices/naming/

};
}

export type Metrics = {
pollCounter: client.Counter<string>;
} | null;
export function setupPrometheus({
enabled,
port,
logger,
}: {
enabled: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these readonly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! 4d93ebd

port: number;
logger: Logger;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger is not yet used. Do you plan to use it in this PR or a direct follow up?
If not, can we remove it until there is a clear use? We can always add it back then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used here: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR57

or do you think it's better to do console log? I would rather keep the logger to make sure the endpoint is mentioned in the logs.

}): Metrics {
if (initialized) {
throw new Error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to find bugs in our code fast.

I was going to suggest to only set initialized to true if enabled was true, but that means double-registry bugs only surface in production. It is good like this

`"setupPrometheus" func shouldn't be initialized more than once.`
);
}
initialized = true;

if (!enabled) {
return null;
}

client.collectDefaultMetrics({ prefix: 'confio_relayer_' });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const server = http.createServer(async (request, response) => {
if (request.method === 'GET' && request.url === '/metrics') {
const metrics = await client.register.metrics();
response.writeHead(200, {
'Content-Type': client.register.contentType,
});
response.end(metrics);

return;
}

response.writeHead(404);
response.end('404');
});
server.listen(port);
logger.info(`Prometheus GET /metrics exposed on port ${port}.`);

return getMetrics();
}
2 changes: 2 additions & 0 deletions src/binary/types.ts
Expand Up @@ -22,6 +22,8 @@ export type AppConfig = {
destConnection?: string;
mnemonic?: string;
keyFile?: string;
enableMetrics?: boolean;
metricsPort?: number;
};

export type LoggerFlags = {
Expand Down
2 changes: 2 additions & 0 deletions src/binary/utils/load-and-validate-app.ts
Expand Up @@ -41,6 +41,8 @@ export function loadAndValidateApp(home: string) {
destConnection: { type: 'string', nullable: true },
mnemonic: { type: 'string', nullable: true },
keyFile: { type: 'string', nullable: true },
enableMetrics: { type: 'boolean', nullable: true },
metricsPort: { type: 'number', nullable: true },
},
};
const validate = ajv.compile(schema);
Expand Down