Curated SQL Views and Metrics for the MetricaDAO Curation Pod.
What's dbt? Learn more here!
There are several ways to use dbt on you machine or in the cloud. We have put together a set of instructions on the pod documentation page, linked here.
/models
- this directory contains SQL files with Jinja templates. DBT will compile these templates and wrap them into create table statements. This means all you have to do is define SQL select statements, while DBT handles the rest. The snowflake table name will match the name of the sql model file.
/macros
- these are helper functions defined as Jinja that can be injected into your SQL models.
/tests
- custom SQL tests that can be attached to tables.
/seeds
- a directory for adding hard-coded data to the models via CSV upload. This is commonly used for known labels or sample data for unit testing.
This project utilizes a set of archived Harmony data for practice purposes. The original project repository is located here. The present form has been refactored to meet our bronze/silver/gold model standards.
The primary source of projects, thus far, has been Flipside Crypto's chainwalker database. The raw data is shared via direct account share in Snowflake for us to use and curate. The structure follows the below, but may change as their ingestion process evolves. The blocks and transactions data is generally the raw response from RPC endpoints.
CHAINWALKERS.PROD.<chain>_BLOCKS
CHAINWALKERS.PROD.<chain>_TXS
Blocks and transactions are fed into the above two tables utilizing the Chainwalkers Framework. Details on the data:
- This is near-real time. Blocks land in this table within 3-5 minutes of being minted.
- The table is a read-only data share in the Metrics DAO Snowflake account under the database
FLIPSIDE
. - The table is append-only, meaning that duplicates can exist if blocks are re-processed. The injested_at timestamp should be used to retrieve only the most recent block. Macros exist
macros/dedupe_utils.sql
to handle this. Seemodels/core/blocks.sql
or/models/core/txs.sql
for an example. - Tx logs are decoded where an ABI exists.
CHAINWALKERS.PROD.<chain>_BLOCKS
Column | Type | Description |
---|---|---|
record_id | VARCHAR | A unique id for the record generated by Chainwalkers |
offset_id | NUMBER(38,0) | Synonmous with block_id for Terra |
block_id | NUMBER(38,0) | The height of the chain this block corresponds with |
block_timestamp | TIMESTAMP | The time the block was minted |
network | VARCHAR | The blockchain network (i.e. mainnet, testnet, etc.) |
chain_id | VARCHAR | Synonmous with blockchain name for Terra |
tx_count | NUMBER(38,0) | The number of transactions in the block |
header | json variant | A json queryable column containing the blocks header information |
ingested_at | TIMESTAMP | The time this data was ingested into the table by Snowflake |
CHAINWALKERS.PROD.<chain>_TXS
Column | Type | Description |
---|---|---|
record_id | VARCHAR | A unique id for the record generated by Chainwalkers |
tx_id | VARCHAR | A unique on chain identifier for the transaction |
tx_block_index | NUMBER(38,0) | The index of the transaction within the block. Starts at 0. |
offset_id | NUMBER(38,0) | Synonmous with block_id for Terra |
block_id | NUMBER(38,0) | The height of the chain this block corresponds with |
block_timestamp | TIMESTAMP | The time the block was minted |
network | VARCHAR | The blockchain network (i.e. mainnet, testnet, etc.) |
chain_id | VARCHAR | Synonmous with blockchain name for Terra |
tx_count | NUMBER(38,0) | The number of transactions in the block |
header | json variant | A json queryable column containing the blocks header information |
tx | array | An array of json queryable objects containing each tx and decoded logs |
ingested_at | TIMESTAMP | The time this data was ingested into the table by Snowflake |
Data in this DBT project is written to the ONBOARDING
database in MetricsDAO.
There is a second ONBOARDING_DEV
database that is where curators will test their work against the live data. The production database should never be touched by an analyst. For this onboarding project, prod will remain a historical archive but for a live project, this would be regularly updated to keep up with chainhead.
The _DEV
database is reset on a daily basis via a drop and clone of prod. This is to ensure the data in dev is fresh within 24h while using minimal compute credits. Further, because an unknown number of curators may be working on a dev database at one time, it is important that changes made are wiped, to a certain extent. For example, if a curator tests a change to messages
and the dev table remains different from prod, anything built downstream might break when merged to prod.
This does mean that a curator working on a silver table will have to create their model each day it is being worked on, as it will not persist past 24h.
When conducting work please branch off of main with a description branch name and generate a pull request. At least one other individual must review the PR before it can be merged into main. Once merged into main DBT Cloud will run the new models and output the results into the PROD
schema.
When creating a PR please include the following details in the PR description:
- List of Tables Created or Modified
- Description of changes.
- Implication of changes (if any).
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Check out the blog for the latest news on dbt's development and best practices