Skip to content

MetricsDAO/onboarding_dbt

Repository files navigation

Onboarding DBT Project

Curated SQL Views and Metrics for the MetricaDAO Curation Pod.

What's dbt? Learn more here!

Setup

There are several ways to use dbt on you machine or in the cloud. We have put together a set of instructions on the pod documentation page, linked here.

Project Overview

/models - this directory contains SQL files with Jinja templates. DBT will compile these templates and wrap them into create table statements. This means all you have to do is define SQL select statements, while DBT handles the rest. The snowflake table name will match the name of the sql model file.

/macros - these are helper functions defined as Jinja that can be injected into your SQL models.

/tests - custom SQL tests that can be attached to tables.

/seeds - a directory for adding hard-coded data to the models via CSV upload. This is commonly used for known labels or sample data for unit testing.

Background on Data

This project utilizes a set of archived Harmony data for practice purposes. The original project repository is located here. The present form has been refactored to meet our bronze/silver/gold model standards.

The primary source of projects, thus far, has been Flipside Crypto's chainwalker database. The raw data is shared via direct account share in Snowflake for us to use and curate. The structure follows the below, but may change as their ingestion process evolves. The blocks and transactions data is generally the raw response from RPC endpoints. CHAINWALKERS.PROD.<chain>_BLOCKS CHAINWALKERS.PROD.<chain>_TXS

Blocks and transactions are fed into the above two tables utilizing the Chainwalkers Framework. Details on the data:

  1. This is near-real time. Blocks land in this table within 3-5 minutes of being minted.
  2. The table is a read-only data share in the Metrics DAO Snowflake account under the database FLIPSIDE.
  3. The table is append-only, meaning that duplicates can exist if blocks are re-processed. The injested_at timestamp should be used to retrieve only the most recent block. Macros exist macros/dedupe_utils.sql to handle this. See models/core/blocks.sql or /models/core/txs.sql for an example.
  4. Tx logs are decoded where an ABI exists.

Table Structures:

CHAINWALKERS.PROD.<chain>_BLOCKS

Column Type Description
record_id VARCHAR A unique id for the record generated by Chainwalkers
offset_id NUMBER(38,0) Synonmous with block_id for Terra
block_id NUMBER(38,0) The height of the chain this block corresponds with
block_timestamp TIMESTAMP The time the block was minted
network VARCHAR The blockchain network (i.e. mainnet, testnet, etc.)
chain_id VARCHAR Synonmous with blockchain name for Terra
tx_count NUMBER(38,0) The number of transactions in the block
header json variant A json queryable column containing the blocks header information
ingested_at TIMESTAMP The time this data was ingested into the table by Snowflake

CHAINWALKERS.PROD.<chain>_TXS

Column Type Description
record_id VARCHAR A unique id for the record generated by Chainwalkers
tx_id VARCHAR A unique on chain identifier for the transaction
tx_block_index NUMBER(38,0) The index of the transaction within the block. Starts at 0.
offset_id NUMBER(38,0) Synonmous with block_id for Terra
block_id NUMBER(38,0) The height of the chain this block corresponds with
block_timestamp TIMESTAMP The time the block was minted
network VARCHAR The blockchain network (i.e. mainnet, testnet, etc.)
chain_id VARCHAR Synonmous with blockchain name for Terra
tx_count NUMBER(38,0) The number of transactions in the block
header json variant A json queryable column containing the blocks header information
tx array An array of json queryable objects containing each tx and decoded logs
ingested_at TIMESTAMP The time this data was ingested into the table by Snowflake

Target Database, Schemas and Tables

Data in this DBT project is written to the ONBOARDING database in MetricsDAO.

There is a second ONBOARDING_DEV database that is where curators will test their work against the live data. The production database should never be touched by an analyst. For this onboarding project, prod will remain a historical archive but for a live project, this would be regularly updated to keep up with chainhead.

The _DEV database is reset on a daily basis via a drop and clone of prod. This is to ensure the data in dev is fresh within 24h while using minimal compute credits. Further, because an unknown number of curators may be working on a dev database at one time, it is important that changes made are wiped, to a certain extent. For example, if a curator tests a change to messages and the dev table remains different from prod, anything built downstream might break when merged to prod.

This does mean that a curator working on a silver table will have to create their model each day it is being worked on, as it will not persist past 24h.

Branching / PRs

When conducting work please branch off of main with a description branch name and generate a pull request. At least one other individual must review the PR before it can be merged into main. Once merged into main DBT Cloud will run the new models and output the results into the PROD schema.

When creating a PR please include the following details in the PR description:

  1. List of Tables Created or Modified
  2. Description of changes.
  3. Implication of changes (if any).

More DBT Resources:

  • Learn more about dbt in the docs
  • Check out Discourse for commonly asked questions and answers
  • Check out the blog for the latest news on dbt's development and best practices

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published