[CT-313] Package new adapters tests #4812

leahwicz · 2022-03-02T03:37:35Z

We want to be able to package the new adapter tests and have 1 internal adapter be able to use them

jtcohen6 · 2022-03-21T15:24:46Z

@gshank and I just had a chance to talk through this. Notes from the conversation:

Requirements / user stories

As a contributor to dbt-core, when I make a change in dbt-core that requires an associated change to the adapter tests (whether underlying framework, reused fixtures/methods, or specific test cases)...
- Bad: I don’t find out the tests are broken until someone reports an issue — status quo with pytest-dbt-adapter
- Good: I find out during CI.
- Better: I can make requisite changes within the same PR (same codebase).
As a plugin developer/maintainer, I can install the adapter tests (framework + basic test cases) using pip...
- Good: From GitHub, subdirectory — current (as of Set up adapter testing framework for use by adapter test repos #4846), fine for now while we're iterating
- Better: From PyPi — status quo with pytest-dbt-adapter
As a plugin maintainer, I know which version of the adapter tests to install because...
- Bad: The versions make no sense and are documented nowhere — status quo with pytest-dbt-adapter
- Good: Version compatibility matrix is clearly documented.
- Better: Version compatibility works the same as for all plugins—Major.Minor always match, then use latest patch for each plugin
  - Caveat: The adapter framework would be major-version 1 from the jump. Are we okay with that? (It's also true of brand-new adapter plugins that follow our versioning recommendations)
As a plugin maintainer, I need to make changes to tests when...
- Bad: Anything changes in dbt-core
- Good: In minor versions, to account for internal changes that are clearly documented / communicated
- Best: When actual, end-user-facing dbt-core functionality changes

My instincts

~~Codebase: same as dbt-core~~ This is an open question
Installation: separate
- Utilities as extra: pip install dbt-core[tests] — people don’t need test dependencies (pytest) unless they want them
- Test cases as totally separate python package that requires dbt-core

What's included in the package?

“Baseline” functionality that every plugin must pass (or override/reimplement for a good reason) — parity with status quo pytest-dbt-adapter
Optional test cases for additional functionality that is common across plugins. All of our plugins will need many of these (currently copy-pasted from dbt-core), and I anticipate external plugins may want to opt into some of them as well.
Reusable fixtures + framework. These will enable each plugin to define net-new tests, for behavior that is specific/unique to that database.

(Eventually) Unit tests for each required method/macro, described here + instantiated by scaffolding script, to aid new plugin developers while building the initial plugin. This enables development phase 1 described in Rework dbt-adapter-tests to improve the experience of adapter development dbt-adapter-tests#25.

jtcohen6 · 2022-03-25T19:54:21Z

Thinking through two of the open questions mentioned above:

Interface boundary: test utilities vs. test cases

We feel good about this interface boundary:

Reusable test utilities (dbt.tests module, usable by anyone running pytest + dbt). These are pretty "close to the metal," in that they hook into dbt-core internals and then expose a (relatively) stable interface for testing higher-order functionality. This is theoretically useful to any user of dbt-core, since it allows anyone to test dbt functionality (including their own custom dbt code) via pytest. (Cool!) Even if we don't document it right away, it will form part of the eventual feature-set of dbt-core as a Python library. One thought we had was whether these test utilities want to be an extra requirement, not included by default: pip install dbt-core[tests]. That's just a judgment call on our part, and should be very simple to execute if desired. Let's just make a decision before releasing v1.1.0rc1.
Specific test cases (top-level tests module in this repo) should not live within dbt-core (the Python package), but they are key to the process of developing and testing dbt-core (the Python package). This includes both (a) unit tests that can be run with only dbt-core installed, and (b) functional/integration tests that require an adapter/database.

For many many tests of standard dbt-core behavior with a database, we use Postgres, and we don't need to replicate these tests on other databases. (Someday, we could aim to use another database, even more lightweight, such as SQLite—if we could mock things out to our satisfaction. That would enable us to move dbt-postgres out to a separate repository.)

A subset of the test cases described in (2) do need to be packaged up, because we want to test the same functionality against many (if not all) of the database adapters that live external to this repository. I've called this the "adapter zone" :) It's conceivable that we'd want to move tests, originally written for one adapter plugin, into the "adapter zone" if we realize that the underlying behavior is similar on another adapter, and we want to guarantee that it works the same, without needing to copy-paste-edit the code for it.

So, we want to bundle up those reusable "adapter zone" test cases, and make them available as a standalone Python package.

Where should the code for those reused test cases live?

There are two options:

dbt-core repository
Another repository

There are pros and cons to both approaches. We should do our best to list them all out and assess. I think we can also tackle this question from two angles: philosophical and practical.

Philosophically: Do these test cases represent a totally separate concern from dbt-core? Does there exist a solid interface between the two, that we want to formalize with separate repositories? A few thoughts:

The "adapter interface" lives within dbt-core (Python package) today — dbt.adapters module — it's tightly coupled code
dbt-core and dbt-tests-adapter don't represent separate pieces of software, as much as they represent code needing tests and tests for that code. That's true of all test types today (unit, functional-Postgres-only, and "adapter zone").
We do want these test cases to be fairly stable between minor versions, and non-breaking between patch releases, so that adapter plugin maintainers can treat the dbt-tests-adapter dependency relatively similar to how they treat dbt-core. It can be helpful to enforce that non-breakiness by eating our own dog food. And we will: We'll be reliant on dbt-tests-adapter as "external maintainers" of dbt-redshift, dbt-snowflake, ...

Practically, here are the main pain points to consider:

Releasing a Python package that lives in a subdirectory is more difficult than releasing a top-level package in a standalone repository. More importantly, having two different release processes presents difficulty. dbt-postgres makes this a necessity today, but our ultimate aim is to split out dbt-postgres as a top-level package in a standalone repository.
- Does this get any simpler if we separate out our release processes, so that we separately release dbt-core vs. dbt-postgres vs. dbt-tests-adapter? That way, we can patch each of these independently (separate workflows), and coordinate those workflows for minor version releases. We'd just need GitHub releases/tags to include the package name.
- The dbt-tests-adapter release process really just needs to build distribution and upload to PyPi. No Homebrew, Docker, ...
Every time we add dbt-core functionality that warrants test cases on external (non-Postgres) adapters — which isn't every week, but also is common enough, especially for community contributors — it would require coordinating PRs across multiple repositories, rather than an "all-in-one" PR, just to get the initial thing merged. (It will always require a separate PR in each external adapter plugin, to extend the functionality and inherit/run the new test cases.)
- Coordinating the dbt-core repo PR and the dbt-tests-adapter repo would require something like: point one PR branch to install from the other, get tests passing, merge one, point the other back to main, and merge the other. This is tricky, though not impossible.

For the time being, while we're assessing our options:

Let's keep these test cases in the dbt-core repository
Let's not push up to PyPi just yet, and ask folks to keep installing as pip install "git+https://github.com/dbt-labs/dbt-core.git#egg=dbt-tests-adapter&subdirectory=tests/adapter"

iknox-fa · 2022-03-28T18:27:13Z

I largely agree with the characterization you've put forth @jtcohen6 but I'd like to present a slightly different way to think of it.

The current way to describe the sql support of dbt is something like "it works if there's an adapter for it". Which is a fine abstraction for end users and compatibility questions, but when it comes to testing it leaves a lot un DRYed because each database has a very large amount of overlap in functionality.

The way to DRY these tests up is to determine the largest subset of SQL that is covered by all official adapters. This subset can then be used to create a test suite that any database adapter can use to ensure basic functionality. Functionality beyond this SQL subset would be the responsibility of the adapter itself to test. Most sql subsets/dialects have names and years to provide versioning like ANSI-SQL 1999, so perhaps we could follow suit with dbt-sql-common-subset 2022?

If this approach is taken, I think the structure falls into place pretty easily. dbt-core, dbt-<db-name>, and dbt-tests-adapter can all live in their own repos. Any dbt-<db-name> adapters can import dbt-tests-adapter to test itself against dbt-sql-subset 2022 regardless of wether if it's internal or external.

Over time the dbt-test-adapter repo can become a place to house various versions of the tests and associated schemas that define the dbt-core/dbt-<db-name> interface.

jtcohen6 · 2022-03-29T10:43:45Z

@iknox-fa Appreciate that perspective! I'm aligned with many parts of your vision. At the risk of waxing even more philosophical...

From my point of view, a lot of the "very large amount of overlap in functionality" is already captured and implemented within dbt-core:

Adapters interface, which defines constructs like schemas, relations, quoting behavior, etc
Global project, which defines the default__ versions of macros (create_schema, create_table_as, etc)

(We want to capture even more of that overlap, by moving some of the common reusable building blocks from dbt-utils. Those aren't strictly necessary for baseline dbt operation, but they're very very useful building blocks, for people writing packages with higher-level functionality: #4813)

We've run into issues before by assuming these are a larger subset than they are, and over the past few years we've needed to significantly expand the scope of where an adapter plugin can reasonably intervene / reimeplement / override / disable that presumed subset of behavior. There is actually a tremendous amount of difference between the support offered by ANSI SQL, Hive SQL, T-SQL, and other "dialects" (are they really even dialects?). Some analytical data platforms have lacked full SQL support for accessing metadata, managing objects (DDL/DML), operating permissions (DCL); each has required hitting other APIs.

What do we say if the mountains will not come to the Prophet? In all cases, the right answer has been: Let's find a way to get dbt working on this data platform, by defining a clear-yet-flexible interface for all the things that might differ across databases. It's the role of the adapter creator / data platform expert to match up the square-ish pegs and the round-ish holes as best as they can.

What we've said historically, and continue to say today: The more your database functions like the accepted standard bits of Postgres/PostgreSQL, the easier it will be to write your adapter, and the easier it will be for users of your adapter to get dbt ecosystem packages "for free" (no shimming/reimplementing of low-level macros required). The more your data platform differs from a Postgres-like database, the more your work is cut out for you—but it's still possible.

Counting lines of code is a highly imperfect proxy for complexity. Still, I think these differences can be instructive:

dbt-postgres contains 364 lines of Python + 336 lines of Jinja-SQL
dbt-redshift contains 298 lines of Python + 534 lines of Jinja-SQL, because it adopts almost all its functionality from the standard behavior (default + Postgres)
dbt-snowflake contains 709 lines of Python (mostly dealing with wonky connection options) + 571 lines of Jinja-SQL
dbt-spark (full of idiosyncrasies) contains 1306 lines of Python + 721 lines of Jinja-SQL
dbt-bigquery (arguably the most different) contains 1760 lines of Python + 780 lines of Jinja-SQL

(A lot of that Jinja-SQL is still repeated unnecessarily, and we should always be on the lookout for ways to move more of that logic into the defaults, and enable more modular overrides when small things differ. As a tiny example: #4488)

Here's where the rubber hits the road, for this issue: We make changes to the "default" interface / SQL support within dbt-core, often by changing macros in the global project. That's what we did in both #4618 + #4858. We need to be able to create new test cases for that functionality:

While developing the feature, and as a precondition for merging the dbt-core PR
In a way that we can easy extend, to run the same test case in other adapter plugins. If the test passes, we're good! If it fails, the test case may need adjusting, the default implementation could be more generic, or that database's SQL support differs in from our "default" SQL, and we'll need to extend the feature accordingly.

To avoid adding stumbling blocks into our development/contribution process, I'm inclined to think that we should be defining the test cases in the same codebase where we define the default implementations/interface that they are testing. Today, that "default adapter" lives in the dbt-core Python package. Could I see us eventually splitting off the entire "default" adapter (pieces from dbt.adapters + include/global_project) into a separate Python package, splitting it off into a separate repository, and moving dbt.tests.adapter there too? Maybe... though even then, they'd still want to be separate packages, right?

github-actions bot changed the title ~~Package new adapters tests~~ [CT-313] Package new adapters tests Mar 2, 2022

leahwicz assigned gshank Mar 2, 2022

jtcohen6 added repo ci/cd Testing and continuous integration for dbt-core + adapter plugins adapter_plugins Issues relating to third-party adapter plugins labels Mar 22, 2022

jtcohen6 mentioned this issue Mar 22, 2022

New framework for testing adapters dbt-labs/docs.getdbt.com#1256

Closed

1 task

gshank mentioned this issue Mar 24, 2022

Create a dbt.tests.adapter release when releasing dbt and postgres #4948

Merged

4 tasks

jtcohen6 mentioned this issue Mar 28, 2022

Deprecate this package in favor of new framework dbt-labs/dbt-adapter-tests#36

Open

gshank closed this as completed in #4948 Mar 29, 2022

This was referenced Mar 30, 2022

[CT-440] Adapter Test Framework Polish #4978

Closed

[CT-441] ADR on final Adapter Testing Framework strategy #4979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-313] Package new adapters tests #4812

[CT-313] Package new adapters tests #4812

leahwicz commented Mar 2, 2022

jtcohen6 commented Mar 21, 2022 •

edited

Loading

jtcohen6 commented Mar 25, 2022 •

edited

Loading

iknox-fa commented Mar 28, 2022 •

edited

Loading

jtcohen6 commented Mar 29, 2022 •

edited

Loading

[CT-313] Package new adapters tests #4812

[CT-313] Package new adapters tests #4812

Comments

leahwicz commented Mar 2, 2022

jtcohen6 commented Mar 21, 2022 • edited Loading

Requirements / user stories

My instincts

What's included in the package?

jtcohen6 commented Mar 25, 2022 • edited Loading

Interface boundary: test utilities vs. test cases

Where should the code for those reused test cases live?

iknox-fa commented Mar 28, 2022 • edited Loading

jtcohen6 commented Mar 29, 2022 • edited Loading

jtcohen6 commented Mar 21, 2022 •

edited

Loading

jtcohen6 commented Mar 25, 2022 •

edited

Loading

iknox-fa commented Mar 28, 2022 •

edited

Loading

jtcohen6 commented Mar 29, 2022 •

edited

Loading