[CT-2848] [Bug] `dbt show` adding decimal places to non-decimal values #8153

dave-connors-3 · 2023-07-19T21:04:56Z

Is this a regression in a recent version of dbt-core?

I believe this is a regression in dbt-core functionality
I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

dbt show is adding decimal places to numeric/integer values incorrectly. This is causing problems for users in the dbt Cloud IDE, as the IDE uses that command to render the preview tab.

Expected/Previous Behavior

Previously (we think!) this was not an issue, and decimal places were properly handled by dbt show

Steps To Reproduce

use dbt-snowflake>=1.5.0
create a model in your project:

# in my_model.sql
select
  cast(1 as numeric) as _numeric,
  cast(1 as integer) as _integer,
  cast(1 as decimal) as _decimal

run dbt show -s my_model --output JSON
get this!

❯ dbt show -s my_model --output json
20:33:15  Running with dbt=1.5.3
20:33:16  Registered adapter: snowflake=1.5.2
20:33:16  Unable to do partial parsing because of a version mismatch
20:33:17  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 436 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
20:33:17  
20:33:18  Concurrency: 8 threads (target='dev')
20:33:18  
20:33:19  {
  "node": "my_model",
  "show": [
    {
      "_NUMERIC": 1.0,
      "_INTEGER": 1.0,
      "_DECIMAL": 1.0
    }
  ]
}

Relevant log output

No response

Environment

- OS: mac
- Python: Python 3.9.16
- dbt (working version): unknown
- dbt (regression version): 1.5.3

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

The text was updated successfully, but these errors were encountered:

dave-connors-3 · 2023-07-19T21:17:49Z

did some more local testing on other versions of 1.5 and observed the same behavior - -this may not be a regression, but just a regular bug!

jtcohen6 · 2023-07-20T19:15:06Z

Copying from #8167 (great minds think alike @dave-connors-3):

What's going on?

We're just using the agate library's type system and JSON serialization:

dbt-core/core/dbt/task/show.py

Lines 78 to 83 in eeb0570

    
           table = result.agate_table 
        
           # Hack to get Agate table output as string 
        
           output = io.StringIO() 
        
           if self.args.output == "json": 
        
               table.to_json(path=output)

This is being returned and saved as a decimal.Decimal type, rather than a Python int — and the decimal.Decimal type serializes to JSON number (1.0) instead of JSON integer (1)

ipdb> table
<agate.table.Table object at 0x1078d1360>
ipdb> [list(row) for row in table]
[['0005', 'false', Decimal('10'), False, '01T00000aabbccdd'], ['0006', 'true', Decimal('20'), True, '01T00000aabbccde']]
ipdb> table.print_json()
[{"stringnum": "0005", "stringbool": "false", "intfield": 10.0, "boolfield": false, "dtstring": "01T00000aabbccdd"}, {"stringnum": "0006", "stringbool": "true", "intfield": 20.0, "boolfield": true, "dtstring": "01T00000aabbccde"}]
ipdb> type(table[0][2])
<class 'decimal.Decimal'>

graciegoheen · 2023-07-20T21:17:42Z

Thanks to 2 great minds for opening this up!

It does look like this was an issue with prior versions of the IDE (before the dbt show command existed), so it's not a regression.

That being said, we definitely want to enable users to be able to spot check their data using dbt show / preview in the IDE.

booster-analytics-team · 2023-07-25T14:57:15Z

Experiencing the same issue here, running dbt version 1.5.0. Following for updates.

jtcohen6 · 2023-08-02T20:09:05Z

I think we might be able to fix this with a couple of additional overrides in our custom Number type, which reimplements agate's built-in one:

class Number(agate.data_types.Number):
    # undo the change in https://github.com/wireservice/agate/pull/733
    # i.e. do not cast True and False to numeric 1 and 0
    def cast(self, d):
        if type(d) == bool:
            raise agate.exceptions.CastError("Do not cast True to 1 or False to 0.")
        # preserve integers as native Python int
        elif type(d) == int:
            return d
        else:
            return super().cast(d)
            
    def jsonify(self, d):
        if d is None:
            return d
        # do not cast integers to floats
        elif type(d) == int:
            return d

        return float(d)

$ dbt show -s test --output json
20:08:09  Running with dbt=1.7.0-a1
20:08:09  Registered adapter: postgres=1.7.0-a1
20:08:09  Found 1 seed, 1 model, 0 sources, 0 exposures, 0 metrics, 352 macros, 0 groups, 0 semantic models
20:08:09
20:08:09  Concurrency: 5 threads (target='dev')
20:08:09
20:08:09  {
  "node": "test",
  "show": [
    {
      "_numeric": 1.0,
      "_integer": 1,
      "_decimal": 1.0
    }
  ]
}

jtcohen6 · 2023-08-06T15:05:22Z

@dave-connors-3 Okay, the failing CI in #8306 reminded me of what was missing with my suggestion above. While this fixes the JSON output, it now raises an error while trying to do normal pretty-printed output, because agate expects all its Number data types to be the Python type decimal.Decimal — here where it calls max_precision, which then calls normalize.

Approach 1: During JSON serialization, if it can be an int, make it an int

In this approach, we allow the Number type to always return a Decimal, and never a true Python int, so that Decimal-specific helper methods can be called later on.

Only during JSON serialization, we convert all numbers with 0-scale (no digits after the decimal) to integers instead of floats. (Unfortunately, the agate library seems to call this "precision" instead of "scale"...). This yields a similar result to what agate does when pretty-printing markdown tables: 1.0 just becomes 1, regardless of type.

Downside: This would remove the distinction between integer and decimal/numeric type, in the case where someone is saying 1.00::numeric and wants to keep seeing those zeroes.

class Number(agate.data_types.Number):
    # undo the change in https://github.com/wireservice/agate/pull/733
    # i.e. do not cast True and False to numeric 1 and 0
    def cast(self, d) -> Decimal:
        if type(d) == bool:
            raise agate.exceptions.CastError("Do not cast True to 1 or False to 0.")
        else:
            return super().cast(d)

    def jsonify(self, d) -> Union[int, float]:
        if d is None:
            return d
        # do not cast integers to floats
        elif d == int(d):
        # fancier way of doing the same: "if agate.utils.max_precision([d]) == 0|
            return int(d)
        else:
            return float(d)

$ dbt show -s test --output json
14:49:17  Running with dbt=1.7.0-a1
14:49:17  Registered adapter: postgres=1.7.0-a1
14:49:17  Found 1 model, 0 sources, 0 exposures, 0 metrics, 352 macros, 0 groups, 0 semantic models
14:49:17
14:49:17  Concurrency: 5 threads (target='dev')
14:49:17
14:49:17  {
  "node": "test",
  "show": [
    {
      "_numeric": 1,
      "_integer": 1,
      "_decimal": 1
    }
  ]
}

It still works for decimals that have nonzero digits after the decimal place, but we're dropping any trailing zeroes. (Remember "significant digits"? Remember high-school science class?)

$ dbt show --output json --inline 'select 1 as a, 1.0 as b, 1.1 as c, 1.10 as d'
15:08:34  Running with dbt=1.7.0-a1
15:08:34  Registered adapter: postgres=1.7.0-a1
15:08:34  Found 1 model, 0 sources, 0 exposures, 0 metrics, 352 macros, 0 groups, 0 semantic models
15:08:34
15:08:34  Concurrency: 5 threads (target='dev')
15:08:34
15:08:34  {
  "show": [
    {
      "a": 1,
      "b": 1,
      "c": 1.1,
      "d": 1.1
    }
  ]
}

Approach 2: Integer != Number

Rather than changing the Number type, we add a new data type Integer, which expects to return a Python int rather than a decimal.Decimal type:

# we do not want integers being coerced to decimals in JSON output
class Integer(agate.data_types.DataType):
    def cast(self, d) -> int:
        if type(d) == int:
            return d
        else:
            raise CastError('Can not parse value "%s" as Integer.' % d)

    def jsonify(self, d) -> int:
        return d


class Number(agate.data_types.Number):
    # undo the change in https://github.com/wireservice/agate/pull/733
    # i.e. do not cast True and False to numeric 1 and 0
    def cast(self, d) -> decimal.Decimal:
        if type(d) == bool:
            raise agate.exceptions.CastError("Do not cast True to 1 or False to 0.")
        else:
            return super().cast(d)

Then, within our type tester further down, we need to try casting to Integer before Number:

def build_type_tester(
    text_columns: Iterable[str], string_null_values: Optional[Iterable[str]] = ("null", "")
) -> agate.TypeTester:

    types = [
        Integer(null_values=("null", "")),
        Number(null_values=("null", "")),
...

Because Integer isn't an instance of agate.data_types.Number, agate won't try to call max_precision/normalize on it here while formatting the markdown table.

This does preserve the distinction between integers and decimals/numerics, although it doesn't preserve the exact scale of the original result set:

$ psql
psql (14.8 (Homebrew), server 13.11)
Type "help" for help.

jerco=# select
  cast(1 as numeric(38,3)) as _numeric,
  cast(1 as integer) as _integer,
  cast(1 as decimal(38,2)) as _decimal
;
 _numeric | _integer | _decimal
----------+----------+----------
    1.000 |        1 |     1.00
(1 row)

$ dbt show -s test --output json
14:58:28  Running with dbt=1.7.0-a1
14:58:28  Registered adapter: postgres=1.7.0-a1
14:58:28  Unable to do partial parsing because config vars, config profile, or config target have changed
14:58:28  Unable to do partial parsing because a project dependency has been added
14:58:28  Found 1 model, 0 sources, 0 exposures, 0 metrics, 352 macros, 0 groups, 0 semantic models
14:58:28
14:58:28  Concurrency: 5 threads (target='dev')
14:58:28
14:58:28  {
  "node": "test",
  "show": [
    {
      "_numeric": 1.0,
      "_integer": 1,
      "_decimal": 1.0
    }
  ]
}

On its face, this feels like the More Correct approach. HOWEVER!! There is a risk of a very subtle breaking change here. Because we expose agate types to end users, in the form of run_query results — if someone has been in the habit of returning integers, and then using Decimal-type methods against those returned values—methods such as normalize, or anything else from this list)—they would start seeing errors like, "AttributeError: 'int' object has no attribute ''`

-- models/test.sql
{% set my_query %}

select
  cast(1 as numeric(38,3)) as _numeric,
  cast(1 as integer) as _integer,
  cast(1 as decimal(38,2)) as _decimal

{% endset %}

{% set results = run_query(my_query) %}

{% for row in results %}

  {{ row[0].is_zero() }}
  {{ row[1].is_zero() }}
  {{ row[2].is_zero() }}
  
{% endfor %}

$ dbt show -s test
...
Runtime Error
  Compilation Error in model test (models/test.sql)
    'int object' has no attribute 'is_zero'. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".

We've never explicitly documented that you can do those things, but it would be a subtle breaking change nevertheless. We don't encounter this issue with Approach 1, because the values remain Decimal type all the way through, until the very moment of serializing to JSON.

greg-mckeon · 2023-08-07T21:59:16Z

@jtcohen6 it seems like option 2 is better here, with us documenting the breaking part of the change and only backporting to 1.6. Does that seem reasonable?

jtcohen6 · 2023-08-07T22:00:46Z

Yup, that's what I'm thinking too!

graciegoheen · 2023-08-23T18:40:42Z

Re-opening - we had to revert this fix due to a number of failing tests and an error when running docs generate in the IDE

dave-connors-3 · 2023-08-30T14:54:24Z

when running with the associated change in the IDE, i executed a dbt docs generate and got the following error:

16:54:21 Running with dbt=1.6.1-rc1
16:54:22 Registered adapter: snowflake=1.6.1
16:54:22 Found 20 models, 32 tests, 14 sources, 0 exposures, 0 metrics, 486 macros, 0 groups, 0 semantic models
16:54:24 Concurrency: 4 threads (target='default')
16:54:27 Building catalog
16:54:33 Encountered an error:
Runtime Error
  Tables contain columns with the same names (stats:row_count:value), but different types (<dbt.clients.agate_helper.Number object at 0x7fa6e98b7040> vs <dbt.clients.agate_helper.Integer object at 0x7fa6e9a375e0>)
16:54:21 Runtime Error
  Tables contain columns with the same names (stats:row_count:value), but different types (<dbt.clients.agate_helper.Number object at 0x7fa6e98b7040> vs <dbt.clients.agate_helper.Integer object at 0x7fa6e9a375e0>)
Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6e2bfe640>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6e0798100>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6e0798370>]}

This error seems to stem from the agate_helper ColumnTypeBuilder class here

dave-connors-3 added bug Something isn't working triage regression labels Jul 19, 2023

github-actions bot changed the title ~~[Regression] dbt show adding decimal places to non-decimal values~~ [CT-2848] [Regression] dbt show adding decimal places to non-decimal values Jul 19, 2023

dave-connors-3 removed the regression label Jul 19, 2023

dave-connors-3 changed the title ~~[CT-2848] [Regression] dbt show adding decimal places to non-decimal values~~ [CT-2848] [Bug] dbt show adding decimal places to non-decimal values Jul 19, 2023

jtcohen6 mentioned this issue Jul 20, 2023

[CT-2853] dbt show --output json converts integers to scale-1 numerics #8167

Closed

graciegoheen self-assigned this Jul 20, 2023

graciegoheen added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed triage labels Jul 20, 2023

graciegoheen removed their assignment Jul 20, 2023

jtcohen6 added this to the v1.6.x milestone Aug 3, 2023

dave-connors-3 mentioned this issue Aug 3, 2023

update Number class to handle integer values #8306

Merged

4 tasks

jtcohen6 added the backport 1.6.latest label Aug 10, 2023

jtcohen6 assigned emmyoop Aug 16, 2023

emmyoop closed this as completed in #8306 Aug 18, 2023

graciegoheen reopened this Aug 23, 2023

QMalcolm mentioned this issue Sep 1, 2023

Fix untyped functions in core/dbt/context/base.py #8525

Merged

4 tasks

emmyoop mentioned this issue Sep 6, 2023

Preserve decimal places for dbt show #8561

Merged

4 tasks

emmyoop closed this as completed in #8561 Sep 7, 2023

emmyoop added the Impact: Adapters label Sep 7, 2023

emmyoop mentioned this issue Sep 11, 2023

[CT-3103] [BACKPORT] [Bug] dbt show adding decimal places to non-decimal values #8611

Closed

1 task

gshank mentioned this issue Oct 26, 2023

[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

Closed

2 tasks

gshank mentioned this issue Nov 6, 2023

Support new agate Integer data_type in adapter code #9004

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-2848] [Bug] `dbt show` adding decimal places to non-decimal values #8153

[CT-2848] [Bug] `dbt show` adding decimal places to non-decimal values #8153

dave-connors-3 commented Jul 19, 2023 •

edited by dbeatty10

Loading

dave-connors-3 commented Jul 19, 2023

jtcohen6 commented Jul 20, 2023

graciegoheen commented Jul 20, 2023

booster-analytics-team commented Jul 25, 2023

jtcohen6 commented Aug 2, 2023

jtcohen6 commented Aug 6, 2023 •

edited

Loading

greg-mckeon commented Aug 7, 2023

jtcohen6 commented Aug 7, 2023

graciegoheen commented Aug 23, 2023

dave-connors-3 commented Aug 30, 2023

[CT-2848] [Bug] dbt show adding decimal places to non-decimal values #8153

[CT-2848] [Bug] dbt show adding decimal places to non-decimal values #8153

Comments

dave-connors-3 commented Jul 19, 2023 • edited by dbeatty10 Loading

Is this a regression in a recent version of dbt-core?

Current Behavior

Expected/Previous Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

dave-connors-3 commented Jul 19, 2023

jtcohen6 commented Jul 20, 2023

What's going on?

graciegoheen commented Jul 20, 2023

booster-analytics-team commented Jul 25, 2023

jtcohen6 commented Aug 2, 2023

jtcohen6 commented Aug 6, 2023 • edited Loading

Approach 1: During JSON serialization, if it can be an int, make it an int

Approach 2: Integer != Number

greg-mckeon commented Aug 7, 2023

jtcohen6 commented Aug 7, 2023

graciegoheen commented Aug 23, 2023

dave-connors-3 commented Aug 30, 2023

[CT-2848] [Bug] `dbt show` adding decimal places to non-decimal values #8153

[CT-2848] [Bug] `dbt show` adding decimal places to non-decimal values #8153

dave-connors-3 commented Jul 19, 2023 •

edited by dbeatty10

Loading

jtcohen6 commented Aug 6, 2023 •

edited

Loading