Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defined dict with malformed result from fromyaml() passed into log() crashes dbt #3464

Closed
barberscott opened this issue Jun 16, 2021 · 2 comments · Fixed by #3687
Closed
Labels
bug Something isn't working rpc Issues related to dbt's RPC server

Comments

@barberscott
Copy link

Describe the bug

dbt crashes attempting to compile when I inadvertently call log() with a non-string - this in turn leads the RPC server to not report an error and, in dbt Cloud, manifests as an infinite spinning compiling status.

[2021-06-16 18:27:42.179948] ERROR: werkzeug: Error on request:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/werkzeug/serving.py", line 323, in run_wsgi
    execute(self.server.app)
  File "/usr/local/lib/python3.8/dist-packages/werkzeug/serving.py", line 312, in execute
    application_iter = app(environ, start_response)
  File "/usr/local/lib/python3.8/dist-packages/werkzeug/middleware/dispatcher.py", line 66, in __call__
    return app(environ, start_response)
  File "/usr/local/lib/python3.8/dist-packages/werkzeug/wrappers/base_request.py", line 238, in application
    resp = f(*args[:-2] + (request,))
  File "/usr/local/lib/python3.8/dist-packages/dbt/task/rpc/server.py", line 156, in handle_jsonrpc_request
    json_data = json.dumps(
  File "/usr/lib/python3.8/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.8/dist-packages/dbt/utils.py", line 332, in default
    return super().default(obj)
  File "/usr/local/lib/python3.8/dist-packages/dbt/utils.py", line 320, in default
    if hasattr(obj, 'to_dict'):
  File "/usr/local/lib/python3.8/dist-packages/dbt/clients/jinja.py", line 441, in __getattr__
    return self.__class__(hint=self.hint, name=self.name)
  File "/usr/local/lib/python3.8/dist-packages/dbt/clients/jinja.py", line 418, in __init__
    super().__init__(hint=hint, name=name)
RecursionError: maximum recursion depth exceeded while calling a Python object

Steps To Reproduce

Case 1: Create a model file with (only):

{%- set yaml_metadata -%}
foo: 'foo_value'
bar:
    -this
        -bad 
            'one':'two'
        -notbad
            'three': 'four'
{%- endset -%}
{% set metadata_dict = fromyaml(yaml_metadata) %}
{{ log(metadata_dict['foo'],true) }}

in the dbt Cloud IDE and save.

Expected behavior

Expect dbt not to crash.

The output of dbt --version:

0.19.1
@barberscott barberscott added bug Something isn't working triage labels Jun 16, 2021
@barberscott barberscott changed the title Malformed YAML passed into log() crashes dbt Defined dict with malformed result from fromyaml() passed into log() crashes dbt Jun 16, 2021
@jtcohen6 jtcohen6 added rpc Issues related to dbt's RPC server and removed triage labels Jun 17, 2021
@jtcohen6
Copy link
Contributor

Thanks for opening @barberscott!

I managed an even simpler reproduction case. Create and save a model file containing just:

{{ log(none['foo']) }}

In response to a status request, the server will return:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or
	there is an error in the application.</p>

With the same stacktrace:

{"timestamp": "2021-06-17T19:52:53.044580Z", "message": "Error on request:\nTraceback (most recent call last):\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/werkzeug/serving.py\", line 323, in run_wsgi\n    execute(self.server.app)\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/werkzeug/serving.py\", line 312, in execute\n    application_iter = app(environ, start_response)\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/werkzeug/middleware/dispatcher.py\", line 66, in __call__\n    return app(environ, start_response)\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/werkzeug/wrappers/base_request.py\", line 238, in application\n    resp = f(*args[:-2] + (request,))\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/dbt/task/rpc/server.py\", line 156, in handle_jsonrpc_request\n    json_data = json.dumps(\n  File \"/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py\", line 234, in dumps\n    return cls(\n  File \"/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py\", line 199, in encode\n    chunks = self.iterencode(o, _one_shot=True)\n  File \"/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py\", line 257, in iterencode\n    return _iterencode(o, 0)\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/dbt/utils.py\", line 332, in default\n    return super().default(obj)\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/dbt/utils.py\", line 320, in default\n    if hasattr(obj, 'to_dict'):\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/dbt/clients/jinja.py\", line 441, in __getattr__\n    return self.__class__(hint=self.hint, name=self.name)\n  File \"/usr/local/Cellar/dbt/0.19.1_1/libexec/lib/python3.8/site-packages/dbt/clients/jinja.py\", line 418, in __init__\n    super().__init__(hint=hint, name=name)\nRecursionError: maximum recursion depth exceeded while calling a Python object", "channel": "werkzeug", "level": 14, "levelname": "ERROR", "thread_name": "Thread-2", "process": 730, "extra": {"stack_info": null, "context": "server", "run_state": "internal"}}

Interestingly, this:

  • Doesn't raise an error in CLI parsing / compilation
  • Doesn't raise an error if the problematic text is passed to the compile_sql RPC method (base64: e3sgbG9nKG5vbmVbJ2ZvbyddKSB9fQ==), rather than being saved in the project

The error is cropping up when trying to create a Jinja2 Undefined object:
https://github.com/fishtown-analytics/dbt/blob/1edb8a60c8867c58e8d6078010aeceb0a6810a92/core/dbt/clients/jinja.py#L415-L425

That's as far as I've gotten. Worthy of further investigation for sure.

@drewbanin
Copy link
Contributor

drewbanin commented Aug 3, 2021

We use a special JSON encoder for json-rpc requests.

The response data is serialized to a JSON string here:
https://github.com/dbt-labs/dbt/blob/159e79ee6ba86c852f4bf067dd0f1b6356135383/core/dbt/task/rpc/server.py#L156-L159

Which uses this custom JSON Encoder:
https://github.com/dbt-labs/dbt/blob/159e79ee6ba86c852f4bf067dd0f1b6356135383/core/dbt/utils.py#L307-L321

This json encoder calls hasattr to see if the object to serialize has a to_dict method. In Python, hasattr calls getattr behind the scenes and checks to see if getattr returns an AttributeError. The implementation of __getattr__ for an instance of a this Undefined class just returns another Undefined object....

https://github.com/dbt-labs/dbt/blob/1edb8a60c8867c58e8d6078010aeceb0a6810a92/core/dbt/clients/jinja.py#L432-L441

So the JSON serializer just repeatedly calls to_dict on each resulting Undefined object until a recursion error arises.

Example:

>>> import dbt.clients.jinja
>>> UndefinedClass = dbt.clients.jinja.create_undefined()
>>> undefined_obj = UndefinedClass()
>>> hasattr(undefined_obj, 'to_dict')
True
>>> undefined_obj.to_dict()
Undefined

So - the fix here is either to:

  • Teach our JSON Encoder about Undefined variables (much like we're already doing for Date/Decimal values)
  • Teach the Undefined class how to respond to to_dict more intelligently? Maybe by raising an AttributeError in __getattr__, or similar?

Might be other options too - open to suggestions :)

jtcohen6 pushed a commit that referenced this issue Aug 10, 2021
* (#3464) Serialize Undefined values to JSON for rpc requests

* Update changelog, fix typo
jtcohen6 pushed a commit that referenced this issue Aug 10, 2021
* (#3464) Serialize Undefined values to JSON for rpc requests

* Update changelog, fix typo
jtcohen6 added a commit that referenced this issue Aug 10, 2021
* (#3464) Serialize Undefined values to JSON for rpc requests

* Update changelog, fix typo

Co-authored-by: Drew Banin <drew@fishtownanalytics.com>
kwigley pushed a commit that referenced this issue Sep 17, 2021
* (#3464) Serialize Undefined values to JSON for rpc requests

* Update changelog, fix typo

Co-authored-by: Drew Banin <drew@fishtownanalytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rpc Issues related to dbt's RPC server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants