Update config parsing to optionally return issues instead of raising exceptions #159

QMalcolm · 2022-07-14T21:19:29Z

The follower are examples prior to this PR and with this PR for configs which have two issues. 1) A metric without a name 2) a time dimension missing type params. Note that prior to this PR only one of the issues gets caught (thats because its the first one hit), whereas both are identified with the updates from this PR. I admit that the JSON schema validations aren't the best still, but they are a lot easier to look at than before.

Previously

poetry run mf validate-configs
(To see warnings and future-errors, run again with flag `--show-all`)
✔ 🎉 Successfully linted config YAML files (FATALS: 0, ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
⠙ Building model from configs
ERROR: An error occurred when attempting to build the semantic model
Log file: /Users/tqm/.metricflow/logs/metricflow.log
Traceback (most recent call last):
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/model/parsing/dir_to_model.py", line 141, in parse_config_yaml
    validate_config_structure(config_yaml)
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/model/parsing/validation.py", line 65, in validate_config_structure
    raise exceptions.ValidationError("\n".join(errors))
jsonschema.exceptions.ValidationError: {'name': 'ds', 'type': 'time', 'expr': 'CAST(created_date as DATETIME)'} is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['dimensions']['items']:
    {'$id': 'dimension_schema',
     'additionalProperties': False,
     'anyOf': [{'not': {'$ref': '#/definitions/is-time-dimension'}},
               {'required': ['type_params']}],
     'definitions': {'is-time-dimension': {'properties': {'type': {'enum': ['TIME',
                                                                            'time']}},
                                           'required': ['type']}},
     'properties': {'description': {'type': 'string'},
                    'display_name': {'type': 'string'},
                    'expr': {'type': ['string', 'boolean']},
                    'is_partition': {'type': 'boolean'},
                    'name': {'pattern': '(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$',
                             'type': 'string'},
                    'owners': {'items': {'type': 'string'},
                               'type': 'array'},
                    'tier': {'type': ['string', 'integer']},
                    'type': {'enum': ['CATEGORICAL',
                                      'TIME',
                                      'categorical',
                                      'time']},
                    'type_params': {'$ref': 'dimension_type_params_schema'}},
     'required': ['name', 'type'],
     'type': 'object'}

On instance['dimensions'][0]:
    {'expr': 'CAST(created_date as DATETIME)', 'name': 'ds', 'type': 'time'}
'name' is a required property

Failed validating 'required' in schema:
    {'$id': 'metric',
     'additionalProperties': False,
     'properties': {'constraint': {'type': 'string'},
                    'description': {'type': 'string'},
                    'display_name': {'type': 'string'},
                    'locked_metadata': {'$ref': 'locked_metadata'},
                    'name': {'pattern': '(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$',
                             'type': 'string'},
                    'owners': {'items': {'type': 'string'},
                               'type': 'array'},
                    'tier': {'type': ['string', 'integer']},
                    'type': {'enum': ['MEASURE_PROXY',
                                      'RATIO',
                                      'EXPR',
                                      'CUMULATIVE',
                                      'measure_proxy',
                                      'ratio',
                                      'expr',
                                      'cumulative']},
                    'type_params': {'$ref': 'metric_type_params'},
                    'where_constraint': {'type': 'string'}},
     'required': ['name', 'type', 'type_params'],
     'type': 'object'}

On instance:
    {'constraint': "stage_name = '6 - Closed Won' \n",
     'owners': ['support@transformdata.io'],
     'type': 'measure_proxy',
     'type_params': {'measures': ['opps']}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/engine/utils.py", line 23, in build_user_configured_model_from_config
    model = parse_directory_of_yaml_files_to_model(models_path).model
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/model/parsing/dir_to_model.py", line 74, in parse_directory_of_yaml_files_to_model
    model = parse_yaml_files_to_model(yaml_config_files)
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/model/parsing/dir_to_model.py", line 103, in parse_yaml_files_to_model
    objects = parse_config_yaml(  # parse config file
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/model/parsing/dir_to_model.py", line 223, in parse_config_yaml
    raise ParsingException(
metricflow.errors.errors.ParsingException: Failed to parse YAML file '/Users/tqm/Developer/transform-data/metricflow/config-templates/salesforce/opportunities.yaml' - YAML file did not conform to metric spec.
Error: {'name': 'ds', 'type': 'time', 'expr': 'CAST(created_date as DATETIME)'} is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['dimensions']['items']:
    {'$id': 'dimension_schema',
     'additionalProperties': False,
     'anyOf': [{'not': {'$ref': '#/definitions/is-time-dimension'}},
               {'required': ['type_params']}],
     'definitions': {'is-time-dimension': {'properties': {'type': {'enum': ['TIME',
                                                                            'time']}},
                                           'required': ['type']}},
     'properties': {'description': {'type': 'string'},
                    'display_name': {'type': 'string'},
                    'expr': {'type': ['string', 'boolean']},
                    'is_partition': {'type': 'boolean'},
                    'name': {'pattern': '(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$',
                             'type': 'string'},
                    'owners': {'items': {'type': 'string'},
                               'type': 'array'},
                    'tier': {'type': ['string', 'integer']},
                    'type': {'enum': ['CATEGORICAL',
                                      'TIME',
                                      'categorical',
                                      'time']},
                    'type_params': {'$ref': 'dimension_type_params_schema'}},
     'required': ['name', 'type'],
     'type': 'object'}

On instance['dimensions'][0]:
    {'expr': 'CAST(created_date as DATETIME)', 'name': 'ds', 'type': 'time'}
'name' is a required property

Failed validating 'required' in schema:
    {'$id': 'metric',
     'additionalProperties': False,
     'properties': {'constraint': {'type': 'string'},
                    'description': {'type': 'string'},
                    'display_name': {'type': 'string'},
                    'locked_metadata': {'$ref': 'locked_metadata'},
                    'name': {'pattern': '(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$',
                             'type': 'string'},
                    'owners': {'items': {'type': 'string'},
                               'type': 'array'},
                    'tier': {'type': ['string', 'integer']},
                    'type': {'enum': ['MEASURE_PROXY',
                                      'RATIO',
                                      'EXPR',
                                      'CUMULATIVE',
                                      'measure_proxy',
                                      'ratio',
                                      'expr',
                                      'cumulative']},
                    'type_params': {'$ref': 'metric_type_params'},
                    'where_constraint': {'type': 'string'}},
     'required': ['name', 'type', 'type_params'],
     'type': 'object'}

On instance:
    {'constraint': "stage_name = '6 - Closed Won' \n",
     'owners': ['support@transformdata.io'],
     'type': 'measure_proxy',
     'type_params': {'measures': ['opps']}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/cli/utils.py", line 198, in wrapper
    func(*args, **kwargs)
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/telemetry/reporter.py", line 149, in wrapped
    return func(*args, **kwargs)
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/cli/main.py", line 721, in validate_configs
    user_model = cfg.user_configured_model
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/cli/cli_context.py", line 111, in user_configured_model
    self._user_configured_model = build_user_configured_model_from_config(self.config)
  File "/Users/tqm/Developer/transform-data/metricflow/metricflow/engine/utils.py", line 27, in build_user_configured_model_from_config
    raise ModelCreationException from e
metricflow.errors.errors.ModelCreationException: An error occurred when attempting to build the semantic model

now

$ poetry run mf validate-configs    
(To see warnings and future-errors, run again with flag `--show-all`)
✔ 🎉 Successfully linted config YAML files (FATALS: 0, ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
✖ Breaking issues found when building model from configs (FATALS: 0, ERRORS: 2, FUTURE_ERRORS: 0, WARNINGS: 0)
• ERROR: in file `/Users/tqm/Developer/transform-data/metricflow/config-templates/salesforce/opportunities.yaml` on line #1 - YAML document did not conform to metric spec.
Error: {'name': 'ds', 'type': 'time', 'expr': 'CAST(created_date as DATETIME)', '__parsing_context__': <metricflow.model.parsing.yaml_loader.ParsingContext object at 0x120578490>} is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['dimensions']['items']:
    {'$id': 'dimension_schema',
     'additionalProperties': False,
     'anyOf': [{'not': {'$ref': '#/definitions/is-time-dimension'}},
               {'required': ['type_params']}],
     'definitions': {'is-time-dimension': {'properties': {'type': {'enum': ['TIME',
                                                                            'time']}},
                                           'required': ['type']}},
     'properties': {'__parsing_context__': {},
                    'description': {'type': 'string'},
                    'display_name': {'type': 'string'},
                    'expr': {'type': ['string', 'boolean']},
                    'is_partition': {'type': 'boolean'},
                    'name': {'pattern': '(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$',
                             'type': 'string'},
                    'owners': {'items': {'type': 'string'},
                               'type': 'array'},
                    'tier': {'type': ['string', 'integer']},
                    'type': {'enum': ['CATEGORICAL',
                                      'TIME',
                                      'categorical',
                                      'time']},
                    'type_params': {'$ref': 'dimension_type_params_schema'}},
     'required': ['name', 'type'],
     'type': 'object'}

On instance['dimensions'][0]:
    {'__parsing_context__': <metricflow.model.parsing.yaml_loader.ParsingContext object at 0x120578490>,
     'expr': 'CAST(created_date as DATETIME)',
     'name': 'ds',
     'type': 'time'}
• ERROR: in file `/Users/tqm/Developer/transform-data/metricflow/config-templates/salesforce/opportunities.yaml` on line #62 - YAML document did not conform to metric spec.
Error: 'name' is a required property

Failed validating 'required' in schema:
    {'$id': 'metric',
     'additionalProperties': False,
     'properties': {'__parsing_context__': {},
                    'constraint': {'type': 'string'},
                    'description': {'type': 'string'},
                    'display_name': {'type': 'string'},
                    'locked_metadata': {'$ref': 'locked_metadata'},
                    'name': {'pattern': '(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$',
                             'type': 'string'},
                    'owners': {'items': {'type': 'string'},
                               'type': 'array'},
                    'tier': {'type': ['string', 'integer']},
                    'type': {'enum': ['MEASURE_PROXY',
                                      'RATIO',
                                      'EXPR',
                                      'CUMULATIVE',
                                      'measure_proxy',
                                      'ratio',
                                      'expr',
                                      'cumulative']},
                    'type_params': {'$ref': 'metric_type_params'},
                    'where_constraint': {'type': 'string'}},
     'required': ['name', 'type', 'type_params'],
     'type': 'object'}

On instance:
    {'__parsing_context__': <metricflow.model.parsing.yaml_loader.ParsingContext object at 0x12057a430>,
     'constraint': "stage_name = '6 - Closed Won' \n",
     'owners': ['support@transformdata.io'],
     'type': 'measure_proxy',
     'type_params': {'__parsing_context__': <metricflow.model.parsing.yaml_loader.ParsingContext object at 0x12057a4f0>,
                     'measures': ['opps']}}

The method of 'validate_config_structure' was duplicating a lot of (almost all of) the logic of 'parse_config_yaml'. This commit merges the "extra" stuff 'validate_config_structure' was doing (validating the yaml conformed to the jsonschema spec). This in turn means that: we load the yaml documents one less time, we iterate over the yaml documents one less time, and we reduce the number of duplicate errors being produced, and reduce code complexity by keeping things more DRY.

tlento

Any thoughts on stack trace preservation for ease of debugging? str(Exception) is just str(Exception.args).

tlento · 2022-07-15T17:33:00Z