Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add name validation for metrics #5841

Merged
merged 4 commits into from
Sep 21, 2022
Merged

Conversation

emmyoop
Copy link
Member

@emmyoop emmyoop commented Sep 14, 2022

resolves #5456

Description

Adds name validation rules:

  • Metric name characters must be alphanumeric or underscore
  • Metric name must begin with an letter (this allows us to get around the BQ naming prefixes which all begin with _ )
  • Metric names must be 126 characters or less (Redshift constraint. BQ and Snowflake are longer)

Adding exposure name validation in #5844

Checklist

@cla-bot cla-bot bot added the cla:yes label Sep 14, 2022
@emmyoop emmyoop changed the title add tests, add name validation Add name validation for exposures and metrics Sep 14, 2022
@emmyoop emmyoop force-pushed the er/ct-828-validate-metric-exp-names branch from b200b4c to a429cff Compare September 14, 2022 19:44
@emmyoop emmyoop changed the title Add name validation for exposures and metrics Add name validation for metrics Sep 14, 2022
@emmyoop emmyoop marked this pull request as ready for review September 14, 2022 20:07
@emmyoop emmyoop requested review from a team as code owners September 14, 2022 20:07
Comment on lines 488 to 502
if "name" in data:
errors = []
if " " in data["name"]:
errors.append("cannot contain spaces")
if len(data["name"]) > 126:
errors.append("cannot contain more than 126 characters")
if not (re.match(r"^[A-Za-z]", data["name"])):
errors.append("must begin with a letter")
if not (re.match(r"[\w-]+$", data["name"])):
errors.append("must contain only letters, numbers and underscores")

if errors:
raise ParsingException(
f"Metrics name '{data['name']}' is invalid. It {', '.join(e for e in errors)}"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@callum-mcdata would love some help with error messaging here! I went with concatenating all the errors together but open to other solutions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the framework here! I'd only tweak it slightly to The metric name as opposed to Metrics name. Apart from that I think it is does a great job of describing the error and I LOVE that it provides all the errors in one failed compile instead of one at a time like I might have novicely implemented it

@emmyoop emmyoop requested review from callum-mcdata and gshank and removed request for Fleid and nathaniel-may September 14, 2022 20:09
@callum-mcdata
Copy link
Contributor

Ah - I should have updated this issue after I did some more testing but I unfortunately put the information I found in dbt-labs/dbt_metrics#41. The TLDR is that it appears that Postgres/Redshift appear to be handling column name truncation under the hood and so failing queries due to too long metric names only appeared to occur in BigQuery and Snowflake.

Should we consider changing the length limit to 250 (lower of BQ/Snowflake) ?

@emmyoop
Copy link
Member Author

emmyoop commented Sep 15, 2022

Sounds reasonable to me @callum-mcdata. That's a simple change. I'll defer to @jtcohen6 for the final decision on length.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 16, 2022

250 chars feels fine — If you can tweet it, you can (probably) make it into a metric

One question for you @callum-mcdata: Given that Postgres/Redshift truncate any column/CTE name that's too long, is there any risk for queries containing multiple metrics, where those metrics have the same first 126 characters? (You could also handle that in dbt-metrics by truncating + hashing yourself.)

@callum-mcdata
Copy link
Contributor

@jtcohen6 huh. Well .... I hadn't thought about that. Is it a fair answer to punt on that and say we'll solve that problem when we get to it? 😅 to me figuring out the pattern to account for that feels like we're optimizing for an edge case.

If that problem did arise then I think your idea around fixing it in the package makes sense - it's a usage pattern more than it is a definition pattern so allowing that kind of naming convention should be supported in core.

@emmyoop emmyoop merged commit a6c37c9 into main Sep 21, 2022
@emmyoop emmyoop deleted the er/ct-828-validate-metric-exp-names branch September 21, 2022 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants