Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agate fixes (#999) (#1639) #1920

Merged
merged 2 commits into from Dec 13, 2019
Merged

Conversation

beckjake
Copy link
Contributor

Fixes #999
Closes #1639 (sort of)

"Fix" agate by making its type inference more nuanced:

  • the only date time formats supported are %Y-%m-%d %H:%M:%S and ISO 8601 compatible.
  • the only date format supported is %Y-%m-%d.
  • removed timedelta support as it's pretty much always wrong.
  • When csvs are loaded, all columns that have a column type specified are now loaded as text before they're cast.

Things to think about:

  • should we do even less type inference when reading data from the database? Would it be possible/desirable to do introspection on the database types and use that to determine agate types?

The datetime/date changes mean that agate will no longer try to interpret locale-specific natural language names as dates. This means that tomorrow, Saturday, SAT, and a number of other unexpectedly special values will be treated as strings, which is what I like to imagine any reasonable person would want.

@cla-bot cla-bot bot added the cla:yes label Nov 13, 2019
@beckjake beckjake marked this pull request as ready for review November 20, 2019 20:42
Restrict agate date+datetime formats
remove timedelta
When column types are specified for a seed, parse those columns as Text unconditionally
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an incredibly positive change to how seeding works, and it's a change that I'm excited to make as soon as possible. I fear only that this is a "breaking" change, in that it does significantly modify the behavior present in dbt v0.15.0. That behavior is now much, much saner than it was before, but we do need to think through how we should release this change.

I think we can release this for 0.15.1 if we're able to fully think through the impacts of the breaking changes and determine that user-space failures will be narrow and easily identifiable / fixable. Part of the problem here is that the previous behavior was so bonkers, it's hard to reason about. Is it possible that users had seeded values like SUN which were translated to 0001-01-01, and they would be upset that dbt now represents this value correctly as SUN? I think that's unlikely. I think the more realistic impact of this change is that columns are loaded with different timezones, or as dates instead of timestamps, or something like that.

The alternative and very safe approach is to shelf this change until 0.16.0. That release might be a ways out, and we continue to get users who are confused about this behavior in dbt, so I hesitate to wait that long if we don't feel like we have to.

The third option is of course to make this opt-in for 0.15.1, then make it the default behavior in 0.16.0. It looks to me like that change could actually be tractable here, but it's unclear to me how beneficial that extra work (for both the implementers and the users) will actually be.

What do you think @beckjake?

@beckjake
Copy link
Contributor Author

Making most of this PR opt-in for 0.15.1+ seems pretty straightforward (it's a medium-sized task), but I'm fine with shelving this until 0.16.0.

We could also consider making our current 0.15.2 plans into 0.16.0, where the breaking change is "agate is more reasonable now". Version numbers are cheap.

@drewbanin
Copy link
Contributor

Let's set the target branch for this PR to be dev/barbara-gittings - it should ship in the release following 0.15.1, which will now be 0.16.0. No need to get too clever with opt-in behavior -- the existing behavior is excruciatingly bad, but it doesn't represent a runtime issue or a regression, so we should just wait for our next minor release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants