Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-8518][Table API & SQL] Support DOW for EXTRACT #6007

Conversation

snuyanzin
Copy link
Contributor

dow extraction support implemented + tests

Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.

Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.

Contribution Checklist

  • Make sure that the pull request corresponds to a JIRA issue. Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.

  • Name the pull request in the form "[FLINK-XXXX] [component] Title of the pull request", where FLINK-XXXX should be replaced by the actual issue number. Skip component if you are unsure about which is the best component.
    Typo fixes that have no associated JIRA issue should be named following this pattern: [hotfix] [docs] Fix typo in event time introduction or [hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator.

  • Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.

  • Make sure that the change passes the automated tests, i.e., mvn clean verify passes. You can set up Travis CI to do that following this guide.

  • Each pull request should address only one issue, not mix up code from multiple issues.

  • Each commit in the pull request has a meaningful commit message (including the JIRA id)

  • Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.

(The sections below can be removed for hotfixes of typos)

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

  • The TaskInfo is stored in the blob store on job creation time as a persistent artifact
  • Deployments RPC transmits only the blob storage reference
  • TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (100MB)
  • Extended integration test for recovery after master (JobManager) failure
  • Added test that validates that TaskInfo is transferred only once across recoveries
  • Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (not documented)

@walterddr
Copy link
Contributor

Wasn't sure this is the full PR or just partial implementation, but I don't see EPOCH or DECADE in main or test though.

@snuyanzin
Copy link
Contributor Author

@walterddr thank you for your comment
it is just DOW implementation because to implement EPOCH we have to wait for CALCITE-2303 resolution
about DECADE - there are 2 ways: one is also to wait for CALCITE-2303
another one is to do the same changes in org.apache.calcite.avatica.util.DateTimeUtils#julianExtract (Flink has its own version of this class) as proposed in CALCITE-2303 and then do other stuff in Flink

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @snuyanzin for the explanation and contribution!

Hmm ok. in this case can you change the PR title to only address DOW?

@snuyanzin snuyanzin changed the title [FLINK-8518][Table API & SQL] Support DOW, EPOCH, DECADE for EXTRACT [FLINK-8518][Table API & SQL] Support DOW for EXTRACT May 16, 2018
@snuyanzin
Copy link
Contributor Author

yes, done

@snuyanzin snuyanzin closed this May 17, 2018
@snuyanzin snuyanzin deleted the _FLINK-8518]_Table_API_SQL]_Support_DOW,_EPOCH,_DECADE_for_EXTRACT branch May 17, 2018 07:45
@snuyanzin snuyanzin restored the _FLINK-8518]_Table_API_SQL]_Support_DOW,_EPOCH,_DECADE_for_EXTRACT branch May 17, 2018 07:47
@snuyanzin
Copy link
Contributor Author

sorry, closed by mistake, restored

@snuyanzin snuyanzin reopened this May 17, 2018
@twalthr
Copy link
Contributor

twalthr commented May 23, 2018

Thanks for the PR @snuyanzin. I created the issue because it seems that this is more than a one line change. I tested your queries in Postgres SQL and it returns 0 instead of 1. We should have the same semantics as popular database vendors. How is Oracle defining this feature?

@snuyanzin
Copy link
Contributor Author

@twalthr thank you for your comment.
I took a similar ticket in Calcite/Avatica from which the current depends on and I pointed this discrepancy in the comment . To fix i t the class org.apache.calcite.avatica.util.DateTimeUtils should be changed which is presented in both avatica and flink.
At the same time different db's provide a little bit different behavior (please have a look at the links below)
looks like in case of day of week extraction for Oracle, MySql Sunday = 1, for Postgresql Sunday = 0, for Microsoft SQL Server it depends on property set by user
On the other hand there is a standard ISO8601 which also defines weekday, day of years e.g. This is also a part of CALCITE-2303.
So my suggestion is within this ticket provide support for all operations which could come from CALCITE-2303: dow, decade, epoch, isodow, isoyear, microsecond and millisecond
However yes you are right it is required to choose what approach for dayOfWeek to use. IMHO the simplest way is to use whatever Calcite/avatica provides
At the same time here Julian Hyde says that

In short, yes, do whatever PostgreSQL does.

So they would like to align the behavior with Postrgresql

About different db's day of week
So the Postgresql's behavior is described here with Sunday = 0 + it supports ISO8601 via isodow and isoyear
the Oracle's behavior is described here with Sunday = 1, so far have not found info about support of 8601 via extract while it is via to_char/to_date
Microsoft SQL Server allows to set up the first weekday at the same time extraction is done via datepart not extract
MySQL provides weekday with Sunday = 1

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the effort @snuyanzin. I was not aware about the amount of work you put into this issue before opening the PR. I'm fine with sticking to Calcite's behavior. Can you add some documentation to sqlApi.md such that users know about the semantics?

"1")

testSqlApi(
"EXTRACT(DOW FROM f16)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test (maybe in ScalarFunctionsValidationTest?) for checking the behavior on a TIME type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, just have done it for DOW and DOY as well

…port_DOW,_EPOCH,_DECADE_for_EXTRACT

add validation tests for DOY, DOW in ScalarFunctionsValidationTest
@snuyanzin
Copy link
Contributor Author

snuyanzin commented May 24, 2018

@twalthr thank you for your comment but could you please clarify this

Can you add some documentation to sqlApi.md such that users know about the semantics?

  1. Am I right that you mean docs/dev/table/sql.md?
  2. Currently I see the general explanation of extract and reserved words where DOW already specified.
    From this point of view I do not see what could be updated right now. At the same time I have a proposal to go to the similar way as Calcite does. Here there is a link to their functions including date/time. Among extract they also have synonyms e.g.

MONTH(date) | Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.
WEEK(date) | Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.
DAYOFYEAR(date) | Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.

and etc.
So I suggest to introduce the same synonyms in flink (just via usage of existing in Calcite) and organize documentation for them in a similar way

And if you agree then one question: would it be better to create another issue for that or use the same?
I would suggest at least to separate two things:

  1. do whatever it is possible right now
  2. other stuff which depends on issues, fixes and updates from Avatica/Calcite

@twalthr
Copy link
Contributor

twalthr commented May 24, 2018

Hi @snuyanzin,
yes, I meant docs/dev/table/sql.md. Right now it is not documented which units are supported by EXTRACT. Just because DOW is in the reserved keywords section doesn't mean that we support it. I'm fine with having the same schema than Calcite. We can create one or more issues for the docs and the additional synonyms. Issues that need a newer Calcite version can be converted to a subtask of FLINK-9134.

@snuyanzin
Copy link
Contributor Author

snuyanzin commented May 24, 2018

Just because DOW is in the reserved keywords section doesn't mean that we support it.

Definitely agree however my point was that the same situation is for DOY, CENTURY and others...
Now I added information about date/time functions synonyms which currently work in flink. I was surprised but they work without any efforts => I just added more tests related to them and documentation.

@twalthr could you please have a look?

btw: a separate ticket Flink-9432 for the functionality depending on Calcite-2303

@twalthr
Copy link
Contributor

twalthr commented Jun 4, 2018

Sorry, for the delay @snuyanzin. I was on vacation. I will have a look at your update.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @snuyanzin. LGTM will merge...

@asfgit asfgit closed this in f591263 Jun 5, 2018
sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018
@snuyanzin snuyanzin deleted the _FLINK-8518]_Table_API_SQL]_Support_DOW,_EPOCH,_DECADE_for_EXTRACT branch May 5, 2022 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants