Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) #9816

Merged
merged 18 commits into from
Dec 13, 2023

Conversation

winningsix
Copy link
Collaborator

@winningsix winningsix commented Nov 21, 2023

This fixes #9815, #9606 and #10006

TODOs:

  • Support ImprovedTimestamp path
  • Fix mis-matched CPU/GPU execution result

@winningsix winningsix changed the title [WIP ] Support Unix_timestamp with non-UTC timezones (non-DST) [WIP] Support Unix_timestamp with non-UTC timezones (non-DST) Nov 21, 2023
@winningsix winningsix changed the title [WIP] Support Unix_timestamp with non-UTC timezones (non-DST) [WIP] Support Unix_timestamp with non-UTC timezones (non-DST) [skip ci] Nov 21, 2023
@@ -262,6 +262,13 @@ def test_unsupported_fallback_to_unix_timestamp(data_gen):
"to_unix_timestamp(a, b)"),
"ToUnixTimestamp")

# TODO: has another test for this called test_unix_timestamp
@pytest.mark.parametrize('data_gen', date_n_time_gens, ids=idfn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test string gen.
Please refer to the Product plan doc, it requires String data type.

@winningsix winningsix changed the title [WIP] Support Unix_timestamp with non-UTC timezones (non-DST) [skip ci] [WIP] Support Unix_timestamp with non-UTC timezones (non-DST) Nov 22, 2023
@sameerz sameerz added the feature request New feature or request label Nov 28, 2023
@winningsix winningsix changed the base branch from branch-23.12 to branch-24.02 November 28, 2023 01:58
@NVnavkumar
Copy link
Collaborator

Also closes #9606

@winningsix winningsix marked this pull request as ready for review December 8, 2023 13:09
@winningsix winningsix changed the title [WIP] Support Unix_timestamp with non-UTC timezones (non-DST) Support Unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) Dec 8, 2023
@winningsix winningsix changed the title Support Unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) Support nix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) Dec 8, 2023
@winningsix winningsix changed the title Support nix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) Support unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) Dec 8, 2023
@winningsix
Copy link
Collaborator Author

build

@sperlingxx
Copy link
Collaborator

LGTM

failOnError)
}
case _: DateType =>
timeZoneId match {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use zoneId defined in TimeZoneAwareExpression

@transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still timeZoneId to match non case but use it later.

@@ -892,8 +909,20 @@ abstract class GpuToTimestampImproved extends GpuToTimestamp {
failOnError)
}
} else if (lhs.dataType() == DateType){
lhs.getBase.asTimestampSeconds()
} else { // Timestamp
timeZoneId match {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use zoneId defined in TimeZoneAwareExpression

@transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

def test_unix_timestamp(data_gen):
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, data_gen).select(f.unix_timestamp(f.col('a'))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add cases for not supported Time Zone.

e.g.:

@pytest.mark.skipif(not is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported")
def test_from_unixtime(data_gen, date_format):

@pytest.mark.skipif(is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported")
def test_from_unixtime_fall_back(data_gen, date_format):

And test TZs locally:
export TZ=Iran
export TZ= America/Los_Angeles

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@winningsix
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator

LGTM

res-life
res-life previously approved these changes Dec 12, 2023
@NVnavkumar NVnavkumar linked an issue Dec 13, 2023 that may be closed by this pull request
@winningsix
Copy link
Collaborator Author

build

Copy link
Collaborator

@NVnavkumar NVnavkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Eventually, need to do cleanup based on NVIDIA/spark-rapids-jni#1643

@NVnavkumar NVnavkumar merged commit 798682a into NVIDIA:branch-24.02 Dec 13, 2023
37 of 38 checks passed
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this just got merged, but we need to go back and fix the tests. I am fine if it is a fast follow on, otherwise we should revert it and do it right. The ProjectExec allowed to not be on the GPU in all test cases is not acceptable.

@allow_non_gpu(*non_utc_allow)
def test_string_to_unix_timestamp(data_gen, date_form, ansi_enabled):
@allow_non_gpu('ProjectExec')
def test_string_to_unix_timestamp(data_gen, date_form, ansi_enabled, non_utc_timezone_enabled):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter is not used and nonUTC.enabled is going away.

@pytest.mark.parametrize('data_gen,date_form', str_date_and_format_gen, ids=idfn)
@allow_non_gpu(*non_utc_allow)
def test_string_to_unix_timestamp(data_gen, date_form, ansi_enabled):
@allow_non_gpu('ProjectExec')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to have a hard coded ProjectExec that does not know if we should fall back to the CPU or not!.

@pytest.mark.parametrize('data_gen,date_form', str_date_and_format_gen, ids=idfn)
@allow_non_gpu(*non_utc_allow)
def test_string_unix_timestamp(data_gen, date_form, ansi_enabled):
@allow_non_gpu('ProjectExec')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
6 participants