Skip to content

Conversation

@mihailomilosevic2001
Copy link
Contributor

@mihailomilosevic2001 mihailomilosevic2001 commented Oct 21, 2024

What changes were proposed in this pull request?

Addition of two new expression try_make_interval.

Why are the changes needed?

This is a split PR from #48499 so that we divide the reasonings for PRs.

Does this PR introduce any user-facing change?

Yes, new expressions added.

How was this patch tested?

Tests added.

Was this patch authored or co-authored using generative AI tooling?

No.

ceiling.__doc__ = pysparkfuncs.ceiling.__doc__


def try_conv(col: "ColumnOrName", fromBase: int, toBase: int) -> Column:
Copy link
Member

@HyukjinKwon HyukjinKwon Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test at test_functions.py so the tests can be shared between Spark Classic and Spark Connect.

@mihailomilosevic2001 mihailomilosevic2001 changed the title [SPARK-50055][SQL] Add TryConv and TryMakeInterval alternatives [WIP][SPARK-50055][SQL] Add TryConv and TryMakeInterval alternatives Oct 23, 2024
@mihailomilosevic2001 mihailomilosevic2001 marked this pull request as draft October 23, 2024 08:01
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it is good idea to mix two new expressions in one PR especially when they are unrelated. I believe it is better to open separate PRs per expressions.

@mihailomilosevic2001 mihailomilosevic2001 changed the title [WIP][SPARK-50055][SQL] Add TryConv and TryMakeInterval alternatives [SPARK-50055][SQL] Add TryMakeInterval alternative Oct 28, 2024
…ryConv-TryMakeInterval

# Conflicts:
#	docs/sql-ref-ansi-compliance.md
@mihailomilosevic2001 mihailomilosevic2001 marked this pull request as ready for review October 28, 2024 09:01
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihailom-db Could you resolve conflicts and fix the test, it seems it is related to your changes:

ERROR [0.217s]: test_try_make_interval (pyspark.sql.tests.test_functions.FunctionsTests.test_try_make_interval)

Comment on lines 3716 to 3722
years: Optional["ColumnOrName"] = None,
months: Optional["ColumnOrName"] = None,
weeks: Optional["ColumnOrName"] = None,
days: Optional["ColumnOrName"] = None,
hours: Optional["ColumnOrName"] = None,
mins: Optional["ColumnOrName"] = None,
secs: Optional["ColumnOrName"] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix indentations here.

Comment on lines 21157 to 21163
years: Optional["ColumnOrName"] = None,
months: Optional["ColumnOrName"] = None,
weeks: Optional["ColumnOrName"] = None,
days: Optional["ColumnOrName"] = None,
hours: Optional["ColumnOrName"] = None,
mins: Optional["ColumnOrName"] = None,
secs: Optional["ColumnOrName"] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentations

|100 years |
+-----------------------------------------+

Example 8: Try make interval.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example which shows the difference between try and non-try versions.

* @group url_funcs
* @since 4.0.0
*/
def try_make_interval(): Column =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this one? In which cases does make_interval() fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we cannot get an error here, as this is the same as makeinterval(0) which is not an overflow. Will update the pr.

…eInterval

# Conflicts:
#	docs/sql-ref-ansi-compliance.md
#	python/docs/source/reference/pyspark.sql/functions.rst
#	sql/core/src/test/resources/sql-functions/sql-expression-schema.md
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

@mihailomilosevic2001
Copy link
Contributor Author

@MaxGekk @HyukjinKwon Do we want to not support try_make_interval in pyspark? This PR seems to block it #46975. I believe we should update that make_interval is not supported as well, as CalendarIntervalType is not supported.

@HyukjinKwon
Copy link
Member

It's fine to have that expression. They can still perform computation with the type. It's just that the ser/de to Python isn't supported.

@mihailomilosevic2001
Copy link
Contributor Author

Should I just then remove it from python API? As we cannot really call it there.

@HyukjinKwon
Copy link
Member

hm, we can call them, no? It's just that we don't support df.collect() on CalendarIntervalType instances.

@HyukjinKwon
Copy link
Member

e.g., you can call df.select(make_interval(...)).show()

@mihailomilosevic2001
Copy link
Contributor Author

Aha, ok, so I should only change the test to use some other type of collection of data that is not collect() and then verify the results.

@HyukjinKwon
Copy link
Member

Yeah, i think so.

@mihailomilosevic2001
Copy link
Contributor Author

mihailomilosevic2001 commented Nov 7, 2024

Ready for merge now @MaxGekk @HyukjinKwon

@MaxGekk
Copy link
Member

MaxGekk commented Nov 7, 2024

+1, LGTM. Merging to master.
Thank you, @mihailom-db and @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in 9858ab6 Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants