Skip to content

Conversation

@sandeep-katta
Copy link
Contributor

@sandeep-katta sandeep-katta commented Aug 14, 2019

What changes were proposed in this pull request?

create or replace function X as 'Y' using jar Z; does not work if the X is already present and is created with some other jar lets say xyz.

As per current implementation spark calls alter Function API of Hive, as of now Hive only alter name, owner, class name, type but not resource URI. After calling alter function only X and Y are updated but not Z.

So when the select is performed on X UDF it throws class not found exception.

Observation 1: Temporary function does not have this problem as it is handled by spark logic

Observation 2: For permanent function Spark calls the Hive to alter function , as of now Hive only alter name, owner, class name, type but not resource URI.

[SparkCode]
image

[Hive Code]
image

As per Hive Documentation it does not supports create or replace command for function

How was this patch tested?

Added UT and also manually tested

Before Fix:
image

After Fix:
image

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, @sandeep-katta . Thank you for your contribution.

""".stripMargin)
val cnt =
sql("SELECT customAdd(2, 'A', 10, date '2015-01-01', 'B', 20, date '2016-01-01')").count()
assert(cnt === 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about checking actual answers instread of the row count?

}
}
}
test("SPARK-28710: Replace permanent function ") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be in HiveUDFSuite?

// Handles `CREATE OR REPLACE FUNCTION AS ... USING ...`
if (replace && catalog.functionExists(func.identifier)) {
// alter the function in the metastore
catalog.alterFunction(func)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing CreateFunctionCommand, should we change the behavior at HiveExternalCatalog.alterFunction? If this is for Hive only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveExternalCatalog is for Hive only, currently alter function of Hive does not update the resource

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya do I need to handle any other cases for this PR ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think @viirya's point is valid. Isn't the root cause that HiveExternalCatalog.alterFunction does not handle?

If this is for Hive only?

I think he meant the behaviours at InMemoryCatalog.alterFunction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, yes this problem is only for HiveExternalCatalog, I will update the code and push

@sandeep-katta
Copy link
Contributor Author

I left a few comments, @sandeep-katta . Thank you for your contribution.

I have fixed the comments

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110697 has finished for PR 25452 at commit f9ba185.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sandeep-katta
Copy link
Contributor Author

retest this please

1 similar comment
@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110861 has finished for PR 25452 at commit f9ba185.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 18, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110879 has finished for PR 25452 at commit f9ba185.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Remove jar and move testcase to HiveUDFSuite. Also added check for change in the resource
@SparkQA
Copy link

SparkQA commented Dec 16, 2019

Test build #115391 has finished for PR 25452 at commit ebf7868.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

client.listFunctions(db, pattern)
}

def isResourceChanged(db: String, newFunction: CatalogFunction): Boolean = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

Comment on lines 1289 to 1290
case true => client.dropFunction(db, functionName)
client.createFunction(db, newDefinition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this style looks uncommon in Spark. We have it usually:

isResourceChanged(db, newDefinition) match {
case true =>
  client.dropFunction(db, functionName)
  client.createFunction(db, newDefinition)
...
}

Comment on lines +1326 to +1327
val changedResources = newFunction.resources.filter(
newResource => !oldFunction.resources.contains(newResource))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will case sensitivity be problem in this comparison? Shall we lower case before comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be case sensitive, cz in case of unix based system /opt/some.jar is different from /OPT/SOME.jar. Isn't it ?

client.createFunction(db, newDefinition)
// replace the function in the metastore if there is no change in the resource
case _ =>
client.alterFunction(db, newDefinition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Hive DDL, looks like it does not have "replace" in create function:

CREATE FUNCTION [db_name.]function_name AS class_name
  [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];

Maybe it is why this API does not do resource replacement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as of now Hive only alter name, owner, class name, type but not resource URI. There is no replace DDL in Hive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also initially in the jira, I proposed 2 solutions for this bug

solution 1: throw Unsupported Error for permanent function
solution 2: instead of alter function, do drop and create

@SparkQA
Copy link

SparkQA commented Dec 16, 2019

Test build #115390 has finished for PR 25452 at commit 05f3b7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115427 has finished for PR 25452 at commit 95d1e23.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sandeep-katta
Copy link
Contributor Author

ping @viirya

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 21, 2020
@github-actions github-actions bot closed this Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants