Skip to content
This repository has been archived by the owner on Sep 20, 2022. It is now read-only.

[HIVEMALL-65] Update define-all.spark and import-packages.spark #44

Closed
wants to merge 4 commits into from
Closed

[HIVEMALL-65] Update define-all.spark and import-packages.spark #44

wants to merge 4 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Feb 12, 2017

What changes were proposed in this pull request?

Make define-all.spark keep correspondence with define-all.hive.

What type of PR is it?

Bug Fix

What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-65

How was this patch tested?

manual tests

@coveralls
Copy link

coveralls commented Feb 12, 2017

Coverage Status

Coverage remained the same at 35.814% when pulling 2f874f8 on wangyum:HIVEMALL-65 into 6b462ae on apache:master.

@maropu
Copy link
Member

maropu commented Feb 13, 2017

@amaya382 Could you review this?

@amaya382
Copy link
Member

@maropu Sure

@wangyum
Copy link
Member Author

wangyum commented Feb 13, 2017

I'm not sure whether these functions should add to define-all.spark:
https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all.hive#L467-L483

@maropu
Copy link
Member

maropu commented Feb 13, 2017

Aha, can you replace RowIdUDF with RowIdUDFWrapper?: https://github.com/apache/incubator-hivemall/blob/master/spark/spark-common/src/main/java/hivemall/tools/mapred/RowIdUDFWrapper.java
And then, you can remove all the entries there except for rowid.

@wangyum
Copy link
Member Author

wangyum commented Feb 13, 2017

OK,
hivemall.smile.regression.RandomForestRegressionUDTF create 2 functions. define-all.spark also add them?
https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all.hive#L607-L611

@maropu
Copy link
Member

maropu commented Feb 13, 2017

yea, please.

@myui
Copy link
Member

myui commented Feb 13, 2017

@maropu You can revise taskid generation scheme using taskAttemptId or something.
https://github.com/apache/spark/blob/17ce0b5b3f6a825fc77458bc8608cece1a6019c7/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L76

You need to use Java Reflaction to avoid Spark imports.

@wangyum
Copy link
Member Author

wangyum commented Feb 13, 2017

OK, all done

@maropu
Copy link
Member

maropu commented Feb 13, 2017

@myui No, you can't access TaskContext inside HiveUDF.

@myui
Copy link
Member

myui commented Feb 13, 2017

@maropu You can use Reflections if the values are stored in ThreadLocal.

@maropu
Copy link
Member

maropu commented Feb 13, 2017

Aha, I see. I'll make a JIRA

@maropu
Copy link
Member

maropu commented Feb 13, 2017

@coveralls
Copy link

coveralls commented Feb 13, 2017

Coverage Status

Coverage remained the same at 35.814% when pulling c805768 on wangyum:HIVEMALL-65 into 6b462ae on apache:master.

sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS popcnt")
sqlContext.sql("CREATE TEMPORARY FUNCTION popcnt AS 'hivemall.knn.distance.PopcountUDF'")

sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS kld")
sqlContext.sql("CREATE TEMPORARY FUNCTION kld AS 'hivemall.knn.distance.KLDivergenceUDF'")

sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS homming_distance")
Copy link
Member

@myui myui Feb 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

@wangyum
Copy link
Member Author

wangyum commented Feb 13, 2017

@maropu , @myui I have sent an email to dev@hivemall.incubator.apache.org. have you received it ?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 36.112% when pulling 827a1a8 on wangyum:HIVEMALL-65 into 6b462ae on apache:master.

@coveralls
Copy link

coveralls commented Feb 13, 2017

Coverage Status

Coverage remained the same at 35.814% when pulling 827a1a8 on wangyum:HIVEMALL-65 into 6b462ae on apache:master.

@myui
Copy link
Member

myui commented Feb 13, 2017

LGTM.

@maropu merge this one if @amaya382 says fine.

@maropu
Copy link
Member

maropu commented Feb 13, 2017

okay

@amaya382
Copy link
Member

LGTM
Notice: We also need revise import-packages.spark in [HIVEMALL-65]. @wangyum Can you do it? (or postpone to other PR)

@myui
Copy link
Member

myui commented Feb 13, 2017

@amaya382 what kind of imports are missing in import-packages.spark ? (cc: @maropu )

@amaya382
Copy link
Member

amaya382 commented Feb 13, 2017

@myui it seems some imports (HmLabeledPoint and HmFeature) should be removed and some ones should be revised to correct package name, but i haven't checked it well yet... maropu should know about it in detail

Change org.apache.spark.sql.hive.XGBoostOptions to hivemall.xgboost.XGBoostOptions.
@coveralls
Copy link

coveralls commented Feb 13, 2017

Coverage Status

Coverage remained the same at 35.814% when pulling 1a48d98 on wangyum:HIVEMALL-65 into 6b462ae on apache:master.

@myui
Copy link
Member

myui commented Feb 14, 2017

@maropu Could you merge this PR? import-packages.spark should be another ticket.

@wangyum
Copy link
Member Author

wangyum commented Feb 14, 2017

import-packages.spark already updated:
1a48d98

@wangyum wangyum changed the title [HIVEMALL-65] Update define-all.spark [HIVEMALL-65] Update define-all.spark and import-packages.spark Feb 14, 2017
@maropu
Copy link
Member

maropu commented Feb 14, 2017

LGTM. I'll merged later

myui pushed a commit to myui/incubator-hivemall that referenced this pull request Feb 15, 2017
@myui
Copy link
Member

myui commented Feb 15, 2017

@wangyum Merged. Thank you for your contribution!

@asfgit asfgit closed this in e158f58 Feb 15, 2017
@wangyum wangyum deleted the HIVEMALL-65 branch February 15, 2017 16:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants