Skip to content
This repository has been archived by the owner on Sep 20, 2022. It is now read-only.

[HIVEMALL-127] Added tree_predict_v1 UDF for RandomForest backward compatibility #102

Closed
wants to merge 6 commits into from

Conversation

myui
Copy link
Member

@myui myui commented Jul 18, 2017

What changes were proposed in this pull request?

Added tree_predict_v1 UDF for RandomForest backward compatibility

What type of PR is it?

Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-127

How was this patch tested?

unit tests, manual tests

How to use this feature?

set hivevar:classification=true;

create table predicted
as
SELECT
  rowid,
  rf_ensemble(predicted) as predicted
FROM (
  SELECT
    rowid, 
    tree_predict_v1(p.model_id, p.model_type, p.pred_model, t.features, ${classification}) as predicted
  FROM
    model p
    LEFT OUTER JOIN -- CROSS JOIN
    training t
) t1
group by
  rowid;

Remaining tasks

  • Did you apply source code formatter, i.e., mvn formatter:format, for your commit?
  • Need to fix rf_ensemble() to accept the old style argument
  • Is the feature documented?
  • Run manual test?

@myui myui changed the title [HIVEMALL-127][Improvement] Added tree_predict_v1 UDF for RandomForest backward compatibility [HIVEMALL-127] Added tree_predict_v1 UDF for RandomForest backward compatibility Jul 18, 2017
@coveralls
Copy link

coveralls commented Jul 18, 2017

Coverage Status

Coverage increased (+0.2%) to 40.971% when pulling 7f3ca6b on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 41.137% when pulling 39334d7 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@coveralls
Copy link

coveralls commented Jul 19, 2017

Coverage Status

Coverage increased (+0.09%) to 40.832% when pulling 39334d7 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@coveralls
Copy link

coveralls commented Jul 19, 2017

Coverage Status

Coverage increased (+0.09%) to 40.832% when pulling 1fbc6d1 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@myui
Copy link
Member Author

myui commented Jul 19, 2017

Need to fix the following error:

create table predicted_rf_old
as
SELECT 
  passengerid,
  predicted.label,
  predicted.probability,
  predicted.probabilities
FROM (
  SELECT
    passengerid,
    rf_ensemble(predicted) as predicted
    -- hivemall v0.5-rc.1 or later
    -- rf_ensemble(predicted.value, predicted.posteriori, model_weight) as predicted
    -- rf_ensemble(predicted.value, predicted.posteriori) as predicted -- avoid OOB accuracy (i.e., model_weight)
  FROM (
    SELECT
      t.passengerid, 
      -- hivemall v0.4.1-alpha.2 or before
      -- tree_predict(p.model, t.features, ${classification}) as predicted
      -- hivemall v0.4.1-alpha.3 or later
      -- tree_predict(p.model_id, p.model_type, p.pred_model, t.features, ${classification}) as predicted
      -- hivemall v0.5-rc.1 or later
      -- p.model_weight,
      -- tree_predict(p.model_id, p.model, t.features, ${classification}) as predicted
      tree_predict_v1(p.model_id, p.model_type, p.pred_model, t.features, ${classification}) as predicted -- to use the old model in v0.5-rc.1 or later
    FROM (
      SELECT 
        -- model_id, model
        -- hivemall v0.4.1-alpha.3 or later
        model_id, -3 as model_type, model as pred_model
        -- hivemall v0.5-rc.1 or later
        -- model_id, model_weight, model
      FROM 
        model_rf 
      DISTRIBUTE BY rand(1)
    ) p
    LEFT OUTER JOIN test_rf t
  ) t1
  group by
    passengerid
) t2
;
Caused by: java.lang.NullPointerException
        at hivemall.smile.tools.RandomForestEnsembleUDAF$RfEvaluatorV1.merge(RandomForestEnsembleUDAF.java:186)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:191)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:619)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:794)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:700)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:768)
        ... 28 more

@coveralls
Copy link

coveralls commented Jul 19, 2017

Coverage Status

Coverage increased (+0.09%) to 40.832% when pulling 2dd6d00 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 40.829% when pulling 2dd6d00 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 40.831% when pulling 2dd6d00 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

1 similar comment
@coveralls
Copy link

coveralls commented Jul 19, 2017

Coverage Status

Coverage increased (+0.08%) to 40.831% when pulling 2dd6d00 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@coveralls
Copy link

coveralls commented Jul 19, 2017

Coverage Status

Coverage increased (+0.08%) to 40.831% when pulling 2dd6d00 on myui:HIVEMALL-127 into 11bd1f8 on apache:master.

@asfgit asfgit closed this in 0737e23 Jul 20, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants