Skip to content
This repository has been archived by the owner on Sep 20, 2022. It is now read-only.

[HIVEMALL-109][HIVEMALL-112] Fix topic model and tokenize UDFs #83

Closed
wants to merge 9 commits into from

Conversation

takuti
Copy link
Member

@takuti takuti commented Jun 2, 2017

What changes were proposed in this pull request?

#82

  • Topic model: train_plsa and train_lda
    • Fix bugs caused by multi-byte input
    • Fix wrong recordBytes calculation for iteration utilizing file IO
    • Refactor and update unit tests accordingly
  • tokenize()
    • Support NULL input; the UDF simply returns NULL itself

What type of PR is it?

Bug Fix

What is the Jira issue?

How was this patch tested?

  • Unit tests
  • Manual tests on EMR

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 39.089% when pulling dda9724 on takuti:fix-topicmodel into 10e7d45 on apache:master.

@takuti
Copy link
Member Author

takuti commented Jun 2, 2017

^ CI failed due to inappropriate hyper-parameters in unit tests. Will fix sooner.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 39.096% when pulling 1d2bfc8 on takuti:fix-topicmodel into 10e7d45 on apache:master.

1 similar comment
@coveralls
Copy link

coveralls commented Jun 5, 2017

Coverage Status

Coverage increased (+0.4%) to 39.096% when pulling 1d2bfc8 on takuti:fix-topicmodel into 10e7d45 on apache:master.

@@ -554,6 +546,21 @@ protected void forwardModel() throws HiveException {
*/

@VisibleForTesting
public void closeWithoutModelReset() throws HiveException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be default scope for VisibleForTesting.

@@ -452,7 +447,7 @@ protected final void runIterativeTraining(@Nonnegative final int iterations)
}
while (remain >= SizeOf.INT) {
int pos = buf.position();
int recordBytes = buf.getInt();
int recordBytes = buf.getInt() - SizeOf.INT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- SizeOf.INT is required?

@asfgit asfgit closed this in 92e85ad Jun 6, 2017
asfgit pushed a commit that referenced this pull request Jun 6, 2017
@myui
Copy link
Member

myui commented Jun 6, 2017

Merged with some modifications. Thanks!

@takuti takuti deleted the fix-topicmodel branch June 6, 2017 07:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants