Skip to content
This repository has been archived by the owner on Sep 20, 2022. It is now read-only.

[HIVEMALL-109][HIVEMALL-112] Fix topic model and tokenize UDFs #83

Closed
wants to merge 9 commits into from

Conversation

takuti
Copy link
Member

@takuti takuti commented Jun 2, 2017

What changes were proposed in this pull request?

#82

  • Topic model: train_plsa and train_lda
    • Fix bugs caused by multi-byte input
    • Fix wrong recordBytes calculation for iteration utilizing file IO
    • Refactor and update unit tests accordingly
  • tokenize()
    • Support NULL input; the UDF simply returns NULL itself

What type of PR is it?

Bug Fix

What is the Jira issue?

How was this patch tested?

  • Unit tests
  • Manual tests on EMR

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 39.089% when pulling dda9724 on takuti:fix-topicmodel into 10e7d45 on apache:master.

@takuti
Copy link
Member Author

takuti commented Jun 2, 2017

^ CI failed due to inappropriate hyper-parameters in unit tests. Will fix sooner.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 39.096% when pulling 1d2bfc8 on takuti:fix-topicmodel into 10e7d45 on apache:master.

1 similar comment
@coveralls
Copy link

coveralls commented Jun 5, 2017

Coverage Status

Coverage increased (+0.4%) to 39.096% when pulling 1d2bfc8 on takuti:fix-topicmodel into 10e7d45 on apache:master.

@@ -554,6 +546,21 @@ protected void forwardModel() throws HiveException {
*/

@VisibleForTesting
public void closeWithoutModelReset() throws HiveException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be default scope for VisibleForTesting.

@@ -452,7 +447,7 @@ protected final void runIterativeTraining(@Nonnegative final int iterations)
}
while (remain >= SizeOf.INT) {
int pos = buf.position();
int recordBytes = buf.getInt();
int recordBytes = buf.getInt() - SizeOf.INT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- SizeOf.INT is required?

@asfgit asfgit closed this in 92e85ad Jun 6, 2017
asfgit pushed a commit that referenced this pull request Jun 6, 2017
@myui
Copy link
Member

myui commented Jun 6, 2017

Merged with some modifications. Thanks!

@takuti takuti deleted the fix-topicmodel branch June 6, 2017 07:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants