Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Jun 17, 2019

What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count.

This takes over #24814

How was this patch tested?

Existing tests.

@srowen srowen self-assigned this Jun 17, 2019
@srowen srowen mentioned this pull request Jun 17, 2019
min2i = pos2
pos2 += 1
}
assert(count(min1i) < Long.MaxValue)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may remove these asserts before we commit; just a double check

@SparkQA
Copy link

SparkQA commented Jun 17, 2019

Test build #106590 has finished for PR 24893 at commit 8d74927.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

srowen added a commit that referenced this pull request Jun 19, 2019
## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count.

This takes over #24814

## How was this patch tested?

Existing tests.

Closes #24893 from srowen/SPARK-28081.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit e96dd82)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
srowen added a commit that referenced this pull request Jun 19, 2019
## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count.

This takes over #24814

## How was this patch tested?

Existing tests.

Closes #24893 from srowen/SPARK-28081.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit e96dd82)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
@srowen
Copy link
Member Author

srowen commented Jun 19, 2019

Merged to master/2.4/2.3

@srowen srowen closed this in e96dd82 Jun 19, 2019
@srowen srowen deleted the SPARK-28081 branch June 19, 2019 13:55
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count.

This takes over apache#24814

## How was this patch tested?

Existing tests.

Closes apache#24893 from srowen/SPARK-28081.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit e96dd82)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count.

This takes over apache#24814

## How was this patch tested?

Existing tests.

Closes apache#24893 from srowen/SPARK-28081.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit e96dd82)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
igreenfield pushed a commit to axiomsl/spark that referenced this pull request Nov 4, 2019
## What changes were proposed in this pull request?

The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count.

This takes over apache#24814

## How was this patch tested?

Existing tests.

Closes apache#24893 from srowen/SPARK-28081.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit e96dd82)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants