Skip to content

[SPARK-29914][ML][FOLLOWUP] CountVectorizer del big attribute array#26767

Closed
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:cv_del_attr
Closed

[SPARK-29914][ML][FOLLOWUP] CountVectorizer del big attribute array#26767
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:cv_del_attr

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

CountVectorizer del big attribute array

Why are the changes needed?

Previous discussion is here

vocabulary is a big number, for example 1 << 18 by default. We will keep a big attribute array here.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuites

@SparkQA
Copy link

SparkQA commented Dec 5, 2019

Test build #114900 has finished for PR 26767 at commit a7876a0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

ping @viirya
I am going to remove the large attribute array in CountVectorizer, but it will revert #20313
How do you think about it?

@viirya
Copy link
Member

viirya commented Dec 6, 2019

oh, I see. This is long time ago.

Going to remove attributes from CountVectorModel will make some transformer fail. I think it will be a regression.

@zhengruifeng
Copy link
Contributor Author

OK, I will close this PR

@zhengruifeng zhengruifeng deleted the cv_del_attr branch December 6, 2019 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants