Skip to content

Conversation

@jkbradley
Copy link
Member

What changes were proposed in this pull request?

StringIndexerModel.transform sets the output column metadata to use name inputCol. It should not. Fixing this causes a problem with the metadata produced by RFormula.

Fix in RFormula: I added the StringIndexer columns to prefixesToRewrite, and I modified VectorAttributeRewriter to find and replace all "prefixes" since attributes collect multiple prefixes from StringIndexer + Interaction.

Note that "prefixes" is no longer accurate since internal strings may be replaced.

How was this patch tested?

Unit test which failed before this fix.


val metadata = NominalAttribute.defaultAttr
.withName($(inputCol)).withValues(labels).toMetadata()
.withName($(outputCol)).withValues(labels).toMetadata()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the bug

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54210 has finished for PR 11965 at commit 36bf730.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Mar 25, 2016

LGTM. Merged into master. I left the JIRA open for 1.6.

@asfgit asfgit closed this in 54d13be Mar 25, 2016
jkbradley added a commit to jkbradley/spark that referenced this pull request Apr 22, 2016
## What changes were proposed in this pull request?

StringIndexerModel.transform sets the output column metadata to use name inputCol.  It should not.  Fixing this causes a problem with the metadata produced by RFormula.

Fix in RFormula: I added the StringIndexer columns to prefixesToRewrite, and I modified VectorAttributeRewriter to find and replace all "prefixes" since attributes collect multiple prefixes from StringIndexer + Interaction.

Note that "prefixes" is no longer accurate since internal strings may be replaced.

## How was this patch tested?

Unit test which failed before this fix.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#11965 from jkbradley/StringIndexer-fix.
@jkbradley jkbradley deleted the StringIndexer-fix branch April 22, 2016 02:21
asfgit pushed a commit that referenced this pull request Apr 26, 2016
…ula - 1.6 backport

Backport of [#11965] for branch-1.6.
There were no merge conflicts.

## What changes were proposed in this pull request?

StringIndexerModel.transform sets the output column metadata to use name inputCol.  It should not.  Fixing this causes a problem with the metadata produced by RFormula.

Fix in RFormula: I added the StringIndexer columns to prefixesToRewrite, and I modified VectorAttributeRewriter to find and replace all "prefixes" since attributes collect multiple prefixes from StringIndexer + Interaction.

Note that "prefixes" is no longer accurate since internal strings may be replaced.

## How was this patch tested?

Unit test which failed before this fix.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #12595 from jkbradley/StringIndexer-fix-1.6.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 27, 2016
…ula - 1.6 backport

Backport of [apache#11965] for branch-1.6.
There were no merge conflicts.

## What changes were proposed in this pull request?

StringIndexerModel.transform sets the output column metadata to use name inputCol.  It should not.  Fixing this causes a problem with the metadata produced by RFormula.

Fix in RFormula: I added the StringIndexer columns to prefixesToRewrite, and I modified VectorAttributeRewriter to find and replace all "prefixes" since attributes collect multiple prefixes from StringIndexer + Interaction.

Note that "prefixes" is no longer accurate since internal strings may be replaced.

## How was this patch tested?

Unit test which failed before this fix.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#12595 from jkbradley/StringIndexer-fix-1.6.

(cherry picked from commit 496496b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants