Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10050][SPARKR] Support collecting data of MapType in DataFrame. #8711

Closed
wants to merge 6 commits into from

Conversation

sun-rui
Copy link
Contributor

@sun-rui sun-rui commented Sep 11, 2015

  1. Support collecting data of MapType from DataFrame.
  2. Support data of MapType in createDataFrame.

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42309 has finished for PR 8711 at commit 09e2396.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42310 has finished for PR 8711 at commit 74558eb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag](
    • class CoGroupedRDD[K: ClassTag](
    • class ShuffledRDD[K: ClassTag, V: ClassTag, C: ClassTag](

@sun-rui
Copy link
Contributor Author

sun-rui commented Sep 11, 2015

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42335 has finished for PR 8711 at commit 74558eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag](
    • class CoGroupedRDD[K: ClassTag](
    • class ShuffledRDD[K: ClassTag, V: ClassTag, C: ClassTag](

@sun-rui
Copy link
Contributor Author

sun-rui commented Sep 12, 2015

what's wrong with Jenkins? This PR does not add any public class.

@holdenk
Copy link
Contributor

holdenk commented Sep 14, 2015

as it says the public classes bit is experimental (it gets it wrong sometimes).

@sun-rui
Copy link
Contributor Author

sun-rui commented Sep 14, 2015

@holdenk, thanks!

@sun-rui
Copy link
Contributor Author

sun-rui commented Sep 14, 2015

Jenkins, retest this please

@sun-rui
Copy link
Contributor Author

sun-rui commented Sep 14, 2015

good luck this time?

if (length(matchedStrings[[1]]) >= 3) {
keyType <- matchedStrings[[1]][2]
valueType <- matchedStrings[[1]][3]
checkType(keyType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, on the write side the keyType has to be strings, but on the read side (collect) that restriction doesn't have to exist. Is this a missing check read side or I am just being overly cautious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good catch. Should check if the key type is String. Also need to add check on Scala side.

It seems a little boring that doing type check on both R and Scala side. Maybe we can remove the type check on R side. Not sure for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool :)

@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42415 has finished for PR 8711 at commit 74558eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42417 has finished for PR 8711 at commit 748fb07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor Author

sun-rui commented Sep 15, 2015

Jenkins, retest this please

keyType = "string",
valueType = infer_type(get(key, x)),
valueContainsNull = TRUE)
paste0("map<string,", infer_type(get(key, x)), ">")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing -- in the previous list we had an entry for valueContainsNull that we dont have any more. I can see that this was always TRUE so this probably doesn't affect functionality right now, but I am just wondering if we had it for some other purpose

cc @daveis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to infer it is nullable or not. So it is always TRUE. Removing it does not affect functionality.

@shivaram
Copy link
Contributor

Thanks @sun-rui - Change looks pretty good to me. I had some minor inline comments

cc @davies

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42462 has finished for PR 8711 at commit 748fb07.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42468 has finished for PR 8711 at commit f0e52e0.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42477 has finished for PR 8711 at commit 6daf126.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

LGTM

@shivaram
Copy link
Contributor

@davies let me know if you have any comments or I'll merge this later today

@davies
Copy link
Contributor

davies commented Sep 16, 2015

LGTM

@asfgit asfgit closed this in 896edb5 Sep 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants