New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading an array in pig #158
Comments
From my experience, Pig tends to be quite picky about tuples and bags when reading them directly from the source - I'm not sure why, maybe this is something that we could improve in ES but again, I'm not sure how. Regarding the initial error (the outofmemory one) I'm not sure why that occurs, but based on the stack trace it seems to be caused by Pig; I wasn't able to reproduce locally (using Pig in local mode). My advice going forward is to try to read the data as a basic tuple and then create your structures by hand. From what I've seen this seems to be the recommendation on the Pig mailing list as well. Hope this helps, |
I tried
gives me the same OOM exception:
|
For some reason I can't reproduce this - maybe it's hadoop2 vs hadoop1. However I would imagine the crux of the problem is the usage of an array inside a bag. As an alternative you could load the array as individual items and then manually create the bag through |
Rescheduling this for 1.3 RC1 potentially by adding more documentation on this type of mapping. |
Hi, This should be fixed in master - though I was not able to reproduce your issue, I bumped into one that had similar behaviour (and it turned out it depended on the JSON being read). It would be great if you could try the latest dev builds and let us know whether they work for you or not. Cheers, |
Hi!
I'm trying to work with pig over an ES index. In my ES index I have arrays of strings that I can't read without error in pig. A gist recreation is here: https://gist.github.com/jpparis-orange/9329308#file-espigarray (ES index creation and pig commands).
Here is my configuration:
If I declare the pig array with my_array:{ the_tuple: ( the_item: chararray ) }, I get (more detailled stack in the gist)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
If I use the (incorrect) pig syntax my_array:(), I can print the array, but if I try to COUNT the elements, I get this error:
<line 2, column 52> Could not infer the matching function for org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an explicit cast.
Thanks for any hints!
jp
The text was updated successfully, but these errors were encountered: