New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayIndexOutOfBoundsException during serialization #178
Comments
We have not seen this. Please provide more context for the errors: Which jar version? Are you running via java directly, hive, pig, druid, etc? Which serde are you using with these sketches? If you can get it to print the sketch preamble info, that'd also be useful. |
|
How do you serialize sketches in Spark? You must have added some wrapper
around sketches to make them Java or Kryo serializable.
…On Thu, Dec 14, 2017 at 11:09 AM, Harsh Pandey ***@***.***> wrote:
- Using sketches-core 0.10.3
- Running this in spark
- Using ArrayOfNumbersSerDe/ArrayOfStringsSerDe
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHmmU8KwroiXt_wvwI8aDC26dXag3ks5tAXJWgaJpZM4RBU6v>
.
|
Yeah we use Java serialization (we use |
Could you share this piece of code?
…On Thu, Dec 14, 2017 at 1:08 PM, Harsh Pandey ***@***.***> wrote:
Yeah we use Java serialization (we use toByteArray and just write the
byte array to an ObjectOutputStream).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHqNUo_RLoZG81on8ZVMype8jyOhFks5tAY5fgaJpZM4RBU6v>
.
|
Looks something like this:
|
Unfortunately, we cannot reproduce this problem. Could you modify your
wrappers to print sketch.toString() if any exception is thrown during
serialization?
For instance:
public byte[] serialize() {
try {
return sketch.toByteArray(new ArrayOfNumbersSerDe());
} catch (Exception e) {
System.out.println(sketch.toString(true, true));
throw e;
}
}
During deserialization, perhaps, you could dump the memory object somehow.
Let us think about that.
…On Mon, Dec 18, 2017 at 9:32 AM, Harsh Pandey ***@***.***> wrote:
Looks something like this:
public final class NumericDistribution {
private static final Comparator<Number> COMPARATOR = Comparator.comparing(num -> new BigDecimal(num.toString()));
private ItemsSketch<Number> sketch;
public NumericDistribution(int distributionK) {
this(ItemsSketch.getInstance(distributionK, COMPARATOR));
}
public NumericDistribution(byte[] data) {
this(ItemsSketch.getInstance(Memory.wrap(data), COMPARATOR, new ArrayOfNumbersSerDe()));
}
private NumericDistribution(ItemsSketch<Number> sketch) {
this.sketch = sketch;
}
public byte[] serialize() {
return sketch.toByteArray(new ArrayOfNumbersSerDe());
}
private void writeObject(ObjectOutputStream outputStream) throws IOException {
outputStream.writeObject(serialize());
}
private void readObject(ObjectInputStream inputStream) throws ClassNotFoundException, IOException {
byte[] data = (byte[]) inputStream.readObject();
sketch = new NumericDistribution(data).sketch;
}
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHqjsPSA0dMN_2oQaih47fO8i3g-hks5tBqGagaJpZM4RBU6v>
.
|
During deserialization we suggest this:
private void readObject(ObjectInputStream inputStream) throws
ClassNotFoundException, IOException {
byte[] data = (byte[]) inputStream.readObject();
try {
sketch = new NumericDistribution(data).sketch;
} catch (Exception e) {
Memory mem = Memory.wrap(data);
System.out.println(mem.toHexString("corrupt quantiles
ItemsSketch<Number>", 0L, (int) mem.getCapacity()));
throw e;
}
}
On Tue, Dec 19, 2017 at 12:38 PM, Alexander Saydakov <saydakov@oath.com>
wrote:
… Unfortunately, we cannot reproduce this problem. Could you modify your
wrappers to print sketch.toString() if any exception is thrown during
serialization?
For instance:
public byte[] serialize() {
try {
return sketch.toByteArray(new ArrayOfNumbersSerDe());
} catch (Exception e) {
System.out.println(sketch.toString(true, true));
throw e;
}
}
During deserialization, perhaps, you could dump the memory object somehow.
Let us think about that.
On Mon, Dec 18, 2017 at 9:32 AM, Harsh Pandey ***@***.***>
wrote:
> Looks something like this:
>
> public final class NumericDistribution {
>
> private static final Comparator<Number> COMPARATOR = Comparator.comparing(num -> new BigDecimal(num.toString()));
>
> private ItemsSketch<Number> sketch;
>
> public NumericDistribution(int distributionK) {
> this(ItemsSketch.getInstance(distributionK, COMPARATOR));
> }
>
> public NumericDistribution(byte[] data) {
> this(ItemsSketch.getInstance(Memory.wrap(data), COMPARATOR, new ArrayOfNumbersSerDe()));
> }
>
> private NumericDistribution(ItemsSketch<Number> sketch) {
> this.sketch = sketch;
> }
>
> public byte[] serialize() {
> return sketch.toByteArray(new ArrayOfNumbersSerDe());
> }
>
> private void writeObject(ObjectOutputStream outputStream) throws IOException {
> outputStream.writeObject(serialize());
> }
>
> private void readObject(ObjectInputStream inputStream) throws ClassNotFoundException, IOException {
> byte[] data = (byte[]) inputStream.readObject();
> sketch = new NumericDistribution(data).sketch;
> }
> }
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#178 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AMhMHqjsPSA0dMN_2oQaih47fO8i3g-hks5tBqGagaJpZM4RBU6v>
> .
>
|
Appreciate the pointers. Will try this and report back. |
I'm a colleague of @hpx7. We added the above logging to our code and received the following dump:
And the exception includes the following message:1
This error occurs many times when building a Distribution of a given Spark column. Interestingly, for each of these errors, capacity - allocSize = 16 (capacity as described by the memory header, and allocSize described by the exception message). This happens to coincide with both the UnsafeObjHeader size and the CumBaseOffset as described by the memory header. Any insight @AlexanderSaydakov ? Thanks very much for the help thus far. |
Could you give the full stack trace please?
…On Tue, Jan 16, 2018 at 10:27 AM, szunami ***@***.***> wrote:
I'm a colleague of @hpx7 <https://github.com/hpx7>. We added the above
logging to our code and received the following dump:
### WritableMemoryImpl SUMMARY ###
Header Comment :
Call Params : .toHexString(..., 0, 4046), hashCode: 291048800
NativeBaseOffset : 0
UnsafeObj, hashCode : byte[], 212194479
UnsafeObjHeader : 16
ByteBuf, hashCode : null
RegionOffset : 0
Capacity : 4046
CumBaseOffset : 16
MemReq, hashCode : DefaultMemoryManager, 593466056
Valid : true
Resource Read Only : false
Resource Endianness : LITTLE_ENDIAN
JDK Version : 8
Data, littleEndian : < data omitted>
And the exception includes the following message:1
"reqOffset: 4030, reqLength: , (reqOff + reqLen): 4031, allocSize: 4030"
This error occurs many times when building a Distribution of a given Spark
column. Interestingly, for each of these errors, capacity - allocSize = 16
(capacity as described by the memory header, and allocSize described by the
exception message). This happens to coincide with both the UnsafeObjHeader
size and the CumBaseOffset as described by the memory header.
Any insight @AlexanderSaydakov <https://github.com/alexandersaydakov> ?
Thanks very much for the help thus far.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHoDurM_KtuLYeMY9nNo3mA15bR3Sks5tLOoQgaJpZM4RBU6v>
.
|
Guys, we are trying to help you, but you but we can't unless we get full stack traces. The message:
comes from The fact that the error message (allocSize = 4030) is different from the Memory.toHex() message you printed out, (Capacity = 4046) means that they are unrelated. Either the error message comes from a different Memory resource OR the message comes from a check on the allocation size of a primitive array being placed into Memory, or the allocation of an array meant to receive data from Memory. We need the full stack trace of what leads up to the above error message, and I mean ALL traces, please. Otherwise, we will be going around this loop forever. |
Here is the stacktrace that corresponds to the above dump.
This stacktrace is printed along with the memory dump in the catch/rethrow block that we added as per @AlexanderSaydakov 's suggestion. |
The code this comes from:
|
Data, littleEndian : < data omitted>
Could you give us the data as well?
…On Tue, Jan 16, 2018 at 2:29 PM, Harsh Pandey ***@***.***> wrote:
The code this comes from:
public NumericDistribution(byte[] data) {
this(ItemsSketch.getInstance(Memory.wrap(data), COMPARATOR, new ArrayOfNumbersSerDe()));
}
private void readObject(ObjectInputStream inputStream) throws ClassNotFoundException, IOException {
byte[] data = (byte[]) inputStream.readObject();
sketch = new NumericDistribution(data).sketch;
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHhFqgiJ0xCe2t_whZhj39m8JADVwks5tLSLUgaJpZM4RBU6v>
.
|
Let me clarify: we need to see the raw bytes output from the memory.toHex() method. it should be about 4K bytes. |
Sure thing, hex data below: Data
|
This is strange. The stream length is 1286693. So the sketch must have
retained 805 items plus min and max plus 16 bytes preamble. I see integers
in the sketch, which means 5 bytes per item using ArrayOfNumbersSerDe. This
means the serialized size must be 16 + 807 * 5 = 4051 bytes. But we have
4046 instead. I wonder where did 5 bytes go.
…On Wed, Jan 17, 2018 at 9:31 AM, szunami ***@***.***> wrote:
Sure thing, hex data below:
Data
Data, littleendian : 0 1 2 3 4 5 6 7
0: 02 03 08 08 80 00 00 00
8: 25 a2 13 00 00 00 00 00
16: 09 01 00 00 00 09 01 00
24: 00 00 09 01 00 00 00 09
32: 01 00 00 00 09 01 00 00
40: 00 09 01 00 00 00 09 01
48: 00 00 00 09 01 00 00 00
56: 09 01 00 00 00 09 01 00
64: 00 00 09 01 00 00 00 09
72: 01 00 00 00 09 01 00 00
80: 00 09 01 00 00 00 09 01
88: 00 00 00 09 01 00 00 00
96: 09 01 00 00 00 09 01 00
104: 00 00 09 01 00 00 00 09
112: 01 00 00 00 09 01 00 00
120: 00 09 01 00 00 00 09 01
128: 00 00 00 09 01 00 00 00
136: 09 01 00 00 00 09 01 00
144: 00 00 09 01 00 00 00 09
152: 01 00 00 00 09 01 00 00
160: 00 09 01 00 00 00 09 01
168: 00 00 00 09 01 00 00 00
176: 09 01 00 00 00 09 01 00
184: 00 00 09 01 00 00 00 09
192: 01 00 00 00 09 01 00 00
200: 00 09 01 00 00 00 09 01
208: 00 00 00 09 01 00 00 00
216: 09 01 00 00 00 09 01 00
224: 00 00 09 01 00 00 00 09
232: 01 00 00 00 09 01 00 00
240: 00 09 01 00 00 00 09 01
248: 00 00 00 09 01 00 00 00
256: 09 01 00 00 00 09 01 00
264: 00 00 09 01 00 00 00 09
272: 01 00 00 00 09 01 00 00
280: 00 09 01 00 00 00 09 01
288: 00 00 00 09 01 00 00 00
296: 09 01 00 00 00 09 01 00
304: 00 00 09 01 00 00 00 09
312: 01 00 00 00 09 01 00 00
320: 00 09 01 00 00 00 09 01
328: 00 00 00 09 01 00 00 00
336: 09 01 00 00 00 09 01 00
344: 00 00 09 01 00 00 00 09
352: 01 00 00 00 09 01 00 00
360: 00 09 01 00 00 00 09 01
368: 00 00 00 09 01 00 00 00
376: 09 01 00 00 00 09 01 00
384: 00 00 09 01 00 00 00 09
392: 01 00 00 00 09 01 00 00
400: 00 09 01 00 00 00 09 01
408: 00 00 00 09 01 00 00 00
416: 09 01 00 00 00 09 01 00
424: 00 00 09 01 00 00 00 09
432: 01 00 00 00 09 01 00 00
440: 00 09 01 00 00 00 09 01
448: 00 00 00 09 01 00 00 00
456: 09 01 00 00 00 09 01 00
464: 00 00 09 01 00 00 00 09
472: 01 00 00 00 09 01 00 00
480: 00 09 01 00 00 00 09 01
488: 00 00 00 09 01 00 00 00
496: 09 01 00 00 00 09 01 00
504: 00 00 09 01 00 00 00 09
512: 01 00 00 00 09 01 00 00
520: 00 09 01 00 00 00 09 01
528: 00 00 00 09 01 00 00 00
536: 09 01 00 00 00 09 01 00
544: 00 00 09 01 00 00 00 09
552: 01 00 00 00 09 01 00 00
560: 00 09 01 00 00 00 09 01
568: 00 00 00 09 01 00 00 00
576: 09 01 00 00 00 09 01 00
584: 00 00 09 01 00 00 00 09
592: 01 00 00 00 09 01 00 00
600: 00 09 01 00 00 00 09 01
608: 00 00 00 09 01 00 00 00
616: 09 01 00 00 00 09 01 00
624: 00 00 09 01 00 00 00 09
632: 01 00 00 00 09 01 00 00
640: 00 09 01 00 00 00 09 01
648: 00 00 00 09 01 00 00 00
656: 09 01 00 00 00 09 01 00
664: 00 00 09 01 00 00 00 09
672: 01 00 00 00 09 01 00 00
680: 00 09 01 00 00 00 09 01
688: 00 00 00 09 01 00 00 00
696: 09 01 00 00 00 09 01 00
704: 00 00 09 01 00 00 00 09
712: 01 00 00 00 09 01 00 00
720: 00 09 01 00 00 00 09 01
728: 00 00 00 09 01 00 00 00
736: 09 01 00 00 00 09 01 00
744: 00 00 09 01 00 00 00 09
752: 01 00 00 00 09 01 00 00
760: 00 09 01 00 00 00 09 01
768: 00 00 00 09 01 00 00 00
776: 09 01 00 00 00 09 01 00
784: 00 00 09 01 00 00 00 09
792: 01 00 00 00 09 01 00 00
800: 00 09 01 00 00 00 09 01
808: 00 00 00 09 01 00 00 00
816: 09 01 00 00 00 09 01 00
824: 00 00 09 01 00 00 00 09
832: 01 00 00 00 09 01 00 00
840: 00 09 01 00 00 00 09 01
848: 00 00 00 09 01 00 00 00
856: 09 01 00 00 00 09 01 00
864: 00 00 09 01 00 00 00 09
872: 01 00 00 00 09 01 00 00
880: 00 09 01 00 00 00 09 01
888: 00 00 00 09 01 00 00 00
896: 09 01 00 00 00 09 01 00
904: 00 00 09 01 00 00 00 09
912: 01 00 00 00 09 01 00 00
920: 00 09 01 00 00 00 09 01
928: 00 00 00 09 01 00 00 00
936: 09 01 00 00 00 09 01 00
944: 00 00 09 01 00 00 00 09
952: 01 00 00 00 09 01 00 00
960: 00 09 01 00 00 00 09 01
968: 00 00 00 09 01 00 00 00
976: 09 01 00 00 00 09 01 00
984: 00 00 09 01 00 00 00 09
992: 01 00 00 00 09 01 00 00
1000: 00 09 01 00 00 00 09 01
1008: 00 00 00 09 01 00 00 00
1016: 09 01 00 00 00 09 01 00
1024: 00 00 09 01 00 00 00 09
1032: 01 00 00 00 09 01 00 00
1040: 00 09 01 00 00 00 09 01
1048: 00 00 00 09 01 00 00 00
1056: 09 01 00 00 00 09 01 00
1064: 00 00 09 01 00 00 00 09
1072: 01 00 00 00 09 01 00 00
1080: 00 09 01 00 00 00 09 01
1088: 00 00 00 09 01 00 00 00
1096: 09 01 00 00 00 09 01 00
1104: 00 00 09 01 00 00 00 09
1112: 01 00 00 00 09 01 00 00
1120: 00 09 01 00 00 00 09 01
1128: 00 00 00 09 01 00 00 00
1136: 09 01 00 00 00 09 01 00
1144: 00 00 09 01 00 00 00 09
1152: 01 00 00 00 09 01 00 00
1160: 00 09 01 00 00 00 09 01
1168: 00 00 00 09 01 00 00 00
1176: 09 01 00 00 00 09 01 00
1184: 00 00 09 01 00 00 00 09
1192: 01 00 00 00 09 01 00 00
1200: 00 09 01 00 00 00 09 01
1208: 00 00 00 09 01 00 00 00
1216: 09 01 00 00 00 09 01 00
1224: 00 00 09 01 00 00 00 09
1232: 01 00 00 00 09 01 00 00
1240: 00 09 01 00 00 00 09 01
1248: 00 00 00 09 01 00 00 00
1256: 09 01 00 00 00 09 01 00
1264: 00 00 09 01 00 00 00 09
1272: 01 00 00 00 09 01 00 00
1280: 00 09 01 00 00 00 09 01
1288: 00 00 00 09 01 00 00 00
1296: 09 01 00 00 00 09 01 00
1304: 00 00 09 01 00 00 00 09
1312: 01 00 00 00 09 01 00 00
1320: 00 09 01 00 00 00 09 01
1328: 00 00 00 09 01 00 00 00
1336: 09 01 00 00 00 09 01 00
1344: 00 00 09 01 00 00 00 09
1352: 01 00 00 00 09 01 00 00
1360: 00 09 01 00 00 00 09 01
1368: 00 00 00 09 01 00 00 00
1376: 09 01 00 00 00 09 01 00
1384: 00 00 09 01 00 00 00 09
1392: 01 00 00 00 09 01 00 00
1400: 00 09 01 00 00 00 09 01
1408: 00 00 00 09 01 00 00 00
1416: 09 01 00 00 00 09 01 00
1424: 00 00 09 01 00 00 00 09
1432: 01 00 00 00 09 01 00 00
1440: 00 09 01 00 00 00 09 01
1448: 00 00 00 09 01 00 00 00
1456: 09 01 00 00 00 09 01 00
1464: 00 00 09 01 00 00 00 09
1472: 01 00 00 00 09 01 00 00
1480: 00 09 01 00 00 00 09 01
1488: 00 00 00 09 01 00 00 00
1496: 09 01 00 00 00 09 01 00
1504: 00 00 09 01 00 00 00 09
1512: 01 00 00 00 09 01 00 00
1520: 00 09 01 00 00 00 09 01
1528: 00 00 00 09 01 00 00 00
1536: 09 01 00 00 00 09 01 00
1544: 00 00 09 01 00 00 00 09
1552: 01 00 00 00 09 01 00 00
1560: 00 09 01 00 00 00 09 01
1568: 00 00 00 09 01 00 00 00
1576: 09 01 00 00 00 09 01 00
1584: 00 00 09 01 00 00 00 09
1592: 01 00 00 00 09 01 00 00
1600: 00 09 01 00 00 00 09 01
1608: 00 00 00 09 01 00 00 00
1616: 09 01 00 00 00 09 01 00
1624: 00 00 09 01 00 00 00 09
1632: 01 00 00 00 09 01 00 00
1640: 00 09 01 00 00 00 09 01
1648: 00 00 00 09 01 00 00 00
1656: 09 01 00 00 00 09 01 00
1664: 00 00 09 01 00 00 00 09
1672: 01 00 00 00 09 01 00 00
1680: 00 09 01 00 00 00 09 01
1688: 00 00 00 09 01 00 00 00
1696: 09 01 00 00 00 09 01 00
1704: 00 00 09 01 00 00 00 09
1712: 01 00 00 00 09 01 00 00
1720: 00 09 01 00 00 00 09 01
1728: 00 00 00 09 01 00 00 00
1736: 09 01 00 00 00 09 01 00
1744: 00 00 09 01 00 00 00 09
1752: 01 00 00 00 09 01 00 00
1760: 00 09 01 00 00 00 09 01
1768: 00 00 00 09 01 00 00 00
1776: 09 01 00 00 00 09 01 00
1784: 00 00 09 01 00 00 00 09
1792: 01 00 00 00 09 01 00 00
1800: 00 09 01 00 00 00 09 01
1808: 00 00 00 09 01 00 00 00
1816: 09 01 00 00 00 09 01 00
1824: 00 00 09 01 00 00 00 09
1832: 01 00 00 00 09 01 00 00
1840: 00 09 01 00 00 00 09 01
1848: 00 00 00 09 01 00 00 00
1856: 09 01 00 00 00 09 01 00
1864: 00 00 09 01 00 00 00 09
1872: 01 00 00 00 09 01 00 00
1880: 00 09 01 00 00 00 09 01
1888: 00 00 00 09 01 00 00 00
1896: 09 01 00 00 00 09 01 00
1904: 00 00 09 01 00 00 00 09
1912: 01 00 00 00 09 01 00 00
1920: 00 09 01 00 00 00 09 01
1928: 00 00 00 09 01 00 00 00
1936: 09 01 00 00 00 09 01 00
1944: 00 00 09 01 00 00 00 09
1952: 01 00 00 00 09 01 00 00
1960: 00 09 01 00 00 00 09 01
1968: 00 00 00 09 01 00 00 00
1976: 09 01 00 00 00 09 01 00
1984: 00 00 09 01 00 00 00 09
1992: 01 00 00 00 09 01 00 00
2000: 00 09 01 00 00 00 09 01
2008: 00 00 00 09 01 00 00 00
2016: 09 01 00 00 00 09 01 00
2024: 00 00 09 01 00 00 00 09
2032: 01 00 00 00 09 01 00 00
2040: 00 09 01 00 00 00 09 01
2048: 00 00 00 09 01 00 00 00
2056: 09 01 00 00 00 09 01 00
2064: 00 00 09 01 00 00 00 09
2072: 01 00 00 00 09 01 00 00
2080: 00 09 01 00 00 00 09 01
2088: 00 00 00 09 01 00 00 00
2096: 09 01 00 00 00 09 01 00
2104: 00 00 09 01 00 00 00 09
2112: 01 00 00 00 09 01 00 00
2120: 00 09 01 00 00 00 09 01
2128: 00 00 00 09 01 00 00 00
2136: 09 01 00 00 00 09 01 00
2144: 00 00 09 01 00 00 00 09
2152: 01 00 00 00 09 01 00 00
2160: 00 09 01 00 00 00 09 01
2168: 00 00 00 09 01 00 00 00
2176: 09 01 00 00 00 09 01 00
2184: 00 00 09 01 00 00 00 09
2192: 01 00 00 00 09 01 00 00
2200: 00 09 01 00 00 00 09 01
2208: 00 00 00 09 01 00 00 00
2216: 09 01 00 00 00 09 01 00
2224: 00 00 09 01 00 00 00 09
2232: 01 00 00 00 09 01 00 00
2240: 00 09 01 00 00 00 09 01
2248: 00 00 00 09 01 00 00 00
2256: 09 01 00 00 00 09 01 00
2264: 00 00 09 01 00 00 00 09
2272: 01 00 00 00 09 01 00 00
2280: 00 09 01 00 00 00 09 01
2288: 00 00 00 09 01 00 00 00
2296: 09 01 00 00 00 09 01 00
2304: 00 00 09 01 00 00 00 09
2312: 01 00 00 00 09 01 00 00
2320: 00 09 01 00 00 00 09 01
2328: 00 00 00 09 01 00 00 00
2336: 09 01 00 00 00 09 01 00
2344: 00 00 09 01 00 00 00 09
2352: 01 00 00 00 09 01 00 00
2360: 00 09 01 00 00 00 09 01
2368: 00 00 00 09 01 00 00 00
2376: 09 01 00 00 00 09 01 00
2384: 00 00 09 01 00 00 00 09
2392: 01 00 00 00 09 01 00 00
2400: 00 09 01 00 00 00 09 01
2408: 00 00 00 09 01 00 00 00
2416: 09 01 00 00 00 09 01 00
2424: 00 00 09 01 00 00 00 09
2432: 01 00 00 00 09 01 00 00
2440: 00 09 01 00 00 00 09 01
2448: 00 00 00 09 01 00 00 00
2456: 09 01 00 00 00 09 01 00
2464: 00 00 09 01 00 00 00 09
2472: 01 00 00 00 09 01 00 00
2480: 00 09 01 00 00 00 09 01
2488: 00 00 00 09 01 00 00 00
2496: 09 01 00 00 00 09 01 00
2504: 00 00 09 01 00 00 00 09
2512: 01 00 00 00 09 01 00 00
2520: 00 09 01 00 00 00 09 01
2528: 00 00 00 09 01 00 00 00
2536: 09 01 00 00 00 09 01 00
2544: 00 00 09 01 00 00 00 09
2552: 01 00 00 00 09 01 00 00
2560: 00 09 01 00 00 00 09 01
2568: 00 00 00 09 01 00 00 00
2576: 09 01 00 00 00 09 01 00
2584: 00 00 09 01 00 00 00 09
2592: 01 00 00 00 09 01 00 00
2600: 00 09 01 00 00 00 09 01
2608: 00 00 00 09 01 00 00 00
2616: 09 01 00 00 00 09 01 00
2624: 00 00 09 01 00 00 00 09
2632: 01 00 00 00 09 01 00 00
2640: 00 09 01 00 00 00 09 01
2648: 00 00 00 09 01 00 00 00
2656: 09 01 00 00 00 09 01 00
2664: 00 00 09 01 00 00 00 09
2672: 01 00 00 00 09 01 00 00
2680: 00 09 01 00 00 00 09 01
2688: 00 00 00 09 01 00 00 00
2696: 09 01 00 00 00 09 01 00
2704: 00 00 09 01 00 00 00 09
2712: 01 00 00 00 09 01 00 00
2720: 00 09 01 00 00 00 09 01
2728: 00 00 00 09 01 00 00 00
2736: 09 01 00 00 00 09 01 00
2744: 00 00 09 01 00 00 00 09
2752: 01 00 00 00 09 01 00 00
2760: 00 09 01 00 00 00 09 01
2768: 00 00 00 09 01 00 00 00
2776: 09 01 00 00 00 09 01 00
2784: 00 00 09 01 00 00 00 09
2792: 01 00 00 00 09 01 00 00
2800: 00 09 01 00 00 00 09 01
2808: 00 00 00 09 01 00 00 00
2816: 09 01 00 00 00 09 01 00
2824: 00 00 09 01 00 00 00 09
2832: 01 00 00 00 09 01 00 00
2840: 00 09 01 00 00 00 09 01
2848: 00 00 00 09 01 00 00 00
2856: 09 01 00 00 00 09 01 00
2864: 00 00 09 01 00 00 00 09
2872: 01 00 00 00 09 01 00 00
2880: 00 09 01 00 00 00 09 01
2888: 00 00 00 09 01 00 00 00
2896: 09 01 00 00 00 09 01 00
2904: 00 00 09 01 00 00 00 09
2912: 01 00 00 00 09 01 00 00
2920: 00 09 01 00 00 00 09 01
2928: 00 00 00 09 01 00 00 00
2936: 09 01 00 00 00 09 01 00
2944: 00 00 09 01 00 00 00 09
2952: 01 00 00 00 09 01 00 00
2960: 00 09 01 00 00 00 09 01
2968: 00 00 00 09 01 00 00 00
2976: 09 01 00 00 00 09 01 00
2984: 00 00 09 01 00 00 00 09
2992: 01 00 00 00 09 01 00 00
3000: 00 09 01 00 00 00 09 01
3008: 00 00 00 09 01 00 00 00
3016: 09 01 00 00 00 09 01 00
3024: 00 00 09 01 00 00 00 09
3032: 01 00 00 00 09 01 00 00
3040: 00 09 01 00 00 00 09 01
3048: 00 00 00 09 01 00 00 00
3056: 09 01 00 00 00 09 01 00
3064: 00 00 09 01 00 00 00 09
3072: 01 00 00 00 09 01 00 00
3080: 00 09 01 00 00 00 09 01
3088: 00 00 00 09 01 00 00 00
3096: 09 01 00 00 00 09 01 00
3104: 00 00 09 01 00 00 00 09
3112: 01 00 00 00 09 01 00 00
3120: 00 09 01 00 00 00 09 01
3128: 00 00 00 09 01 00 00 00
3136: 09 01 00 00 00 09 01 00
3144: 00 00 09 01 00 00 00 09
3152: 01 00 00 00 09 01 00 00
3160: 00 09 01 00 00 00 09 01
3168: 00 00 00 09 01 00 00 00
3176: 09 01 00 00 00 09 01 00
3184: 00 00 09 01 00 00 00 09
3192: 01 00 00 00 09 01 00 00
3200: 00 09 01 00 00 00 09 01
3208: 00 00 00 09 01 00 00 00
3216: 09 01 00 00 00 09 01 00
3224: 00 00 09 01 00 00 00 09
3232: 01 00 00 00 09 01 00 00
3240: 00 09 01 00 00 00 09 01
3248: 00 00 00 09 01 00 00 00
3256: 09 01 00 00 00 09 01 00
3264: 00 00 09 01 00 00 00 09
3272: 01 00 00 00 09 01 00 00
3280: 00 09 01 00 00 00 09 01
3288: 00 00 00 09 01 00 00 00
3296: 09 01 00 00 00 09 01 00
3304: 00 00 09 01 00 00 00 09
3312: 01 00 00 00 09 01 00 00
3320: 00 09 01 00 00 00 09 01
3328: 00 00 00 09 01 00 00 00
3336: 09 01 00 00 00 09 01 00
3344: 00 00 09 01 00 00 00 09
3352: 01 00 00 00 09 01 00 00
3360: 00 09 01 00 00 00 09 01
3368: 00 00 00 09 01 00 00 00
3376: 09 01 00 00 00 09 01 00
3384: 00 00 09 01 00 00 00 09
3392: 01 00 00 00 09 01 00 00
3400: 00 09 01 00 00 00 09 01
3408: 00 00 00 09 01 00 00 00
3416: 09 01 00 00 00 09 01 00
3424: 00 00 09 01 00 00 00 09
3432: 01 00 00 00 09 01 00 00
3440: 00 09 01 00 00 00 09 01
3448: 00 00 00 09 01 00 00 00
3456: 09 01 00 00 00 09 01 00
3464: 00 00 09 01 00 00 00 09
3472: 01 00 00 00 09 01 00 00
3480: 00 09 01 00 00 00 09 01
3488: 00 00 00 09 01 00 00 00
3496: 09 01 00 00 00 09 01 00
3504: 00 00 09 01 00 00 00 09
3512: 01 00 00 00 09 01 00 00
3520: 00 09 01 00 00 00 09 01
3528: 00 00 00 09 01 00 00 00
3536: 09 01 00 00 00 09 01 00
3544: 00 00 09 01 00 00 00 09
3552: 01 00 00 00 09 01 00 00
3560: 00 09 01 00 00 00 09 01
3568: 00 00 00 09 01 00 00 00
3576: 09 01 00 00 00 09 01 00
3584: 00 00 09 01 00 00 00 09
3592: 01 00 00 00 09 01 00 00
3600: 00 09 01 00 00 00 09 01
3608: 00 00 00 09 01 00 00 00
3616: 09 01 00 00 00 09 01 00
3624: 00 00 09 01 00 00 00 09
3632: 01 00 00 00 09 01 00 00
3640: 00 09 01 00 00 00 09 01
3648: 00 00 00 09 01 00 00 00
3656: 09 01 00 00 00 09 01 00
3664: 00 00 09 01 00 00 00 09
3672: 01 00 00 00 09 01 00 00
3680: 00 09 01 00 00 00 09 01
3688: 00 00 00 09 01 00 00 00
3696: 09 01 00 00 00 09 01 00
3704: 00 00 09 01 00 00 00 09
3712: 01 00 00 00 09 01 00 00
3720: 00 09 01 00 00 00 09 01
3728: 00 00 00 09 01 00 00 00
3736: 09 01 00 00 00 09 01 00
3744: 00 00 09 01 00 00 00 09
3752: 01 00 00 00 09 01 00 00
3760: 00 09 01 00 00 00 09 01
3768: 00 00 00 09 01 00 00 00
3776: 09 01 00 00 00 09 01 00
3784: 00 00 09 01 00 00 00 09
3792: 01 00 00 00 09 01 00 00
3800: 00 09 01 00 00 00 09 01
3808: 00 00 00 09 01 00 00 00
3816: 09 01 00 00 00 09 01 00
3824: 00 00 09 01 00 00 00 09
3832: 01 00 00 00 09 01 00 00
3840: 00 09 01 00 00 00 09 01
3848: 00 00 00 09 01 00 00 00
3856: 09 01 00 00 00 09 01 00
3864: 00 00 09 01 00 00 00 09
3872: 01 00 00 00 09 01 00 00
3880: 00 09 01 00 00 00 09 01
3888: 00 00 00 09 01 00 00 00
3896: 09 01 00 00 00 09 01 00
3904: 00 00 09 01 00 00 00 09
3912: 01 00 00 00 09 01 00 00
3920: 00 09 01 00 00 00 09 01
3928: 00 00 00 09 01 00 00 00
3936: 09 01 00 00 00 09 01 00
3944: 00 00 09 01 00 00 00 09
3952: 01 00 00 00 09 01 00 00
3960: 00 09 01 00 00 00 09 01
3968: 00 00 00 09 01 00 00 00
3976: 09 01 00 00 00 09 01 00
3984: 00 00 09 01 00 00 00 09
3992: 01 00 00 00 09 01 00 00
4000: 00 09 01 00 00 00 09 01
4008: 00 00 00 09 01 00 00 00
4016: 09 01 00 00 00 09 01 00
4024: 00 00 09 01 00 00 00 09
4032: 01 00 00 00 09 01 00 00
4040: 00 09 01 00 00 00 ~~~
</p></details>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHtMICKHARu655YcWzzS-hcFtc7J3ks5tLi5ZgaJpZM4RBU6v>
.
|
Not sure if relevant, but I believe the input was doubles. |
Let us know if it would be helpful to have more memory dumps as well (we have a few more). |
I could not reproduce it so far. My test shows the correct size.
@test
public void numbers() throws Exception {
Comparator<Number> comparator = Comparator.comparing(num -> new
BigDecimal(num.toString())); // very inefficient
ItemsSketch<Number> sketch = ItemsSketch.getInstance(comparator);
for (int i = 0; i < 1286693; i++) {
sketch.update(i);
}
byte[] bytes = sketch.toByteArray(new ArrayOfNumbersSerDe());
System.out.println("serialized size: " + bytes.length);
}
serialized size: 4051
…On Wed, Jan 17, 2018 at 12:17 PM, Harsh Pandey ***@***.***> wrote:
Let us know if it would be helpful to have more memory dumps as well (we
have a few more).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHlrwQy2p345mVzwTRcQDr0IynqMuks5tLlVTgaJpZM4RBU6v>
.
|
Not sure if relevant, but I believe the input was doubles.
I see the repeating pattern of 09 xx xx xx xx in the dump. 09 is the code
for integer in ArrayOfNumbersSerDe.
By the way, using ItemsSketch<Number> with ArrayOfNumbersSerDe might make
sense if you wanted to keep a mix of different number types (bytes, shorts,
integers, longs and so on). One byte per item is used in the
ArrayOfNumbersSerDe to encode the type. If you want a particular fixed
type, say Long, it would be better to use ItemsSketch<Long> with
ArrayOfLongsSerDe. If you want doubles, I would suggest using specialized
DoublesSketch.
…On Wed, Jan 17, 2018 at 12:16 PM, Harsh Pandey ***@***.***> wrote:
Not sure if relevant, but I believe the input was doubles.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHkwPS1Ti_JibBjVXWB5FSc6-WMjYks5tLlUNgaJpZM4RBU6v>
.
|
I see, this must have been a dump for an integer column then. I suppose the reproduction is complicated by the fact that we do other operations besides |
Is there a chance that you have different threads building sketches and
serializing? That would explain the issue and also the sporadic nature of
failures. Perhaps, you are not even aware what Spark is doing. I would
suggest a simple test for this: just put the "synchronized" keyword on each
method of the wrapper (NumericDistribution). If the problem goes away, that
would prove that we are dealing with a multi-threading issue. By the way, I
don't see any methods to update the wrapped sketch. You must have
simplified the code just to show serialization and deserialization.
…On Wed, Jan 17, 2018 at 12:52 PM, Harsh Pandey ***@***.***> wrote:
I see, this must have been a dump for an integer column then.
I suppose the reproduction is complicated by the fact that we do other
operations besides update. For example, we also merge and
serialize/deserialize sketches during the process of computing them
(ultimately driven by spark).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHrTthaoZrNF7EgxtwHg8YrRNLdG-ks5tLl2MgaJpZM4RBU6v>
.
|
None of the sketches in the library are multi-threaded. If you have concurrent threads reading and writing to the same sketch you must make your sketch wrapper synchronized. |
We don't have concurrent threads reading and writing to the same sketch (Spark parallelizes by splitting the data across machines -- within a given process we iterate over the data sequentially). @AlexanderSaydakov I've updated #178 (comment) to include our update+merge methods. When I say sporadic failures, I meant that the computation fails on certain datasets and not others. For a given dataset, the failure/nonfailure is consistent on retries. However, it's been difficult to reproduce locally even if I download the problematic dataset since I don't have a clustered spark setup on my local machine. |
I would still suggest to try making the wrapper synchronized just to rule
this out. Perhaps you don't realize what Spark is doing. If not
synchronization, we don't have an explanation yet as to how this sort of
corruption would be possible.
…On Thu, Jan 18, 2018 at 7:05 AM, Harsh Pandey ***@***.***> wrote:
We don't have concurrent threads reading and writing to the same sketch
(Spark parallelizes by splitting the data across machines -- within a given
process we iterate over the data sequentially).
@AlexanderSaydakov <https://github.com/alexandersaydakov> I've updated #178
(comment)
<#178 (comment)>
to include our update+merge methods.
When I say sporadic failures, I meant that the computation fails on
certain datasets and not others. For a given dataset, the
failure/nonfailure is consistent on retries. However, it's been difficult
to reproduce locally even if I download the problematic dataset since I
don't have a clustered spark setup on my local machine.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHioNgXR7IqQI5_YFrO9Uf4Rf3SM1ks5tL12_gaJpZM4RBU6v>
.
|
Fair enough, we can synchronize all the methods just to rule this out. |
Let us know if it would be helpful to have more memory dumps as well (we
have a few more).
Yes, we would like to have another example with a different data type
involved (not Integer). Preferably, the size of the type should be
different (Double would be great, for instance).
Did you have a chance to test the synchronized version yet?
…On Thu, Jan 18, 2018 at 11:04 AM, Harsh Pandey ***@***.***> wrote:
Fair enough, we can synchronize all the methods just to rule this out.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMhMHsBbLvjnUKMepWdef-yYNKVmr0y4ks5tL5W6gaJpZM4RBU6v>
.
|
@AlexanderSaydakov after digging in to the Spark architecture a bit, our non-threadsafe code does seem like a likely root cause here. We are in the process of testing a threadsafe version, but as I'm sure you are aware these sort of race-conditions can be hard to formally root out. Thanks so much for your insight and analysis, I suspect by the end of this week we ought to have a more conclusive picture of whether this change fixed the issue or not. |
After running the synchronized code for a couple weeks, the issue has not resurfaced. Thanks as always for the help! |
Thank you for getting back to us. I think this thread will be valuable reading for a number of folks. |
Seeing this sporadically for
FrequentItems
:as well as
Quantiles
:Has anyone seen this before? Might it be related to memory corruption as we suspected in #175?
The text was updated successfully, but these errors were encountered: