Skip to content
This repository has been archived by the owner on May 27, 2020. It is now read-only.

Altering UDTs cause ArrayIndexOutOfBoundsException in ColumnsMapper #395

Open
jgerew opened this issue Jul 9, 2018 · 14 comments
Open

Altering UDTs cause ArrayIndexOutOfBoundsException in ColumnsMapper #395

jgerew opened this issue Jul 9, 2018 · 14 comments

Comments

@jgerew
Copy link

jgerew commented Jul 9, 2018

Apache Cassandra Version: 3.11.2
Stratio Cassandra Lucene Index Version: 3.11.1.0

Reproduction steps:

  1. Create a UDT
    CREATE TYPE test_udt (name text, type text);

  2. Create a stratio lucene index using the UDT

CREATE CUSTOM INDEX test_index ON test_table ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
   'refresh_seconds': '1',
   'schema': '{
      fields: {
         "raw_data.test.name": {type: "text"}
      }
   }'
};
  1. Insert data into the test_table

  2. Execute a query using stratio lucene expression to verify results
    SELECT * FROM test_table WHERE expr(test_index, '{query:[{type:"boolean","should":[{type:"wildcard",field:"raw_data.test.name",value:"*"}]}]}');

  3. Add a field to the UDT
    ALTER TYPE test_udt ADD test_code int;

  4. Execute a query again

ERROR	[Native-Transport-Requests-1]	2018-07-09	18:37:55,141	QueryMessage.java:129	-	Unexpected	error	during	query
	java.lang.ArrayIndexOutOfBoundsException:	2
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8(ColumnsMapper.scala:215)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8$adapted(ColumnsMapper.scala:214)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractIterator.foldRight(Iterator.scala:1409)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractIterable.foldRight(Iterable.scala:54)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractTraversable.$colon$bslash(Traversable.scala:104)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:214)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:173)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8(ColumnsMapper.scala:222)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.$anonfun$columns$8$adapted(ColumnsMapper.scala:214)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractIterator.foldRight(Iterator.scala:1409)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractIterable.foldRight(Iterable.scala:54)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractTraversable.$colon$bslash(Traversable.scala:104)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:214)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:173)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:156)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:119)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper.$anonfun$columns$3(ColumnsMapper.scala:91)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractIterator.foldRight(Iterator.scala:1409)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractIterable.foldRight(Iterable.scala:54)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.AbstractTraversable.$colon$bslash(Traversable.scala:104)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:87)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:56)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.IndexPostProcessor.document(IndexPostProcessor.scala:141)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.IndexPostProcessor.$anonfun$top$1(IndexPostProcessor.scala:106)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:156)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.IndexPostProcessor.top(IndexPostProcessor.scala:103)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.IndexPostProcessor.process(IndexPostProcessor.scala:57)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.ReadCommandPostProcessor.apply(IndexPostProcessor.scala:168)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.ReadCommandPostProcessor.apply(IndexPostProcessor.scala:161)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	org.apache.cassandra.db.PartitionRangeReadCommand.postReconciliationProcessing(PartitionRangeReadCommand.java:408)	~[apache-cassandra-3.11.2.jar:3.11.2]
		at	org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:2288)	~[apache-cassandra-3.11.2.jar:3.11.2]
		at	org.apache.cassandra.db.PartitionRangeReadCommand.execute(PartitionRangeReadCommand.java:263)	~[apache-cassandra-3.11.2.jar:3.11.2]
		at	com.stratio.cassandra.lucene.IndexQueryHandler.executeSortedLuceneQuery(IndexQueryHandler.scala:226)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.IndexQueryHandler.executeLuceneQuery(IndexQueryHandler.scala:193)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.IndexQueryHandler.processStatement(IndexQueryHandler.scala:122)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	com.stratio.cassandra.lucene.IndexQueryHandler.process(IndexQueryHandler.scala:101)	~[cassandra-lucene-index-plugin-3.11.1.0.jar:na]
		at	org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)	~[apache-cassandra-3.11.2.jar:3.11.2]
		at	org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)	[apache-cassandra-3.11.2.jar:3.11.2]
		at	org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)	[apache-cassandra-3.11.2.jar:3.11.2]
		at	io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)	[netty-all-4.0.44.Final.jar:4.0.44.Final]
		at	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)	[netty-all-4.0.44.Final.jar:4.0.44.Final]
		at	io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)	[netty-all-4.0.44.Final.jar:4.0.44.Final]
		at	io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348)	[netty-all-4.0.44.Final.jar:4.0.44.Final]
		at	java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)	[na:1.8.0_171]
		at	org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)	[apache-cassandra-3.11.2.jar:3.11.2]
		at	org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)	[apache-cassandra-3.11.2.jar:3.11.2]
		at	java.lang.Thread.run(Thread.java:748)	[na:1.8.0_171]
@jgerew
Copy link
Author

jgerew commented Jul 9, 2018

The problem appears to lie with how the values are separated by UDT fields. All of the data saved prior to the ALTER statement has only 2 values (index 0 and 1), but the UDT now has 3 fields (index 0, 1, and 2). The UDT field names are iterated over and expects the same number of indexes in the value as there are in the field names causing the exception. See current code below:

  private[mapping] def columns(column: Column, udt: UserType, value: ByteBuffer): Columns = {
    val itemValues = udt.split(value)
    ((0 until udt.fieldNames.size) :\ Columns()) ((i, columns) => {
      val itemValue = itemValues(i) #causes ArrayIndexOutOfBoundsException
      if (itemValue == null) {
        columns
      } else {
        val itemName = udt.fieldNameAsString(i)
        val itemType = udt.fieldType(i)
        val itemColumn = column.withUDTName(itemName)
        this.columns(itemColumn, itemType, itemValue) ++ columns
      }
    })
  }

If we change the code to expect the possibility of an index mismatch we can resolve the issue:

  private[mapping] def columns(column: Column, udt: UserType, value: ByteBuffer): Columns = {
    val itemValues = udt.split(value)
    ((0 until udt.fieldNames.size) :\ Columns()) ((i, columns) => {
      val itemValue = if (i < itemValues.length) itemValues(i) else null #see here
      if (itemValue == null) {
        columns
      } else {
        val itemName = udt.fieldNameAsString(i)
        val itemType = udt.fieldType(i)
        val itemColumn = column.withUDTName(itemName)
        this.columns(itemColumn, itemType, itemValue) ++ columns
      }
    })
  }

@jgerew jgerew changed the title Altering UDTs cause ArrayOutOfBoundsException in ColumnsMapper Altering UDTs cause ArrayIndexOutOfBoundsException in ColumnsMapper Jul 9, 2018
@smiklosovic
Copy link

smiklosovic commented Aug 20, 2018

Hi @jgerew

We are hitting the very same issue.

Our "workflow" is like this:

We have completely empty DB and we create schema, we insert data and then we create index so all is indexed. All works. After that we drop the index and recreate the very same index again and all queries are giving us this exception.

I was going through the very same code as you did and yes it failed on that row.

It is worth to say what if we dont use "expr" queries but "where lucene = query" it all works.

Why?

Could you look into this please?

We are using 3.7.2 Cassandra with 3.7.6 plugin.

@adelapena

@smiklosovic
Copy link

I am taking back my point about expr vs lucene, it fails either way.

@smiklosovic
Copy link

what is even more strange is that we can not use sort after we drop and create an index but we can continue to use queries without sorting, all lucene and expr are working without sorting even we drop and create index again.

@jgerew
Copy link
Author

jgerew commented Aug 20, 2018

Hi @smiklosovic,

I wish I could help, but it doesn't sound like the same thing we were facing. Our problem was due to modifying the UDT type. I don't think it being an index had anything to do with it, we just ran into issues with the stratio plugin when it was formulating the result set. In our case, stratio was expecting the same number of values as there were fields in the UDT and since we had added a new field we got the ArrayIndexOutOfBoundsException.

That being said, we did run into some issues with dropping/re-adding indexes and getting ArrayIndexOutOfBoundsExceptions. We didn't dig too far into it, but we figured it may have been due to schema replication. You may want to try to run your index alterations with consistency set to ALL.

Good luck!

Joe

@smiklosovic
Copy link

smiklosovic commented Aug 20, 2018

@jgerew

we were also thinking this is due to altering an udt - we have our migration scripts and we indeed altered UDT as the last script by adding a field.

But once we were about to replicate this "from scratch" we "describe keyspace"-ed the DB where all was consolidated in a flat schema so no altering at all but we are facing this issue with index drop and sorting anyway.

We wanted to workaround it to have timestamp as part of primary key as clustering column so we would "order by"-ed but

InvalidRequest: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."

@jgerew
Copy link
Author

jgerew commented Aug 20, 2018

Interesting, do you have data stored for all of the UDT fields? If not, I wonder if that could be causing you to have the same issue I did. For us, the index had nothing to do with the problem. We got the ArrayIndexOutOfBoundsException whenever we queried the DB (using 'expr') where the result set would have returned records containing the UDT data with missing pieces.

Try inserting a new (and fully populated) record into your table and querying for just that one record (using 'expr', not the primary key). Does that query cause an ArrayIndexOutOfBoundsException?

@smiklosovic
Copy link

There were some null fields in UDTs for sure. The field we were sorting by was not part of that UDT, it was regular column, of time date, we are going to try to use timeuuid instead in that field and sort by that one, there is already usecase like this in our app. I'll update you about the results.

@smiklosovic
Copy link

It doesnt work at all. Whenever we drop the index we can not do sort queries. I ensured all fields in UDT are non-null and we are doing sorts by timeuuid as a field in that table. Now I am getting this:

java.nio.BufferUnderflowException: null
at java.nio.Buffer.nextGetIndex(Buffer.java:506) ~[na:1.8.0_181]
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361) ~[na:1.8.0_181]
at org.apache.cassandra.serializers.CollectionSerializer.readCollectionSize(CollectionSerializer.java:79) ~[apache-cassandra-3.7.2.jar:3.7.2]
at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.frozenCollectionSize(ColumnsMapper.scala:272) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]
at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:202) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]
at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:182) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]
at com.stratio.cassandra.lucene.mapping.ColumnsMapper$.columns(ColumnsMapper.scala:175) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]
at com.stratio.cassandra.lucene.mapping.ColumnsMapper.columns(ColumnsMapper.scala:133) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]
at com.stratio.cassandra.lucene.mapping.ColumnsMapper.$anonfun$columns$4(ColumnsMapper.scala:105) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]
at scala.collection.TraversableOnce.$anonfun$foldRight$1(TraversableOnce.scala:162) ~[cassandra-lucene-index-plugin-3.7.6.jar:na]

@smiklosovic
Copy link

soooo what is totally awkward is that it all works, even with sorts, only in case we use "select * from" instead of "select column1,column2 from" ....

It seems like selecting columns doesnt work but doing * does. This is the most weird bug I have ever seen.

@rampeni
Copy link

rampeni commented Aug 23, 2018

@smiklosovic that's exactly what we ran against in #394

@jgerew
Copy link
Author

jgerew commented Aug 23, 2018

@smiklosovic @rampeni Just curious, does the fix I posted resolve the issues you guys are facing? I have yet to hear from anyone on stratio concerning this ticket unfortunately. We've been running with a forked branch with the code change above.

@smiklosovic
Copy link

i have a feeling this project is dead.

@rampeni
Copy link

rampeni commented Aug 24, 2018

same feeling here, we will probably move on rather than spending more time trying to fix

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants