Skip to content
This repository has been archived by the owner on May 27, 2020. It is now read-only.

org.apache.cassandra.serializers.MarshalException: Invalid US-ASCII bytes #148

Closed
ghost opened this issue Jun 3, 2016 · 10 comments
Closed

Comments

@ghost
Copy link

ghost commented Jun 3, 2016

@adelapena I wounder if you can help me with this issue;

I have created a new custom secondary index using the lucene plug with the following CQL

CREATE CUSTOM INDEX idx_sec_symbol on securities()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'schema' : '{
fields : {
text: {
type : "text",
validated : true,
column : "symbol",
case_sensitive : false
}
}
}'};

But this produces the following stack-trace in cassandra logs and I cannot find a solution in any of the documents of Stratio/Cassandra-lucene-index plugin or the apache cassandra pages.

ERROR 14:55:58 Exception in thread Thread[CompactionExecutor:78,1,main] org.apache.cassandra.serializers.MarshalException: Invalid US-ASCII bytes 44656b6147656ec3bc737365202b2052656e74656e at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:45) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:28) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:114) ~[apache-cassandra-3.0.5.jar:3.0.5] at com.stratio.cassandra.lucene.column.ColumnBuilder.buildWithDecomposed(ColumnBuilder.java:71) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.column.ColumnsMapper.addColumns(ColumnsMapper.java:166) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.column.ColumnsMapper.addColumns(ColumnsMapper.java:100) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.column.ColumnsMapper.addColumns(ColumnsMapper.java:53) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexServiceWide.columns(IndexServiceWide.java:100) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexService.needsReadBeforeWrite(IndexService.java:319) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexWriterWide.index(IndexWriterWide.java:71) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexWriter.insertRow(IndexWriter.java:85) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at org.apache.cassandra.index.SecondaryIndexManager.lambda$indexPartition$79(SecondaryIndexManager.java:544) ~[apache-cassandra-3.0.5.jar:3.0.5] at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_72-internal] at org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:544) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.Keyspace.indexPartition(Keyspace.java:538) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:69) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1345) ~[apache-cassandra-3.0.5.jar:3.0.5] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_72-internal] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_72-internal] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_72-internal] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72-internal] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72-internal] ERROR 14:55:58 Exception in thread Thread[SecondaryIndexManagement:16,5,main] java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.serializers.MarshalException: Invalid US-ASCII bytes 44656b6147656ec3bc737365202b2052656e74656e at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:384) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.index.SecondaryIndexManager.buildIndexesBlocking(SecondaryIndexManager.java:367) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.index.SecondaryIndexManager.buildIndexBlocking(SecondaryIndexManager.java:277) ~[apache-cassandra-3.0.5.jar:3.0.5] at com.stratio.cassandra.lucene.Index.lambda$getInitializationTask$0(Index.java:136) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_72-internal] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_72-internal] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72-internal] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72-internal] Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.serializers.MarshalException: Invalid US-ASCII bytes 44656b6147656ec3bc737365202b2052656e74656e at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_72-internal] at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_72-internal] at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:380) ~[apache-cassandra-3.0.5.jar:3.0.5] ... 7 common frames omitted Caused by: org.apache.cassandra.serializers.MarshalException: Invalid US-ASCII bytes 44656b6147656ec3bc737365202b2052656e74656e at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:45) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:28) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:114) ~[apache-cassandra-3.0.5.jar:3.0.5] at com.stratio.cassandra.lucene.column.ColumnBuilder.buildWithDecomposed(ColumnBuilder.java:71) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.column.ColumnsMapper.addColumns(ColumnsMapper.java:166) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.column.ColumnsMapper.addColumns(ColumnsMapper.java:100) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.column.ColumnsMapper.addColumns(ColumnsMapper.java:53) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexServiceWide.columns(IndexServiceWide.java:100) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexService.needsReadBeforeWrite(IndexService.java:319) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexWriterWide.index(IndexWriterWide.java:71) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at com.stratio.cassandra.lucene.IndexWriter.insertRow(IndexWriter.java:85) ~[cassandra-lucene-index-plugin-3.0.5.2.jar:na] at org.apache.cassandra.index.SecondaryIndexManager.lambda$indexPartition$79(SecondaryIndexManager.java:544) ~[apache-cassandra-3.0.5.jar:3.0.5] at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_72-internal] at org.apache.cassandra.index.SecondaryIndexManager.indexPartition(SecondaryIndexManager.java:544) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.Keyspace.indexPartition(Keyspace.java:538) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:69) ~[apache-cassandra-3.0.5.jar:3.0.5] at org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1345) ~[apache-cassandra-3.0.5.jar:3.0.5] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_72-internal] ... 4 common frames omitted

@ghost ghost closed this as completed Jun 3, 2016
@ghost ghost reopened this Jun 3, 2016
@adelapena
Copy link
Contributor

Hi @TEAMKILLER,

Which is the type of the column symbol? Can you please provide the schema of the indexed table?

By the way, the property case_sensitive is not supported by text mappers, only by string mappers. Are you sure you want the text to be analyzed using the Lucene's default text analyzer?

@ghost
Copy link
Author

ghost commented Jun 3, 2016

@adelapena

I don't have the full scheme to hand at present, but the symbol column is TEXT, i have tried using ASCII and VARCHAR data types with similar results except that the error reports as US-UTf8 in stead of US-ASCII.

I can add the index when the table is empty and the lucene query works as expected except no results, if i then use Sstableloader to bulk load data the query still works but the result is missing large amounts of rows, if i drop and try to recreate the index i start getting errors that report "the index is not yet available" due to the index failing to rebuild.

if you still need the full scheme to help me resolve this issue i will have access again on monday

@adelapena
Copy link
Contributor

Yes, I would need the indexed table schema to be able of reproducing the problem. You can retrieve it with CQL:

DESCRIBE TABLE securities

Independently of the create index problems, what do you mean with missing large amount of rows? Is the index not retrieving the expected results even when there are no insertion problems? If so, could you please provide the failing search query? Just to be sure, are you aware of the difference between text and string mappers?

@adelapena
Copy link
Contributor

By the way, which OS are you using?

@ghost
Copy link
Author

ghost commented Jun 3, 2016

@adelapena I have managed to get access to the server out of office hours, and here is the scheme for the table in question, the scheme is very basic and the trailing options are the defaults when not supplied, except for the CLUSTERING ORDER BY.

CREATE TABLE securities (
mic ASCII,
symbol TEXT,
pricedate INT,
currency ASCII,
exchgcd ASCII,
high DECIMAL,
isin ASCII,
last DECIMAL,
low DECIMAL,
mid DECIMAL,
open DECIMAL,
close DECIMAL,
primary_exchgcd ASCII,
total_trades DECIMAL,
traded_value DECIMAL,
traded_volume DECIMAL
PRIMARY KEY ((mic, symbol), pricedate, currency)
) WITH CLUSTERING ORDER BY (pricedate DESC, currency ASC)
AND read_repair_chance = 0.0
AND dclocal_read_repair_chance = 0.1
AND gc_grace_seconds = 86400
AND bloom_filter_fp_chance = 0.01
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
AND comment = ''
AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold' : 32, 'min_threshold' : 4 }
AND compression = { 'chunk_length_in_kb' : 64, 'class' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
AND default_time_to_live = 0
AND speculative_retry = '99PERCENTILE'
AND min_index_interval = 128
AND max_index_interval = 2048
AND crc_check_chance = 1.0;

@ghost
Copy link
Author

ghost commented Jun 3, 2016

I am using CentOS 7, but I have had the same issue with other operating systems like debian and Ubuntu, I have also used the official docker Cassandra images version 3 and 3.0.5 as well as prebuild cassandra-lucene-index images from SharkCell on docker hub and all produce the same US-ASCII or US-UTF-8 byte issue.

@adelapena
Copy link
Contributor

The provided stacktrace shows that the value of the failing column is 44656b6147656ec3bc737365202b2052656e74656e, which is the hexadecimal value of the UTF8 DekaGenüsse + Renten. The stacktrace message says that this column is trying to be validated by an ASCII column type.

I have done some test using this column value, inserting data after and before the index creation, and droping the index and creating it again:

CREATE KEYSPACE pricing WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
USE pricing;

CREATE TABLE securities (
mic ASCII,
symbol TEXT,
pricedate INT,
currency ASCII,
exchgcd ASCII,
high DECIMAL,
isin ASCII,
last DECIMAL,
low DECIMAL,
mid DECIMAL,
open DECIMAL,
close DECIMAL,
primary_exchgcd ASCII,
total_trades DECIMAL,
traded_value DECIMAL,
traded_volume DECIMAL,
PRIMARY KEY ((mic, symbol), pricedate, currency)
) WITH CLUSTERING ORDER BY (pricedate DESC, currency ASC)
AND read_repair_chance = 0.0
AND dclocal_read_repair_chance = 0.1
AND gc_grace_seconds = 86400
AND bloom_filter_fp_chance = 0.01
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
AND comment = ''
AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold' : 32, 'min_threshold' : 4 }
AND compression = { 'chunk_length_in_kb' : 64, 'class' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
AND default_time_to_live = 0
AND speculative_retry = '99PERCENTILE'
AND min_index_interval = 128
AND max_index_interval = 2048
AND crc_check_chance = 1.0;

INSERT INTO securities(mic, symbol, pricedate, currency) VALUES ('a', 'DekaGenüsse + Renten', 1, 'a');

CREATE CUSTOM INDEX idx_sec_symbol ON pricing.securities () USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds' : '1', 'schema' : '{
fields : { 
symbol : {
type : "string",
validated : true,
column : "symbol",
case_sensitive : false
}
}
}'};

INSERT INTO securities(mic, symbol, pricedate, currency) VALUES ('a', 'DekaGenüsse + Renten', 1, 'a');

SELECT * FROM securities WHERE expr(idx_sec_symbol,'{filter:{type:"match", field:"symbol", value:"DekaGenüsse + Renten"}}');

DROP INDEX idx_sec_symbol ;

CREATE CUSTOM INDEX idx_sec_symbol ON pricing.securities () USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds' : '1', 'schema' : '{
fields : { 
symbol : {
type : "string",
validated : true,
column : "symbol",
case_sensitive : false
}
}
}'};

SELECT * FROM securities WHERE expr(idx_sec_symbol,'{filter:{type:"match", field:"symbol", value:"DekaGenüsse + Renten"}}');


INSERT INTO securities(mic, symbol, pricedate, currency) VALUES ('a', 'DekaGenüsse + Renten', 1, 'a');

And there are no failures. I have also used sstableloader to insert the data with and without index, droping and creating it again.

So I don't know how to reproduce the issue. Are you sure that the provided schema is the same in the sstableloader origin and destination? Can you give me access to some of the ingested data, or simulate a workload able to reproduce the problem?

@ghost
Copy link
Author

ghost commented Jun 3, 2016

I can get you more details and data on Monday, i am also very grateful for all of your help

@ghost
Copy link
Author

ghost commented Jun 6, 2016

@adelapena The issue was not with the Stratio Lucene Index plugin itself, but with the sstableloader driver I was using in my java application, The sstable loader was not validating the generated sstables correctly as far as I can tell and thus produced the error I reported, I did however find using simple INSERT statement resolved this issue and the data can now be ingested and indexed using the Stratio's Cassandra-Lucene-Index plugin.

I would also like to thank you for the time and effort you gave to this issue, and I now owe you a coffee of a beer.

@adelapena
Copy link
Contributor

Ok, great. SStableloader can be treacherous sometimes :)

By the way, I encourage you to use, if possible, Cassandra 3.0.6 with Lucene's index 3.0.6.2. It includes a relatively important fix for deletion of frozen collections, support for columns with TTL, and paging over index-sorted results.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant