Skip to content

Commit

Permalink
Improve CQL3 type validation
Browse files Browse the repository at this point in the history
patch by slebresne; reviewed by iamaleksey for CASSANDRA-5198
  • Loading branch information
Sylvain Lebresne committed Jan 31, 2013
1 parent 4df6136 commit a67f779
Show file tree
Hide file tree
Showing 42 changed files with 616 additions and 155 deletions.
11 changes: 11 additions & 0 deletions NEWS.txt
Expand Up @@ -8,6 +8,17 @@ upgrade, just in case you need to roll back to the previous version.
(Cassandra version X + 1 will always be able to read data files created
by version X, but the inverse is not necessarily the case.)

1.2.2
=====

Upgrading
---------
- CQL3 type validation for constants has been fixed, which may require
fixing queries that were relying on the previous loose validation. Please
refer to the CQL3 documentation (http://cassandra.apache.org/doc/cql3/CQL.html)
and in particular the changelog section for more details.


1.2.1
=====

Expand Down
57 changes: 33 additions & 24 deletions doc/cql3/CQL.textile
@@ -1,6 +1,6 @@
<link rel="StyleSheet" href="CQL.css" type="text/css" media="screen">

h1. Cassandra Query Language (CQL) v3.0.1
h1. Cassandra Query Language (CQL) v3.0.2


<span id="tableOfContents">
Expand Down Expand Up @@ -47,12 +47,15 @@ p. There is a second kind of identifiers called _quoted identifiers_ defined by

h3(#constants). Constants

CQL defines 4 kinds of _implicitly-typed constants_: strings, numbers, uuids and booleans:
CQL defines the following kind of _constants_: strings, integers, floats, booleans, uuids and blobs:
* A string constant is an arbitrary sequence of characters characters enclosed by single-quote(@'@). One can include a single-quote in a string by repeating it, e.g. @'It''s raining today'@. Those are not to be confused with quoted identifiers that use double-quotes.
* Numeric constants are either integer constant defined by @'-'?[0-9]+@ or a float constant defined by @'-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?@.
* A "UUID":http://en.wikipedia.org/wiki/Universally_unique_identifier constant is defined by @hex{8}-hex{4}-hex{4}-hex{4}-hex{12}@ where @hex@ is an hexadecimal character, e.g. @[0-9a-fA-F]@ and @{4}@ is the number of such characters.
* An integer constant is defined by @'-'?[0-9]+@.
* A float constant is defined by @'-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?@.
* A boolean constant is either @true@ or @false@ up to case-insensitivity (i.e. @True@ is a valid boolean constant).
* A "UUID":http://en.wikipedia.org/wiki/Universally_unique_identifier constant is defined by @hex{8}-hex{4}-hex{4}-hex{4}-hex{12}@ where @hex@ is an hexadecimal character, e.g. @[0-9a-fA-F]@ and @{4}@ is the number of such characters.
* A blob constant is an hexadecimal number defined by @0[xX](hex)+@ where @hex@ is an hexadecimal character, e.g. @[0-9a-fA-F]@.

For how these constants are typed, see the "data types section":#types.

h3. Comments

Expand Down Expand Up @@ -692,7 +695,7 @@ SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR'

h2(#types). Data Types

CQL supports a rich set of native data types for columns defined in a table. On top of those native types, users can also provide custom types (through a JAVA class extending @AbstractType@ loadable by Cassandra). The syntax of types is thus:
CQL supports a rich set of data types for columns defined in a table, including collection types. On top of those native and collection types, users can also provide custom types (through a JAVA class extending @AbstractType@ loadable by Cassandra). The syntax of types is thus:

bc(syntax)..
<type> ::= <native-type>
Expand Down Expand Up @@ -721,25 +724,27 @@ bc(syntax)..
| map '<' <native-type> ',' <native-type> '>'
p. Note that the native types are keywords and as such are case-insensitive. They are however not reserved ones.

p. The following table gives additional informations on the native data types:

|_. type |_. description|
|@ascii@ |ASCII character string|
|@bigint@ |64-bit signed long|
|@blob@ |Arbitrary bytes (no validation)|
|@boolean@ |true or false|
|@counter@ |Counter column (64-bit signed value). See "Counters":#counters for details|
|@decimal@ |Variable-precision decimal|
|@double@ |64-bit IEEE-754 floating point|
|@float@ |32-bit IEEE-754 floating point|
|@inet@ |An IP address. It can be either 4 bytes long (IPv4) or 16 bytes long (IPv6)|
|@int@ |32-bit signed int|
|@text@ |UTF8 encoded string|
|@timestamp@|A timestamp. See "Working with dates":#usingdates below for more information.|
|@timeuuid@ |Type 1 UUID. This is generally used as a "conflict-free" timestamp. See "Working with @timeuuid@":#usingtimeuuid below.|
|@uuid@ |Type 1 or type 4 UUID|
|@varchar@ |UTF8 encoded string|
|@varint@ |Arbitrary-precision integer|
p. The following table gives additional informations on the native data types, and on which kind of "constants":#constants each type supports:

|_. type |_. constants supported|_. description|
|@ascii@ | strings |ASCII character string|
|@bigint@ | integers |64-bit signed long|
|@blob@ | blobs |Arbitrary bytes (no validation)|
|@boolean@ | booleans |true or false|
|@counter@ | integers |Counter column (64-bit signed value). See "Counters":#counters for details|
|@decimal@ | integers, floats |Variable-precision decimal|
|@double@ | integers |64-bit IEEE-754 floating point|
|@float@ | integers, floats |32-bit IEEE-754 floating point|
|@inet@ | strings |An IP address. It can be either 4 bytes long (IPv4) or 16 bytes long (IPv6). There is no @inet@ constant, IP address should be inputed as strings|
|@int@ | integers |32-bit signed int|
|@text@ | strings |UTF8 encoded string|
|@timestamp@| integers, strings |A timestamp. Strings constant are allow to input timestamps as dates, see "Working with dates":#usingdates below for more information.|
|@timeuuid@ | uuids |Type 1 UUID. This is generally used as a "conflict-free" timestamp. See "Working with @timeuuid@":#usingtimeuuid below.|
|@uuid@ | uuids |Type 1 or type 4 UUID|
|@varchar@ | strings |UTF8 encoded string|
|@varint@ | integers |Arbitrary-precision integer|

For more information on how to use the collection types, see the "Working with collections":#collections section below.

h3(#usingdates). Working with dates

Expand Down Expand Up @@ -1000,6 +1005,10 @@ h2(#changes). Changes

The following describes the addition/changes brought for each version of CQL.

h3. 3.0.2

- Type validation for the "constants":#constants has been fixed. For instance, the implementation used to allow @'2'@ as a valid value for an @int@ column (interpreting it has the equivalent of @2@), or @42@ as a valid @blob@ value (in which case @42@ was interpreted as an hexadecimal representation of the blob). This is no longer the case, type validation of constants is now more strict. See the "data types":#dataTypes section for details on which constant is allowed for which type, but note that this let to the introduction of "blobs constants":#constants.

h3. 3.0.1

- "Date strings":#usingdates (and timestamps) are no longer accepted as valid @timeuuid@ values. Doing so was a bug in the sense that date string are not valid @timeuuid@, and it was thus resulting in "confusing behaviors":https://issues.apache.org/jira/browse/CASSANDRA-4936. However, the following new methods have been added to help working with @timeuuid@: @now@, @minTimeuuid@, @maxTimeuuid@ , @dateOf@ and @unixTimestampOf@. See the "section dedicated to these methods":#usingtimeuuid for more detail.
Expand Down
Expand Up @@ -22,13 +22,13 @@
import org.apache.cassandra.exceptions.ConfigurationException;
import org.apache.cassandra.exceptions.SyntaxException;

public interface ParsedType
public interface CQL3Type
{
public boolean isCollection();
public boolean isCounter();
public AbstractType<?> getType();

public enum Native implements ParsedType
public enum Native implements CQL3Type
{
ASCII (AsciiType.instance),
BIGINT (LongType.instance),
Expand Down Expand Up @@ -68,15 +68,26 @@ public boolean isCounter()
{
return this == COUNTER;
}

@Override
public String toString()
{
return super.toString().toLowerCase();
}
}

public static class Custom implements ParsedType
public static class Custom implements CQL3Type
{
private final AbstractType<?> type;

public Custom(AbstractType<?> type)
{
this.type = type;
}

public Custom(String className) throws SyntaxException, ConfigurationException
{
this.type = TypeParser.parse(className);
this(TypeParser.parse(className));
}

public boolean isCollection()
Expand All @@ -93,18 +104,24 @@ public boolean isCounter()
{
return false;
}

@Override
public String toString()
{
return "'" + type + "'";
}
}

public static class Collection implements ParsedType
public static class Collection implements CQL3Type
{
CollectionType type;

private Collection(CollectionType type)
public Collection(CollectionType type)
{
this.type = type;
}

public static Collection map(ParsedType t1, ParsedType t2) throws InvalidRequestException
public static Collection map(CQL3Type t1, CQL3Type t2) throws InvalidRequestException
{
if (t1.isCollection() || t2.isCollection())
throw new InvalidRequestException("map type cannot contain another collection");
Expand All @@ -114,7 +131,7 @@ public static Collection map(ParsedType t1, ParsedType t2) throws InvalidRequest
return new Collection(MapType.getInstance(t1.getType(), t2.getType()));
}

public static Collection list(ParsedType t) throws InvalidRequestException
public static Collection list(CQL3Type t) throws InvalidRequestException
{
if (t.isCollection())
throw new InvalidRequestException("list type cannot contain another collection");
Expand All @@ -124,7 +141,7 @@ public static Collection list(ParsedType t) throws InvalidRequestException
return new Collection(ListType.getInstance(t.getType()));
}

public static Collection set(ParsedType t) throws InvalidRequestException
public static Collection set(CQL3Type t) throws InvalidRequestException
{
if (t.isCollection())
throw new InvalidRequestException("set type cannot contain another collection");
Expand All @@ -148,5 +165,21 @@ public boolean isCounter()
{
return false;
}

@Override
public String toString()
{
switch (type.kind)
{
case LIST:
return "list<" + ((ListType)type).elements.asCQL3Type() + ">";
case SET:
return "set<" + ((SetType)type).elements.asCQL3Type() + ">";
case MAP:
MapType mt = (MapType)type;
return "set<" + mt.keys.asCQL3Type() + ", " + mt.values.asCQL3Type() + ">";
}
throw new AssertionError();
}
}
}
82 changes: 45 additions & 37 deletions src/java/org/apache/cassandra/cql3/Cql.g
Expand Up @@ -662,8 +662,8 @@ map_literal returns [Map<Term, Term> value]
;

finalTerm returns [Term term]
: t=(STRING_LITERAL | UUID | INTEGER | FLOAT | K_TRUE | K_FALSE ) { $term = new Term($t.text, $t.type); }
| f=(K_MIN_TIMEUUID | K_MAX_TIMEUUID | K_NOW) '(' (v=(STRING_LITERAL | INTEGER))? ')' { $term = new Term($f.text + "(" + ($v == null ? "" : $v.text) + ")", UUID); }
: t=(STRING_LITERAL | UUID | INTEGER | FLOAT | BOOLEAN | HEXNUMBER ) { $term = new Term($t.text, $t.type); }
| f=(K_MIN_TIMEUUID | K_MAX_TIMEUUID | K_NOW) '(' (v=(STRING_LITERAL | INTEGER))? ')' { $term = new Term($f.text + "(" + ($v == null ? "" : $v.text) + ")", Term.Type.UUID, true); }
;

term returns [Term term]
Expand Down Expand Up @@ -741,15 +741,15 @@ property[PropertyDefinitions props]
;
propertyValue returns [String str]
: v=(STRING_LITERAL | IDENT | INTEGER | FLOAT | K_TRUE | K_FALSE) { $str = $v.text; }
: v=(STRING_LITERAL | IDENT | INTEGER | FLOAT | BOOLEAN | HEXNUMBER) { $str = $v.text; }
| u=unreserved_keyword { $str = u; }
;
// Either a string or a list of terms
tokenDefinition returns [Pair<String, List<Term>> tkdef]
// Either a term or a list of terms
tokenDefinition returns [Pair<Term, List<Term>> tkdef]
: K_TOKEN { List<Term> l = new ArrayList<Term>(); }
'(' t1=term { l.add(t1); } ( ',' tn=term { l.add(tn); } )* ')' { $tkdef = Pair.<String, List<Term>>create(null, l); }
| t=STRING_LITERAL { $tkdef = Pair.<String, List<Term>>create($t.text, null); }
'(' t1=term { l.add(t1); } ( ',' tn=term { l.add(tn); } )* ')' { $tkdef = Pair.<Term, List<Term>>create(null, l); }
| t=term { $tkdef = Pair.<Term, List<Term>>create(t, null); }
;
relation[List<Relation> clauses]
Expand All @@ -764,10 +764,10 @@ relation[List<Relation> clauses]
}
else
{
Term str = tkd.left == null ? null : new Term(tkd.left, Term.Type.STRING);
Term tokenLitteral = tkd.left;
for (int i = 0; i < l.size(); i++)
{
Term tt = str == null ? Term.tokenOf(tkd.right.get(i)) : str;
Term tt = tokenLitteral == null ? Term.tokenOf(tkd.right.get(i)) : tokenLitteral;
$clauses.add(new Relation(l.get(i), $type.text, tt, true));
}
}
Expand All @@ -776,13 +776,13 @@ relation[List<Relation> clauses]
'(' f1=term { rel.addInValue(f1); } (',' fN=term { rel.addInValue(fN); } )* ')' { $clauses.add(rel); }
;
comparatorType returns [ParsedType t]
comparatorType returns [CQL3Type t]
: c=native_type { $t = c; }
| c=collection_type { $t = c; }
| s=STRING_LITERAL
{
try {
$t = new ParsedType.Custom($s.text);
$t = new CQL3Type.Custom($s.text);
} catch (SyntaxException e) {
addRecognitionError("Cannot parse type " + $s.text + ": " + e.getMessage());
} catch (ConfigurationException e) {
Expand All @@ -791,36 +791,36 @@ comparatorType returns [ParsedType t]
}
;
native_type returns [ParsedType t]
: K_ASCII { $t = ParsedType.Native.ASCII; }
| K_BIGINT { $t = ParsedType.Native.BIGINT; }
| K_BLOB { $t = ParsedType.Native.BLOB; }
| K_BOOLEAN { $t = ParsedType.Native.BOOLEAN; }
| K_COUNTER { $t = ParsedType.Native.COUNTER; }
| K_DECIMAL { $t = ParsedType.Native.DECIMAL; }
| K_DOUBLE { $t = ParsedType.Native.DOUBLE; }
| K_FLOAT { $t = ParsedType.Native.FLOAT; }
| K_INET { $t = ParsedType.Native.INET;}
| K_INT { $t = ParsedType.Native.INT; }
| K_TEXT { $t = ParsedType.Native.TEXT; }
| K_TIMESTAMP { $t = ParsedType.Native.TIMESTAMP; }
| K_UUID { $t = ParsedType.Native.UUID; }
| K_VARCHAR { $t = ParsedType.Native.VARCHAR; }
| K_VARINT { $t = ParsedType.Native.VARINT; }
| K_TIMEUUID { $t = ParsedType.Native.TIMEUUID; }
;
collection_type returns [ParsedType pt]
native_type returns [CQL3Type t]
: K_ASCII { $t = CQL3Type.Native.ASCII; }
| K_BIGINT { $t = CQL3Type.Native.BIGINT; }
| K_BLOB { $t = CQL3Type.Native.BLOB; }
| K_BOOLEAN { $t = CQL3Type.Native.BOOLEAN; }
| K_COUNTER { $t = CQL3Type.Native.COUNTER; }
| K_DECIMAL { $t = CQL3Type.Native.DECIMAL; }
| K_DOUBLE { $t = CQL3Type.Native.DOUBLE; }
| K_FLOAT { $t = CQL3Type.Native.FLOAT; }
| K_INET { $t = CQL3Type.Native.INET;}
| K_INT { $t = CQL3Type.Native.INT; }
| K_TEXT { $t = CQL3Type.Native.TEXT; }
| K_TIMESTAMP { $t = CQL3Type.Native.TIMESTAMP; }
| K_UUID { $t = CQL3Type.Native.UUID; }
| K_VARCHAR { $t = CQL3Type.Native.VARCHAR; }
| K_VARINT { $t = CQL3Type.Native.VARINT; }
| K_TIMEUUID { $t = CQL3Type.Native.TIMEUUID; }
;
collection_type returns [CQL3Type pt]
: K_MAP '<' t1=comparatorType ',' t2=comparatorType '>'
{ try {
// if we can't parse either t1 or t2, antlr will "recover" and we may have t1 or t2 null.
if (t1 != null && t2 != null)
$pt = ParsedType.Collection.map(t1, t2);
$pt = CQL3Type.Collection.map(t1, t2);
} catch (InvalidRequestException e) { addRecognitionError(e.getMessage()); } }
| K_LIST '<' t=comparatorType '>'
{ try { if (t != null) $pt = ParsedType.Collection.list(t); } catch (InvalidRequestException e) { addRecognitionError(e.getMessage()); } }
{ try { if (t != null) $pt = CQL3Type.Collection.list(t); } catch (InvalidRequestException e) { addRecognitionError(e.getMessage()); } }
| K_SET '<' t=comparatorType '>'
{ try { if (t != null) $pt = ParsedType.Collection.set(t); } catch (InvalidRequestException e) { addRecognitionError(e.getMessage()); } }
{ try { if (t != null) $pt = CQL3Type.Collection.set(t); } catch (InvalidRequestException e) { addRecognitionError(e.getMessage()); } }
;
username
Expand Down Expand Up @@ -945,9 +945,6 @@ K_WRITETIME: W R I T E T I M E;
K_MAP: M A P;
K_LIST: L I S T;
K_TRUE: T R U E;
K_FALSE: F A L S E;
K_MIN_TIMEUUID: M I N T I M E U U I D;
K_MAX_TIMEUUID: M A X T I M E U U I D;
K_NOW: N O W;
Expand Down Expand Up @@ -1027,10 +1024,21 @@ FLOAT
| INTEGER '.' DIGIT* EXPONENT?
;
/*
* This has to be before IDENT so it takes precendence over it.
*/
BOOLEAN
: T R U E | F A L S E
;
IDENT
: LETTER (LETTER | DIGIT | '_')*
;
HEXNUMBER
: '0' X HEX+
;
UUID
: HEX HEX HEX HEX HEX HEX HEX HEX '-'
HEX HEX HEX HEX '-'
Expand Down

0 comments on commit a67f779

Please sign in to comment.