diff --git a/docs/ws/api.md b/docs/ws/api.md index c7b8c044..8349a856 100644 --- a/docs/ws/api.md +++ b/docs/ws/api.md @@ -12,55 +12,89 @@ Bullet-BQL provides users with a friendly SQL-like API to submit queries to the ## Statement Syntax - SELECT select - FROM stream +`query` is one of + + innerQuery + outerQuery + +where `innerQuery` is + + SELECT select FROM stream + ( LATERAL VIEW lateralView )? ( WHERE expression )? - ( GROUP BY expression ( , expression )* )? + ( GROUP BY expressions )? ( HAVING expression )? ( ORDER BY orderBy )? ( WINDOWING window )? ( LIMIT Integer )? - ';'? - -where `select` is + +and `outerQuery` is + SELECT select FROM ( innerQuery ) + ( LATERAL VIEW lateralView )? + ( WHERE expression )? + ( GROUP BY expressions )? + ( HAVING expression )? + ( ORDER BY orderBy )? + ( LIMIT Integer )? + +where `select` is + DISTINCT? selectItem ( , selectItem )* - + and `selectItem` is one of expression ( AS? identifier )? + tableFunction * and `expression` is one of valueExpression - fieldExpression + fieldExpression ( : fieldType )? + subFieldExpression ( : fieldType )? + subSubFieldExpression ( : fieldType )? listExpression expression IS NULL expression IS NOT NULL unaryExpression - functionExpression - expression NOT? IN expression - expression RLIKE ANY? expression - expression ( * | / ) expression + functionExpression + expression ( * | / | % ) expression expression ( + | - ) expression expression ( < | <= | > | >= ) ( ANY | ALL )? expression - expression ( = | != ) ( ANY | ALL )? expression + expression ( = | != ) ( ANY | ALL )? expression + expression NOT? RLIKE ANY? expression + expression NOT? IN expression + expression NOT? IN ( expressions ) + expressioon NOT? BETWEEN ( expression, expression ) expression AND expression expression XOR expression expression OR expression ( expression ) -where `valueExpression` is one of Null, Boolean, Integer, Long, Float, Double, or String +and `expressions` is + + expression ( , expression )* -and `fieldExpression` is one of +where `valueExpression` is one of Null, Boolean, Integer, Long, Float, Double, String, or `NOW` - a keyword that is converted to the current unix time in milliseconds - identifier ( : fieldType )? - identifier [ Integer ] ( : fieldType )? - identifier [ Integer ] . identifier ( : fieldType )? - identifier . identifier ( : fieldType )? - identifier . identifier . identifier ( : fieldType )? +and `fieldExpression` is + identifier + +and `subFieldExpression` is one of + + fieldExpression [ Integer ] + fieldExpression [ String ] + fieldExpression [ expression ] + fieldExpression . identifier + +and `subSubFieldExpression` is one of + + subFieldExpression [ String ] + subFieldExpression [ expression ] + subFieldExpression . identifier + `fieldType` is one of primitiveType @@ -68,22 +102,25 @@ and `fieldExpression` is one of MAP [ primitiveType ] LIST [ MAP [ primitiveType ] ] MAP [ MAP [ primitiveType ] ] - + and `primitiveType` is `INTEGER`, `LONG`, `FLOAT`, `DOUBLE`, `BOOLEAN`, or `STRING` where `listExpression` is one of - + [] - [ expression ( , expression )* ] - -`unaryExpression` is + [ expressions ] +`unaryExpression` is + ( NOT | SIZEOF ) ( expression ) with optional parentheses + ( ABS | TRIM | LOWER | UPPER ) ( expression ) with non-optional parentheses `functionExpression` is one of - ( SIZEIS | CONTAINSKEY | CONTAINSVALUE | FILTER ) ( expression, expression ) - IF ( expression ( , expression )* ) three arguments + ( SIZEIS | CONTAINSKEY | CONTAINSVALUE | FILTER ) ( expression , expression ) + UNIXTIMESTAMP ( expressions? ) zero, one, or two arguments + SUBSTRING ( expressions? ) two or three arguments + ( IF | BETWEEN ) ( expressions? ) three arguments aggregateExpression CAST ( expression AS primitiveType ) @@ -104,23 +141,32 @@ and `inputMode` is one of MANUAL, Number ( , Number )* defined points +and `tableFunction` is one of + + OUTER? EXPLODE ( expression ) AS identifier explode a list to one column + OUTER? EXPLODE ( expression ) AS ( identifier , identifier ) explode a map to a key and a value column + and `stream` is one of STREAM() default time duration will be set from BQLConfig - STREAM( ( Integer | MAX ), TIME ) time based duration control + STREAM( ( Integer | MAX ), TIME ) time based duration control `RECORD` will be supported in the future. -and `orderBy` is +and `lateralView` is + + tableFunction (LATERAL VIEW tableFunction)* + +and `orderBy` is expression ( ASC | DESC )? ( , expression ( ASC | DESC )? )* -and `window` is one of +and `window` is one of EVERY ( Integer, ( TIME | RECORD ), include ) TUMBLING ( Integer, ( TIME | RECORD ) ) -`include` is one of +`include` is one of ALL FIRST, Integer, ( TIME | RECORD ) diff --git a/docs/ws/examples.md b/docs/ws/examples.md index 0c24910f..00dfa91e 100644 --- a/docs/ws/examples.md +++ b/docs/ws/examples.md @@ -156,6 +156,44 @@ WHERE NOT CONTAINSVALUE(data_map, 'btsg8l9b234ha') LIMIT 1; ``` +### Filtering with NOW Keyword + +```SQL +SELECT * +FROM STREAM(30000, TIME) +WHERE event_timestamp >= NOW +LIMIT 10; +``` + +### BETWEEN Filter + +This query checks to see if the field ```heart_rate``` is in-between 70 and 100 inclusive and returns all records for which this is true. The ```BETWEEN``` operator can be written in two ways as shown below. + +```SQL +SELECT * +FROM STREAM(30000, TIME) +WHERE heart_rate BETWEEN (70, 100) +LIMIT 10; +``` + +```SQL +SELECT * +FROM STREAM(30000, TIME) +WHERE BETWEEN(heart_rate, 70, 100) +LIMIT 10; +``` + +### IN Filter + +This query checks to see if the field ```color``` is in the given list and returns all records for which is true. + +```SQL +SELECT * +FROM STREAM(30000, TIME) +WHERE color IN ('red', 'green', 'blue') +LIMIT 10; +``` + ### Relational Filter comparing to other fields Instead of comparing to static, constant values, you may use the extended values notation and set ```kind``` to ```FIELD``` to compare to other fields within the same record. The following query returns the first record for which the ```id``` field is set to the ```uid``` field. @@ -1004,6 +1042,109 @@ subtract 24 from it, you get the lower bound of the true count. Note that this also means the order of the items could be off. If two items had ```Count``` within 24 of each other, it is possible that the higher one *may* actually have had a true count *lower* than the second one and possibly be ranked higher. There is no such situation in this result set. +### Lateral View Explode + +```SQL +SELECT student, score +FROM STREAM(30000, TIME) +LATERAL VIEW EXPLODE(test_scores) AS (student, score) +WHERE score >= 80 +LIMIT 10; +``` + +This query explodes the map ```test_scores``` to the fields ```student``` and ```score```. This effectively generates a record with a key field and value field for each entry in the exploded map. +The lateral view means the generated records are appended to the original record, though in this query, only the exploded fields have been selected. + +```javascript +{ + "records":[ + { + "student": "Roger", + "score": 90 + }, + { + "student": "Albert", + "score": 92 + }, + { + "student": "Emily", + "score": 90 + }, + { + "student": "Winston", + "score": 81 + }, + { + "student": "Jeff", + "score": 95 + }, + { + "student": "Kristen", + "score": 97 + }, + { + "student": "Percy", + "score": 85 + }, + { + "student": "Tyson", + "score": 80 + }, + { + "student": "Jackie", + "score": 89 + }, + { + "student": "Alice", + "score": 100 + } + ], + "meta": "" +} +``` + +### Multiple Lateral View Explodes + +Multiple lateral view explodes can also be chained in the same query. For instance, using the above example, instead of the map ```test_scores```, there is the list of maps ```tests```. +This list could be exploded into a field ```test_scores``` which could the be exploded into the fields ```student``` and ```score``` as before. + +```SQL +SELECT student, score +FROM STREAM(30000, TIME) +LATERAL VIEW EXPLODE(tests) AS test_scores +LATERAL VIEW EXPLODE(test_scores) AS (student, score) +WHERE score >= 80 +LIMIT 10; +``` + +### Outer Query + +```SQL +SELECT COUNT(*) +FROM ( + SELECT browser_name, COUNT(*) + FROM STREAM(30000, TIME) + GROUP BY browser_name + HAVING COUNT(*) > 10 +) +``` + +This query has an inner query wrapped by an outer query. Note that the inner query selects from ```STREAM``` and is thus the main query while the outer query selects from the inner query. +Note also that the inner/main query can have a window while the outer query cannot. + +The query above counts the number of browser names that appear more than 10 times in 30 seconds. + +```javascript +{ + "records":[ + { + "COUNT(*)": 6 + } + ], + "meta": "" +} +``` + ### Window - Tumbling Group-By ```SQL