Cep 29 not operator #2556

pkolaczk · 2023-08-08T14:55:18Z

NOT IN
NOT CONTAINS / NOT CONTAINS KEY
Inequality operator (!=) for map entries
Inequality operator (!=) for clustering columns
Both ALLOW FILTERING and indexing (SAI) supported

Ability to use NOT operator in filters on lists, sets and maps. Requires ALLOW FILTERING. All those operators are implemented in a ContainsRestriction, therefore they allow to be mixed: i.e. you can use multiple CONTAINS, NOT CONTAINS, CONTAINS KEY, NOT CONTAINS KEY in a single WHERE clause.

Ability to use NOT IN restrictions in CQL WHERE clause. Can be used on any column with ALLOW FILTERING. Can be also used on clustering columns w/o ALLOW FILTERING. Multicolumn NOT IN restrictions are supported (e.g. (c1, c2) NOT IN ((1, 2), (3, 4))). NOT IN restrictions are implemented as slice restrictions, therefore they can be merged with other slice restrictions using <, >, <=, >=. Also multiple NOT IN restrictions can be merged together. Overview of the code changes: - NOT IN operator has been added to the CQL WHERE clause parser definition. - SliceRestriction got a new field with a list of terms to skip, received from the terms passed in the NOT IN clause. - SliceRestriction methods have been adjusted to generate filters that filter out the items to skip, as well as generate proper clustering bounds, omitting the items to skip. - Because the list of items to skip can be given either as a marker or a list literal, MarkerOrList has been introduced to abstract out the concern of getting the items out of the SliceRestriction. This way both SliceRestriction and INRestriction could be reduced. INRestriction hierarchy has been flattened to a single class. - MultiCBuilder has been rewritten. It has been both simplified and added new functionality. The specialization for single column clusterings has been removed as it had almost doubled the complexity for a questionable performance gain. The code for computing cartesian products has been simplified and optimized: now it has proper O(n^2) complexity instead of O(n^3). MultiCBuilder now also tracks the inclusiveness of each clustering bound. This is required because when merging NOT IN with inequality like <= or >=, we need to mix bounds differing in inclusiveness. - MultiColumnRestriction.appendBoundTo has been modified to use the new MultiCBuilder and also simplified and documented, as this part of code turned out to be extremely hard to understand for me.

Now that we implemented NOT IN, both equals and non-equals can be rewritten as slices. column != value ---> column NOT IN (value) column = value ---> column >= value AND column <= value And because slices can be freely merged, this opens room for many interesting queries that were previously impossible, e.g.: SELECT ... WHERE c1 != ? SELECT ... WHERE c1 != ? AND c1 != ? AND ... SELECT ... WHERE c1 = ? AND (c1, c2) != (v1, v2) SELECT ... WHERE c1 IN ? AND (c1, c2) != (v1, v2)

NOT CONTAINS, NOT CONTAINS KEY and map entry inequality can be now used on SAI indexed columns without need for ALLOW FILTERING. The result sets are evaluated by taking an iterator to all primary keys (in all active memtables and sstables) and anti-joining them with the primary keys matching the CONTAINS predicate, read from the index. This way there was no need to change the index structure. Although a full scan over all primary keys is needed, the scan involves scanning of primary keys only, without deserializing the whole rows. For sstables, the primary key trie component is used, which is likely much smaller than the whole sstable. Therefore, this method is likely faster than filtering, as long as rows are big enough.

src/antlr/Parser.g

src/java/org/apache/cassandra/cql3/MarkerOrList.java

src/java/org/apache/cassandra/cql3/MultiColumnRelation.java

src/java/org/apache/cassandra/cql3/Relation.java

src/java/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictions.java

src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java

src/java/org/apache/cassandra/db/MultiCBuilder.java

src/java/org/apache/cassandra/index/sai/disk/EmptyIndex.java

test/unit/org/apache/cassandra/index/sai/cql/CollectionIndexingTest.java

src/java/org/apache/cassandra/index/sai/IndexContext.java

src/java/org/apache/cassandra/index/sai/disk/IndexSearchResultIterator.java

src/java/org/apache/cassandra/index/sai/disk/PrimaryKeyMapIterator.java

src/java/org/apache/cassandra/index/sai/disk/SSTableIndex.java

src/java/org/apache/cassandra/index/sai/disk/v1/PerColumnIndexFiles.java

src/java/org/apache/cassandra/index/sai/memory/MemtableKeyRangeIterator.java

src/java/org/apache/cassandra/index/sai/memory/TrieMemoryIndex.java

src/java/org/apache/cassandra/index/sai/plan/Operation.java

src/java/org/apache/cassandra/utils/btree/BTree.java

src/java/org/apache/cassandra/cql3/restrictions/TermSlice.java

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

Renamed: CBuilder -> ClusteringBuilder MultiCBuilder -> MultiClusteringBuilder Added a longer explanation on how MultiClusteringBuilder works.

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

When computing non-equality searches (NEQ, NOT CONTAINS, NOT CONTAINS KEY), perform anti-join operation on data of each index separately, before computing the union. This may cause returning false positives from the search, but those will be filtered out by post-filtering. Unfortunately, each index search may return false positive matches, due to presence of updates and deletes that happened after a flush. Hence, we cannot compute a set difference of keys returned from one index with the keys of another index, as subtracting a false positive would lead to a false negative, i.e. missing keys in the search results. We have no way to correct false negatives.

src/java/org/apache/cassandra/db/MultiClusteringBuilder.java

src/java/org/apache/cassandra/cql3/restrictions/SingleRestriction.java

src/java/org/apache/cassandra/db/MultiClusteringBuilder.java

src/java/org/apache/cassandra/cql3/MultiColumnRelation.java

Supports queries of the following type: SELECT * FROM t WHERE indexed_column != ?;

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

adelapena · 2023-11-06T12:43:41Z

src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java

+            boolean supportsSlice = slice.isSupportedBy(columnDef, index);
+            boolean supportsNeq = index.supportsExpression(columnDef, Operator.NEQ);
+            return  supportsSlice || !skippedValues.isEmpty() && supportsNeq;


The implementations of SingleColumnRestriction.SliceRestriction#isSupportedBy and MultiColumnRestriction.SliceRestriction#isSupportedBy are identical. Also, both implementations are the only callers of TermSlice#isSupportedBy(ColumnMetadata, Index).

Maybe we could pass the list of skipped values to TermSlice#isSupportedBy and delegate the NEQ check to it, putting similar things together and preventing some duplication:

public boolean isSupportedBy(ColumnMetadata column, Index index, List<MarkerOrTerms> skippedValues) { boolean supported = false; if (hasBound(Bound.START)) supported |= isInclusive(Bound.START) ? index.supportsExpression(column, Operator.GTE) : index.supportsExpression(column, Operator.GT); if (hasBound(Bound.END)) supported |= isInclusive(Bound.END) ? index.supportsExpression(column, Operator.LTE) : index.supportsExpression(column, Operator.LT); supported |= !skippedValues.isEmpty() && index.supportsExpression(column, Operator.NEQ); return supported; }

I don't agree on this one. This is just coincidental duplication - accidentally both single and multi restrictions work the same way here, but there is no fundamental reason they have to. The amount of duplication is also very tiny. And introducing an unrelated parameter to TermSlice is mixing responsibilities which is IMHO worse.

I thought that maybe we could move the skipped values to the TermSlice itself. Then it would make more sense to have TermSlice handle the isSupportedBy logic as it would have all the information it needs. WDYT?

Agree, moving the skipped values to the TermSlice sounds good to me.

adelapena · 2023-11-06T12:49:42Z

@pkolaczk there is a bunch of unresolved nits above that only show up if you click on the 12 hidden conversations Load more... link on the GH web UI, maybe you have missed them?

src/java/org/apache/cassandra/cql3/MultiColumnRelation.java

adelapena · 2023-11-06T13:37:52Z

I have noticed that this test returns more rows than expected:

@Test
public void notInTest()
{
    createTable("CREATE TABLE %s (k int PRIMARY KEY, v1 int, v2 int)");
    createIndex("CREATE INDEX ON %s(v1) USING 'sai'");
    createIndex("CREATE INDEX ON %s(v2) USING 'sai'");

    execute("INSERT INTO %s (k, v1, v2) VALUES (1, 1, 4)");
    execute("INSERT INTO %s (k, v1, v2) VALUES (2, 2, 3)");
    execute("INSERT INTO %s (k, v1, v2) VALUES (3, 3, 2)");
    execute("INSERT INTO %s (k, v1, v2) VALUES (4, 4, 1)");

    waitForTableIndexesQueryable(currentTable());

    assertRowsIgnoringOrder(execute("SELECT * FROM %s"),
            row(1, 1, 4),
            row(2, 2, 3),
            row(3, 3, 2),
            row(4, 4, 1));

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (1)"),
            row(2, 2, 3),
            row(3, 3, 2),
            row(4, 4, 1));

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (4)"),
            row(1, 1, 4),
            row(2, 2, 3),
            row(3, 3, 2));

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (1, 4)"),
            row(2, 2, 3),
            row(3, 3, 2)); // Got 1 extra row(s) in result: (k=1, v1=1, v2=4)

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (1, 4) AND v2 NOT IN (3, 2)")); // 2 extra rows
}

However, the test passes if we don't create the indexes and use ALLOW FILTERING instead:

@Test
public void notInTest()
{
    createTable("CREATE TABLE %s (k int PRIMARY KEY, v1 int, v2 int)");

    execute("INSERT INTO %s (k, v1, v2) VALUES (1, 1, 4)");
    execute("INSERT INTO %s (k, v1, v2) VALUES (2, 2, 3)");
    execute("INSERT INTO %s (k, v1, v2) VALUES (3, 3, 2)");
    execute("INSERT INTO %s (k, v1, v2) VALUES (4, 4, 1)");

    assertRowsIgnoringOrder(execute("SELECT * FROM %s"),
            row(1, 1, 4),
            row(2, 2, 3),
            row(3, 3, 2),
            row(4, 4, 1));

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (1) ALLOW FILTERING"),
            row(2, 2, 3),
            row(3, 3, 2),
            row(4, 4, 1));

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (4) ALLOW FILTERING"),
            row(1, 1, 4),
            row(2, 2, 3),
            row(3, 3, 2));

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (1, 4) ALLOW FILTERING"),
            row(2, 2, 3),
            row(3, 3, 2));

    assertRowsIgnoringOrder(execute("SELECT * FROM %s WHERE v1 NOT IN (1, 4) AND v2 NOT IN (3, 2) ALLOW FILTERING"));
}

pkolaczk · 2023-11-06T17:49:11Z

there is a bunch of unresolved nits above that only show up if you click on the 12 hidden conversations Load more... link on the GH web UI, maybe you have missed them?

lol, indeed, I missed them.

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

This reverts commit f574547.

pkolaczk · 2023-11-07T17:49:37Z

I reverted the support for != and NOT IN on indexed columns for now as current index search does not allow multiple exclusions.

adelapena · 2023-11-08T12:38:34Z

Albeit the reversal of the support for != and NOT IN on indexed columns has made it necessary to use ALLOW FILTERING again, it seems that the indexes are still involved in the query and producing wrong results:

@Test
public void notInTest()
{
    createTable("CREATE TABLE %s (k int PRIMARY KEY, v int)");

    execute("INSERT INTO %s (k, v) VALUES (1, 1)");
    execute("INSERT INTO %s (k, v) VALUES (2, 2)");
    execute("INSERT INTO %s (k, v) VALUES (3, 3)");
    execute("INSERT INTO %s (k, v) VALUES (4, 4)");

    String query = "SELECT * FROM %s WHERE v NOT IN (1, 4) ALLOW FILTERING";
    Object[][] rows = rows(row(2, 2), row(3, 3));

    assertRowsIgnoringOrder(execute(query), rows);

    createIndex("CREATE INDEX ON %s(v) USING 'sai'");
    waitForTableIndexesQueryable();

    assertRowsIgnoringOrder(execute(query), rows); // Got 1 extra row(s) in result: (k=1, v1=1)
}

I suggest we add some additional tests around the use of unsupported operators with SAI indexes and ALLOW FILTERING.

Mmuzaf · 2023-12-13T12:43:54Z

src/java/org/apache/cassandra/db/ClusteringBuilder.java

@@ -28,9 +28,9 @@
 /**
 * Allows to build ClusteringPrefixes, either Clustering or ClusteringBound.
 */
-public abstract class CBuilder


Hi, I prefer to make these sorts of changes in a dedicated issue, so it looks like we can significantly reduce the number of changed files if we revert it, and keep only necessary changes here.

See the discussion below, we can ignore code formatting changes by listing the relevant commit hashes in a special file (this is not used by the project right now). So if we do this change in another issue and put that commit hash in a special file, it won't pollute the git blame in our IDEs.
https://github.com/orgs/community/discussions/5033
https://git-scm.com/docs/git-blame#Documentation/git-blame.txt-blameignoreRevsFile

This is my humble argument for such not making such a change here :-)

Mmuzaf · 2023-12-13T13:23:44Z

src/java/org/apache/cassandra/index/sai/disk/PrimaryKeyMapIterator.java

+ * <p>
+ * The keys are returned in token-clustering order.
+ */
+public final class PrimaryKeyMapIterator extends KeyRangeIterator


The iterator looks very similar to this one, do we still need to keep it?

private SSTableRowIdKeyRangeIterator(PrimaryKey min, PrimaryKey max, long count, PrimaryKeyMap primaryKeyMap, QueryContext queryContext, PostingList postingList)

pkolaczk added 4 commits August 8, 2023 15:56

pkolaczk force-pushed the cep-29-not-operator branch from 0a98baf to 5a2b739 Compare August 9, 2023 09:48

adelapena reviewed Aug 9, 2023

View reviewed changes

adelapena reviewed Aug 10, 2023

View reviewed changes

adelapena reviewed Aug 11, 2023

View reviewed changes

src/java/org/apache/cassandra/cql3/restrictions/TermSlice.java Show resolved Hide resolved

pkolaczk and others added 5 commits October 24, 2023 12:26

Fix comment in Parser.g (IN -> NOT IN)

46712bc

Use inheritance in MarkerOrList/MarkerOrTerms

c827494

Fix Javadoc in Relation

cf8ddd1

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

Add missing JavaDoc in MultiColumnRelation.java

c5f4a81

Split newContainsRestriction and remove isNot param

90315cc

pkolaczk force-pushed the cep-29-not-operator branch from 9801ef2 to 90315cc Compare October 24, 2023 11:54

pkolaczk and others added 14 commits October 24, 2023 14:49

Replace buildSize() == 0 with buildIsEmpty()

db5ece9

Add missing final and move the constructor up

5288f67

Rename CBuilder to ClusteringBuilder and add docs

fc91240

Renamed: CBuilder -> ClusteringBuilder MultiCBuilder -> MultiClusteringBuilder Added a longer explanation on how MultiClusteringBuilder works.

Remove double blank lines in MultiClusteringBuilder

6e8d818

Remove unused imports

f783b3e

Remove unused IndexContext#isCollection

b57f4be

Fix typo in a comment

26f0d27

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

Format Javadoc link

9e2aa46

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

Remove unneeded this

7c710d3

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

Remove uneeded blank line

6f4b166

Co-authored-by: Andrés de la Peña <adelapena@users.noreply.github.com>

Remove extra blank line in CollectionIndexingTest

a305a1a

Remove extra blank lines in TrieMemoryIndex

db56a76

Remove unused AND_NOT operator

570ce2a

Uncomment accidentally commented out code in BTree

2b74005

pkolaczk added 2 commits October 26, 2023 11:04

Add final to BOTTOM, TOP and ROOT in MultiClusteringBuilder

e7ac2f9