Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't filter by more than one indexed column #208

Open
yeudit opened this issue Jan 12, 2015 · 9 comments
Open

Can't filter by more than one indexed column #208

yeudit opened this issue Jan 12, 2015 · 9 comments

Comments

@yeudit
Copy link

yeudit commented Jan 12, 2015

I'm getting an 'Cequel::Record::IllegalQuery: Can't scope by more than one indexed column in the same query' exception.
I have a record class:

class Space
  include Cequel::Record
  timestamps

  key :id, :uuid, :auto => true
  column :name,         :text,     :index => true
  column :type,          :text,      :index => true
end

I'm trying to get all spaces with name => x and type => y. Example:
Space.where({name: 'name', type: 'type'})

This raises 'Cequel::Record::IllegalQuery: Can't scope by more than one indexed column in the same query' error.
I want to know how can I filter records by 2 or more columns?

@Micka33
Copy link

Micka33 commented Mar 2, 2015

From the cassandra documentation

CREATE TABLE ruling_stewards (
  steward_name text,
  king text,
  reign_start int,
  event text,
  PRIMARY KEY (steward_name, king, reign_start)
);

This query constructs a filter that selects data about stewards whose reign started by 2450 and ended before 2500. If king were not a component of the primary key, you would need to create an index on king to use this query:

SELECT * FROM ruling_stewards
  WHERE king = 'Brego'
  AND reign_start >= 2450
  AND reign_start < 2500 ALLOW FILTERING;

The output is:

steward_name | king  | reign_start | event
--------------+-------+-------------+--------------------
      Boromir | Brego |        2477 |   Attacks continue
       Cirion | Brego |        2489 | Defeat of Balchoth

To allow Cassandra to select a contiguous ordering of rows, you need to include the king component of the primary key in the filter using an equality condition. The ALLOW FILTERING clause is also required.

ALLOW FILTERING clause

When you attempt a potentially expensive query, such as searching a range of rows, this prompt appears:

Bad Request: Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance. If you want
to execute this query despite the performance unpredictability,
use ALLOW FILTERING.

To run the query, use the ALLOW FILTERING clause. Imposing a limit using the LIMIT n clause is recommended to reduce memory used.


I guess the solution is to add ALLOW FILTERING to the request sent by cequel. however I can't find in the documentation how to do that.

@outoftime
Copy link
Member

@Micka33 Cequel currently doesn't allow filtering, but I've been thinking it would be best to change that

@Micka33
Copy link

Micka33 commented Mar 3, 2015

@outoftime Yes, I think it's should be accessible in some way.
Something like MyModel.where(...).allow_filtering.
I'd be happy to add it myself, but I don't have much time at the moment to get familiar with the code, understand if there are any implications or if it's as simple as a += ' ALLOW FILTERING'.

@yeudit
Copy link
Author

yeudit commented Mar 3, 2015

I tried to add allow_filtering to my query:
MyModel.where({:filed1=>'x',:filed2=>'y'}).allow_filtering
and it's still failed on the same error:
Cequel::Record::IllegalQuery: Can't scope by more than one indexed column in the same query

this error raised before the query executed, it is from:
https://github.com/cequel/cequel/blob/master/lib/cequel/record/record_set.rb#L766

but when my query filters by only one field:
MyModel.where({:field1=>'x'})
it works for me.

the problem is when I'm trying to filter by more than one field.

Thanks

@Micka33
Copy link

Micka33 commented Mar 3, 2015

@yeudit .allow_filtering do not exist yet.
Now, looking at what you are trying to do I believe you are not making the right use of either Cassandra or your model.
I suggest that you read the answer of this stackoverflow post about Table design.

Also here is part of a model I have.

  class Propositions
    include Cequel::Record
    key :id,                :bigint,      { partition: true }
    key :ending_period,     :timestamp,   { order: :desc }
    key :starting_period,   :timestamp
    key :ia,                :text

    column :resourceType,   :text
  end

I suggest that you read about compound keys and about what a partition key is.

When you add :index => true to a column, you are making it a secondary index. The error you have is telling you that you can't scope using 2 indexed column in the same query. And, if you read the whole answer in the stackoverflow link I gave above, you know it's not the best idea to use secondary indexes.


Hint: When you query in your rails console, cequel display the CQL used. Try to execute it directly into a cqlsh terminal. If you get the same error, then refer to the datastax documentation to see why cassandra reject your query.

@phoet
Copy link
Contributor

phoet commented Jan 19, 2017

i'm not an cassandra expert, so i don't know if this considered a bad practice, but i would really like to use allow filtering.

those are the use-cases:

  • explore data in rails-console instead of cqlsh
  • when secondary index is not a good option (has a high cardinality)
  • i want cassandra to filter a result-set, ie lookup posts for a user with multiple conditions

something that might be explicit and easy to implement could be like

# reference-id would have high cardinality, so not a good secondary index
# column :reference_id, :bigint, index: true
column :reference_id, :bigint, allow_filtering: true # i know what i'm doing here!

@pezra
Copy link
Contributor

pezra commented Jan 19, 2017

I think it would be a bit too dangerous to globally allow filtering on a column. Whether or not filtering will have acceptable performance depends on the rest of the query, in addition to the table structure. I would merge a PR that provided a way to add allow filtering to individual queries, though.

@phoet
Copy link
Contributor

phoet commented Jan 19, 2017

@pezra what do you think should the api look like? something like this?

@pezra
Copy link
Contributor

pezra commented Jan 20, 2017

I think @Micka33's suggestion of MyModel.where(...).allow_filtering is a good one. Though, I would add a ! to the method name since it is a little dangerous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants