Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non skip take pagination #341

Merged
merged 16 commits into from Apr 7, 2019
17 changes: 16 additions & 1 deletion README.md
Expand Up @@ -542,7 +542,8 @@ u.addresses.where(city: 'Chicago').all

But keep in mind Dynamoid -- and document-based storage systems in general -- are not drop-in replacements for existing relational databases. The above query does not efficiently perform a conditional join, but instead finds all the user's addresses and naively filters them in Ruby. For large associations this is a performance hit compared to relational database engines.

#### Limits
#### Pagination
##### Limits / Skip-Take

There are three types of limits that you can query with:

Expand Down Expand Up @@ -580,6 +581,20 @@ Address.record_limit(10_000).batch(100).each { … } # Batch specified as part o
The implication of batches is that the underlying requests are done in the batch sizes to make the request and responses
more manageable. Note that this batching is for `Query` and `Scans` and not `BatchGetItem` commands.

##### DynamoDB Native Pages
At times it can be useful to rely on DynamoDB [default pages](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Pagination)
instead of fixed pages sizes. Each page results in a single Query or Scan call
to DyanmoDB, but returns an unknown number of records.

Access to the native DynamoDB pages can be obtained via the `find_by_pages`
method, which yields arrays of records.

```ruby
Address.find_by_pages do |addresses|
# have an array of pages
end
```
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My next PR adding the restart hashes will continue with this section for both fixed limit and native pages.


#### Sort Conditions and Filters

You are able to optimize query with condition for sort key. Following operators are available: `gt`, `lt`, `gte`, `lte`,
Expand Down
16 changes: 7 additions & 9 deletions lib/dynamoid/adapter_plugin/aws_sdk_v3.rb
Expand Up @@ -532,12 +532,11 @@ def put_item(table_name, object, options = {})
#
# @todo Provide support for various other options http://docs.aws.amazon.com/sdkforruby/api/Aws/DynamoDB/Client.html#query-instance_method
def query(table_name, options = {})
return enum_for(:query, table_name, options) unless block_given?
table = describe_table(table_name)

Enumerator.new do |yielder|
Query.new(client, table, options).call.each do |page|
page.items.each { |row| yielder << result_item_to_hash(row) }
end
Query.new(client, table, options).call.each do |page|
yield page.items.map{ |row| result_item_to_hash(row) }
end
end

Expand All @@ -562,12 +561,11 @@ def query_count(table_name, options = {})
#
# @todo: Provide support for various options http://docs.aws.amazon.com/sdkforruby/api/Aws/DynamoDB/Client.html#scan-instance_method
def scan(table_name, conditions = {}, options = {})
return enum_for(:scan, table_name, conditions, options) unless block_given?
table = describe_table(table_name)

Enumerator.new do |yielder|
Scan.new(client, table, conditions, options).call.each do |page|
page.items.each { |row| yielder << result_item_to_hash(row) }
end
Scan.new(client, table, conditions, options).call.each do |page|
yield page.items.map{ |row| result_item_to_hash(row) }
end
end

Expand All @@ -591,7 +589,7 @@ def truncate(table_name)
hk = table.hash_key
rk = table.range_key

scan(table_name, {}, {}).each do |attributes|
scan(table_name, {}, {}).flat_map{ |i| i }.each do |attributes|
opts = {}
opts[:range_key] = attributes[rk.to_sym] if rk
delete_item(table_name, attributes[hk], opts)
Expand Down
2 changes: 1 addition & 1 deletion lib/dynamoid/criteria.rb
Expand Up @@ -8,7 +8,7 @@ module Criteria
extend ActiveSupport::Concern

module ClassMethods
%i[where all first last each record_limit scan_limit batch start scan_index_forward].each do |meth|
%i[where all first last each record_limit scan_limit batch start scan_index_forward find_by_pages].each do |meth|
# Return a criteria chain in response to a method that will begin or end a chain. For more information,
# see Dynamoid::Criteria::Chain.
#
Expand Down
65 changes: 42 additions & 23 deletions lib/dynamoid/criteria/chain.rb
Expand Up @@ -75,12 +75,12 @@ def delete_all
ranges = []

if key_present?
Dynamoid.adapter.query(source.table_name, range_query).collect do |hash|
Dynamoid.adapter.query(source.table_name, range_query).flat_map{ |i| i }.collect do |hash|
ids << hash[source.hash_key.to_sym]
ranges << hash[source.range_key.to_sym] if source.range_key
end
else
Dynamoid.adapter.scan(source.table_name, scan_query, scan_opts).collect do |hash|
Dynamoid.adapter.scan(source.table_name, scan_query, scan_opts).flat_map{ |i| i }.collect do |hash|
ids << hash[source.hash_key.to_sym]
ranges << hash[source.range_key.to_sym] if source.range_key
end
Expand Down Expand Up @@ -128,6 +128,10 @@ def each(&block)
records.each(&block)
end

def find_by_pages(&block)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the final verdict on the page for this, if you would prefer pages I can move that method up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name find_by_pages looks good.

pages.each(&block)
end

private

# The actual records referenced by the association.
Expand All @@ -136,42 +140,57 @@ def each(&block)
#
# @since 0.2.0
def records
pages.lazy.flat_map { |i| i }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the explicit lazy turns out to be needed here, or the flat_map ended up calling scan/query

end

# Arrays of records, sized based on the actual pages produced by DynamoDB
#
# @return [Enumerator] an iterator of the found records.
#
# @since 3.1.0
def pages
if key_present?
records_via_query
pages_via_query
else
records_via_scan
issue_scan_warning if Dynamoid::Config.warn_on_scan && query.present?
pages_via_scan
end
end

def records_via_query
Enumerator.new do |yielder|
Dynamoid.adapter.query(source.table_name, range_query).each do |hash|
yielder.yield source.from_database(hash)
end
# If the query matches an index, we'll query the associated table to find results.
#
# @return [Enumerator] an iterator of the found pages. An array of records
#
# @since 3.1.0
def pages_via_query
return enum_for(:pages_via_query) unless block_given?

Dynamoid.adapter.query(source.table_name, range_query).each do |items|
yield items.map { |hash| source.from_database(hash) }
end
end

# If the query does not match an index, we'll manually scan the associated table to find results.
#
# @return [Enumerator] an iterator of the found records.
# @return [Enumerator] an iterator of the found pages. An array of records
#
# @since 0.2.0
def records_via_scan
if Dynamoid::Config.warn_on_scan && query.present?
Dynamoid.logger.warn 'Queries without an index are forced to use scan and are generally much slower than indexed queries!'
Dynamoid.logger.warn "You can index this query by adding index declaration to #{source.to_s.downcase}.rb:"
Dynamoid.logger.warn "* global_secondary_index hash_key: 'some-name', range_key: 'some-another-name'"
Dynamoid.logger.warn "* local_secondary_index range_key: 'some-name'"
Dynamoid.logger.warn "Not indexed attributes: #{query.keys.sort.collect { |name| ":#{name}" }.join(', ')}"
end
# @since 3.1.0
def pages_via_scan
return enum_for(:pages_via_scan) unless block_given?

Enumerator.new do |yielder|
Dynamoid.adapter.scan(source.table_name, scan_query, scan_opts).each do |hash|
yielder.yield source.from_database(hash)
end
Dynamoid.adapter.scan(source.table_name, scan_query, scan_opts).each do |items|
yield items.map { |hash| source.from_database(hash) }
end
end

def issue_scan_warning
Dynamoid.logger.warn 'Queries without an index are forced to use scan and are generally much slower than indexed queries!'
Dynamoid.logger.warn "You can index this query by adding index declaration to #{source.to_s.downcase}.rb:"
Dynamoid.logger.warn "* global_secondary_index hash_key: 'some-name', range_key: 'some-another-name'"
Dynamoid.logger.warn "* local_secondary_index range_key: 'some-name'"
Dynamoid.logger.warn "Not indexed attributes: #{query.keys.sort.collect { |name| ":#{name}" }.join(', ')}"
end

def count_via_query
Dynamoid.adapter.query_count(source.table_name, range_query)
end
Expand Down
4 changes: 2 additions & 2 deletions lib/dynamoid/finders.rb
Expand Up @@ -183,7 +183,7 @@ def find_by_composite_key(hash_key, range_key, options = {})
def find_all_by_composite_key(hash_key, options = {})
ActiveSupport::Deprecation.warn('[Dynamoid] .find_all_composite_key is deprecated! Call .where instead of')

Dynamoid.adapter.query(table_name, options.merge(hash_value: hash_key)).collect do |item|
Dynamoid.adapter.query(table_name, options.merge(hash_value: hash_key)).flat_map{ |i| i }.collect do |item|
from_database(item)
end
end
Expand Down Expand Up @@ -240,7 +240,7 @@ def find_all_by_secondary_index(hash, options = {})
opts[range_op_mapped] = range_key_value
end
dynamo_options = opts.merge(options.reject { |key, _| key == :range })
Dynamoid.adapter.query(table_name, dynamo_options).map do |item|
Dynamoid.adapter.query(table_name, dynamo_options).flat_map{ |i| i }.map do |item|
from_database(item)
end
end
Expand Down