Skip to content

BuildableSearch#complete!.to_a auto-paginates silently and Limit is per-page (companion to #153) #157

@thomaswitt

Description

@thomaswitt

Describe the feature

Two coordinated improvements to Aws::Record::BuildableSearch and Aws::Record::ItemCollection, motivated by a real-world refactor we just shipped:

  1. Documentation that clarifies (a) BuildableSearch#limit is per-request page size, NOT a total cap; (b) iterating an
    ItemCollection (via #to_a, #each, or any other Enumerable method) auto-paginates the entire query.

  2. Strategy-explicit terminals on BuildableSearch as first-class methods, replacing the implicit "what does .complete!.to_a mean for cost?" question with explicit verbs: a single-page method, a take-N method, and an exhaust-with-cap method. Concrete reference implementation included below.

  3. A safe iteration mode that surfaces transient errors as inspectable Result objects instead of raising mid-walk - companion to [BUG] Aws::Record::ItemCollection intermittently returns nil/raises from to_a and each #153, which has been sitting on this exact reliability concern for 5 months.

Use Case

I'm always frustrated when innocent-looking code silently issues many more DynamoDB requests than the author intended.

A pattern that appears across our codebase (and the community — see #39 from 7 years ago, where the same misunderstanding was raised and closed without a doc fix):

  records = build_query
    .on_index(:tenant_and_key_gsi)
    .key_expr(':tenant_and_key = ? AND :range_value = ?', partition, value)
    .limit(1)
    .complete!
    .to_a

  records.first

Reads as: "fetch one record by GSI key." Actually does: "fetch in pages of 1, walking every page until the partition is exhausted."

BuildableSearch#limit(n) sets DynamoDB's Limit parameter (per-request page size), not a total cap. Combined with ItemCollection#to_a silently auto-paginating, callers consistently get this wrong. Our internal audit found ~26 .limit(n) call sites where the author trusted limit as a total cap; several were latent unbounded reads waiting for the data shape to grow.

The deeper issue is that complete! returns an ItemCollection whose iteration semantics (one DynamoDB request? all pages? what happens on mid-walk failure?) aren't visible at the call site. The verb complete! tells you the query is built, not how iterating its result will fan out into DynamoDB requests. Every caller has to remember the contract.

This is also the companion to #153: auto-pagination amplifies iteration fragility because every additional page is another chance for transient throttling or a network blip mid-walk. Five months after we filed #153, our codebase still wraps #to_a/#each in defensive rescue StandardError — because the underlying contract is still fuzzy.

Proposed Solution

We just shipped a refactor that introduces three explicit terminals on Aws::Record::BuildableSearch via Module#prepend, migrating ~24 call sites away from the ambiguous .complete!.to_a pattern. Our reference iplementation in production today:

  module Patches
    module AwsRecordBuildableSearchSafeTerminals
      TERMINALS = %i[to_a_page to_a_take to_a_exhaust!].freeze

      def self.install!
        require 'aws-record'
        collisions = TERMINALS.select do |method_name|
          Aws::Record::BuildableSearch.method_defined?(method_name) ||
            Aws::Record::BuildableSearch.private_method_defined?(method_name)
        end
        if collisions.any? && (Aws::Record::BuildableSearch.ancestors & [self]).empty?
          raise "Aws::Record::BuildableSearch already defines #{collisions.join(', ')}; " \
                're-evaluate the safe terminal patch.'
        end
        Aws::Record::BuildableSearch.prepend(self) unless
  Aws::Record::BuildableSearch.ancestors.include?(self)
      end

      # Single DynamoDB request. Returns first page only.
      def to_a_page
        result = complete!
        return [] if result.nil?
        return result.page || [] if result.is_a?(Aws::Record::ItemCollection)
        terminal_array(result)
      rescue StandardError => e
        log_terminal_failure(:to_a_page, e)
        []
      end

      # Walk pages collecting up to `count` records. Pushes
      # min(existing_limit, count) into the per-request Limit so
      # the gem doesn't over-fetch the first page.
      def to_a_take(count)
        count_int = count.respond_to?(:to_i) ? count.to_i : 0
        return [] if count_int <= 0
        @params[:limit] = [@params[:limit], count_int].compact.min
        result = complete!
        return [] if result.nil?
        return terminal_array(result).first(count_int) unless
  result.is_a?(Aws::Record::ItemCollection)
        collected = []
        result.each do |item|
          collected << item
          break if collected.size >= count_int
        end
        collected
      rescue StandardError => e
        log_terminal_failure(:to_a_take, e)
        []
      end

      # Walk every page; raise once the cap is exceeded.
      # `max: nil` opts out of the cap.
      def to_a_exhaust!(max: 1_000)
        result = complete!
        return [] if result.nil?
        collected = []
        enumerable = result.is_a?(Aws::Record::ItemCollection) ? result : terminal_array(result)
        enumerable.each do |item|
          collected << item
          if max && collected.size > max
            raise ResultSetTooLargeError.new(model: model_name, max_items: max)
          end
        end
        collected
      end

      private

      def terminal_array(result)
        result.is_a?(Array) ? result : [result]
      end

      def log_terminal_failure(method_name, error)
        Rails.logger&.warn("AwsRecordBuildableSearchSafeTerminals##{method_name} failed:
  #{error.class}: #{error.message}")
      end

      def model_name
        instance_variable_get(:@model)&.name
      end
    end
  end

How call sites changed

Single-record GSI lookup — was implicit, now explicit:

  # before
  records = wrap_records(
    build_query.on_index(:tenant_and_key_gsi).key_expr(...).limit(1).complete!,
    single_page: true,
  )
  records.first

  # after
  records = build_query.on_index(:tenant_and_key_gsi).key_expr(...).to_a_page
  records.first

Duplicate detection — was silently auto-paginating, now bounded:

  # before
  results = wrap_records(build_query.on_index(:class_gsi).key_expr(...).complete!)
  raise DuplicateSlug if results.count > 1

  # after
  results = build_query.on_index(:class_gsi).key_expr(...).to_a_take(2)
  raise DuplicateSlug if results.count > 1

Full enumeration with safety cap — was open-ended each, now loud failure mode:

  # before
  base_query.complete!.each do |record|
    next unless record.version.zero?
    results << record
  end

  # after
  base_query
    .filter_expr('version = ?', 0)
    .to_a_exhaust!(max: 1_000)

Asks (priority order, all first-class)

  1. Loud rdoc warnings on ItemCollection#to_a, #each, and BuildableSearch#complete! clarifying that iteration auto-paginates AND may raise mid-walk on transient AWS errors.
  2. Clarify BuildableSearch#limit rdoc to state that it is per-request, not total. Even one sentence would prevent a class of bugs.
  3. Document #to_a failure modes — when nil vs empty Array, what happens to already-yielded items on mid-walk failure, the retry contract for transient errors.
  4. Adopt strategy-explicit terminals on BuildableSearch along the lines of the reference implementation above. complete! could remain for cursor-aware callers; the new terminals would be the primary path for everyone else.
  5. A safe iteration mode that returns inspectable Result objects (yielded items + error + continuation token) instead of raising mid-walk. Companion to [BUG] Aws::Record::ItemCollection intermittently returns nil/raises from to_a and each #153.

Other Information

Real-world impact. Before this refactor, our codebase had
~26 .limit(n) call sites trusting limit as a total cap that
the gem doesn't actually provide, plus several latent
unbounded-pagination paths that iterated .complete!.to_a
directly. The migration touched ~50 files and removed our
defensive wrap_records helper entirely.

Spec coverage of the reference implementation. 10 RSpec
examples covering: single-page semantics, n <= 0 early return,
per-request Limit pushdown, min(existing_limit, n)
precedence, multi-page bounded collect, max: boundary raise,
max: nil unbounded path, non-ItemCollection Hash/Struct
fallback, and a load-time collision guard. Test fake walks real
multi-page data and tracks page-walk count to verify pagination
efficiency.

Why we still need defensive rescue around to_a. In our
implementation, to_a_page and to_a_take wrap their work in
rescue StandardError → log + return [] because #to_a and
#each on ItemCollection still raise unpredictably (the
exact symptom #153 was filed about, 5 months ago). We could not
delete the defensive rescue; the gem's contract is still too
fuzzy. This is fresh evidence that #153 remains live.

Related issues:

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

aws-sdk-ruby-record version used

latest

Environment details (OS name and version, etc.)

  • aws-record: 2.14.0 - Ruby: 3.4 - DynamoDB single-table data model - Falcon (fiber-based) server runtime

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestA feature should be added or improved.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions