Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query yielding heterogeneous results #107

Closed
sereneiconoclast opened this issue Aug 24, 2020 · 9 comments · Fixed by #108
Closed

Query yielding heterogeneous results #107

sereneiconoclast opened this issue Aug 24, 2020 · 9 comments · Fixed by #108
Assignees
Labels
feature-request A feature should be added or improved.

Comments

@sereneiconoclast
Copy link

From what I can tell, currently all queries are performed through a specific model type, and must therefore return only records of that type.

I'd like to execute a single query that can return records of mixed types:

{hk: "me@here.com", rk: "PhoneNumber 123-555-1234", ... more fields specific to model type 'User'}
{hk: "me@here.com", rk: "Order 1000001", ... more fields specific to model type 'Order'}
{hk: "me@here.com", rk: "Order 1000002", ... more fields specific to model type 'Order'}
{hk: "me@here.com", rk: "Review 5000", ... more fields specific to model type 'Review'}

Instead of calling User.query(...), I would then call BaseTable.query(...) (as per issue 92) and pass a Proc whose job is to examine the Hash of raw attribute values, and return a reference to the appropriate child class to instantiate (User, Order, Review). In the example above, I'd probably do that by looking at the first word of the range key. It could do something else instead, such as attempting to match each range key against a regex ("this looks like a phone number"), or switching based on some other attribute (item_type=="User"), or going by which attributes are present and which aren't.

Does this sound reasonable?

@awood45
Copy link
Member

awood45 commented Aug 24, 2020

This is definitely a use case that would be helpful to support. My thought for this is to provide an enumeration, perhaps where you signal in each loop using whatever logic you wish what model class should be used. On mobile now but can sketch out an example soon.

@awood45
Copy link
Member

awood45 commented Aug 26, 2020

Here's how I imagined this. Let's pretend we have a couple of tables here:

class Project
  include Aws::Record
  set_table_name(ENV["TABLE_NAME"])

  string_attr :uuid, hash_key: true
  string_attr :table_name, range_key: true

  string_attr :project_name
end

class Task
  include Aws::Record
  set_table_name(ENV["TABLE_NAME"])

  string_attr :uuid, hash_key: true
  string_attr :table_name, range_key: true

  string_attr :task_name
  string_attr :parent_project_uuid
  string_attr :status
end

Fairly simple example, but we could then run this against any table class:

scan = Project.build_scan.multi_model_filter do |raw_item_attributes|
  if raw_item_attributes[:table_name] == "PROJECT"
    Project
  elsif raw_item_attributes[:table_name] == "TASK"
    Task
  else
    nil
  end
end

What I'm imagining here is we let you pass in a block rather than complete!, for example, and the block returns the model class based on any manipulation of the raw item that you like, or nil if no model applies and it should be skipped. This could also apply to built queries, though as a limitation, you have to have some sort of model class to use as a starting point. It seems like a reasonable compromise though, as you could have a base class for Single-Table query building as needed.

@awood45
Copy link
Member

awood45 commented Aug 26, 2020

I should add, when you run scan.each or scan.each etc, the items in that enumeration would be in the appropriate class as specified by the filter block code.

@awood45
Copy link
Member

awood45 commented Aug 26, 2020

So presumably, if you use this logic, you need to be prepared for heterogeneous sets, but you're opting in to that behavior anyways.

@sereneiconoclast
Copy link
Author

Nice. So the build_scan or build_query is returning a builder as an intermediate result, and the multi_model_filter is augmenting it... similar to RSpec's syntax for programming mocks: expect(thing).to receive(:method_name).with(...).and_return(...)

An alternate style would be to accept a Proc as an optional argument, so you could write

BaseTable.query(...normal query terms...,
  select_model: ->(raw_attributes) { ... some logic returning Project, Task, or nil }
)

This doesn't look as clean as the style you suggested, but it's probably less work to implement. I'd be happy with either.

So presumably, if you use this logic, you need to be prepared for heterogeneous sets, but you're opting in to that behavior anyways.

Yes, the straightforward behavior would be to return a single array containing objects of various types, in whatever order they were found. It might be nicer for the consumer, perhaps, to return a Hash sorting them by type:

{
  Project => [project_1, project_2, project_3...],
  Task => [task_1, task_2...]
}

...since nearly everyone will, as a first step, be sorting through the results in this fashion.

Bonus: Allowing the block to return nil to mean "skip this" means this also functions similar to aws dynamodb query --filter-expression.

@alextwoods alextwoods self-assigned this Aug 31, 2020
@alextwoods alextwoods added the feature-request A feature should be added or improved. label Aug 31, 2020
@alextwoods alextwoods linked a pull request Sep 3, 2020 that will close this issue
@alextwoods
Copy link
Contributor

I've got a draft PR (#108) that implements this - I'm still thinking through some behavior...

It might be nicer for the consumer, perhaps, to return a Hash sorting them by type

Since results are returned page by page it would require iterating through the entire set to build a sorted Hash which in many cases isn't desirable.

@awood45
Copy link
Member

awood45 commented Sep 3, 2020

I'd say that's actually an unacceptable outcome, it should be returned one page at a time no matter what - otherwise you can accidentally pull up millions of records.

@sereneiconoclast
Copy link
Author

Since results are returned page by page it would require iterating through the entire set to build a sorted Hash which in many cases isn't desirable.

I'm not sure I see what one has to do with the other. You can still return paginated results, it's just that each page would be a Hash with items bucketed by type.

If you don't think it desirable then you could certainly leave that part out. But I know that, as a customer, the first thing I'm going to do with each heterogeneous result set is to divide them up by item type, and I would expect most consumers would do likewise.

@awood45
Copy link
Member

awood45 commented Sep 5, 2020

Yes true - the important part is the one page at a time thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants