Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the item limit and associations #123

Open
cavi21 opened this issue Jan 11, 2017 · 11 comments
Open

Question about the item limit and associations #123

cavi21 opened this issue Jan 11, 2017 · 11 comments
Labels

Comments

@cavi21
Copy link
Contributor

cavi21 commented Jan 11, 2017

Hey guys! I started wondering about the items limit that DynamoDB has of 400Kb and the associations of the Dynamoid gem provides, because for instance the has_many association will be contain (as the README mention) within the object, so potentially we will have in that field a value (in this case the set of foreign ids) that will be larger than 400kb, right?

Seems that I'm missing something or maybe I'm misunderstanding the limit of the DynamoDB on the items.

Do you understand what I mean, or wonder about?

Thanks a lot for the time and the hard work on the gem!

@mattwelke
Copy link

mattwelke commented Jan 12, 2017

The limit refers to the amount of space a single DynamoDB item can occupy when stored in the database. The neat thing about these ORMs is that they're so abstract. They return objects of types defined in the ORM, not the document data itself. So for example if we have a User which has_many :addresses and an Address which belongs_to :user, created a User, added an Address to their addresses, and then went user.addresses we actually get something like this returned:

irb(main):033:0> pp user.addresses
#<Dynamoid::Associations::HasMany:0x000000038227a8
 @loaded=false,
 @name=:addresses,
 @options={},
 @query={},
 @source=
  #<User:0x00000004909210
   @associations=
    {:addresses_ids=>#<Dynamoid::Associations::HasMany:0x000000038227a8 ...>},
   @attributes=
    {:created_at=>Wed, 11 Jan 2017 19:15:26 -0500,
     :updated_at=>Wed, 11 Jan 2017 19:22:14 -0500,
     :id=>"7495a9e2-6b6d-411e-b6b6-69ef58a3dec8",
     :email=>"bob@email.com",
     :password=>"abc",
     :addresses_ids=>#<Set: {"6d4dc20d-657b-462e-b353-c0c54159f28a"}>},
   @changed_attributes={},
   @errors=
    #<ActiveModel::Errors:0x000000048c3058
     @base=#<User:0x00000004909210 ...>,
     @details={},
     @messages={}>,
   @new_record=false,
   @previously_changed=
    {"addresses_ids"=>[nil, #<Set: {"6d4dc20d-657b-462e-b353-c0c54159f28a"}>]},
   @validation_context=nil>,
 @target=nil>

We get back an object of type Dynamoid::Associations::HasMany, which has the methods necessary to go get the data we need, no matter how much of it there is.

It isn't until we call user.addresses.all that we get an array of objects of type Address. Even these still are pretty custom, having the field information tucked away inside an @attributes variable. It even includes more info to go with the attributes.

irb(main):050:0> pp user.addresses.all
[#<Address:0x00000002392e78
  @associations={},
  @attributes=
   {:created_at=>Wed, 11 Jan 2017 19:22:08 -0500,
    :updated_at=>Wed, 11 Jan 2017 19:22:14 -0500,
    :id=>"6d4dc20d-657b-462e-b353-c0c54159f28a",
    :body=>"321 Some St.",
    :user_ids=>#<Set: {"7495a9e2-6b6d-411e-b6b6-69ef58a3dec8"}>},
  @changed_attributes={},
  @new_record=false>]

So you can see how you don't actually download the information until you call methods like first and all. And then that more than 400 KB of data is stored in the Ruby variable, not the DynamoDB item.

EDIT:

I think after rereading your question it was me misunderstanding you. It sounds like you are aware that the 400 KB limit refers to the stored data, not the Ruby variable created when you use Dynamoid, and that you were just concerned about the size of the array of foreign keys being more than 400 KB. And that is a valid concern. As it stands, Dynamoid appears to store the keys on both sides of the relationship. This means that eventually, when there are many, many items associated with a particular item, that particular item will reach the 400 KB limit.

I know from my experience with Mongoid that with the type of association I used in my example above, the keys are created on only one side, the many side of a one-to-many relationship, where they act as foreign keys. So perhaps Dynamoid stores the keys on both sides for technical reasons, the differences between the way the two different databases work. Perhaps one of the current maintainers could shed some light on that.

@cavi21
Copy link
Contributor Author

cavi21 commented Jan 12, 2017

Hi @Welkie thanks a lot for taking the time to write about it. As you said in the "edit" part... my concern is about the stored data, and in particular, what an item means to DynamoDB, my intuition is that an item is a hole registry (object) with their attributes values, right?

And reading about the limits in here in the part of Number of Values in List, Map, or Set seems that (quoting)

There is no limit on the number of values in a List, a Map, or a Set, as long as the item containing the values fits within the 400 KB item size limit.

So it's clear the the 400KB applies to the item, but in the "... item containing the values ..." it's tricky because what contains the values is the attribute (a.k.a column in a SQL type of DB) of an object, but I feel that is not what they mean there with that expression, right? But rather the item is the object with all the attributes.

EDIT:

Anyone has any recommendations about this? or face this issue and how address it? I thought about a few approaches, but seems to have complexity that seems unnecessary... ideas welcome 😃

@mattwelke
Copy link

My interpretation is that item is the noun they use to refer to objects stored in the database, equivalent to an SQL row or MongoDB document. So when they say the item containing the values, they mean the values could be anywhere in that item, perhaps a top level field, perhaps nested somewhere inside, but that entire item can't go over 400 KB.

I would love to hear, from those who have used DynamoDB, whether they found this limit to be a problem. This context is interesting in particular - when you need to store foreign keys for many associated items in one item.

@pboling
Copy link
Member

pboling commented Jan 13, 2017

I have not looked into this, and have not run into any problems... yet. It would seem that creating a record with thousands of associated records through has_many would explode this limit. It is now on my list of things to try eventually.

@cavi21
Copy link
Contributor Author

cavi21 commented Jan 16, 2017

Awesome @pboling thanks for the insight! it's weird though that anyone run into this so far, right? But maybe we can create a spec for it, not sure how without creating thousands of records.

Let me know If I can help in anyway.
Have a nice week.

@mattwelke
Copy link

mattwelke commented Jan 16, 2017

I'm definitely a noob when it comes to this gem and DynamoDB, so correct me if I'm wrong, but isn't the reason the foreign keys are stored on both sides of the association because of DynamoDB's older weaker support for indexes? And doesn't the official Ruby SDK support indexes better now?

Perhaps it's worth re-examining the index features now available to us, and maybe we'll have the option of no longer needing foreign keys on the one side of one-to-many associations, because we can rely on the foreign keys on the many side being indexed?

@pboling
Copy link
Member

pboling commented Jan 16, 2017

There really aren't any experts here at all. One of us will have to put in the work to understand these issues. Hopefully before one of us goes down in flames. ;) Given enough time I will look into this, but it likely won't be soon, since for me there is no pain here yet, and I am very busy.

This project needs more maintainers who use this project more fully, the only other maintainer besides me is moving off DynamoDB, and my use case is very thin. I don't use associations at all; I use the gem for the global secondary index querying support.

@mattwelke
Copy link

mattwelke commented Jan 17, 2017

I'm strongly considering using Dynamo for a few mobile app back ends. I like the idea of getting 25 GB of space for free instead of just 500 MB (with Mongo) when I use Heroku. I will definitely need associations for the various data types I'd be using.

I'll try to learn more about how this gem ticks and I can see myself becoming a maintainer when I get the hang of things. :)

@pboling
Copy link
Member

pboling commented Jan 17, 2017

@Welkie You get free space because they charge you for a "query rate" instead of for space. It only marginally scales, unless you upgrade your plan, and if you exceed your paid rate, and have no accumulated extra to draw from (short life span on accumulations) then your queries die a hard death, literally just data loss. So you have to have all your queries in a system with exponential backoff retry in order to be robust.

It is an excellent service for things that fit the use case well, such as a fairly constant rate of queries with little to no spiking. It is terrible for things that do not fit the use case.

IE, if you are a store front, do not expect to survive Black Friday unless you have said exponential backoff queueing system for queries. Amazon's DynamoDB docs have a nice article on the benefits, and near necessity of, an exponential back off system. For my use case I don't have one, as data loss is not critical for me, and my use rate is nearly constant.

@cavi21
Copy link
Contributor Author

cavi21 commented Jan 17, 2017

Thanks @pboling for taking the time maintain this gem and also to explain some of the cons that may have depending on the use case of it. As @Welkie I'm not that much into DynamoDB internals, and also the project I'm working on doesn't need (as for right know) a lot of the features that DynamoDB offers, but that could change eventually... not sure. I'm also happy to help with it as much as I can.

Also @Welkie mention aws-sdk-ruby-record that's is the official gem but very poor in features for DynamoDB, maybe make sense to have some of the amazon guys working on that project giving a hand and collaborating here as well, WDYT?

And we can add a section in the README about the use cases that you mention and maybe other, right? let me know and I can open a PR for that if you thinks is valuable.

@mattwelke
Copy link

mattwelke commented Jan 17, 2017

@pboling Actually at work, we studied Dynamo and other database technologies heavily before settling on one that worked best for us, and I found that Dynamo's cost is a function of data stored, provisioned throughput capacity, and data egress. Each has free tiers, where you get 25 GB stored, 25 units, and 1 GB/month respectively. We settled on Mongo for the project because it involved storing massive amounts of data, and that made Mongo cost effective over Dynamo. But what intrigues me about Dynamo is the simplicity. I kind of like the idea of only having to increase my provisioned throughput capacity (provided it's cost effective) and call it a day, knowing I can now handle my increased load.

@cavi21 My guess is that the official Amazon developers would be more interested in developing that official gem. Any features they understand well would be implemented there. Perhaps that means the most effective way of benefiting from that would be to have this Dynamoid extend aws-record, or otherwise use it, piggy backing off the features implemented by that team?

EDIT: We should consider moving this conversation to the other issue created calling for maintainers by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants