Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit of DynamoDB batch writer #118

Merged
merged 4 commits into from
Jun 2, 2015

Conversation

jamesls
Copy link
Member

@jamesls jamesls commented Jun 1, 2015

This is similar to what boto2 does in terms of
the interface with a few internal details changed:

  • Keep a single buffer of puts and deletes. This simplifies
    the logic of when to send requests. It also sends the
    requests in the order they were called. In boto2, puts
    were always sent before deletes(). This shouldn't affect
    the semantics though because you can't put/delete the same
    object in a batch request.
  • Immediately handle unprocessed items in the next batch. boto2
    would keep these and flush them only at exit. This meant
    you could have unbounded growth of unprocessed items.

Perf is about the same as boto2.

cc @kyleknap @mtdowling

This is similar to what boto2 does in terms of
the interface with a few internal details changed:

* Keep a single buffer of puts and deletes.  This simplifies
  the logic of when to send requests.  It also sends the
  requests in the order they were called.  In boto2, puts
  were always sent before deletes().  This shouldn't affect
  the semantics though because you can't put/delete the same
  object in a batch request.
* Immediately handle unprocessed items in the next batch.  boto2
  would keep these and flush them only at __exit__.  This meant
  you could have unbounded growth of unprocessed items.

Perf is about the same as boto2.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) to 97.66% when pulling 7f518e7 on jamesls:ddb-batch-write into 95c377b on boto:develop.

RequestItems={self._table_name: self._items_buffer})
unprocessed_items = response['UnprocessedItems']

if unprocessed_items and unprocessed_items[self._table_name]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that the table_name is in the unprocessed_items? I would be wary of accessing it directly. Probably would use a get().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. From the API docs, the unprocessed items is a map of table name to unprocessed items. Given we control the request and can guarantee that we're only ever adding items from a single table, then we know that if the unprocessed items is not empty, it has to come from the table we specified in the originating request.

@kyleknap
Copy link
Member

kyleknap commented Jun 2, 2015

Looks good. Code is pretty clean. I had a couple comments. Otherwise, 🚢

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 97.66% when pulling 6535a93 on jamesls:ddb-batch-write into 95c377b on boto:develop.

@jamesls jamesls merged commit 6535a93 into boto:develop Jun 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants