Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Amazon DynamoDB online indexing support on High level API #2925

Merged
merged 14 commits into from Feb 6, 2015

Conversation

PauloMigAlmeida
Copy link
Contributor

Hi,

Yesterday (Jan-29), Amazon has added support to add or delete global indexes at any time, which is by far the feature I was most looking forward using for over a year.

I've changed dynamodb2.table.update method taking into account the backward campatibility as well. Documentation amendments and a unit test case has been provided too accordingly to the contributing guidelines.

Looking forward to see this in the new version of boto.


Currently, the only thing you can modify about a table after it has
been created is the throughput.
Now, you can modify table's throughput , create/update/delete global
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be useful to mention Now. It's pretty much "The feature does not exist" / "The feature exist" type of comment :)

@PauloMigAlmeida
Copy link
Contributor Author

@xethorn Wow, so many comments at once. It’s really cool to see I'm not the only guy interested on this DynamoDB feature :)

@xethorn
Copy link

xethorn commented Jan 30, 2015

@pauloubuntu : this is really looking great! I've added minor comments (it's mostly about keep the methods as small as possible to keep its complexity to the bare minimum.) – those comments are definitely not blocking if @danielgtaylor is fine with the changes!

@xethorn
Copy link

xethorn commented Jan 30, 2015

@pauloubuntu : This change is definitely super exciting 👍

One more thing: to get your code merged faster, it's better if the commits are squashed (so one commit per pull request.)

@PauloMigAlmeida
Copy link
Contributor Author

@xethorn

I've actually changed update method because it calls UpdateTableAPI which is the same endpoint the aws-cli uses to create/update/delete global secondary indexes. When I was first implementing this feature I thought I should keep it in that way. However, I agree that there are a couple of points that can be improved and I’m really willing to do it.

Thanks for all the code line comments you've made.

I hope I can get 'em done as soon as I get home

Cheers

@PauloMigAlmeida
Copy link
Contributor Author

I really liked the idea of creating delete_global_secondary_index, update_global_secondary_index, create_global_secondary_index methods instead of keeping 'em under the update method. Once again thanks for pointing this out.

@xethorn
Copy link

xethorn commented Jan 30, 2015

👍 ! Obrigado Paulo!

@danielgtaylor
Copy link
Member

@pauloubuntu, @xethorn I like the idea of separate methods, and this is both simpler to call and seems consistent with other pieces of Boto 👍 Aside from that change this looks very good. The only other comment I have is maybe to add some unit tests, since the integration test suite tends to get ignored by a lot of contributors and failing during unit tests would be preferable as this is a module with a lot of use and a lot of contributors.

Edit: I forgot to mention, thanks for the pull request!

@@ -716,3 +716,68 @@ def test_query_after_describe_with_gsi(self):

for rs_item in rs:
self.assertEqual(rs_item['username'], ['johndoe'])

def test_update_table_online_indexing_support(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielgtaylor : are we running those tests against a local dynamodb?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pauloubuntu : nothing to do here, i'm just curious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xethorn, these tests are run against the live DynamoDB service using your configured credentials. It's possible that you could run against a DynamoDB Local service, but it is not used by default.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

@PauloMigAlmeida
Copy link
Contributor Author

Hi, @xethorn , @danielgtaylor

I've splitted the logic in smaller methods as we've discussed before. Here goes a few observations:

  • I've reverted the update method to the previous implementation and added a note for future developers that if they're writing a new code then they should be using update_global_secondary_index method instead
  • On create_global_secondary_index I've decided to use GlobalBaseIndexField subclass as a parameter since it has already all the paramters we need in a well-known form instead of breaking its attributes in three different parameters such as :
def create_global_secondary_index(self, index_name, parts, throughput)

Let me know if you guys want to change anything while I'm coding the unit tests.

Cheers

gsi_data = None
gsi_data_attr_def = None

if global_index:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global index should not be null – otherwise you shouldn't update the table. Not sure what @danielgtaylor would prefer for this, but it seems like if not global_index, just return False and log the fact that it's empty? This approach will also simplify your tests.

@xethorn
Copy link

xethorn commented Jan 31, 2015

This looks even better than the previous one! Almost there. Let me know if you have any questions :)

@PauloMigAlmeida
Copy link
Contributor Author

@xethorn what about now ? :)

@PauloMigAlmeida
Copy link
Contributor Author

@xethorn @danielgtaylor
Actually. I was little confused about using boto.log.error in this case. Should I use boto.log.warn instead ?

What do you think ?

@xethorn
Copy link

xethorn commented Jan 31, 2015

I would say it's an error, because you're not suppose to call this method without the right params. If you called the cli client with missing params, you usually get an exception / failure.

Warn will display it, but it might remain more silent. That said, I don't have strong preferences, and @danielgtaylor can point you to the right direction.

Code looks great! 👍


# Wait for it.
time.sleep(60)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, we typically like it if you explicitly assert something as opposed to just testing that you can successfully call the method. I would suggest using the table's layer2 to make a call to describe_table() and then asserting that the response has the global secondary indexes that you are expecting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Thanks for pointing this out.

@kyleknap
Copy link
Member

kyleknap commented Feb 2, 2015

Thanks for the PR! It looks good to me. I just had a few comments.

The only other general comment that I had was similar to the @danielgtaylor comment, which was to add some unit tests because those are ran more than the integration tests. If you need help writing some, I would suggest following the style here:
https://github.com/boto/boto/blob/develop/tests/unit/dynamodb2/test_table.py#L1519

where you patch the layer1's update_table method and make sure that the correct parameters were passed to the patched method.

@PauloMigAlmeida
Copy link
Contributor Author

@kyleknap I've changed that parameter to global_indexes and I've also added a few assertions on test_highlevel.py. Just a few observations and one question:

  • I've used the highlevel describe() method to retrieve and assert if everything is working as expected instead of the layer1 since I couldn't find usages of any method that wasn't from the highlevel api. So I'm assuming this integration should only use highlevel api methods. ( Correct if I'm wrong please :) )
  • I haven't touched the update_global_secondary_index internal logic since we are arguing which should be the best approach for this case. I agree with @xethorn that we should keep it as a separate method and deprecate the use of update method to update global_indexes throughput. This is going to make things a bit easier as DynamoDB continues to add new features. Let me know your opinion about this.
  • I'm still working on some unit tests which I hope to finish in a few hours.

Cheers

@kyleknap
Copy link
Member

kyleknap commented Feb 3, 2015

@pauloubuntu Awesome! Thanks for seeing this through. Changes that you have made so far look good.

Also @pauloubuntu @xethorn that is fine if we do not share code between the update() and update_global_secondary_index() method. I was just a suggesting a way to save you some code/logic.

I agree that we should not be bloating the update() method with more features. This is will have to be for a PR later down the road, but we cannot begin deprecating the update() method till a Table level method is added that can update throughput because I believe the update() method is the only function that can, not including the layer1 methods.

@PauloMigAlmeida
Copy link
Contributor Author

@kyleknap , @xethorn , @danielgtaylor I've just pushed a few unit tests to test_table.py following the style suggested.

Before you check it I need to ask you a question. As @kyleknap said, unit tests are supposed to explicitly assert something, but to assert that I have created/updated/deleted any online global indexes I should update the self.global_indexes variable which wasn't been updated by neither the update method nor the newest methods we've created. I have also noticed that the describe method only updates the self.global_indexes if it's None so I should remove that condition as well.

( I'll write just a method to give an example)
The create method would be something like it:

def create_global_secondary_index(self, global_index):
        if global_index:
            # Add a GSI to global_indexes
            self.global_indexes.append(global_index)

            gsi_data = []
            gsi_data_attr_def = []

            gsi_data.append({
                "Create": global_index.schema()
            })

            for attr_def in global_index.parts:
                gsi_data_attr_def.append(attr_def.definition())

            self.connection.update_table(
                self.table_name,
                global_secondary_index_updates=gsi_data,
                attribute_definitions=gsi_data_attr_def
            )

            return True
        else:
            msg = 'You need to provide the global_index to ' \
                  'create_global_secondary_index method'
            boto.log.error(msg)

            return False

What do you think about it ?

Obs.: I'm only asking it, I haven't added any of these lines yet

@kyleknap
Copy link
Member

kyleknap commented Feb 4, 2015

The unit test look great. You are explicitly asserting something in those tests. You are asserting what the connection mock was called with, which is all the testing needed for this abstraction.

For actually updating the Table instances when calling those new methods you added, I would say do not update the Table instance. Instead, you can just document that you need to call the describe() method to actually update the Table. But I do like the idea of removing that if statement in the describe() call to account for the ability to now create and delete gsi's for tables.

@PauloMigAlmeida
Copy link
Contributor Author

@kyleknap I just removed that if statement and added a few notes about the need to call describe to update the Table instance.

Apart from the fact we are not going to deprecate update due to a couple of reasons, I guess we've solved every point discussed. You may now merge this pull request if you want.

@kyleknap , @xethorn , @danielgtaylor
I want to thank you for the time you've spent to help me and exposing your opinions. I've undoubtedly learnt a lot with this PR.

@xethorn
Copy link

xethorn commented Feb 6, 2015

De nada

Sent from myMail app for Android
Thursday, 05 February 2015, 08:34PM -05:00 from Paulo Miguel Almeida notifications@github.com:
@kyleknap I just removed that if statement and added a few notes about the need to call describe to update the Table instance.
Apart from the fact we are not going to deprecate update due to a couple of reasons, I guess we've solved every point discussed. You may now merge this pull request if you want.
@kyleknap@xethorn@danielgtaylor
I want to thank you for the time you've spent to help me and exposing your opinions. I've undoubtedly learnt a lot with this PR.

Reply to this email directly or  view it on GitHub .

@kyleknap
Copy link
Member

kyleknap commented Feb 6, 2015

Awesome! Thanks for all of the work everyone put into this PR. Merging

kyleknap added a commit that referenced this pull request Feb 6, 2015
Add Amazon DynamoDB online indexing support on High level API
@kyleknap kyleknap merged commit 0621c53 into boto:develop Feb 6, 2015
awatts pushed a commit to awatts/boto that referenced this pull request Feb 13, 2015
…4-08-15_MTurkAPI

# By Paulo Almeida (13) and others
# Via Daniel G. Taylor (2) and others
* 'develop' of git://github.com/boto/boto:
  add some minor documentation for Route53 tutorial
  script glacier upload now returns the ArchiveID
  Remove trailing whitespace
  Updated code snippets in DynamoDB tutorial
  Updated documentation for create/update/delete global secondary indexes
  Added unit tests for online global secondary indexes updates
  Added a few assertions to test_highlevel.py
  Added a few assertions to test_highlevel.py
  Fixing a few documentation typos that @kyleknap pointed out
  Add false return for a few conditional rules
  Fixed unit test for test_update method
  split global secondary indexes online suppot in smaller methods as discussed on PR boto#2925
  Added unit test
  Added attribute_definitions = None on assert_caled_once_with test
  PEP adjustments
  amended dynamoDB update method
  Add Amazon DynamoDB online indexing support on high level api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants