-
Notifications
You must be signed in to change notification settings - Fork 79
feat(cursor): Add method insert_data_bulk #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Yash621 ,
Thank you for opening this PR in response to this feature request. The team appreciates the time and effort you've put into this PR.
I've left some granular comments, but on a high level I don't see any tests for this method. Could you please add some so we can ensure it's behaving as desired?
Also, it seems like there are a number of whitespace changes in this PR unrelated to the desired change. Could you please revert those?
@Brooke-white can you guide me on how i can write the tests for this ? |
Sure, @Yash621. I've listed a couple ideas below. I think the testing for this method can be done using unit tests. Reading the csv
Building
For the above ideas, when we aren't confirming an exception is raised, we often want to check what is passed to Please let me know if you have any questions about writing the tests, or other comments on the PR. I'm more than happy to discuss :) |
can the aws team please prioritize support for this. it's a pretty big limitation that's a common feature across other drivers. |
@david-dest01, we are working with @Yash621, the author of this PR, to get this feature ready for being merged. In the meantime, feel free to review this PR to ensure it suits your needs. @Yash621 -- do you have an idea of when you plan to post a revision of the PR? It'd be nice if we could include this feature in our next release (scheduled for early February). Please let me know how I can help make this happen. |
@Brooke-white will surely raise the revision by today or tomorrow ,just stuck on writing tests for this as have n't done it before ,will raise the revision soon :) |
@Brooke-white thank you. @Yash621 thank you for contributing this - happy to help with testing and modifications based on feedback from the aws team. in the interim i added some comments as well that you may want to consider, but no pressure. |
there's a PR template for new feature. could you fill that out when finished? |
@david-dest01 , thank you for your help reviewing this PR, it's much appreciated! @Yash621, I have resolved the PR comments from David and myself that you addressed so we can more easily see those remaining. |
@Yash621 lmk if you want to pair program to work through multiple args in execute. hope the feedback provided helps clarify some desirable behavioral that accomplishes what you're looking to do plus more other desirable use cases. |
@david-dest01 would love to pair program, let me know when we can do this :) |
@Brooke-white thanks :) ,working on other changes would push soon :) |
Let's coordinate - will shoot you an email. |
@Brooke-white I have improved the implementation for the function and have also updated the doc string for it. Also after this I will add the tests within 1-2 days. |
Thanks for the quick turn around, @Yash621! I'll post a review by end of day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yash621 , I've added one small note re: docs and two suggestions for validating table_name
and column_names
before we do the bulk insertion. Once these comments are addressed + the tests are added we should be good to get this merged :). Let me know if you have any questions or need any help!
Thanks for the quick turn around, @Yash621, the tests can go in |
@Brooke-white thanks,can you also tell what is |
@Brooke-white I have added error handling for the cases you mentioned above ,but I am still a it confused about how I should use mock for spying parameters ? |
Thanks, @Yash621! I'll write up an sample test that you can build off of (with an explanation) and post that shortly :)
Could you clarify this for me? I'm not fully understanding. Are you saying |
@Brooke-white yea,do we need column_indexes ? |
@Yash621 , I think it could be valuable to leave in since it would give more flexibility in addressing cases such as the following: A user is interested in bulk insertion of only columns
|
@Brooke-white Ya, I added it thinking that only,just want to confirm if we need that flexibility or not. |
@Brooke-white I have resolved the prepending issue :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yash621 , while testing I noticed a minor issue with how values_list
and sql_query
are built. I added some comments showing what changes are needed to send things to the server in the way it expects. I also included some tests I wrote for this method, feel free to add more if you'd like.
After making these final changes, please make sure to run the pre-commit hooks to ensure this change matches the rest of the code base. Then lets get this merged! :)
Here are the tests, I had placed them in test/unit/test_cursor.py
. The mocking and "spy" bits were a bit tricky to get set up :) Make sure unittest.mock.mock_open
is imported as it's used for reading in a testing csv from memory.
@pytest.mark.parametrize("indexes, names", [([1], []), ([], ["c1"])])
def test_insert_data_column_names_indexes_mismatch_raises(indexes, names, mocker):
# mock fetchone to return "True" to ensure the table_name and column_name
# validation steps pass
mocker.patch("redshift_connector.Cursor.fetchone", return_value=[1])
mock_cursor: Cursor = Cursor.__new__(Cursor)
# mock out the connection
mock_cursor._c = Mock()
mock_cursor.paramstyle = 'qmark'
with pytest.raises(InterfaceError, match="Column names and indexes must be the same length"):
mock_cursor.insert_data_bulk(
filename="test_file", table_name='test_table',
column_indexes=indexes, column_names=names, delimeter=','
)
in_mem_csv = """\
col1,col2,col3
1,3,foo
2,5,bar
-1,7,baz"""
insert_bulk_data = [
([0], ['col1'], ('INSERT INTO test_table (col1) VALUES (%s), (%s), (%s);', ['1', '2', '-1'])),
([1], ['col2'], ('INSERT INTO test_table (col2) VALUES (%s), (%s), (%s);', ['3', '5', '7'])),
([2], ['col3'], ('INSERT INTO test_table (col3) VALUES (%s), (%s), (%s);', ['foo', 'bar', 'baz'])),
([0, 1], ['col1', "col2"], ('INSERT INTO test_table (col1, col2) VALUES (%s, %s), (%s, %s), (%s, %s);', ['1', '3', '2', '5', '-1', '7'])),
([0, 2], ['col1', "col3"], ('INSERT INTO test_table (col1, col3) VALUES (%s, %s), (%s, %s), (%s, %s);', ['1', 'foo', '2', 'bar', '-1', 'baz'])),
([1, 2], ['col2', "col3"], ('INSERT INTO test_table (col2, col3) VALUES (%s, %s), (%s, %s), (%s, %s);', ['3', 'foo', '5', 'bar', '7', 'baz'])),
([0, 1, 2], ['col1', 'col2', "col3"], ('INSERT INTO test_table (col1, col2, col3) VALUES (%s, %s, %s), (%s, %s, %s), (%s, %s, %s);', ['1', '3', 'foo', '2', '5', 'bar', '-1', '7', 'baz'])),
]
@patch("builtins.open", new_callable=mock_open, read_data=in_mem_csv)
@pytest.mark.parametrize("indexes,names,exp_execute_args", insert_bulk_data)
def test_insert_data_column_stmt(mocked_csv, indexes, names, exp_execute_args, mocker):
# mock fetchone to return "True" to ensure the table_name and column_name
# validation steps pass
mocker.patch("redshift_connector.Cursor.fetchone", return_value=[1])
mock_cursor: Cursor = Cursor.__new__(Cursor)
# spy on the execute method, so we can check value of sql_query
spy = mocker.spy(mock_cursor, "execute")
# mock out the connection
mock_cursor._c = Mock()
mock_cursor.paramstyle = 'qmark'
mock_cursor.insert_data_bulk(
filename='mocked_csv', table_name='test_table',
column_indexes=indexes, column_names=names, delimeter=','
)
assert spy.called is True
assert spy.call_args[0][0] == exp_execute_args[0]
assert spy.call_args[0][1] == exp_execute_args[1]
@Brooke-white I have resolved all the changes you suggested and have also added tests , |
Thank you for bearing with us through the review process, @Yash621! We appreciate your hard work and persistence in making this contribution :). |
@Brooke-white my pleasure :) |
Hi @Yash621 this is great feature I have been looking for. Below is what I am passing to insert_bulk_data After running insert_data_bulk. I executed cursor.rowcount and it returns -1. I also went to database and checked on the table rowcount and it doesn't any increment in rowcount. Any advise on this issue |
@kishaningithub This PR is with respect to issue #75
I have implemented the bulk insert function ,please review it and suggest if any changes are required.