-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate header check in 5.2.0 is not backward compatible #1007
Comments
Hi @robrap thank you for raising this issue. I confirm that it breaks if all your headers are empty strings I am wondering why would you use the method Instead you could use the following methods:
I am still thinking about a way to prevent this breaking change and keep the new feature. |
Good question. Not all of our columns have empty string headers. We just have a number of columns at the end of the sheet with empty strings. Some of these empty header columns contain a pivot table related to the actual data in the sheet, so I don't want to just delete the columns. |
gspread 5.2.0 introduced a backward-incompatible change related to a sheet with extra columns with blank headers. For details of the bug, see burnash/gspread#1007. We can upgrade once the issue is resolved, or if we delete these extra columns. Note: For edX specific ownership spreadsheet, this sheet currently contains a pivot table that would need to be moved elsewhere.
I understand, it bothers me to introduce backward incompatible feature. The simplest way I can think of right now is to add a new flag to the function that enable/disable this feature. I still want to keep the feature enabled by default for the simplest reason that if 2 headers a equals then the column content of the first header is overridden by the second column content and that is the purpose of this method ( |
gspread 5.2.0 introduced a backward-incompatible change related to a sheet with extra columns with blank headers. For details of the bug, see burnash/gspread#1007. We can upgrade once the issue is resolved, or if we delete these extra columns. Note: For edX specific ownership spreadsheet, this sheet currently contains a pivot table that would need to be moved elsewhere.
Thanks. A workaround would be great.
I'm not clear on what "the feature" is? Is it just the more strict check on the headers? Note: it's up to you whether you change the default in a 6.0.0 release to convey the breaking change, or do it in a minor release. You may want to update your release notes either way. Thanks again! |
I solved this problem. The problem was in gspread version. Just install 5.1.1 instead of 5.2.0. |
something is still not clear to me with this situation, if you use the version |
@lavigne958: Here is an example of what I described above (in CSV format):
Notice that the
|
Sure it makes it clear. Thank you for this data sample. What I can think of that would solve your issue and provide a nice addition to the method I need some time to think it through, but that should solve your issue. |
Thanks @lavigne958. That might work. Unfortunately, it makes it a little more brittle, because if you care most about a subset of columns by header/key that are at the end of the spreadsheet (the right), and someone adds new columns to the left, they would fall out of range. What if you could declare the column header keys you care about, and those must be unique, and are what gets loaded? It could even be a different method. Maybe something like |
This is not a bad idea, this is what has been asked here in this issue #976 I will look at it, it could be one way to solve this issue. |
Add a new argument to `get_all_records` to provide the list of expected headers. The given expected headers must: - be unique - be part of the complete headers list - must not contain extra headers This will provide a way for users to use this method and still have *some* duplicated headers that are not relevant to pull. This will ensure the columns that matters have unique headers. Closes #1007
I found a potential way to make the best of both worlds:
See linked PR. |
Thank you @lavigne958. |
It's done ✔️ This proposal for a fix has been released in https://github.com/burnash/gspread/releases/tag/v5.3.2 |
Hi all, thanks for putting in above information. I too faced a similar error with version 5.3.2 today, had to downgrade it back to 5.1.1 to make it work. I tried the above data sample, but stil didn't work for me 🤔 |
Hi, version 5.3.2 provides an extra parameter that allow you to pass a list of headers you expect from the spreadsheet. This allows you to use the method |
Note: The change is still backward incompatible, but passing |
This was still happening to me Workaround : sheet_ref = gspread_client.open_by_key(sheet_key).get_worksheet_by_id(worksheet_gid)
expected_headers = sheet_ref.row_values(1)
all_records = sheet_ref.get_all_records(expected_headers=expected_headers) |
Describe the bug
A spreadsheet with multiple columns that had a blank header used to load using
get_all_records
before 5.2.0, but it now fails with "headers must be uniques" exception. I presume, but did not confirm, that it is due to this simplification: c8a5a73To Reproduce
Steps to reproduce the behavior:
get_all_records
on a spreadsheet with multiple columns with a blank header.Expected behavior
This should work as it used to without an error.
Environment info:
Stack trace or other output that would be helpful
Traceback (most recent call last):
File "", line 1, in
File "/edx/other/edx-repo-health/repo_health/check_ownership.py", line 79, in check_ownership
records = find_worksheet(google_creds_file, spreadsheet_url, worksheet_id)
File "/edx/other/edx-repo-health/repo_health/check_ownership.py", line 44, in find_worksheet
return worksheet.get_all_records()
File "/edx/venvs/edx-repo-health/lib/python3.8/site-packages/gspread/worksheet.py", line 408, in get_all_records
raise GSpreadException("headers must be uniques")
gspread.exceptions.GSpreadException: headers must be uniques
The text was updated successfully, but these errors were encountered: