Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_all_records with unique expected_headers fails in version 5.12 #1352

Closed
g-l-mansell opened this issue Nov 15, 2023 · 3 comments · Fixed by #1353
Closed

get_all_records with unique expected_headers fails in version 5.12 #1352

g-l-mansell opened this issue Nov 15, 2023 · 3 comments · Fixed by #1353
Assignees
Labels
Milestone

Comments

@g-l-mansell
Copy link

g-l-mansell commented Nov 15, 2023

Describe the bug
We have a spreadsheet with duplicated headers. We find a list of just the unique headers and pass this to worksheet.get_all_records(expected_headers=expected_headers)
In version 5.11.3 this works, only the columns passed to expected_headers are returned
In version 5.12.0 this no longer works, we get the error gspread.exceptions.GSpreadException: headers must be unique

(sorry I dont have time to add more info now, but I can later if the above is not sufficient!)

To Reproduce
Steps to reproduce the behavior:
1.
2.
3.

Expected behavior
A clear and concise description of what you expected to happen.

Code example*
If applicable, provide a code example to help explain your problem.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment info:

  • Operating System: Windows laptop, but running in linux docker container
  • Python version: 3.10.11
  • gspread version: 5.12.0

Stack trace or other output that would be helpful

Additional context
Add any other context about the problem here.

@lavigne958
Copy link
Collaborator

Hello thank you for raising this issue.

We'll have a look at it.

We might need more information later.
For example the list of actual headers you're passing to the method get_all_records.

@lavigne958 lavigne958 added Bug Need investigation This issue needs to be tested or investigated labels Nov 15, 2023
@alifeee
Copy link
Collaborator

alifeee commented Nov 15, 2023

Hi, I have looked into it and it is a bug which was introduced by #1301. It is a fault of us and the tests that it was not caught. Thanks for letting us know about this bug!

I have fixed the bug in #1353, which will soon be released as v5.12.1.

Context

expected_headers was added by #1021, and should be used when there are duplicate headers in a spreadsheet and you want to ignore them. e.g.,

"datum 1" "datum 2" "foo" "foo" "foo"
important important wever wever wever
important important wever wever wever

here I only care about the "datum 1" and "datum 2" columns, but I want to use get_all_records. So, I would use `expected_headers=["datum 1", "datum 2"], and the "foo" columns would be combined (and data lost). But! I don't care, I didn't want those columns anyway.

The problem

In v5.11.3 and before the logic in get_all_records worked like this:

if expected_headers is not set:
 all headers must be unique
 data is returned
if expected_headers is set:
 expected_headers must be unique
 expected_headers must be a subset of headers
 data is returned and data from columns with headers not in expected_headers might be lost

In v5.12.0 this changed and the logic became

+all headers must be unique
if expected_headers is not set:
-all headers must be unique
 data is returned
if expected_headers is set:
 expected_headers must be unique
 expected_headers must be a subset of headers
 data is returned and data from columns with headers not in expected_headers might be lost

That is to say, the uniqueness check was outside the expected_headers branch so it was always hit,, and an error thrown, even if expected_headers was set (which is not what we want as if expected_headers is set, we know there are probably identical headers - that's the whole point of expected_headers)

I suspect this was changed because the purpose of expected_headers was not clear in the original implementation. Either way, I have reverted it to the original logic in #1353, and changed the documentation of the expected_headers kwarg to be a bit clearer on what it is for.

@alifeee alifeee self-assigned this Nov 15, 2023
@alifeee alifeee added this to the 5.12.1 milestone Nov 15, 2023
@alifeee alifeee removed the Need investigation This issue needs to be tested or investigated label Nov 15, 2023
@alifeee
Copy link
Collaborator

alifeee commented Dec 7, 2023

Hi. Please read the proposal for changing how get_all_records works for next release 6.0.0 -> #1367. We would enjoy if you would contribute your opinion :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants