Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty cells in xlsx become the non-empty string "None" in the dictionary #6

Closed
JoshuaCapel opened this issue Apr 24, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@JoshuaCapel
Copy link

Thank you for writing this. I have recently moved from using csv files to excel files and this has made the transition easier.

One issue I have encountered is that empty cells are not being returned as a Falsey value like None, or the empty string ''.

From what I have been able to understand, openpyxl reads the empty cell as None, and the following code in sheet2dict uses str to convert this into the string 'None'.

def item(row, col):
return (
sheet.cell(row=1, column=col).value,
str(sheet.cell(row=row, column=col).value),
)

I have filtered my own dictionaries to turn 'None' into '' but that will cause issues when any of my sheet really do contain the word None.

I just did some quick testing, and it seems that the csv.DictReader() in the python standard library uses the empty string '' for missing values in the a csv file, so that might be a good thing to do for these cases.

@Pytlicek
Copy link
Owner

Good idea @JoshuaCapel 👍 thanks for pointing this out 🙏
I hope I will have some spare time this weekend and I can work on this.

@Pytlicek Pytlicek self-assigned this Apr 27, 2022
@Pytlicek Pytlicek added the enhancement New feature or request label Apr 27, 2022
@bai-yi-bai
Copy link
Collaborator

@JoshuaCapel thank you for raising this. I dusted off sheet2dict today for a new project and tried the latest version and experienced this. I had a un-merged a lot of cells and was shocked to see them full of "None". It's a real shame that openpxyl doesn't support merged cells... but then again, I have no idea how sheet2dict would handle them.

@Pytlicek - I lost your contact information somehow, please ping me. In the meantime, let's talk about a solution. I need to re-set up my IDE/computer to push/pull correctly.

Maybe we can add an if just to check to see if the result is None. I did not modify the sheet2dict source, I made allowances in my code.

  • This is untested.
  • I cannot recall if the current Python convention is to use equality checks or is/is not. Is there a more elegant solution? Is type checking the way to go?
  • I know this implementation will add CPU cycles... I'm not sure if we should use if/else instead since every cell is checked twice with my implementation.
 def item(row, col):
     if type(sheet.cell(row=row, column=col).value) is None:
        return (sheet.cell(row=1, column=col).value, '')
    if type(sheet.cell(row=row, column=col).value) is not None:
        return (sheet.cell(row=1, column=col).value, str(sheet.cell(row=row, column=col).value)) 

Also, whatever we do, we should add some test cases to check for this (You taught me well @Pytlicek!)

@Pytlicek
Copy link
Owner

@bai-yi-bai do you have an example file? I want to look at this and better understand what is the case, what exactly we need and so on. Thanks

@bai-yi-bai
Copy link
Collaborator

bai-yi-bai commented Oct 31, 2022

@Pytlicek:

I created a PR #9
(Actually there were 2 PRs, my IDE didn't like the paths hard-coded into the test files.)

  • inventory.xlsx is sufficient since it has empty cells.

  • The existing tests were insufficient to detect this bug.

  • I'm not sure why empty headers are not affected, but this uncovered a new bug: if two header columns are empty (None), then one of the column's data will be ignored/overwritten. We should examine this further; meaning we need to generate a new excel file with two empty columns.

  • main.py

        def item(row, col):
            if sheet.cell(row=row, column=col).value is None:
                return (sheet.cell(row=1, column=col).value, '')
            else:
                return (sheet.cell(row=1, column=col).value, str(sheet.cell(row=row, column=col).value))
  • \tests\test_3_xlsx.py
def test_parse_xlsx_sheet_items(worksheet):
      ws = worksheet
      ws.xlsx_to_dict(path="inventory.xlsx", select_sheet='SJ3')
      ws_items = ws.sheet_items
      assert "Taipei" in str(ws_items)
      assert "Bratislava" not in str(ws_items)
      assert "None:" in str(ws_items)
      assert "'None'," not in str(ws_items) # Tests whether empty cells were converted to "" strings and not "None"
      assert None in ws_items[0]
      assert len(ws_items) > 1
      assert len(ws_items) == 6
  • I inserted a print(ws_items) and compared the output with https://www.diffchecker.com/diff
  • Compare the difference between the d8d363b and my proposed changes. I tried to use BOLD on 'None' and ''. Theref are four changes.
  • d8d363b:
  • test_3_xlsx.py::test_parse_xlsx_sheet_items [{'country': 'Taiwan', 'city': 'Taipei', 'area (km)': '271.8', None: '88', 'rand_field': 'ee'}, {'country': 'South Korea', 'city': 'Seoul', 'area (km)': '605.2', None: '77', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'New York', 'area (km)': '783.8', None: '66', 'rand_field': 'ee'}, {'country': 'None', 'city': 'None', 'area (km)': 'None', None: '55', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'Miami', 'area (km)': '143.1', None: 'None', 'rand_field': 'ee'}, {'country': ' ', 'city': ' ', 'area (km)': ' ', None: ' ', 'rand_field': ' '}]
  • My change:
  • test_3_xlsx.py::test_parse_xlsx_sheet_items [{'country': 'Taiwan', 'city': 'Taipei', 'area (km)': '271.8', None: '88', 'rand_field': 'ee'}, {'country': 'South Korea', 'city': 'Seoul', 'area (km)': '605.2', None: '77', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'New York', 'area (km)': '783.8', None: '66', 'rand_field': 'ee'}, {'country': '', 'city': '', 'area (km)': '', None: '55', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'Miami', 'area (km)': '143.1', None: '', 'rand_field': 'ee'}, {'country': ' ', 'city': ' ', 'area (km)': ' ', None: ' ', 'rand_field': ' '}]

@Pytlicek
Copy link
Owner

Thank you @bai-yi-bai 🙏 Merged and published on PyPI with version: sheet2dict 0.1.5
@JoshuaCapel pls test if you are still interested in this issue. THX 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants