Empty cells in xlsx become the non-empty string "None" in the dictionary #6

JoshuaCapel · 2022-04-24T12:16:18Z

Thank you for writing this. I have recently moved from using csv files to excel files and this has made the transition easier.

One issue I have encountered is that empty cells are not being returned as a Falsey value like None, or the empty string ''.

From what I have been able to understand, openpyxl reads the empty cell as None, and the following code in sheet2dict uses str to convert this into the string 'None'.

sheet2dict/sheet2dict/main.py

Lines 30 to 34 in 93bf731

    
           def item(row, col): 
        
               return ( 
        
                   sheet.cell(row=1, column=col).value, 
        
                   str(sheet.cell(row=row, column=col).value), 
        
               )

I have filtered my own dictionaries to turn 'None' into '' but that will cause issues when any of my sheet really do contain the word None.

I just did some quick testing, and it seems that the csv.DictReader() in the python standard library uses the empty string '' for missing values in the a csv file, so that might be a good thing to do for these cases.

The text was updated successfully, but these errors were encountered:

Pytlicek · 2022-04-27T11:16:33Z

Good idea @JoshuaCapel 👍 thanks for pointing this out 🙏
I hope I will have some spare time this weekend and I can work on this.

bai-yi-bai · 2022-10-20T10:44:19Z

@JoshuaCapel thank you for raising this. I dusted off sheet2dict today for a new project and tried the latest version and experienced this. I had a un-merged a lot of cells and was shocked to see them full of "None". It's a real shame that openpxyl doesn't support merged cells... but then again, I have no idea how sheet2dict would handle them.

@Pytlicek - I lost your contact information somehow, please ping me. In the meantime, let's talk about a solution. I need to re-set up my IDE/computer to push/pull correctly.

Maybe we can add an if just to check to see if the result is None. I did not modify the sheet2dict source, I made allowances in my code.

This is untested.
I cannot recall if the current Python convention is to use equality checks or is/is not. Is there a more elegant solution? Is type checking the way to go?
I know this implementation will add CPU cycles... I'm not sure if we should use if/else instead since every cell is checked twice with my implementation.

 def item(row, col):
     if type(sheet.cell(row=row, column=col).value) is None:
        return (sheet.cell(row=1, column=col).value, '')
    if type(sheet.cell(row=row, column=col).value) is not None:
        return (sheet.cell(row=1, column=col).value, str(sheet.cell(row=row, column=col).value))

Also, whatever we do, we should add some test cases to check for this (You taught me well @Pytlicek!)

Pytlicek · 2022-10-23T18:29:33Z

@bai-yi-bai do you have an example file? I want to look at this and better understand what is the case, what exactly we need and so on. Thanks

bai-yi-bai · 2022-10-31T05:33:02Z

@Pytlicek:

I created a PR #9
(Actually there were 2 PRs, my IDE didn't like the paths hard-coded into the test files.)

inventory.xlsx is sufficient since it has empty cells.
The existing tests were insufficient to detect this bug.
I'm not sure why empty headers are not affected, but this uncovered a new bug: if two header columns are empty (None), then one of the column's data will be ignored/overwritten. We should examine this further; meaning we need to generate a new excel file with two empty columns.
main.py

        def item(row, col):
            if sheet.cell(row=row, column=col).value is None:
                return (sheet.cell(row=1, column=col).value, '')
            else:
                return (sheet.cell(row=1, column=col).value, str(sheet.cell(row=row, column=col).value))

\tests\test_3_xlsx.py

def test_parse_xlsx_sheet_items(worksheet):
      ws = worksheet
      ws.xlsx_to_dict(path="inventory.xlsx", select_sheet='SJ3')
      ws_items = ws.sheet_items
      assert "Taipei" in str(ws_items)
      assert "Bratislava" not in str(ws_items)
      assert "None:" in str(ws_items)
      assert "'None'," not in str(ws_items) # Tests whether empty cells were converted to "" strings and not "None"
      assert None in ws_items[0]
      assert len(ws_items) > 1
      assert len(ws_items) == 6

I inserted a print(ws_items) and compared the output with https://www.diffchecker.com/diff
Compare the difference between the d8d363b and my proposed changes. I tried to use BOLD on 'None' and ''. Theref are four changes.
d8d363b:
test_3_xlsx.py::test_parse_xlsx_sheet_items [{'country': 'Taiwan', 'city': 'Taipei', 'area (km)': '271.8', None: '88', 'rand_field': 'ee'}, {'country': 'South Korea', 'city': 'Seoul', 'area (km)': '605.2', None: '77', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'New York', 'area (km)': '783.8', None: '66', 'rand_field': 'ee'}, {'country': 'None', 'city': 'None', 'area (km)': 'None', None: '55', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'Miami', 'area (km)': '143.1', None: 'None', 'rand_field': 'ee'}, {'country': ' ', 'city': ' ', 'area (km)': ' ', None: ' ', 'rand_field': ' '}]
My change:
test_3_xlsx.py::test_parse_xlsx_sheet_items [{'country': 'Taiwan', 'city': 'Taipei', 'area (km)': '271.8', None: '88', 'rand_field': 'ee'}, {'country': 'South Korea', 'city': 'Seoul', 'area (km)': '605.2', None: '77', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'New York', 'area (km)': '783.8', None: '66', 'rand_field': 'ee'}, {'country': '', 'city': '', 'area (km)': '', None: '55', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'Miami', 'area (km)': '143.1', None: '', 'rand_field': 'ee'}, {'country': ' ', 'city': ' ', 'area (km)': ' ', None: ' ', 'rand_field': ' '}]

Pytlicek · 2022-10-31T12:35:16Z

Thank you @bai-yi-bai 🙏 Merged and published on PyPI with version: sheet2dict 0.1.5
@JoshuaCapel pls test if you are still interested in this issue. THX 👍

Pytlicek self-assigned this Apr 27, 2022

Pytlicek added the enhancement New feature or request label Apr 27, 2022

bai-yi-bai mentioned this issue Oct 31, 2022

Fix for issue 6 #9

Merged

Pytlicek closed this as completed Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty cells in xlsx become the non-empty string "None" in the dictionary #6

Empty cells in xlsx become the non-empty string "None" in the dictionary #6

JoshuaCapel commented Apr 24, 2022

Pytlicek commented Apr 27, 2022

bai-yi-bai commented Oct 20, 2022

Pytlicek commented Oct 23, 2022

bai-yi-bai commented Oct 31, 2022 •

edited

Pytlicek commented Oct 31, 2022

Empty cells in xlsx become the non-empty string "None" in the dictionary #6

Empty cells in xlsx become the non-empty string "None" in the dictionary #6

Comments

JoshuaCapel commented Apr 24, 2022

Pytlicek commented Apr 27, 2022

bai-yi-bai commented Oct 20, 2022

Pytlicek commented Oct 23, 2022

bai-yi-bai commented Oct 31, 2022 • edited

Pytlicek commented Oct 31, 2022

bai-yi-bai commented Oct 31, 2022 •

edited