Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLSX conversion crashes with NoneType for authorId, seems related to cell comments #107

Open
jbthiel opened this issue Apr 1, 2024 · 0 comments

Comments

@jbthiel
Copy link

jbthiel commented Apr 1, 2024

With a complex HTML, source converted to multi-sheet XLSX via Gnumeric, sqlitebiter crashes on the conversion, log below.
This is the same table source described in my other report #106.
From the error log this appears to relate to cell comments not having an authorId.
The Gnumeric import does create comments for many of the cells, and inspecting/editing shows they have OldAuthor blank, and NewAuthor="Unknown".

The tail of error log shows:

File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/excelrd/xlsx.py", line 731, in process_comments_stream
    note.author = authors[int(elem.get("authorId"))]
                          ^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

Reproduce via
wget https://en.wikipedia.org/wiki/Comparison_of_text_editors
Gnumeric Data.Import as HTML, it loads 12 sheets.
Most of the sheets have missing column1 header cells.
Replace blank column1 headers to "A", and fixup the names of the sheets to simple strings like "sNN".
Gnumeric.SaveAs Excel 2007/2010 (XLSX) --> demo.xlsx
sqlitebiter file -f excel demo.xlsx

The whole error log is:

$ sqlitebiter file Comparison_of_text_editors-fixed-excel2007.xlsx
Traceback (most recent call last):
  File "/usr/local/src/sqlitebiter.git/.venv/bin/sqlitebiter", line 8, in <module>
    sys.exit(cmd())
             ^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/sqlitebiter/__main__.py", line 363, in file
    converter.convert(file_path)
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/sqlitebiter/converter/_file.py", line 106, in convert
    self.__convert(fpath, source_info_record_base)
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/sqlitebiter/converter/_file.py", line 137, in __convert
    for table_data in loader.load():
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/pytablereader/spreadsheet/excelloader.py", line 88, in load
    workbook = xlrd.open_workbook(self.source)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/excelrd/__init__.py", line 159,in open_workbook
    bk = xlsx.open_workbook_2007_xml(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/excelrd/xlsx.py", line 1011, inopen_workbook_2007_xml
    x12sheet.process_comments_stream(comments_stream)
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/excelrd/xlsx.py", line 731, in process_comments_stream
    note.author = authors[int(elem.get("authorId"))]
                          ^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant