Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list index out of range when trying to convert file saved with LibreOffice #611

Open
lognaturel opened this issue Jun 9, 2022 · 2 comments

Comments

@lognaturel
Copy link
Contributor

ODK forum thread.

Manually copying the contents of the file and saving them in a new document worked.

Related to #604 in that these document compatibility issues come from openpyxl.

@lindsay-stevens
Copy link
Contributor

From the original thread, this comment has an example file PRUEBA.xlsx which seems corrupted somehow. I can open it with Excel (2010) or LibreOffice (7.2.7.2) but has excessive style / format data. It is 674KB on disk, and approx. 85% of that is from the ./xl/styles.xml document within the xlsx zip file. After opening the file, there are dozens of hyperlink (Hipervínculo) custom formats. Opening the file with openpyxl (with read_only and data_only modes) takes about 45 seconds on my machine. For this kind of file, perhaps pyxform could have a file read timeout kwarg (in xls2xform_convert) to optionally limit resource usage when pyxform is used in a server / service context?

@lindsay-stevens
Copy link
Contributor

From the original thread, this comment also has an example file PRUEBA.xlsx which also seems corrupted. I can open it with Excel (2010) or LibreOffice (7.2.7.2) but has excessive style / format data, as described above. The file won't open with openpyxl, instead an error relating to the workbook style data is thrown, as copied below. For this kind of file, perhaps pyxform could catch the error and in the warning, suggest that the user re-save the file with Excel or try copying the XLSForm data into a new workbook file? Alternatively, perhaps this is a known issue for openpyxl or could be fixed upstream there.

Error traceback
Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/local/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/usr/local/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/home/lindsay/repos/pyxform/repo/tests/test_xls2json_backends.py", line 170, in test_xlsx_with_many_empty_cells2
    xlsx_data = xlsx_to_dict(xlsx_path)
  File "/home/lindsay/repos/pyxform/repo/pyxform/xls2json_backends.py", line 219, in xlsx_to_dict
    workbook = openpyxl.open(filename=path_or_file, read_only=True, data_only=True)
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
    reader.read()
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 281, in read
    apply_stylesheet(self.archive, self.wb)
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
    stylesheet = Stylesheet.from_tree(node)
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
    return super(Stylesheet, cls).from_tree(node)
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
    return cls(**attrib)
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 94, in __init__
    self.named_styles = self._merge_named_styles()
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 114, in _merge_named_styles
    self._expand_named_style(style)
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 124, in _expand_named_style
    xf = self.cellStyleXfs[named_style.xfId]
  File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/cell_style.py", line 185, in __getitem__
    return self.xf[idx]
IndexError: list index out of range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants