Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form conversion is slow on form with lots of blank rows and columns #604

Closed
yanokwa opened this issue Apr 12, 2022 · 3 comments · Fixed by #612
Closed

Form conversion is slow on form with lots of blank rows and columns #604

yanokwa opened this issue Apr 12, 2022 · 3 comments · Fixed by #612
Labels

Comments

@yanokwa
Copy link
Contributor

yanokwa commented Apr 12, 2022

Software and hardware versions

pyxform v1.9

Problem description

UCL_Biomass_Plot_Form.xlsx when converted with pyxform-http (running in Docker on my computer) takes 27 seconds.

The form has lots of blank columns and rows and seems to be corrupt in some way.

I re-saved the form as XLSX in Excel 16.59 for Mac and that seems to have cleaned things up. Conversion speed went from to 27 seconds to .43 seconds.

Other information

https://forum.getodk.org/t/the-xlsform-conversion-service-could-not-be-contacted/37315/ is the forum thread with the original issue.

This sounds a lot like #596, but I have confirmed I'm running 1.9. It's hard to verify directly with xls2xform because my computer is really fast :(

https://github.com/getodk/pyxform-http/blob/master/app/main.py#L40 is the specific pyxform-http function call.

My guess is that the form in question has the wrong dimensions. This script points to that.

from openpyxl import load_workbook
wb = load_workbook(filename='UCL_Biomass_Plot_Form.xlsx', read_only=True)
ws = wb['survey']
print(ws.calculate_dimension()) # A1:AMJ1048576
ws.reset_dimensions()
print(ws.calculate_dimension(force=True)) # A1:AMJ91

Maybe if we see a survey sheet with more than 50 columns (2x what we'd expect) or 5000 rows, we should reset the dimensions?

https://openpyxl.readthedocs.io/en/2.5.14/optimized.html#worksheet-dimensions

@yanokwa yanokwa changed the title Form conversation is slow forms with lots of blank rows and columns Form conversation is slow on form with lots of blank rows and columns Apr 12, 2022
@lognaturel lognaturel changed the title Form conversation is slow on form with lots of blank rows and columns Form conversion is slow on form with lots of blank rows and columns Apr 12, 2022
@qlands
Copy link
Contributor

qlands commented Apr 29, 2022

Hi. I am experiencing the same issue but with a form that does not have (I think) blank columns or rows, however, it has colors across columns.

The slow file (143,961 bytes): www.qlands.com/other_files/spanish_english_slow.xlsx
Converting to XML: 65.300415 seconds
Converting to JSON: 61.0054 seconds

The fast file without any colors (182,741 bytes): http://www.qlands.com/other_files/spanish_english_fast.xlsx
Converting to XML: 5.888324 seconds
Converting to JSON: 0.522944 seconds

I am using 1.10.0

@seadowg
Copy link
Contributor

seadowg commented May 30, 2022

I noticed I was getting OOM on my 1GB Central instance errors (forcing me to power cycle to get it working again) when uploading a form I created in Libre Office. @lognaturel thought that sounds related to this issue and suggested I try creating a fresh version in Google Docs to compare, and it indeed seems to work fine. This zip contains the Libre Office and Google Docs XLSX files for comparison.

@lognaturel
Copy link
Contributor

@qlands were your files created with LibreOffice as well? So far all reports of misbehaving files seem to be related to LibreOffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants