Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

698: library API for non-path input, accept markdown and dict input #712

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Jun 25, 2024

Closes #698
Closes #599

  • add: xls2xform.convert for library users to call pyxform without needing to use files (accepts bytes/bufferedreader (from open())/strings)
    • accepts markdown input since this is widely used by pyxform
    • accepts dict to avoid needing to use internal funcs that may change (also you could more easily go from pandas/database -> dict -> pyxform -> xform; the dict use case could be improved by adding a schema or something).
  • chg: avoid writing to files unless validate=True (for ODK Validate)
    • also avoid assuming any files were written, e.g. missing_ok=True
  • chg: move xls/x_sheet_to_csv, sheet_to_csv from utils.py to xls2json_backends.py because they are backends for csv input.
  • chg: move md_to_dict from test directory into xls2json_backends.py
  • chg: refactor pyxform_test_case.py to use xls2xform.convert only, instead of internal funcs associated with md_to_dict, so that the existing tests check API stability e.g. file types, dict input, etc.
  • chg: survey.py print_xform_to_file returns xml, so now survey.to_xml() doesn't create xml twice (once itself, once via print_xform_to_file).
    • similarly, xls2xform would read the input XLSForm a 2nd time if there were external choices. Now it uses the data from the first reading.

Why is this the best possible solution? Were any other approaches considered?

Differences to #698: as shown in the new tests in test_xls2xform.py, there are a few reasonable call patterns that don't return BytesIO; and adding a new function allows more easily tailoring it to the library use case.

Was not initially planning on the markdown/dict changes but considering a) #599 has been open for a while, b) the recent review of pyxform-related projects showed there is a need for alternative input formats, or at least more flexibility, c) I figured that it would be best to support markdown/dict now so that the API was designed for that, rather than needing to potentially refactor for it later. I think the main value in pyxform is in workbook_to_json -> survey.py -> xform, so by adding the option of providing a dict, library users can do whatever deserialisation is necessary and pass the data to pyxform.

What are the regression risks?

There are some functions moved around as mentioned above, so at upgrade time, library users would need to change import locations or switch to using xls2xform.convert. Functionality should be the same.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

I think there's a good amount of info in the docstrings and tests, but a readme section could be added to make it easier to take in, if required.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run python -m unittest and verified all tests pass
  • run ruff format pyxform tests and ruff check pyxform tests to lint code
  • verified that any code or assets from external sources are properly credited in comments

- add: xls2xform.convert for library users to call pyxform without
  needing to use files (accepts bytes/file handles/strings)
  - accepts markdown input since this is widely used by pyxform
  - accepts dict to avoid needing to use internal funcs that may change
- chg: avoid writing to files unless validate=True (for ODK Validate)
  - also avoid assuming any files were written, e.g. missing_ok=True
- chg: move xls/x_sheet_to_csv, sheet_to_csv from utils.py to
  xls2json_backends.py because they are backends for csv input.
- chg: move md_to_dict from test directory into xls2json_backends.py
- chg: refactor pyxform_test_case.py to use xls2xform.convert only,
  instead of internal funcs associated with md_to_dict, so that the
  existing tests check API stability e.g. file types, dict input, etc.
@lindsay-stevens
Copy link
Contributor Author

FYI @spwoodcock

@lindsay-stevens
Copy link
Contributor Author

lindsay-stevens commented Jun 26, 2024

Forum thread for community input: https://forum.getodk.org/t/handling-xlsform-xform-conversion-in-memory/45830/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move markdown to pyxform functionality out of tests and into main package
1 participant