Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track input data generation process in the repo #236

Merged
merged 27 commits into from
Apr 25, 2023

Conversation

willGraham01
Copy link
Collaborator

@willGraham01 willGraham01 commented Feb 22, 2023

Closes #240 if the branch chain below is completely merged into main |

Part of a series of PRs that progressively adds the system tests to the new workflow piece by piece:
main <- #236 <- #241 <- #242 <- #243 <- #244

Ongoing conversations:

Changes

Introduces the new workflow (for a system test) of:

  1. Regenerate the input .mat files
  2. Run tdms executable
  3. Compare .mat outputs to the reference material from Zenodo

This is just a PoC for now, so only arc_01 has the updated workflow (and associated files) currently.

  • The tdms/tests/system/data/input_generation directory has been created. The src/matlab folder has been moved to be the matlab/ subdirectory of this new directory, since these functions are only used in the generation of the input files.
  • A README.md has been added to tdms/tests/system to outline the system test workflow
  • generate_test_input.py is a python file that defines the objects which will allow the system tests (written in Python) to call the necessary MATLAB scripts/functions and regenerate the input data.

There are also two futher files, config_01.yaml and input_file_01.m, whose purpose is explained in the README. The config_01.yaml files can also double-up as the config files that test_system.py already reads from when working in the zipped directories.

Testing

Sadly, #239 and #238 show that it is impossible for us to regenerate the .mat inputs on GitHub runners all the while MATLAB files are required. As such, we are forced to make use of the current test_system.py and read_config.py supporting files.

However, test_regen.py and the supporting tdms_testing_class.py are available in the repository to be run locally on a machine that has the requirements installed. These can be run locally before pushing whenever we need to alter the format of the inputs to tdms themselves.

As we remove the dependency on MATLAB (#70), one of the task might be to translate the input generation into Python anyway. In which case, test_system.py and read_config.py would be made redundant.

@samcunliffe
Copy link
Member

I like this more than #229, indeed. Thanks for the (moderately) drastic reworking ❤️ .

You might be able to avoid some of your python classes altogether. You can take advantage of python's implicit tuple (un)packing and maybe parameters as a vanilla python dictionary.

The bscan/run_bscan.m file contains the matlab function which generates the input data. Regrettably, we need to specify particular inputs to this script for each test, which requires us to translate the argument values as read from the config.yaml file into a long string of values in the correct order, which can in turn be called from matlab.

BScanArguments is essentially a glorified dictionary, it's members sharing the names of the input arguments to run_bscan. It's create_bscan_argument can be used to convert the values that need to be passed into a string of the form:
run_bscan(input_arguments_in_the_correct_order).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does black allow this? I'm expecting a forced wrap at 88.

Copy link
Member

@samcunliffe samcunliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, as I said in person: no big structural problems with this, but I do have niggly comments.

The main ones:

  • Can haz simpler python pls? Simple is better than complex
  • I think tests/system/README.md should be integrated into doc/developers.md.
  • I'm pretty sure sys.path.insert followed by further imports is bad python style ([citation needed]).

tdms/tests/system/README.md Outdated Show resolved Hide resolved
tdms/tests/system/README.md Outdated Show resolved Hide resolved
tdms/tests/system/README.md Outdated Show resolved Hide resolved
tdms/tests/system/README.md Outdated Show resolved Hide resolved
tdms/tests/system/README.md Outdated Show resolved Hide resolved
tdms/tests/system/README.md Outdated Show resolved Hide resolved
# Run the test workflow, for this test
workflow(test_id)
# End of system test
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file is a lot of WET from test_system.test_system and test_system.workflow 👎 . Why is it duplicated?

If we really need a second function, I'd rename it and merge the comments into the docstring. But you don't need to repeat your code. Just import the functionality from test_system and wrap the extra setup (or whatever).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anticipated that test_system and test_regen will diverge very quickly as more tests (requiring different input methods) are added, which they do: final version of test_regen is here. I'd argue that the workflows are sufficiently divergent as to warrant two separate files, but there are some other supporting reasons:

  • test_regen's workflow is already different to test_system's. Additionally, the pytest parametrisation also changes to use the config files rather than relying on the manually-typed Zenodo list
  • workflow exists to provide some distinction between the fetch-request for the data and the actual steps in the test. Granted, not the most informative name, but I would want to separate it from the setup steps regardless.
  • test_system.py (and it's supporting classes) will be made obsolete if we ever convert the system tests entirely into Python or get MATLAB to behave on runners. In which case, I'd rather have test_regen working stand-alone so we don't get surprised when other files get deleted. (Also applies to a couple of the other classes in the files below)

Although I'm aware that I'm very much grounding my "justification" in the future. Importing from test_system would've been nicer to look at, at this stage of the chain.

Comment on lines 15 to 22
"""One run, or execution, of the TDMS executable. That is, one call to

tdms [OPTIONS] [input_file] [gridfile] [output_file]

that is required as part of a single system test.
The run() command executes the above call to TDMS
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need another whole class to wrap this? I'm lost as to why what we had before is insufficient.

We now have TDMSRun wrapping utils.run_tdms wrapping a relatively simple subprocess.Popen. Can we stick with utils.run_tdms?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much rather reduce the wrapping by having what is presently the content of utils.run_tdms as part of this class' method. We only use utils.run_tdms in test_system and test_regen (technically in the classes they rely on), one of which we want to get rid of eventually.

Besides run_tdms, utils.py itself only really contains:

  • The H5File class
  • relative_mean_squared_difference function (which is only called once, within the H5File class)
  • work_in_zipped_dir, which only test_system uses
  • download_data, which test_regen & test_system use.
    So I'd rather break utils into a separate files:
  • H5File .py file,
  • Move download_data into test_regen (or test_system?)
  • Move the others into test_system so they disappear with it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid, I'm not convinced. But you can at least make this a dataclasses.dataclass and use minimal post_init to check the file exists.

I would probably err on the side of having Paths rather than Union[Path, str]. Just because less code is better and we're the only people using this.

(Tangential: I modern python allows, and maybe even prefers, the | or... flibble: Path | str).

Comment on lines 115 to 118
@dataclass
class TDMSRunAndReference:
tdms_run: TDMSRun # The object that handles the tdms call itself
ref_data: str # The name of the reference data within the .zip archive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't I have a plain python tuple?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can, but it makes this typedef look nasty and on this line, I'd rather write

run.tdms_run.run() 
# OK, maybe I should name things
# run_and_ref.tdms_run.run()
# instead

over

run[0].run()

Because the latter looks like I've forgotten to loop through an index of runs, which I'm not doing here.

Comment on lines 121 to 65
class TDMSSystemTest:
"""Instance of one tdms system test. This object controls all the tdms executations that required for the system test arc_{test_id} to run, and handles the setup and tear-down of each run.

The runs themselves are performed by TDMSRun instances.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐢
🐢
🐢

@samcunliffe samcunliffe self-requested a review March 22, 2023 10:52
Copy link
Member

@samcunliffe samcunliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still uneasy that there are now three (four?) python classes to wrap: calling MATLAB to generate an input array, and calling tdms.

... sorry for the slow review 😔

.github/workflows/linux_tests.yml Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
doc/developers.md Outdated Show resolved Hide resolved
willGraham01 and others added 6 commits April 21, 2023 13:47
- Move content of tdms/tests/system/README.md into doc/developers.md to avoid duplication
- Remove unnecessary spaces in run_bscan script
- Type-hinting guards in BScanArguments (cast to str to account for Path inputs)
- Remove _stop_engine method and just write the .quit() command explicitly
- Proper imports with __init__.py's in test_regen.py
Co-authored-by: Sam Cunliffe <samcunliffe@users.noreply.github.com>
@samcunliffe
Copy link
Member

willGraham01 force-pushed the wgraham-input_data_generation branch from 7967bbd to 5aa7cd2

💪

@willGraham01
Copy link
Collaborator Author

willGraham01 commented Apr 24, 2023

Still uneasy that there are now three (four?) python classes to wrap: calling MATLAB to generate an input array, and calling tdms

Have rebased this guy onto main and removed a lot of the wrapping classes. Diff is now ~700 lines as opposed to nearly 1000, and we mostly have functions where requested:

Have included actual links here since the old comments will be out of date with the combination of a rebase + conflict fix + class -> function reworking.

Copy link
Member

@samcunliffe samcunliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all of the back and forth. And sorry again for the epic review.

@samcunliffe
Copy link
Member

Also, I managed to avoid conflicts in the merge of #265.
🍉

... I leave you the joy of clicking merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track system test .mat input generation.
2 participants