Track input data generation process in the repo #236

willGraham01 · 2023-02-22T14:15:49Z

Closes #240 if the branch chain below is completely merged into main |

Part of a series of PRs that progressively adds the system tests to the new workflow piece by piece:
main <- #236 <- #241 <- #242 <- #243 <- #244

Ongoing conversations:

Changes

Introduces the new workflow (for a system test) of:

Regenerate the input .mat files
Run tdms executable
Compare .mat outputs to the reference material from Zenodo

This is just a PoC for now, so only arc_01 has the updated workflow (and associated files) currently.

The tdms/tests/system/data/input_generation directory has been created. The src/matlab folder has been moved to be the matlab/ subdirectory of this new directory, since these functions are only used in the generation of the input files.
A README.md has been added to tdms/tests/system to outline the system test workflow
generate_test_input.py is a python file that defines the objects which will allow the system tests (written in Python) to call the necessary MATLAB scripts/functions and regenerate the input data.

There are also two futher files, config_01.yaml and input_file_01.m, whose purpose is explained in the README. The config_01.yaml files can also double-up as the config files that test_system.py already reads from when working in the zipped directories.

Testing

Sadly, #239 and #238 show that it is impossible for us to regenerate the .mat inputs on GitHub runners all the while MATLAB files are required. As such, we are forced to make use of the current test_system.py and read_config.py supporting files.

However, test_regen.py and the supporting tdms_testing_class.py are available in the repository to be run locally on a machine that has the requirements installed. These can be run locally before pushing whenever we need to alter the format of the inputs to tdms themselves.

As we remove the dependency on MATLAB (#70), one of the task might be to translate the input generation into Python anyway. In which case, test_system.py and read_config.py would be made redundant.

tdms/tests/system/data/input_generation/README.md

samcunliffe · 2023-02-23T15:53:54Z

I like this more than #229, indeed. Thanks for the (moderately) drastic reworking ❤️ .

You might be able to avoid some of your python classes altogether. You can take advantage of python's implicit tuple (un)packing and maybe parameters as a vanilla python dictionary.

tdms/tests/system/data/input_generation/generate_test_input.py

samcunliffe · 2023-02-23T15:59:55Z

tdms/tests/system/data/input_generation/generate_test_input.py

+    The bscan/run_bscan.m file contains the matlab function which generates the input data. Regrettably, we need to specify particular inputs to this script for each test, which requires us to translate the argument values as read from the config.yaml file into a long string of values in the correct order, which can in turn be called from matlab.
+
+    BScanArguments is essentially a glorified dictionary, it's members sharing the names of the input arguments to run_bscan. It's create_bscan_argument can be used to convert the values that need to be passed into a string of the form:
+    run_bscan(input_arguments_in_the_correct_order).


How does black allow this? I'm expecting a forced wrap at 88.

samcunliffe

Yep, as I said in person: no big structural problems with this, but I do have niggly comments.

The main ones:

Can haz simpler python pls? Simple is better than complex
I think tests/system/README.md should be integrated into doc/developers.md.
I'm pretty sure sys.path.insert followed by further imports is bad python style ([citation needed]).

tdms/tests/system/README.md

samcunliffe · 2023-03-10T14:54:15Z

tdms/tests/system/test_regen.py

+    # Run the test workflow, for this test
+    workflow(test_id)
+    # End of system test
+    return


This whole file is a lot of WET from test_system.test_system and test_system.workflow 👎 . Why is it duplicated?

If we really need a second function, I'd rename it and merge the comments into the docstring. But you don't need to repeat your code. Just import the functionality from test_system and wrap the extra setup (or whatever).

I anticipated that test_system and test_regen will diverge very quickly as more tests (requiring different input methods) are added, which they do: final version of test_regen is here. I'd argue that the workflows are sufficiently divergent as to warrant two separate files, but there are some other supporting reasons:

test_regen's workflow is already different to test_system's. Additionally, the pytest parametrisation also changes to use the config files rather than relying on the manually-typed Zenodo list

workflow exists to provide some distinction between the fetch-request for the data and the actual steps in the test. Granted, not the most informative name, but I would want to separate it from the setup steps regardless.

test_system.py (and it's supporting classes) will be made obsolete if we ever convert the system tests entirely into Python or get MATLAB to behave on runners. In which case, I'd rather have test_regen working stand-alone so we don't get surprised when other files get deleted. (Also applies to a couple of the other classes in the files below)

Although I'm aware that I'm very much grounding my "justification" in the future. Importing from test_system would've been nicer to look at, at this stage of the chain.

samcunliffe · 2023-03-10T14:58:32Z

tdms/tests/system/tdms_testing_class.py

+    """One run, or execution, of the TDMS executable. That is, one call to
+
+    tdms [OPTIONS] [input_file] [gridfile] [output_file]
+
+    that is required as part of a single system test.
+    The run() command executes the above call to TDMS
+    """


Do we really need another whole class to wrap this? I'm lost as to why what we had before is insufficient.

We now have TDMSRun wrapping utils.run_tdms wrapping a relatively simple subprocess.Popen. Can we stick with utils.run_tdms?

I'd much rather reduce the wrapping by having what is presently the content of utils.run_tdms as part of this class' method. We only use utils.run_tdms in test_system and test_regen (technically in the classes they rely on), one of which we want to get rid of eventually.

Besides run_tdms, utils.py itself only really contains:

The H5File class

relative_mean_squared_difference function (which is only called once, within the H5File class)

work_in_zipped_dir, which only test_system uses

download_data, which test_regen & test_system use.
So I'd rather break utils into a separate files:

H5File .py file,

Move download_data into test_regen (or test_system?)

Move the others into test_system so they disappear with it

I'm afraid, I'm not convinced. But you can at least make this a dataclasses.dataclass and use minimal post_init to check the file exists.

I would probably err on the side of having Paths rather than Union[Path, str]. Just because less code is better and we're the only people using this.

(Tangential: I modern python allows, and maybe even prefers, the | or... flibble: Path | str).

samcunliffe · 2023-03-10T14:59:08Z

tdms/tests/system/tdms_testing_class.py

+@dataclass
+class TDMSRunAndReference:
+    tdms_run: TDMSRun  # The object that handles the tdms call itself
+    ref_data: str  # The name of the reference data within the .zip archive


Can't I have a plain python tuple?

You can, but it makes this typedef look nasty and on this line, I'd rather write

run.tdms_run.run() # OK, maybe I should name things # run_and_ref.tdms_run.run() # instead

over

run[0].run()

Because the latter looks like I've forgotten to loop through an index of runs, which I'm not doing here.

samcunliffe · 2023-03-10T14:59:46Z

tdms/tests/system/tdms_testing_class.py

+class TDMSSystemTest:
+    """Instance of one tdms system test. This object controls all the tdms executations that required for the system test arc_{test_id} to run, and handles the setup and tear-down of each run.
+
+    The runs themselves are performed by TDMSRun instances.
+    """


🐢
🐢
🐢

samcunliffe

Still uneasy that there are now three (four?) python classes to wrap: calling MATLAB to generate an input array, and calling tdms.

... sorry for the slow review 😔

.github/workflows/linux_tests.yml

doc/developers.md

…y contents

…e skipped by pytest on CI

…ine necessity

Co-authored-by: Sam Cunliffe <samcunliffe@users.noreply.github.com>

- Move content of tdms/tests/system/README.md into doc/developers.md to avoid duplication - Remove unnecessary spaces in run_bscan script - Type-hinting guards in BScanArguments (cast to str to account for Path inputs) - Remove _stop_engine method and just write the .quit() command explicitly - Proper imports with __init__.py's in test_regen.py

Co-authored-by: Sam Cunliffe <samcunliffe@users.noreply.github.com>

samcunliffe · 2023-04-24T09:19:11Z

willGraham01 force-pushed the wgraham-input_data_generation branch from 7967bbd to 5aa7cd2

💪

willGraham01 · 2023-04-24T09:24:03Z

Still uneasy that there are now three (four?) python classes to wrap: calling MATLAB to generate an input array, and calling tdms

Have rebased this guy onto main and removed a lot of the wrapping classes. Diff is now ~700 lines as opposed to nearly 1000, and we mostly have functions where requested:

TDMSRun is a dataclass with this redundant class now absorbed into it. New TDMSRun class is here..
TDMSSystemTest is no longer a class and is just the run_system_test function that infers all the TDMSRuns that make up a system test.
test_regen.py is still very similar to test_system.py, but these will diverge from each other further up the chain.
Running run_bscan is now a wrapping function rather than a class, that takes a MatlabEngine session as input (connect_engine didn't work for some reason).
There is now a function to start a MATLAB session with the extra options TDMS needs, rather than a wrapping class.

Have included actual links here since the old comments will be out of date with the combination of a rebase + conflict fix + class -> function reworking.

tdms/tests/system/run_system_test.py

samcunliffe

Thanks a lot for all of the back and forth. And sorry again for the epic review.

samcunliffe · 2023-04-24T14:55:44Z

Also, I managed to avoid conflicts in the merge of #265.
🍉

... I leave you the joy of clicking merge.

Fix typo in refactoring

samcunliffe reviewed Feb 23, 2023

View reviewed changes

tdms/tests/system/data/input_generation/README.md Outdated Show resolved Hide resolved

samcunliffe reviewed Feb 23, 2023

View reviewed changes

tdms/tests/system/data/input_generation/generate_test_input.py Outdated Show resolved Hide resolved

samcunliffe reviewed Feb 23, 2023

View reviewed changes

This was referenced Feb 27, 2023

setup-matlab doesn't allow us to run MATLAB directly or via python #239

Open

Track generation for arc_02 and arc_03 #241

Merged

willGraham01 marked this pull request as ready for review March 1, 2023 16:10

samcunliffe reviewed Mar 10, 2023

View reviewed changes

samcunliffe self-requested a review March 22, 2023 10:52

samcunliffe reviewed Apr 13, 2023

View reviewed changes

samcunliffe mentioned this pull request Apr 20, 2023

Final part of #230: clarity + the flow of information #265

Merged

1 task

willGraham01 and others added 19 commits April 21, 2023 13:42

Update .gitignore to ony ignore data/*.zip rather than whole director…

2e767ac

…y contents

Relocate common MATLAB functionality

8888e82

Clarify what we're generating here

d718c35

Create script for generating test data

5d90f8d

Add a readme so other people know what's going on

74e8a31

Fix now-broken pathing on MATLAB tests

bffb08c

Add regenerate_all script

b93d2db

More generality for the classes

1f9b147

1/2: new tests regenerate input data

fa53176

Regen tests now work!

3a3600a

Remove Will's quick-run hack

db765e2

Update README.md now PoConcept is complete

252feaa

Mark test_regen for skipping ATM

11133a4

Update so the mark is actually applied now

87ccfac

Update to use matlabengine for cleaner calls

fcbad8f

Update README to reflect use of matlabengine. Mark test_regen.py to b…

ebfab3f

…e skipped by pytest on CI

Update ci.yml to have pytest ignore test_regen.py on GH runners.

4240fa0

Force test_regen file ignore rather than test ignore due to MATLABEng…

85b5e80

…ine necessity

Apply suggestions from code review

dcbc87b

Co-authored-by: Sam Cunliffe <samcunliffe@users.noreply.github.com>

willGraham01 and others added 6 commits April 21, 2023 13:47

Apply suggestions from code review

6943561

Co-authored-by: Sam Cunliffe <samcunliffe@users.noreply.github.com>

Apply doc updates that were hanging from GitHub

a52492d

Simpler python plz: request granted

86f7cbb

Remove duplicate functions and wrappers

a209b25

Update paths and move regenerate_all script to top-level

5aa7cd2

willGraham01 force-pushed the wgraham-input_data_generation branch from 7967bbd to 5aa7cd2 Compare April 24, 2023 09:15

willGraham01 requested a review from samcunliffe April 24, 2023 09:24

Function and file renames accordingly

67a80a9

willGraham01 commented Apr 24, 2023

View reviewed changes

tdms/tests/system/run_system_test.py Outdated Show resolved Hide resolved

samcunliffe approved these changes Apr 24, 2023

View reviewed changes

Update tdms/tests/system/run_system_test.py

e202ba1

Fix typo in refactoring

willGraham01 merged commit e58da5e into main Apr 25, 2023
10 checks passed

willGraham01 deleted the wgraham-input_data_generation branch April 25, 2023 08:08

This was referenced Apr 28, 2023

Add arc_{08,12,13,example_fdtd} to test_regen.py #242

Merged

Add arc_09 and arc_10 to the scope of test_regen.py #243

Merged

General tidy up of matlab directory #244

Merged

This was referenced Jun 6, 2023

Input Generation: convert MATLAB scripts #342

Open

Choose a consistent file structure for integration test data #116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track input data generation process in the repo #236

Track input data generation process in the repo #236

willGraham01 commented Feb 22, 2023 •

edited

samcunliffe commented Feb 23, 2023

samcunliffe Feb 23, 2023

samcunliffe left a comment •

edited

samcunliffe Mar 10, 2023

willGraham01 Mar 10, 2023 •

edited

samcunliffe Mar 10, 2023

willGraham01 Mar 10, 2023

samcunliffe Apr 5, 2023

samcunliffe Mar 10, 2023

willGraham01 Mar 10, 2023

samcunliffe Mar 10, 2023

samcunliffe left a comment

samcunliffe commented Apr 24, 2023

willGraham01 commented Apr 24, 2023 •

edited

samcunliffe left a comment

samcunliffe commented Apr 24, 2023

Track input data generation process in the repo #236

Track input data generation process in the repo #236

Conversation

willGraham01 commented Feb 22, 2023 • edited

Ongoing conversations:

Changes

Testing

samcunliffe commented Feb 23, 2023

Choose a reason for hiding this comment

samcunliffe left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willGraham01 Mar 10, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samcunliffe left a comment

Choose a reason for hiding this comment

samcunliffe commented Apr 24, 2023

willGraham01 commented Apr 24, 2023 • edited

samcunliffe left a comment

Choose a reason for hiding this comment

samcunliffe commented Apr 24, 2023

willGraham01 commented Feb 22, 2023 •

edited

samcunliffe left a comment •

edited

willGraham01 Mar 10, 2023 •

edited

willGraham01 commented Apr 24, 2023 •

edited