Refactoring parsers #22

simonbowly · 2022-01-14T07:56:52Z

@maliheha and I are working on this on my fork. These updates aims to separate the parsers for different sections of the log into submodules so things are easier to unit-test and modify as we go.

@mattmilten no need to review for now, we'll let you know when it's in a more complete state.

@ronaldvdv we noticed you're also adding some tests and example data - can we fold this into the refactored code? Just want to make sure we aren't duplicating work and writing tests that may eventually clash with one another. Happy to discuss and coordinate together.

Added temporary tests asserting against current parser outputs from log files in the data directory

Separate parsing components for norel and nodelog sections. Replace main code sections with the new parsers.

Use common interface and helper functions

… responsible for

Update tox config and package data specs

Move loaders for parameter defaults and descriptions to dedicated grblogtools.parameters package

merged_logs argument is not needed, we always collect multiple logs from a file and use a LogNumber column to distinguish them

Avoid trying to create Log/ModelFile/Model columns when ModelFilePath is not available.

Co-authored-by: Maliheh Aramon <maliheha@users.noreply.github.com>

simonbowly · 2022-03-09T12:56:45Z

@mattmilten we are pretty close to done with refactoring in this PR :) There are some very minor behaviour changes which we've documented in the changelog. To try to avoid regressions, the previous code is still in the repo, and tests/test_regression.py tests the refactored code against the v1.3.2 code (we should delete both of these before finalising).

When you have a chance to review could you let us know your thoughts? Probably a few things could still be tidied up but we're quite happy with the structure.

Thanks @maliheha for your great work on this so far, and for putting the below summary of the new structure:

In the new design, parsers are the most inner layer, each responsible for parsing a specific section in the log file.
The parsers can be found in src/grblogtools/parsers/. Each parser has a main method parse() that takes a single line as an input and returns True/False in case the line is matched by a pattern associated with the parser.
Each parser has two other methods named get_summary() and get_progress() (if applicable) to return the parsed summary or the detailed progress.
The next layer is the SingleLogParser which is responsible for parsing one single log run.
The SingleLogParser(see src/grblogtools/parsers/single_log.py) has the main method parse() which takes in a line as an input and returns True/False if the line is matched by some patterns. The SingleLogParser has two internal variables: current_parser and future_parsers which keeps them updated as it sees more lines. It initializes by setting the current_parser to the header_parser and the future_parsers to the list of remaining parsers. If a line is not matched by the current_parser, the SingleLogParser checks whether it should pass the parser to any of the remaining parsers or not. As soon as any of the parsers in the list of future_parsers returns True, the current_parser and the future_parsers are updated.
The final layer (API layer) (see src/grblogtools/api.py) has also a method named parse() where it takes in strings of log file patterns as individual arguments and returns a ParseResult object. The methods summary() and progress() of the ParseResult class returns the summary dict and search progress information as before.
The main method of the ParseResult class is parse() that takes in the path to a single log file (the log file can include multiple runs) and uses SingleLogParser objects to parse each log run.
The API also includes a legacy function named get_dataframe(logfiles, timelines=False, prettyparams=False). i.e. the API is unchanged other than dropping merged_logs (these are handled automagically)
The usage example of the API can be found in the header of the api.py module.
Unit tests can be found in tests/ where separate tests are written for individual parsers and api.
There is a file named test_refactor_regression.py including regression tests to ensure the equivalent outputs from the current grblogtools api and the newly designed api.
The tests run against current test data in data folder and newly added test data in tests/assets.
The plotting API is left untouched and there is no plan to re-factor it under this project.
The extension of the current API to tuner, multiple objective optimization, concurrent and distributed optimization are the next steps, which can be directly incorporated into the main grblogtools git repo after the current base code is reviewed and hopefully merged.

mattmilten

Excellent work! One question: would it make sense to define a base class for the different parsers? They all implement the same function anyway but the classes do not seem to be connected by a common base class.

src/grblogtools/__init__.py

src/grblogtools/cli.py

src/grblogtools/parsers/nodelog.py

CONTRIBUTING.md

CITATION.cff

README.md

simonbowly · 2022-03-29T11:05:54Z

Excellent work! One question: would it make sense to define a base class for the different parsers? They all implement the same function anyway but the classes do not seem to be connected by a common base class.

Thanks! I don't think abstract base classes really add much here. There's a common API between classes but the abstract class doesn't add any functionality, so I'm inclined to just leave it as is.

Click 8.1.0 broke something in the black formatter. Temporarily fix to <= 8.0.4

Update author list and bump version

Remove -m (merged logs) option from cli. Update cli to use new API. Run api tests against top level import.

Termination regexes could return None values, which then failed in type conversion.

maliheha · 2022-03-30T04:34:46Z

tests/parsers/test_header.py

+        has seen a proper log start line."""
+
+        parser = HeaderParser()
+        parse_lines(parser, ["Presolved: 390 rows, 316 columns, 1803 nonzeros"])


I have seen this to happen in the dev log when we have node presolve output too. If I am not overlooking, in a typical log, the header parser should not interrupt because we should have been in the presolve parser when reaching this line. That being said, thank you for the change. It is definitely safer to guard against some patterns.

simonbowly · 2022-03-31T00:56:09Z

@mattmilten this is ready to go!

maliheha and others added 30 commits December 27, 2021 15:58

add pre-commit to the list of install_requires

1615e9a

add the pre-commit yaml file

4a4d264

pre-commit formatting

f96d9b5

add the header_parser

48bc8c7

add the test for the header_parser

95c5f97

add two utility function, one copied from Simon's branch

61cf431

changes to the header_parser

810a0cb

tests for the header parser

d861a83

add the presolve_parser

ca0b00f

add the tests for the presolved parser

983de75

apply pre-commit hooks

71d813d

apply pre-commits

a9b39a5

add the contribuiting readme

c6d42e1

self-review of the CONTRIBUTING.md file

9d56ee7

self-review of the header_parser

c769772

self-review helpers

8b4ea19

self-review of the header and presolve parsers

c1f3a4f

self-review the tests

a8fbd3c

Add regression tests

0171252

Added temporary tests asserting against current parser outputs from log files in the data directory

Add norel and nodelog parser classes.

9eb8873

Separate parsing components for norel and nodelog sections. Replace main code sections with the new parsers.

Merge header-presolve-parsers into refactor

c0149e9

Run pre-commit hooks on merged files

406fd42

Update norel and nodelog parsers

5b0364d

Use common interface and helper functions

Truncate file paths in regression tests

c8caa33

String a full log parser together

8db8be9

Test combined summary

1abcb4c

remove the pre-commit package

a7dc136

chamge get_log() to get_summary()

fe84c78

replace get_log() with get_summary()

983ec67

change get_log to get_summary and extend the # of lines the header is…

dd2eff2

… responsible for

maliheha and others added 12 commits February 15, 2022 22:57

Fix comment

150fc1a

Move to setup.cfg/pyproject.toml builds

f55b621

Update tox config and package data specs

Move json data

670e35e

Move loaders for parameter defaults and descriptions to dedicated grblogtools.parameters package

Change imports in __init__.py to new code

d065071

Add prettyparams option to new code

9cf3ff0

Allow multiple logfile patterns

e25c9f4

Remove merged logs from cli

2ee6614

merged_logs argument is not needed, we always collect multiple logs from a file and use a LogNumber column to distinguish them

Create common_log_data separate from summary

80b2b3e

Handle logs without a read-in model

f4dd72b

Avoid trying to create Log/ModelFile/Model columns when ModelFilePath is not available.

Fix ReadTime/ReadingTime confusion

41f83a3

Add some notes to changelog

33ab664

Fix to docstring and remove dead code

1cb0dbb

Co-authored-by: Maliheh Aramon <maliheha@users.noreply.github.com>

mattmilten self-requested a review March 10, 2022 14:55

mattmilten approved these changes Mar 18, 2022

View reviewed changes

simonbowly added 9 commits March 29, 2022 22:35

Pin pre-commit dependency

989a46a

Click 8.1.0 broke something in the black formatter. Temporarily fix to <= 8.0.4

Update metadata

486e199

Update author list and bump version

Update cli options

af797d5

Remove -m (merged logs) option from cli. Update cli to use new API. Run api tests against top level import.

Delete unused regex

be35ae2

Add tests for plotting functions

f1529c8

Update readme and notebook with new API

822c2bb

Remove v1 code and regression test

ade619e

Guard some patterns in header parser

348e61c

Fix an issue with interrupt terminations

77a9207

Termination regexes could return None values, which then failed in type conversion.

maliheha reviewed Mar 30, 2022

View reviewed changes

simonbowly marked this pull request as ready for review March 31, 2022 00:54

simonbowly changed the title ~~WIP refactoring parsers~~ Refactoring parsers Mar 31, 2022

mattmilten merged commit 6a7783e into Gurobi:master Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring parsers #22

Refactoring parsers #22

simonbowly commented Jan 14, 2022

simonbowly commented Mar 9, 2022

mattmilten left a comment

simonbowly commented Mar 29, 2022

maliheha Mar 30, 2022

simonbowly commented Mar 31, 2022

Refactoring parsers #22

Refactoring parsers #22

Conversation

simonbowly commented Jan 14, 2022

simonbowly commented Mar 9, 2022

mattmilten left a comment

Choose a reason for hiding this comment

simonbowly commented Mar 29, 2022

maliheha Mar 30, 2022

Choose a reason for hiding this comment

simonbowly commented Mar 31, 2022