Make some daffodil-runtime2 improvements by tuxji · Pull Request #525 · apache/daffodil

tuxji · 2021-04-07T21:19:58Z

Refactor duplicate code in Binary*CodeGenerator.scala files.

Change PState/UState member field from a string error message to a
struct error object. Make parse/unparse functions initialize that
member field if any error happens. Move checks for errors from
parse/unparse functions to their callers, e.g., "if (ustate->error)
return;".

Collect validation errors without stopping parsing or unparsing by
initializing another PState/UState member field pointing to a
validation struct with some fields. As last step, update all C files
using include-what-you-use to document what's used from each include.

Add instructions to README.md how to setup required C tool-chain
elements on Linux and Windows.

Generate daffodil-runtime2's name and version into generated_code.c
automatically so we won't need to keep it up to date.

Changelog:

In README.md, fix or disable markdownlint warnings. Expand Build
Requirements with instructions how to install C compiler, Mini-XML
library, java, and sbt on Linux and Windows. Verify/update old
hyperlinks and add new MSYS2 and clang hyperlinks.

In build.sbt, add Compile / cCompiler and ccArchiveCommand settings as
a guide to show developers how to override these default settings if
they want sbt to call the C compiler with a different name than "cc".
The sbt-cc plugin doesn't iterate through a list (${CC}, cc, gcc,
clang, zig cc) to find a C compiler like our runtime2 code does.

In daffodil_argp.c, remove argp_program_version's definition since we
now generate it automatically within generated_code.c.

In daffodil_main.c, add fflush_or_exit and change fopen_or_exit /
fclose_or_open to call continue_or_exit with error message instead of
calling exit directly. Make main print any validation diagnostics
before printing any errors since they've been split into separate
PState/UState member fields. Also make main call continue_or_exit
instead of calling error directly and pass Error structs, not strings,
to all continue_or_exit calls.

In stack.c, make stack functions call continue_or_exit instead of
calling error directly and pass Error structs, not strings, to all
continue_or_exit calls.

In xml_reader.c, make conversion functions return Error structs, not
strings, to indicate errors converting strings to primitive datums.
Make reader functions return Error structs, not strings, to indicate
errors reading XML data.

In xml_writer.c, make writer functions return Error structs, not
strings, to indicate errors writing XML data. Remove redundant stack
checking code already performed within stack fuctions.

In errors.c (new file), use switch statement inside error_message
function to centralize all error messages in one place for easier
future internationalization. Use switch statement inside
print_mybe_stop function to allow error messages to interpolate an
extra detail (string or integer) if needed. Define a single static
array that can be used to store validation diagnostics and define a
function to print validation diagnostics. Implement continue_or_exit
and eof_or_error functions here as well.

In errors.h (new file), define enumerations for each type of error
which could occur in C code. Define an Error struct to store an
enumeration and optionally an extra detail (string or integer). Also
define a Diagnostics struct to store an array of data validation
errors which should be reported separately from parse errors. Move
PState and UState structs here from infoset.h and give them member
fields pointing to data validation and parse errors. Also declare
need_diagnostics, print_diagnostics, continue_or_exit, eof_or_error
prototypes and UNUSED macro in errors.h.

In infoset.c, make walkInfosetNode and walkInfoset functions return
Error struct, not string. Remove eof_or_error_msg function (now moved
to errors.c with shorter name).

In infoset.h, change InitChoiceRD and Visit* prototypes to return
Error structs, not strings. Remove PState & UState structs,
eof_or_error_msg prototype, and UNUSED macro (all moved to errors.h).

In parsers.c, make parse__ functions stop enclosing
their code in "if (!pstate->error_msg)" statements and use Error
structs, not strings. Callers are now expected to check for errors
after each call immediately. Make parse_fill_bytes use
read_stream_update_position helper macro too. Make
parse_validate_fixed report errors in separate pstate->validati field,
not pstate->error field.

In unparsers.c, make unparse__ functions stop enclosing
their code in "if (!ustate->error_msg)" statements and use Error
structs, not strings. Callers are now expected to check for errors
after each call immediately. Make unparse_fill_bytes use
write_stream_update_position helper macro too. Make
unparse_validate_fixed report errors in separate ustate->validati
field, not ustate->error field.

In NestedUnion.c and ex_nums.c examples, update with freshly generated
code to illustrate code generator changes (in particular, defining
argp_program_version, adding an if statement after each parse/unparse
call to check for an error and skip rest of calls immediately, and
returning errors directly from initChoice functions with choice key
that failed to match any branch key).

In BinaryAbstractCodeGenerator.scala (new file), refactor common code
that used to be duplicated in BinaryBooleanCodeGenerator.scala,
BinaryFloatCodeGenerator.scala, and
BinaryIntegerKnownLengthCodeGenerator.scala; now they pass parameters
to binaryAbstractGenerateCode instead.

In CodeGeneratorState.scala, make all necessary changes to report
errors using structs instead of strings in generated code including
initChoice functions, add if statements after each parse/unparse call,
and generate the program name and version automatically.

In build.properties, change sbt.version from 1.4.1 to 1.4.9 to ensure
successful sbt build from scratch in a new Ubuntu or Windows VM (sbt
was getting java.lang.NoClassDefFound: scala/Serializer).

Refactor duplicate code in Binary*CodeGenerator.scala files. Change PState/UState member field from a string error message to a struct error object. Make parse/unparse functions initialize that member field if any error happens. Move checks for errors from parse/unparse functions to their callers, e.g., "if (ustate->error) return;". Collect validation errors without stopping parsing or unparsing by initializing another PState/UState member field pointing to a validation struct with some fields. As last step, update all C files using include-what-you-use to document what's used from each include. Add instructions to README.md how to setup required C tool-chain elements on Linux and Windows. Generate daffodil-runtime2's name and version into generated_code.c automatically so we won't need to keep it up to date. Changelog: In README.md, fix or disable markdownlint warnings. Expand Build Requirements with instructions how to install C compiler, Mini-XML library, java, and sbt on Linux and Windows. Verify/update old hyperlinks and add new MSYS2 and clang hyperlinks. In build.sbt, add Compile / cCompiler and ccArchiveCommand settings as a guide to show developers how to override these default settings if they want sbt to call the C compiler with a different name than "cc". The sbt-cc plugin doesn't iterate through a list (${CC}, cc, gcc, clang, zig cc) to find a C compiler like our runtime2 code does. In daffodil_argp.c, remove argp_program_version's definition since we now generate it automatically within generated_code.c. In daffodil_main.c, add fflush_or_exit and change fopen_or_exit / fclose_or_open to call continue_or_exit with error message instead of calling exit directly. Make main print any validation diagnostics before printing any errors since they've been split into separate PState/UState member fields. Also make main call continue_or_exit instead of calling error directly and pass Error structs, not strings, to all continue_or_exit calls. In stack.c, make stack functions call continue_or_exit instead of calling error directly and pass Error structs, not strings, to all continue_or_exit calls. In xml_reader.c, make conversion functions return Error structs, not strings, to indicate errors converting strings to primitive datums. Make reader functions return Error structs, not strings, to indicate errors reading XML data. In xml_writer.c, make writer functions return Error structs, not strings, to indicate errors writing XML data. Remove redundant stack checking code already performed within stack fuctions. In errors.c (new file), use switch statement inside error_message function to centralize all error messages in one place for easier future internationalization. Use switch statement inside print_mybe_stop function to allow error messages to interpolate an extra detail (string or integer) if needed. Define a single static array that can be used to store validation diagnostics and define a function to print validation diagnostics. Implement continue_or_exit and eof_or_error functions here as well. In errors.h (new file), define enumerations for each type of error which could occur in C code. Define an Error struct to store an enumeration and optionally an extra detail (string or integer). Also define a Diagnostics struct to store an array of data validation errors which should be reported separately from parse errors. Move PState and UState structs here from infoset.h and give them member fields pointing to data validation and parse errors. Also declare need_diagnostics, print_diagnostics, continue_or_exit, eof_or_error prototypes and UNUSED macro in errors.h. In infoset.c, make walkInfosetNode and walkInfoset functions return Error struct, not string. Remove eof_or_error_msg function (now moved to errors.c with shorter name). In infoset.h, change InitChoiceRD and Visit* prototypes to return Error structs, not strings. Remove PState & UState structs, eof_or_error_msg prototype, and UNUSED macro (all moved to errors.h). In parsers.c, make parse_<endian>_<type> functions stop enclosing their code in "if (!pstate->error_msg)" statements and use Error structs, not strings. Callers are now expected to check for errors after each call immediately. Make parse_fill_bytes use read_stream_update_position helper macro too. Make parse_validate_fixed report errors in separate pstate->validati field, not pstate->error field. In unparsers.c, make unparse_<endian>_<type> functions stop enclosing their code in "if (!ustate->error_msg)" statements and use Error structs, not strings. Callers are now expected to check for errors after each call immediately. Make unparse_fill_bytes use write_stream_update_position helper macro too. Make unparse_validate_fixed report errors in separate ustate->validati field, not ustate->error field. In NestedUnion.c and ex_nums.c examples, update with freshly generated code to illustrate code generator changes (in particular, defining argp_program_version, adding an if statement after each parse/unparse call to check for an error and skip rest of calls immediately, and returning errors directly from initChoice functions with choice key that failed to match any branch key). In BinaryAbstractCodeGenerator.scala (new file), refactor common code that used to be duplicated in BinaryBooleanCodeGenerator.scala, BinaryFloatCodeGenerator.scala, and BinaryIntegerKnownLengthCodeGenerator.scala; now they pass parameters to binaryAbstractGenerateCode instead. In CodeGeneratorState.scala, make all necessary changes to report errors using structs instead of strings in generated code including initChoice functions, add if statements after each parse/unparse call, and generate the program name and version automatically. In build.properties, change sbt.version from 1.4.1 to 1.4.9 to ensure successful sbt build from scratch in a new Ubuntu or Windows VM (sbt was getting java.lang.NoClassDefFound: scala/Serializer).

mbeckerle

+1 I had minor comments/suggestions. In general this is a really nice change set, as it handles the whole issue of generalizing the error diagnostics system to one that is more appropriately abstract. Not string oriented.

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

tuxji

Thanks for the suggestions, I'll take care of one now and add the other to the runtime2 todos. I have a question below as well.

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

daffodil-runtime2/src/main/resources/c/libruntime/parsers.c

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

In errors.[ch], define new function add_diagnostic to add an error to a Diagnostics struct. In parsers.c and unparsers.c, make (un)parse_validate_fixed call add_diagnostic instead of adding error directly to Diagnostics struct.

tuxji · 2021-04-10T17:48:28Z

I can run sbt compile in a Windows VM fine, so I'm not sure why the Windows checks are hanging at that step. Let's continue reviewing the PR and merge it after a second +1 since I need to rebase the branch on the latest master anyway. Will someone else please review this PR?

mbeckerle

+1 once windows CI issues are fixed, or understood to be spurious.

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

mbeckerle · 2021-04-12T15:13:20Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+    const Error *error;    // any error which stops program
+} UState;
+
+// need_diagnostics - return pointer to validation diagnostics


Is this a naming convention? I would have expected this to be named get_diagnostics, as it is a "getter".

No, your suggestion is better. Renamed to get_diagnostics.

mbeckerle · 2021-04-12T15:17:21Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.c

+    if (validati && error)
+    {
+        if (validati->length <
+            sizeof(validati->array) / sizeof(*validati->array))


I guess the only way more than one can accumulate here is if they are warnings.

But to continue execution when this is full feels like the wrong choice.

Maybe return a boolean which indicates whether the diagnostic was able to be recorded or not.

Then the place issuing the warning can decide if there are "too many warnings" and exit, or not?

Makes sense. I've made add_diagnostic return false if the array was too full to add another diagnostic.

stevedlawrence

+1

Handful of questions, but looks good! Some nice refactorings.

stevedlawrence · 2021-04-08T12:20:09Z

README.md

+data processing frameworks so as to bypass any XML/JSON overheads.

-For more information about Daffodil, see https://daffodil.apache.org/.
+For more information about Daffodil, see the [Website].


Minor, but I actually prefer the full URL here. The url is something that some people might want to know so we might not want to hide it.

Sure, I put the full URL back with brackets around it to avoid markdownlint's bare url warning (that was why I changed it before).

stevedlawrence · 2021-04-12T15:51:21Z

README.md

+and more.  [SDK] offers an easy and uniform way to install both java
+and sbt on any Unix based platform:
+
+    curl -s "https://get.sdkman.io" | bash


I don't think we should reccomend that people download a script from the internet and pipe it to bash. Especially when I'd assume most Linux distros already support installing openjdk. I'd also rather reference the sbt download page, which has deb, rpm, and windows installers

You're right, I actually read the script carefully before I ran it and I shouldn't suggest to anyone it's okay to download a script from the Internet and pipe it to bash. If you already have sdk installed, though, then showing people how to install the JDK and SBT become very simple and uniform one-liners that work on any UNIX platform (sigh, wish sdk was everywhere). I'll just recommend people install JDK and SBT directly from their websites, though.

stevedlawrence · 2021-04-12T15:54:07Z

README.md

+and Ubuntu):
+
+    # Just mentioning all other packages you might need too
+    sudo apt install build-essential curl git libmxml-dev


I'm wondering if it would be better to have a separate mardown file that describes details to how to get the required packages from different distros if that's something we want to maintain. That way the main readme just lists the dependencies. Some users can easily figure that out, and then a more detailed page is available for those that need commands. Or at the very list this can be broken down into a "Buid Setup" section with different sections, e.g . DNF-based distros, APT-based distros, Windows, etc. so people can easily skip to the section they need.

Another thought, we could make build setup instructions for different distos collapsable: https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab

That way it doesn't make the README huge by default, but has the information people need if they need it. For example:

Build Setup

Fedora

sudo dnf install ...

More stuff here

Ubuntu/Debian

sudo apt install ...

More stuff here

Windows

Windows Instructions here

I like the idea of putting the instructions in a separate markdown file better. I've moved everything I added to README.md into BUILD.md except for the four build requirements and "See BUILD.md for more details" below them. I still improved some parts of README.md anyway :).

stevedlawrence · 2021-04-12T15:59:12Z

README.md

+* C compiler C99 or higher
+* Mini-XML Version 3.2 or higher
+
+Since Daffodil has a DFDL to C backend, you will need a C compiler


Are the're any thoughts on making this C backend optional / is it even possible to be optional? I know of people that build Daffodil from source and build an RPM, and they probably would not want to the C backend, so a way to turn it off so that the potential attack surface is lessend might be a beneficial option. Is it even possible to turn off from the build right now, or is it too tightly coupled into the Daffodil compiler?

The C backend is a code generator written in 100% Scala. The most important reason why we call the C compiler during the sbt build is to alert developers as soon as possible if any change anywhere causes a runtime2 problem. We call the C compiler in multiple places:

Explicitly in build.sbt (building daffodil-runtime2 always calls the sbt-cc plugin on all C source files)

Implicitly in sbt test (running daffodil-runtime2's unit/TDML tests and daffodil-test's runtime2-specific TDML tests also tests Daffodil's ability to call the C compiler)

We probably could find a way (with a custom sbt plugin or some conditional code in build.sbt) to still build daffodil-runtime2 so that the CLI's "generate c" command still works but disable running both 1) the sbt-cc plugin and 2) the daffodil-runtime2 unit/TDML tests during a "special" sbt build. I'd prefer to put the onus on people to run a "special" sbt build that disables any C compilation so that we still alert developers to problems as early as possible (the longer problems hide, the more time it takes to fix them once they surface).

You're more familiar with SBT's DSL scripting than I am; what's the easiest way to make build.sbt skip running sbt-cc and skip running runtime2-specific unit/TDML tests upon request?

README.md

stevedlawrence · 2021-04-12T16:51:00Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.c

+need_diagnostics(void)
+{
+    static Diagnostics validati;
+    return &validati;


I can't say I ever seen these usage of static variable before. Is this just a way to have a global variable without actually looking like a global variable?

Returning a pointer to a static variable defined inside a function is a good way to provide some storage without polluting the global namespace with a global variable's name, yes. It's also more flexible since we can change the function to use an array, a heap, etc., at a later time.

stevedlawrence · 2021-04-12T17:01:12Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+    ERR_XML_LEFT,
+    ERR_XML_MISMATCH,
+    ERR_XML_WRITE
+};


Runtime1 has different kinds of diagnostics, e.g. error, warning, validation error, recoverable error. Are these all just considered "errors" in runtime1? Are there any diagnostics that are fatal errors?

ERR_FIXED_VALUE is used only as a validation error. The rest are considered fatal errors which cause the program to stop immediately. When runtime2 grows larger, then we may need ERR_, WRN_, VLD_, and TRY_ to distinguish different kinds of diagnostics, but we haven't gotten that far yet.

stevedlawrence · 2021-04-12T17:04:49Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+
+// need_diagnostics - return pointer to validation diagnostics
+
+extern Diagnostics *need_diagnostics(void);


Why get diagnostics from the static variable, rather than just having the PState/UState have a Diagnostics instances? Then PState/UState can be passed around and this state can be mutated as diagnostics/errors are created. Avoids potentialy issues with global state/threading.

True, we could make the PState/UState have a Diagnostics instance itself. Then there would be no need for a get_diagnostics function. My thinking is that the majority of schemas I've seen don't use fixed values so I don't want to even call get_diagnostics and initialize the Diagnostics instance unless I have to. That thinking probably will change over time.

stevedlawrence · 2021-04-12T17:15:36Z

...me2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryAbstractCodeGenerator.scala

+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""


Would prefer we add this fixed logic to the dsom. For example, right now there is nothing pasing the fixed value and making sure it matches the type. For example

<xs:element name="someBool" type="xs:boolean" fixed="invalidBooleanValue" />

Ideally the dsom would convert the "fixed" value to match the primitive type, which the DSOM should be doing.

I've moved the fixed value logic to the DSOM and updated BinaryAbstractCodeGenerator.scala to use the DSOM getters.

stevedlawrence · 2021-04-12T17:20:01Z

...me2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryAbstractCodeGenerator.scala

+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 =
+          s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);


Wondering if there's a potentially security issue here? Looks like we just inject the fixed value from teh schema directly into this C code. Which means I think a schema could look something like this to inject some code?

<xs:element name="foo" ... fixed='0, "foo", pstate); malcious C code here; //' />

Yes, we had a security issue. Now that I'm using the DSOM to convert the XML attribute to the primitive data type and back to a string, a fixed='0, "foo", pstate); malicious C code here; //' will result in an error.

I think this only supports fixed values where the primitive is a numeric? Are string primitives supported yet, or do we need a check here that SDE's if hasFixedValue but primitive type is not a numeric?

Runtime2 doesn't support strings yet, and we do need to consider how to handle fixed values that are strings (we'd have to SDE or call strcmp instead of using value == fixed). I would expect that strings in binary data could have length fields as well as be terminated by a null byte.

In BUILD.md, copy build requirements and setup instructions here from README.md with some reordering and edits for better clarity. In README.md, restore Daffodil website full url, remove build setup instructions, and turn getting started instructions into bullet items for less vertical space and better readability. In build.sbt, allow environment variables CC and AR to override cCompiler and ccArchiveCommand's default values. In ElementBase.scala, ElementDeclMixin.scala, and ElementRef.scala, add hasFixedValue, fixedValueAsString, and fixedValue getters. In .clang-format, increase column limit to 110 (default had been 80). Reformat C files accordingly. In xml_reader.c, make xmlNumberElem define error_erd in block inside switch statement and make all cases call return. In xml_writer.c, use LIMIT_XML_NESTING constant from errors.h. In errors.c, add assert to default case in error_message, rename need_diagnostics to get_diagnostics, and make add_diagnostic check LIMIT_DIAGNOSTICS and return false if array is full. In errors.h, define limit constants inside enum Limits and use LIMIT_DIAGNOSTICS constant in Diagnostics. Rename member field of PState/UState from validati to diagnostics, rename need_diagnostics to get_diagnostics, and make add_diagnostic return bool. In infoset.c, use LIMIT_NAME_LENGTH constant from errors.h. In BinaryAbstractCodeGenerator.scala, use DSOM fixed getters instead of reading e.xml.attribute("fixed") directly to ensure fixed is a valid primitive data value before using it. In CodeGeneratorState.scala, make addImplementation count both parserStatements and unparserStatements, not only parserStatements.

tuxji

Have addressed reviewer requested changes. Will wait for rebase on master branch to determine whether Windows checks really still fail.

tuxji · 2021-04-13T13:48:28Z

README.md

+data processing frameworks so as to bypass any XML/JSON overheads.

-For more information about Daffodil, see https://daffodil.apache.org/.
+For more information about Daffodil, see the [Website].


Sure, I put the full URL back with brackets around it to avoid markdownlint's bare url warning (that was why I changed it before).

tuxji · 2021-04-13T14:21:36Z

README.md

+* C compiler C99 or higher
+* Mini-XML Version 3.2 or higher
+
+Since Daffodil has a DFDL to C backend, you will need a C compiler


The C backend is a code generator written in 100% Scala. The most important reason why we call the C compiler during the sbt build is to alert developers as soon as possible if any change anywhere causes a runtime2 problem. We call the C compiler in multiple places:

Explicitly in build.sbt (building daffodil-runtime2 always calls the sbt-cc plugin on all C source files)

Implicitly in sbt test (running daffodil-runtime2's unit/TDML tests and daffodil-test's runtime2-specific TDML tests also tests Daffodil's ability to call the C compiler)

We probably could find a way (with a custom sbt plugin or some conditional code in build.sbt) to still build daffodil-runtime2 so that the CLI's "generate c" command still works but disable running both 1) the sbt-cc plugin and 2) the daffodil-runtime2 unit/TDML tests during a "special" sbt build. I'd prefer to put the onus on people to run a "special" sbt build that disables any C compilation so that we still alert developers to problems as early as possible (the longer problems hide, the more time it takes to fix them once they surface).

You're more familiar with SBT's DSL scripting than I am; what's the easiest way to make build.sbt skip running sbt-cc and skip running runtime2-specific unit/TDML tests upon request?

tuxji · 2021-04-13T16:59:47Z

README.md

+and Ubuntu):
+
+    # Just mentioning all other packages you might need too
+    sudo apt install build-essential curl git libmxml-dev


I like the idea of putting the instructions in a separate markdown file better. I've moved everything I added to README.md into BUILD.md except for the four build requirements and "See BUILD.md for more details" below them. I still improved some parts of README.md anyway :).

tuxji · 2021-04-13T17:18:27Z

README.md

+and more.  [SDK] offers an easy and uniform way to install both java
+and sbt on any Unix based platform:
+
+    curl -s "https://get.sdkman.io" | bash


You're right, I actually read the script carefully before I ran it and I shouldn't suggest to anyone it's okay to download a script from the Internet and pipe it to bash. If you already have sdk installed, though, then showing people how to install the JDK and SBT become very simple and uniform one-liners that work on any UNIX platform (sigh, wish sdk was everywhere). I'll just recommend people install JDK and SBT directly from their websites, though.

tuxji · 2021-04-13T18:17:06Z

README.md

+    sdk install sbt
+
+You can edit the Compile / cCompiler setting in build.sbt if you don't
+want sbt to call your C compiler with "cc" as the driver command.


Unfortunately, only Daffodil has logic for checking an environment variable and trying a list of known driver commands to find a C compiler. However, your set Compiler / cCompiler := "gcc" line makes me realize we could embed at least the environment variable logic into build.sbt. I've made a one-line change to build.sbt that sets cCompiler to either "cc" or the value of the environment variable "CC".

Also note that setting CC = "true" could be one way people could avoid calling the C compiler although some unit/TDML tests still would break.

tuxji · 2021-04-14T03:20:19Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+    ERR_XML_LEFT,
+    ERR_XML_MISMATCH,
+    ERR_XML_WRITE
+};


ERR_FIXED_VALUE is used only as a validation error. The rest are considered fatal errors which cause the program to stop immediately. When runtime2 grows larger, then we may need ERR_, WRN_, VLD_, and TRY_ to distinguish different kinds of diagnostics, but we haven't gotten that far yet.

tuxji · 2021-04-14T03:43:21Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+    const Error *error;    // any error which stops program
+} UState;
+
+// need_diagnostics - return pointer to validation diagnostics


No, your suggestion is better. Renamed to get_diagnostics.

tuxji · 2021-04-14T03:52:05Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+
+// need_diagnostics - return pointer to validation diagnostics
+
+extern Diagnostics *need_diagnostics(void);


True, we could make the PState/UState have a Diagnostics instance itself. Then there would be no need for a get_diagnostics function. My thinking is that the majority of schemas I've seen don't use fixed values so I don't want to even call get_diagnostics and initialize the Diagnostics instance unless I have to. That thinking probably will change over time.

tuxji · 2021-04-14T14:30:35Z

...me2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryAbstractCodeGenerator.scala

+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""


I've moved the fixed value logic to the DSOM and updated BinaryAbstractCodeGenerator.scala to use the DSOM getters.

tuxji · 2021-04-14T14:36:34Z

...me2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryAbstractCodeGenerator.scala

+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 =
+          s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);


Yes, we had a security issue. Now that I'm using the DSOM to convert the XML attribute to the primitive data type and back to a string, a fixed='0, "foo", pstate); malicious C code here; //' will result in an error.

stevedlawrence

Some additional minor questions/comments on the updates.

stevedlawrence · 2021-04-14T17:52:05Z

BUILD.md

+then run the following commands in a "MSYS2 MSYS" window:
+
+    pacman -S gcc git libargp-devel make pkgconf
+    git clone https://github.com/michaelrsweet/mxml.git


Should we include a specific tag here, e.g.

git clone -t v3.2 https://github.com/michaelrsweet/mxml.git

I don't really like the idea of reccommending people use the master branch of a repo we don't control.

Yes, I will put -t v3.2 in the git clone command both here and in the CI workflow when I rebase the runtime2-2204 branch.

stevedlawrence · 2021-04-14T17:57:04Z

BUILD.md

+operating system's package manager (for example, `apt` on Debian or
+Ubuntu):
+
+    sudo apt install build-essential curl git libmxml-dev


Is libxml-dev MiniXML in ubuntu? On fedora libxml is something completely different. This is why I would prefer that we have step by step build instructions for common operating systems to avoid this kind of confusion. Build instructions like "you might need this because your OS my not provide it can be confusing for new users. By giving specific instructions, it makes it easier to figure out exactly what steps to follow, and if you'r on an varient, likely commands that need to be run.

It's also a bit confusing because Windows and no windows things are interleaved. If I'm not on windows, I want to just ignored anything windows related to quickly get set up. For example, I'm on Fedora, there are git clone commands to clone and install mxml. I don't know if I need to do that or not. Turns out, Fedora does have mxml via sudo dnf instal mxml-devel but that isn't anywhere in the instructions. And that version is only 3.1, so I assume it won't work for this, and I still need to install via source?

Mini-XML is mxml-dev in Fedora and libmxml-dev in Ubuntu; libxml is something completely different on both Fedora and Ubuntu. If Fedora's mxml-dev is only 3.1, the API is missing a function which we expect to call so we still need to install Mini-XML from source there too. I'll separate the build instructions for Fedora, Ubuntu, and Windows and make them more clear when I rebase the runtime2-2204 branch.

stevedlawrence · 2021-04-14T18:00:17Z

README.md

+some of the more commonly used commands for Daffodil development.

-### Compile
+* Compile source code:


I personally perfer the ### headings over the bullets. It makes it easier to skim over and find the command you might need, rather than having the full sentences.

OK, fair enough. I'll put the ### headings back.

stevedlawrence · 2021-04-14T18:01:25Z

build.sbt

-                                Compile / cCompiler := "cc",
-                                Compile / ccArchiveCommand := "ar",
+                                Compile / cCompiler := sys.env.getOrElse("CC", "cc"),
+                                Compile / ccArchiveCommand := sys.env.getOrElse("AR", "ar"),


Is it worth mentiong the AR enironment variable in the BUILD setup readme?

Yes, if someone says CC=true, they probably need to say AR=true too. I'll mention AR too. Can you think of a clean way to skip the tests too?

Not sure if there's a simple way to do it. If all that's really needed is to install a compiler and mini xml, I dont't think it's worth it. I think maybe if we get more runtimes or more devs focusing on different runtimes, it might be worth figuring out how to make things more pluggable so people can really only focus on runtimes they care about (as long as CI continues to test all runtimes). But for now, I dont' think it's a big deal. And it will also force people to test it, which is probably important until we get more people working on runtime2.

stevedlawrence · 2021-04-14T18:03:38Z

daffodil-core/src/main/scala/org/apache/daffodil/dsom/ElementBase.scala

  }

+  /**
+   * Is either None or Some(primTypeValue).


This comment isn't accurent. This is always a pimitive, but might be the special NoValue.

I had noticed that the comment seemed inaccurate when I copied it too. Yes, I'll update both places.

stevedlawrence · 2021-04-14T18:07:45Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+enum Limits
+{
+    LIMIT_DIAGNOSTICS = 100,  // limits how many diagnostics can accumulate
+    LIMIT_NAME_LENGTH = 9999, // limits how long infoset names can become


This limit seems pretty huge. I would think we could make this smaller and reduce memory usage?

If you look at infoset.c which uses LIMIT_NAME_LENGTH, it's just a couple of functions with static char[9999] buffers and their storage are used only temporarily by the callers long enough to hand off to the Mini-XML library (which copies the actual, usually less than 20 characters long, strings immediately) or compare against strings from the Mini-XML library using strcmp. Setting the limit to 9999 characters only uses static storage in a couple of places and reduces the chance that an infoset namespace or name might not fit in the static buffer.

stevedlawrence · 2021-04-14T18:10:31Z

...me2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryAbstractCodeGenerator.scala

+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 =
+          s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);


I think this only supports fixed values where the primitive is a numeric? Are string primitives supported yet, or do we need a check here that SDE's if hasFixedValue but primitive type is not a numeric?

tuxji

Thanks, I've updated my todos accordingly.

tuxji · 2021-04-14T19:02:44Z

BUILD.md

+operating system's package manager (for example, `apt` on Debian or
+Ubuntu):
+
+    sudo apt install build-essential curl git libmxml-dev


Mini-XML is mxml-dev in Fedora and libmxml-dev in Ubuntu; libxml is something completely different on both Fedora and Ubuntu. If Fedora's mxml-dev is only 3.1, the API is missing a function which we expect to call so we still need to install Mini-XML from source there too. I'll separate the build instructions for Fedora, Ubuntu, and Windows and make them more clear when I rebase the runtime2-2204 branch.

tuxji · 2021-04-14T19:14:29Z

BUILD.md

+then run the following commands in a "MSYS2 MSYS" window:
+
+    pacman -S gcc git libargp-devel make pkgconf
+    git clone https://github.com/michaelrsweet/mxml.git


Yes, I will put -t v3.2 in the git clone command both here and in the CI workflow when I rebase the runtime2-2204 branch.

tuxji · 2021-04-14T19:17:00Z

README.md

+some of the more commonly used commands for Daffodil development.

-### Compile
+* Compile source code:


OK, fair enough. I'll put the ### headings back.

tuxji · 2021-04-14T19:20:07Z

build.sbt

-                                Compile / cCompiler := "cc",
-                                Compile / ccArchiveCommand := "ar",
+                                Compile / cCompiler := sys.env.getOrElse("CC", "cc"),
+                                Compile / ccArchiveCommand := sys.env.getOrElse("AR", "ar"),


Yes, if someone says CC=true, they probably need to say AR=true too. I'll mention AR too. Can you think of a clean way to skip the tests too?

tuxji · 2021-04-14T19:23:38Z

daffodil-core/src/main/scala/org/apache/daffodil/dsom/ElementBase.scala

  }

+  /**
+   * Is either None or Some(primTypeValue).


I had noticed that the comment seemed inaccurate when I copied it too. Yes, I'll update both places.

tuxji · 2021-04-14T19:41:34Z

daffodil-runtime2/src/main/resources/c/libruntime/errors.h

+enum Limits
+{
+    LIMIT_DIAGNOSTICS = 100,  // limits how many diagnostics can accumulate
+    LIMIT_NAME_LENGTH = 9999, // limits how long infoset names can become


If you look at infoset.c which uses LIMIT_NAME_LENGTH, it's just a couple of functions with static char[9999] buffers and their storage are used only temporarily by the callers long enough to hand off to the Mini-XML library (which copies the actual, usually less than 20 characters long, strings immediately) or compare against strings from the Mini-XML library using strcmp. Setting the limit to 9999 characters only uses static storage in a couple of places and reduces the chance that an infoset namespace or name might not fit in the static buffer.

tuxji · 2021-04-14T19:57:17Z

...me2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryAbstractCodeGenerator.scala

+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 =
+          s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);


Runtime2 doesn't support strings yet, and we do need to consider how to handle fixed values that are strings (we'd have to SDE or call strcmp instead of using value == fixed). I would expect that strings in binary data could have length fields as well as be terminated by a null byte.

mbeckerle approved these changes Apr 9, 2021

View reviewed changes

daffodil-runtime2/src/main/resources/c/libruntime/errors.h Show resolved Hide resolved

daffodil-runtime2/src/main/resources/c/libruntime/errors.h Outdated Show resolved Hide resolved

tuxji commented Apr 9, 2021

View reviewed changes

daffodil-runtime2/src/main/resources/c/libruntime/errors.h Show resolved Hide resolved

daffodil-runtime2/src/main/resources/c/libruntime/parsers.c Outdated Show resolved Hide resolved

daffodil-runtime2/src/main/resources/c/libruntime/errors.h Outdated Show resolved Hide resolved

Refactor into add_diagnostic

8b4e44e

In errors.[ch], define new function add_diagnostic to add an error to a Diagnostics struct. In parsers.c and unparsers.c, make (un)parse_validate_fixed call add_diagnostic instead of adding error directly to Diagnostics struct.

mbeckerle approved these changes Apr 12, 2021

View reviewed changes

stevedlawrence approved these changes Apr 12, 2021

View reviewed changes

tuxji commented Apr 14, 2021

View reviewed changes

Fix ratcheck

d895641

tuxji merged commit e73310c into apache:runtime2-2202 Apr 14, 2021

tuxji deleted the ji/runtime2-2202/make-improvements branch April 14, 2021 17:59

stevedlawrence reviewed Apr 14, 2021

View reviewed changes

tuxji commented Apr 14, 2021

View reviewed changes


		// need_diagnostics - return pointer to validation diagnostics

		extern Diagnostics *need_diagnostics(void);

		val fixed = e.xml.attribute("fixed")
		val fixedValue = if (fixed.isDefined) fixed.get.text else ""

Conversation

tuxji commented Apr 7, 2021

Uh oh!

mbeckerle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tuxji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tuxji commented Apr 10, 2021

Uh oh!

mbeckerle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevedlawrence left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Build Setup

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tuxji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment