DRILL-8474: Add Daffodil Format Plugin #2836

mbeckerle · 2023-10-14T03:11:24Z

Adding Daffodil to Drill as a 'contrib'

Requires Daffodil 3.7.0-SNAPSHOT which has metadata support we're using.

New format-daffodil module created

Still uses absolute paths for the schemaFileURI. (which is cheating. Wouldn't work in a true distributed drill environment.)

We have yet to work out how to enable Drill to provide access for DFDL schemas in XML form with include/import to be resolved.

The input data stream is, however, being accessed in the proper Drill manner. Gunzip happened automatically. Nice.

Note: Fix boxed Boolean vs. boolean problem. Don't use boxed primitives in Format config objects.

Tests show Daffodil works for data as complex as having nested repeating sub-records.

These DFDL types are supported:

int
long
short
byte
boolean
double
float (does not work. Bug DAFFODIL-2367)
hexBinary
string

#2835

cgivre

@mbeckerle
I mistakenly pushed some code cleanup I did directly to your branch. I apologize for that. In any event, I added some comments to the BatchReader and FormatPlugin which I think will help you get unblocked.

cgivre · 2023-10-19T02:17:59Z

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java

+    dafParser.setInfosetOutputter(outputter);
+    // Lastly, we open the data stream
+    try {
+      dataInputStream = dataInputURI.toURL().openStream();


Ok, I'm not sure why we need to do this. Drill can get you an input stream of the input file.
All you need to do is:

dataInputStream = negotiator.file().fileSystem().openPossiblyCompressedStream(negotiator.file().split().getPath());

For the data files this works.

For schemas, this will not be a solution even temporarily. Daffodil loads schemas from the classpath. Large schemas are complex objects, akin to a software system with dependencies expressed via XML Schema include/import statements with schemaLocation attributes that contain relative URLs or "absolute" URLs where absolute means relative to some root of some jar file on the classpath.

Even simple DFDL schemas are routinely spread over a couple jars.

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java

...b/format-daffodil/src/test/java/org/apache/drill/exec/store/daffodil/TestDaffodilReader.java

distribution/src/assemble/component.xml

contrib/pom.xml

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java

...ffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilDrillInfosetOutputter.java

cgivre · 2023-10-29T04:01:43Z

@mbeckerle Looks like you're making good progress!

...il/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java

...b/format-daffodil/src/test/java/org/apache/drill/exec/store/daffodil/TestDaffodilReader.java

...ib/format-daffodil/src/test/java/org/apache/drill/exec/store/daffodil/TestDaffodilUtils.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatCreator.java

contrib/format-daffodil/.gitignore

...ffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilDrillInfosetOutputter.java

...-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/framework/SchemaNegotiator.java

mbeckerle · 2023-12-22T00:43:26Z

This is pretty much working now, in terms of constructing drill metadata from DFDL schemas, and
Daffodil delivering data to Drill.

There were dozens of commits to get here, so I squashed them as they were no longer helpful.

Obviously more test are needed, but the ones there show nested subrecords working.

The issues like how schemas get distributed, and how Daffodil gets invoked in parallel by drill are still open.

3.7.0-SNAPSHOT of Daffodil which has metadata support we're using. New format-daffodil module created Still uses absolute paths for the schemaFileURI. (which is cheating. Wouldn't work in a true distributed drill environment.) We have yet to work out how to enable Drill to provide access for DFDL schemas in XML form with include/import to be resolved. The input data stream is, however, being accessed in the proper Drill manner. Gunzip happened automatically. Nice. Note: Fix boxed Boolean vs. boolean problem. Don't use boxed primitives in Format config objects. Test show this works for data as complex as having nested repeating sub-records. These DFDL types are supported: - int - long - short - byte - boolean - double - float (does not work. Bug DAFFODIL-2367) - hexBinary - string apache#2835

mbeckerle · 2023-12-22T01:02:19Z

Rebased onto latest Drill master as of 2023-12-21 (force pushed one more time)

Note that this is never going to pass automated tests until the Daffodil release this depends on is official (currently it needs a locally build Daffodil 3.7.0-snapshot, though the main daffodil branch has the changes integrated so any 3.7.0-snapshot build will work.

cgivre

Hi Mike,
This is looking good. I have some minor comments, mostly formatting. It seems like the next step would be to figure out where and how we store the DFDL files.

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java

...b/format-daffodil/src/test/java/org/apache/drill/exec/store/daffodil/TestDaffodilReader.java

...ffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilDrillInfosetOutputter.java

...ormat-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilMessageParser.java

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java

mbeckerle · 2024-01-02T15:45:06Z

...ffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilDrillInfosetOutputter.java

+    extends InfosetOutputter {
+
+  private boolean isOriginalRoot() {
+    boolean result = currentTupleWriter() == rowSetWriter;


Is the Drill coding style defined in a wiki or other doc page somewhere? I didn't find one.

If this is just java-standard, then I need reminding, as I have not coded Java prior to this effort for 12+ years now.

...ffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilDrillInfosetOutputter.java

mbeckerle · 2024-01-02T16:00:39Z

@cgivre yes, the next architectural-level issue is how to get a compiled DFDL schema out to everyplace Drill will run a Daffodil parse. Every one of those JVMs needs to reload it.

I'll do the various cleanups and such. The one issue I don't know how to fix is the "typed setter" vs. (set-object) issue, so if you could steer me in the right direction on that it would help.

paul-rogers · 2024-01-03T04:48:41Z

Hi Mike, Just jumping in with a random thought. Drill has accumulated a number of schema systems: Parquet metadata cache, HMS, Drill's own metastore, "provided schema", and now DFDL. All provide ways of defining data: be it Parquet, JSON, CSV or whatever. One can't help but wonder, should some future version try to reduce this variation somewhat? Maybe map all the variations to DFDL? Map DFDL to Drill's own mechanisms? Drill uses two kinds of metadata: schema definitions and file metadata used for scan pruning. Schema information could be used at plan time (to provide column types), but certainly at scan time (to "discover" the defined schema.) File metadata is used primarily at plan time to work out how to distribute work. A bit of background on scan pruning. Back in the day, it was common to have thousands or millions of files in Hadoop to scan: this was why tools like Drill were distributed: divide and conquer. And, of course, the fastest scan is to skip files that we know can't contain the information we want. File metadata captures this information outside of the files themselves. HMS was the standard solution in the Hadoop days. (Amazon Glue, for S3, is evidently based on HMS.) For example, Drill's Parquet metadata cache, the Drill metastore and HMS all provide both schema and file metadata information. The schema information mainly helped with schema evolution: over time, different files have different sets of columns. File metadata provides information *about* the file, such as the data ranges stored in each file. For Parquet, we might track that '2023-01-Boston.parquet' has data from the office='Boston' range. (So, no use scanning the file for office='Austin'.) And so on. With Hadoop HFS, it was customary to use directory structure as a partial primary index: our file above would live in the /sales/2023/01 directory, for example, and logic chooses the proper set of directories to scan. In Drill, it is up to the user to add crufty conditionals on the path name. In Impala, and other HMS-aware tools, the user just says WHERE order_year = 2023 AND order_month = 1, and HMS tells the tool that the order_year and order_month columns translate to such-and-so directory paths. Would be nice if Drill could provide that feature as well, given the proper file metadata: in this case, the mapping of column names to path directories and file names. Does DFDL provide only schema information? Does it support versioning so that we know that "old.csv" lacks the "version" column, while "new.csv" includes that column? Does it also include the kinds of file metadata mentioned above? Or, perhaps DFDL is used in a different context in which the files have a fixed schema and are small in number? This would fit well the "desktop analytics" model that Charles and James suggested is where Drill is now most commonly used. The answers might suggest if DFDL can be the universal data description. or if DFDL applies just to individual file schemas, and Drill would still need a second system to track schema evolution and file metadata for large deployments. Further, if DFDL is kind of a stand-alone thing, with its own reader, then we end up with more complexity: the Drill JSON reader and the DFDL JSON reader. Same for CSV, etc. JSON is so complex that we'd find ourselves telling people that the quirks work one way with the native reader, another way with DFDL. Plus, the DFDL readers might not handle file splits the same way, or support the same set of formats that Drill's other readers support, and so on. It would be nice to separate the idea of schema description from reader implementation, so that DFDL can be used as a source of schema for any arbitrary reader: both at plan and scan times. If DFDL uses its own readers, then we'd need DFDL reader representations in Calcite, which would pick up DFDL schemas so that the schemas are reliably serialized out to each node as part of the physical plan. This is possible, but it does send us down the two-readers-for-every-format path. On the other hand, if DFDL mapped to Drill's existing schema description, then DFDL could be used with our existing readers and there would be just one schema description sent to readers: Drill's existing provided schema format that EVF can already consume. At present, just a few formats support provided schema in the Calcite layer: CSV for sure, maybe JSON? Any thoughts on where this kind of thing might evolve with DFDL in the picture? Thanks, - Paul

…

On Tue, Jan 2, 2024 at 8:00 AM Mike Beckerle ***@***.***> wrote: @cgivre <https://github.com/cgivre> yes, the next architectural-level issue is how to get a compiled DFDL schema out to everyplace Drill will run a Daffodil parse. Every one of those JVMs needs to reload it. I'll do the various cleanups and such. The one issue I don't know how to fix is the "typed setter" vs. (set-object) issue, so if you could steer me in the right direction on that it would help. — Reply to this email directly, view it on GitHub <#2836 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYZF4MFVRCUYDCKJYSKKYTYMQVLFAVCNFSM6AAAAAA576F7J2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGIYTGNZYGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Date, Time, DateTime, Boolean, Unsigned integers, Integer, NonNegativeInteger,Decimal, float, double, hexBinary.

mbeckerle · 2024-01-04T21:56:37Z

Let me respond between the paragraphs.... On Tue, Jan 2, 2024 at 11:49 PM Paul Rogers ***@***.***> wrote:

Hi Mike, Just jumping in with a random thought. Drill has accumulated a number of schema systems: Parquet metadata cache, HMS, Drill's own metastore, "provided schema", and now DFDL. All provide ways of defining data: be it Parquet, JSON, CSV or whatever. One can't help but wonder, should some future version try to reduce this variation somewhat? Maybe map all the variations to DFDL? Map DFDL to Drill's own mechanisms? Well we can dream can't we :-)

I can contribute the ideas in https://daffodil.apache.org/dev/design-notes/Proposed-DFDL-Standard-Profile.md which is an effort to restrict the DFDL language so that schemas written in DFDL can work more smoothly with Drill, NiFi, Spark, Flink, Beam, etc. etc. DFDL's data model is too restrictive to be "the model" for Drill since Drill wants to query even unstructured data like XML without schema. DFDL's data model is targeted only at structured data. Drill's data model and APIs seem optimized for streaming block-buffered top-level rows of data (the EVF API does anyway). Top level row-sets are first-class citizens, as are the fields of said rows. Fields containing arrays of maps (possibly containing more arrays of maps, and so on deeply nested) are not handled uniformly with the same block-buffered "row-like" mechanisms. The APIs are similar, but not polymorphic. I suspect that the block-buffered data streaming in Drill only happens for top-level rows, because there is no test for whether or not you are allowed to create another array item like there is a test for creating another row in a row-set writer. There is no control inversion where an adapter must give back control to Drill in the middle of trying to write an array. The current Drill/Daffodil interface I've created doesn't cope with header-body* files (ex: PCAP which format has a header record, then repeating packet records) as it has no way of returning just the body records as top level rows. So while there exists a DFDL schema for PCAP, you really do want to use a dedicated PCAP Drill adapter which hands back rows, not Daffodil which will parse the entire PCAP file into one huge row containing a monster sub-array of packets, where each packet is a map within the array of maps. This is ok for now as many files where DFDL is used are not like PCAP. They are just repeating records of one format with no special whole-file header. Eventually we will want to be able to supply a path to tell the Drill/Daffodil interface that you only want the packet array as the output rows. (This is the unimplemented Daffodil "onPath(...)" API feature. We haven't needed this yet for DFDL work in cybersecurity, but it was anticipated 10+ years back as essential for data integration.)

Drill uses two kinds of metadata: schema definitions and file metadata used for scan pruning. Schema information could be used at plan time (to provide column types), but certainly at scan time (to "discover" the defined schema.) File metadata is used primarily at plan time to work out how to distribute work.

DFDL has zero notion of file metadata. It doesn't know whether data even comes from a file or an open TCP socket. Daffodil/DFDL just sees a java.io.InputStream. The schema it uses for a given file is specified by the API call. Daffodil does nothing itself to try to find or identify any schema. So we're "blank slate" on this issue with DFDL.

A bit of background on scan pruning. Back in the day, it was common to have thousands or millions of files in Hadoop to scan: this was why tools like Drill were distributed: divide and conquer. And, of course, the fastest scan is to skip files that we know can't contain the information we want. File metadata captures this information outside of the files themselves. HMS was the standard solution in the Hadoop days. (Amazon Glue, for S3, is evidently based on HMS.) For example, Drill's Parquet metadata cache, the Drill metastore and HMS all provide both schema and file metadata information. The schema information mainly helped with schema evolution: over time, different files have different sets of columns. File metadata provides information *about* the file, such as the data ranges stored in each file. For Parquet, we might track that '2023-01-Boston.parquet' has data from the office='Boston' range. (So, no use scanning the file for office='Austin'.) And so on. With Hadoop HFS, it was customary to use directory structure as a partial primary index: our file above would live in the /sales/2023/01 directory, for example, and logic chooses the proper set of directories to scan. In Drill, it is up to the user to add crufty conditionals on the path name. In Impala, and other HMS-aware tools, the user just says WHERE order_year = 2023 AND order_month = 1, and HMS tells the tool that the order_year and order_month columns translate to such-and-so directory paths. Would be nice if Drill could provide that feature as well, given the proper file metadata: in this case, the mapping of column names to path directories and file names.

The above all makes perfect sense to me, and DFDL schemas are completely orthogonal to this. If a file naming convention tells *Drill* that it doesn't need to open and parse some data using Daffodil, great, then *Drill* will not invoke Daffodil to do so. DFDL/Daffodil doesn't know nor care about this.

Does DFDL provide only schema information? Does it support versioning so that we know that "old.csv" lacks the "version" column, while "new.csv" includes that column? Does it also include the kinds of file metadata mentioned above?

DFDL only provides structural schema information. Data formats do versioning in a wide variety of ways, so DFDL can't take any position on how this is done, but many DFDL schemas adapt to multiple versions of the data formats they describe based on the existence of different fields or values of those fields. This can only work for formats where there are data fields that identify the versions. But nothing based on file metadata.

Or, perhaps DFDL is used in a different context in which the files have a fixed schema and are small in number? This would fit well the "desktop analytics" model that Charles and James suggested is where Drill is now most commonly used.

The cybersecurity use case is one of the prime motivators for DFDL work. Often the cyber gateways are file movers, files arrive spontaneously in various locations, and are moved across the cyber boundary. The use cases continue to grow in scale, and some people use Apache NiFi with DFDL for large scale such file moving. Unlike Drill, these use cases all parse and then re-serialize the data after extensive validation and rule-based filtering. The same sort of file-metadata based stuff - ex: rules like all the files in this directory named X with extension ".dat" use schema S - all applies in the cyber-gateway use case. Apache Daffodil doesn't know anything about this cyber use case however, nor anything about data integration. Daffodil is actually a quite narrow library. Stays in its lane.

The answers might suggest if DFDL can be the universal data description. or if DFDL applies just to individual file schemas, and Drill would still need a second system to track schema evolution and file metadata for large deployments.

Yeah. Drill needs a separate system for this. Not at all a DFDL-specific issue. DFDL/Daffodil take no position on schema evolution. However, to Daffodil devs, a DFDL schema is basically source code. We keep them in git. They have releases. We package them in jars and use managed dependency tools to grab them from repositories the same way java code jars are grabbed by maven. One of my concerns about metadata repositories/registries is that they are not thought of as configuration management systems. But DFDL schemas are certainly large formal objects that require configuration management. For example, the VMF schema we have is over 180K lines of DFDL "code", spread over hundreds of files. It is actually an assembly composed of specific versions of 4 different smaller DFDL schemas and the large corpus of VMF-specific schema files. There is documentation, analysis reports, etc. that go along with it. So some sort of repository that makes specific schemas available to Drill makes sense, but cannot be confused with the configuration management system. I quite literally just got a Maven Central/Sonotype account yesterday so that I can push some DFDL schemas up to maven central so they can be reused from there via jars.

Further, if DFDL is kind of a stand-alone thing, with its own reader, then we end up with more complexity: the Drill JSON reader and the DFDL JSON reader. Same for CSV, etc. JSON is so complex that we'd find ourselves telling people that the quirks work one way with the native reader, another way with DFDL. Plus, the DFDL readers might not handle file splits the same way,

Daffodil knows no concept of "file splits". It doesn't even know about files actually. It's just an input byte stream. literally a java.io.InputStream.

or support the same set of formats that Drill's other readers support, and so on. It would be nice to separate the idea of schema description from reader implementation, so that DFDL can be used as a source of schema for any arbitrary reader: both at plan and scan times.

The DFDL/Drill integration converts DFDL-described data directly to Drill with no intermediate form like XML nor JSON. One hop. E.g., drillScalaWriter.setInt(daffodilInfosetElement.getInt()); There is no notion of Daffodil "also" reading JSON. You wouldn't parse JSON with DFDL typically. You would use a JSON library and hopefully a JSON schema that describes the JSON. Ditto for XML, Google protocol buffers, Avro, etc.

If DFDL uses its own readers, then we'd need DFDL reader representations in

DFDL is a specific reader, this notion of "its own readers" doesn't apply.

Calcite, which would pick up DFDL schemas so that the schemas are reliably serialized out to each node as part of the physical plan. This is possible, but it does send us down the two-readers-for-every-format path.

On the other hand, if DFDL mapped to Drill's existing schema description,

then DFDL could be used with our existing readers

I don't get "DFDL used with existing readers".... by "with" you mean "along-side" or "incorporating"?

and there would be just one schema description sent to readers: Drill's existing provided schema format that EVF can already consume. At present, just a few formats support provided schema in the Calcite layer: CSV for sure, maybe JSON?

This is what we need. The Daffodil/Drill integration walks DFDL metadata and creates Drill metadata 100% in advance and this should, I think, automatically find its way to all the right places without anything else being needed beyond today's Drill behavior. But besides Drill's metadata the Daffodil execution at each node needs to load up the compiled DFDL schema. That object, which can be several megabytes of stuff. Needs to find its way out to all the nodes that need it. This I have no idea how we make happen.

…

Any thoughts on where this kind of thing might evolve with DFDL in the picture? Thanks, - Paul On Tue, Jan 2, 2024 at 8:00 AM Mike Beckerle ***@***.***> wrote: > @cgivre <https://github.com/cgivre> yes, the next architectural-level > issue is how to get a compiled DFDL schema out to everyplace Drill will run > a Daffodil parse. Every one of those JVMs needs to reload it. > > I'll do the various cleanups and such. The one issue I don't know how to > fix is the "typed setter" vs. (set-object) issue, so if you could steer me > in the right direction on that it would help. > > — > Reply to this email directly, view it on GitHub > <#2836 (comment)>, or > unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAYZF4MFVRCUYDCKJYSKKYTYMQVLFAVCNFSM6AAAAAA576F7J2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGIYTGNZYGA> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#2836 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALUDA4H366DXIG2RATIV4TYMTPLHAVCNFSM6AAAAAA576F7J2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUHA2DKMRXGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

I imported the dev-support/formatter/eclipse settings and used them to reformat the code in IntelliJ IDEA. No functional changes in this commit.

Also a few code cleanups.

mbeckerle · 2024-01-04T23:48:28Z

This is ready for a next review. All the scalar types are now implemented with typed setter calls.

The prior review comments have all been addressed I believe.

Remaining things to do include:

How to get the compiled DFDL schema object so it can be loaded by daffodil out at the distributed Drill nodes.
Test of nilled values (and more tests generally to show deeply nested and repeating nested objects work.)
Errors - revisit every place errors are detected or thrown to make sure these are being done the right way for DFDL schema compilation and runtime errors as well.

cgivre · 2024-01-05T00:48:48Z

@mbeckerle I had a thought about your TODO list. See inline.

This is ready for a next review. All the scalar types are now implemented with typed setter calls.

The prior review comments have all been addressed I believe.

Remaining things to do include:

How to get the compiled DFDL schema object so it can be loaded by daffodil out at the distributed Drill nodes.

I was thinking about this and I remembered something that might be useful. Drill has support for User Defined Functions (UDF) which are written in Java. To add a UDF to Drill, you also have to write some Java classes in a particular way, and include the JARs. Much like the DFDL class files, the UDF JARs must be accessible to all nodes of a Drill cluster.

Additionally, Drill has the capability of adding UDFs dynamically. This feature was added here: #574. Anyway, I wonder if we could use a similar mechanism to load and store the DFDL files so that they are accessible to all Drill nodes. What do you think?

Test of nilled values (and more tests generally to show deeply nested and repeating nested objects work.)

Errors - revisit every place errors are detected or thrown to make sure these are being done the right way for DFDL schema compilation and runtime errors as well.

mbeckerle · 2024-01-05T16:01:56Z

@mbeckerle I had a thought about your TODO list. See inline.

This is ready for a next review. All the scalar types are now implemented with typed setter calls.
The prior review comments have all been addressed I believe.
Remaining things to do include:

How to get the compiled DFDL schema object so it can be loaded by daffodil out at the distributed Drill nodes.

I was thinking about this and I remembered something that might be useful. Drill has support for User Defined Functions (UDF) which are written in Java. To add a UDF to Drill, you also have to write some Java classes in a particular way, and include the JARs. Much like the DFDL class files, the UDF JARs must be accessible to all nodes of a Drill cluster.

Additionally, Drill has the capability of adding UDFs dynamically. This feature was added here: #574. Anyway, I wonder if we could use a similar mechanism to load and store the DFDL files so that they are accessible to all Drill nodes. What do you think?

Excellent: So drill has all the machinery, it's just a question of repackaging it so it's available for this usage pattern, which is a bit different from Drill's UDFs, but also very similar.

There are two user scenarios which we can call production and test.

Production: binary compiled DFDL schema file + code jars for Daffodil's own UDFs and "layers" plugins. This should, ideally, cache the compiled schema and not reload it for every query (at every node), but keep the same loaded instance in memory in a persistant JVM image on each node. For large production DFDL schemas this is the only sensible mechanism as it can take minutes to compile large DFDL schemas.
Test: on-the-fly centralized compilation of DFDL schema (from a combination of jars and files) to create and cache (to avoid recompiling) the binary compiled DFDL schema file. Then using that compiled binary file, as item 1. For small DFDL schemas this can be fast enough for production use. Ideally, if the DFDL schema is unchanged this would reuse the compiled binary file, but that's an optimization that may not matter much.

Kinds of objects involved are:

Daffodil plugin code jars
DFDL schema jars
DFDL schema files (just not packaged into a jar)
Daffodil compiled schema binary file
Daffodil config file - parameters, tunables, and options needed at compile time and/or runtime

Code jars: Daffodil provides two extension features for DFDL users - DFDL UDFs and DFDL 'layers' (ex: plug-ins for uudecode, or gunzip algorithms used in part of the data format). Those are ordinary compiled class files in jars, so in all scenarios those jars are needed on the node class path if the DFDL schema uses them. Daffodil dynamically finds and loads these from the classpath in regular Java Service-Provider Interface (SPI) mechanisms.

Schema jars: Daffodil packages DFDL schema files (source files i.e., mySchema.dfdl.xsd) into jar files to allow inter-schema dependencies to be managed using ordinary jar/java-style managed dependencies. Tools like sbt and maven can express the dependencies of one schema on another, grab and pull them together, etc. Daffodil has a resolver so when one schema file referenes another with include/import it searches the class path directories and jars for the files.

Schema jars are only needed centrally when compiling the schema to a binary file. All references to the jar files for inter-schema file references are compiled into the compiled binary file.

It is possible for one DFDL schema 'project' to define a DFDL schema, along with the code for a plugin like a Daffodil UDF or layer. In that case the one jar created is both a code jar and a schema jar. The schema jar aspects are used when the schema is compiled and ignored at Daffodil runtime. The code jar aspects are used at Daffodil run time and ignored at schema compilation time. So such a jar that is both code and schema jar needs to be on the class path in both places, but there's no interaction of the two things.

Binary Compiled Schema File: Centrally, DFDL schemas in files and/or jars are compiled to create a single binary object which can be reloaded in order to actually use the schema to parse/unparse data.

These binary files are tied to a specific version+build of Daffodil. (They are just a java object serialization of the runtime data structures used by Daffodil).
Once reloaded into a JVM to create a Daffodil DataProcessor object, that object is read-only so thread safe, and can be shared by parse calls happening on many threads.

Daffodil Config File: This contains settings like what warnings to suppress when compiling and/or at runtime, tunables, such as how large to allow a regex match attempt, maximum parsed data size limit, etc. This also is needed both at schema compile and at runtime, as the same file contains parameters for both DFDL schema compile time and runtime.

...il/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java

...odil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaUtils.java

mbeckerle · 2024-01-05T15:19:48Z

.../src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java

+
+  private void loadSchema(URI schemaFileURI) throws IOException, InvalidParserException {
+    Compiler c = Daffodil.compiler();
+    dp = c.reload(Channels.newChannel(schemaFileURI.toURL().openStream()));


@cgivre This reload call is the one that has to happen on every drill node.
It needs only to happen once for that schema for the life of the JVM. The "dp" object created here can be reused every time that schema is needed to parse more data. The dp (DataProcessor) is a read only (thread safe) data structure.

As you see, this can throw exceptions, so the question of how those situations should be handled arises.
Even if drill perfectly makes the file available to every node for this, that would rule out the IOException due to file not found or access rights, but a user can create a compiled DFDL schema binary file using the wrong version of the Daffodil schema compiler which is a mismatch for the runtime; hence, it is possible for the InvalidParserException to be thrown.

This definitely seems like an area where there is potential for a lot of different things to go wrong. My view is we should just do our best to provide clear error messages so that the user can identify and fix the issues.

mbeckerle · 2024-01-05T15:21:36Z

.../src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java

+      try {
+        dmp.loadSchema(schemaFileURI);
+      } catch (IOException | InvalidParserException e) {
+        throw new CompileFailure(e);


Error architecture?

This loadSchema call needs to happen on every node, and so has the potential (if the loaded binary schema file is no good or mismatches the Daffodil library version) to fail. Is throwing this exception the right thing here or are other steps preferred/necessary?

My thought here would be to fail as quickly as possible. If the DFDL schema can't be read, I'm assuming that we cannot proceed, so throwing an exception would be the right thing to do IMHO. With that said, we should make sure we provide a good error message that would explain what went wrong.
One of the issues we worked on for a while with Drill was that it would fail and you'd get a stack trace w/o a clear idea of what the actual issue is and how to rectify it.

.../src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java

mbeckerle · 2024-01-05T15:29:39Z

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java

+            .addContext(errorContext).build(logger);
+      }
+      if (dafParser.isValidationError()) {
+        logger.warn(dafParser.getDiagnosticsAsString());


Do we need an option here to convert validation errors to fatal?

Will logger.warn be seen by a query user, or is that just for someone dealing with the logs?

Validation errors either should be escalated to fatal, OR they should be visible in the query output display to a user somehow.

Either way, users will need a mechanism to suppress validation errors that prove to be unavoidable since they could be common place. Nodody wants thousands of warnings about something they can't avoid that doesn't stop parsing and querying the data.

@mbeckerle The question I'd have is whether the query can proceed if validation fails. (I don't know the answer)
If the answer is no, then we need to halt execution ASAP and throw an exception. If the answer is it can proceed, but the data might be less than ideal, maybe we add a configuration option which will allow the user to decide the behavior on a validation failure.

I could imagine situations where you have Drill unable to read a huge file because someone fat fingered a quotation mark somewhere or something like that. In a situation like that, sometimes you might just want to say I'll accept a row or two of bad data just so I can read the whole file.

Agree.

We draw a distinction between "well formed" and "invalid" data and whether one does validation seems like the right switch in daffodil to use.

If data is malformed, that means you can't successfully parse it. If it is invalid, that just means values are unexpected. Example: A 3 digit number representing a percentage 0 to 100. -1 is invalid, ABC is malformed.

If data is not well formed, you really cannot continue parsing it, as you cannot convert it to the type expected. But, if you are able to determine at least how big it is, it's possible to capture that length of data into a dummy "badData" element which is always invalid (so isn't a "false positive" parse). This capability has to be designed into the DFDL schema, but it is something we've been doing more and more.

Hence, one can tolerate even some malformed data. If it is malformed to where you cannot determine the length, then continuing is impossible.

We will see if more than this is needed. Options like the "use all strings/varchar" or all numbers are float, which you have for toleratng situations with other data connectors may prove useful, particularly while a DFDL schema is in development and you are really just testing it (and the corresponding data) using Drill.

cgivre · 2024-01-07T16:46:32Z

@mbeckerle
With respect to style, I tried to reply to that comment, but the thread won't let me. In any event, Drill classes will typically start with the constructor, then have whatever methods are appropriate for the class. The logger creation usually happens before the constructor. I think all of your other classes followed this format, so the one or two that didn't kind of jumped out at me.

cgivre · 2024-01-07T16:49:41Z

@mbeckerle I had a thought about your TODO list. See inline.

This is ready for a next review. All the scalar types are now implemented with typed setter calls.
The prior review comments have all been addressed I believe.
Remaining things to do include:

How to get the compiled DFDL schema object so it can be loaded by daffodil out at the distributed Drill nodes.

I was thinking about this and I remembered something that might be useful. Drill has support for User Defined Functions (UDF) which are written in Java. To add a UDF to Drill, you also have to write some Java classes in a particular way, and include the JARs. Much like the DFDL class files, the UDF JARs must be accessible to all nodes of a Drill cluster.
Additionally, Drill has the capability of adding UDFs dynamically. This feature was added here: #574. Anyway, I wonder if we could use a similar mechanism to load and store the DFDL files so that they are accessible to all Drill nodes. What do you think?

Excellent: So drill has all the machinery, it's just a question of repackaging it so it's available for this usage pattern, which is a bit different from Drill's UDFs, but also very similar.

There are two user scenarios which we can call production and test.

Production: binary compiled DFDL schema file + code jars for Daffodil's own UDFs and "layers" plugins. This should, ideally, cache the compiled schema and not reload it for every query (at every node), but keep the same loaded instance in memory in a persistant JVM image on each node. For large production DFDL schemas this is the only sensible mechanism as it can take minutes to compile large DFDL schemas.

Test: on-the-fly centralized compilation of DFDL schema (from a combination of jars and files) to create and cache (to avoid recompiling) the binary compiled DFDL schema file. Then using that compiled binary file, as item 1. For small DFDL schemas this can be fast enough for production use. Ideally, if the DFDL schema is unchanged this would reuse the compiled binary file, but that's an optimization that may not matter much.

Kinds of objects involved are:

Daffodil plugin code jars

DFDL schema jars

DFDL schema files (just not packaged into a jar)

Daffodil compiled schema binary file

Daffodil config file - parameters, tunables, and options needed at compile time and/or runtime

Code jars: Daffodil provides two extension features for DFDL users - DFDL UDFs and DFDL 'layers' (ex: plug-ins for uudecode, or gunzip algorithms used in part of the data format). Those are ordinary compiled class files in jars, so in all scenarios those jars are needed on the node class path if the DFDL schema uses them. Daffodil dynamically finds and loads these from the classpath in regular Java Service-Provider Interface (SPI) mechanisms.

Schema jars: Daffodil packages DFDL schema files (source files i.e., mySchema.dfdl.xsd) into jar files to allow inter-schema dependencies to be managed using ordinary jar/java-style managed dependencies. Tools like sbt and maven can express the dependencies of one schema on another, grab and pull them together, etc. Daffodil has a resolver so when one schema file referenes another with include/import it searches the class path directories and jars for the files.

Schema jars are only needed centrally when compiling the schema to a binary file. All references to the jar files for inter-schema file references are compiled into the compiled binary file.

It is possible for one DFDL schema 'project' to define a DFDL schema, along with the code for a plugin like a Daffodil UDF or layer. In that case the one jar created is both a code jar and a schema jar. The schema jar aspects are used when the schema is compiled and ignored at Daffodil runtime. The code jar aspects are used at Daffodil run time and ignored at schema compilation time. So such a jar that is both code and schema jar needs to be on the class path in both places, but there's no interaction of the two things.

Binary Compiled Schema File: Centrally, DFDL schemas in files and/or jars are compiled to create a single binary object which can be reloaded in order to actually use the schema to parse/unparse data.

These binary files are tied to a specific version+build of Daffodil. (They are just a java object serialization of the runtime data structures used by Daffodil).

Once reloaded into a JVM to create a Daffodil DataProcessor object, that object is read-only so thread safe, and can be shared by parse calls happening on many threads.

Daffodil Config File: This contains settings like what warnings to suppress when compiling and/or at runtime, tunables, such as how large to allow a regex match attempt, maximum parsed data size limit, etc. This also is needed both at schema compile and at runtime, as the same file contains parameters for both DFDL schema compile time and runtime.

@mbeckerle Would you want to chat sometime next week and I can walk you through the UDF architecture? I don't know how relevant it would be, but you'd at least see how things are installed and so forth.

mbeckerle · 2024-01-09T23:35:43Z

@mbeckerle With respect to style, I tried to reply to that comment, but the thread won't let me. In any event, Drill classes will typically start with the constructor, then have whatever methods are appropriate for the class. The logger creation usually happens before the constructor. I think all of your other classes followed this format, so the one or two that didn't kind of jumped out at me.

@cgivre I believe the style issues are all fixed. The build did not get any codestyle issues.

cgivre · 2024-01-14T15:58:05Z

@mbeckerle With respect to style, I tried to reply to that comment, but the thread won't let me. In any event, Drill classes will typically start with the constructor, then have whatever methods are appropriate for the class. The logger creation usually happens before the constructor. I think all of your other classes followed this format, so the one or two that didn't kind of jumped out at me.

@cgivre I believe the style issues are all fixed. The build did not get any codestyle issues.

The issue I was referring to was more around the organization of a few classes. Usually we'll have the constructor (if present) at the top followed by any class methods. I think there was a class or two where the constructor was at the bottom or something like that. In any event, consider the issue resolved.

This significantly simplifies the metadata walking to convert Daffodil metadata to drill metadata.

mbeckerle · 2024-01-21T20:17:57Z

@cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree?

I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist.

cgivre · 2024-01-21T20:23:39Z

@cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree?

I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist.

@mbeckerle Here's an example: https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't able to connect last week.

mbeckerle · 2024-01-23T17:24:25Z

@cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree?
I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist.

@mbeckerle Here's an example: https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't able to connect last week.

If I understand this correctly, if a jar is on the classpath and has drill-module.conf in its root dir, then drill will find it and read that HOCON file to get the package to add to drill.classpath.scanning.packages.

Drill then appears to scan jars for class files for those packages. Not sure what it is doing with the class files. I imagine it is repackaging them somehow so Drill can use them on the drill distributed nodes. But it isn't yet clear to me how this aspect works. Do these classes just get loaded on the distributed drill nodes? Or is the classpath augmented in some way on the drill nodes so that they see a jar that contains all these classes?

I have two questions:

(1) what about dependencies? The UDF may depend on libraries which depend on other libraries, etc.

(2) what about non-class files, e.g., things under src/main/resources of the project that go into the jar, but aren't "class" files? How do those things also get moved? How would code running in the drill node access these? The usual method is to call getResource(URL) with a URL that gives the path within a jar file to the resource in question.

Thanks for any info.

cgivre · 2024-01-23T18:36:04Z

@cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree?
I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist.

@mbeckerle Here's an example: https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't able to connect last week.

If I understand this correctly, if a jar is on the classpath and has drill-module.conf in its root dir, then drill will find it and read that HOCON file to get the package to add to drill.classpath.scanning.packages.

I believe that is correct.

Drill then appears to scan jars for class files for those packages. Not sure what it is doing with the class files. I imagine it is repackaging them somehow so Drill can use them on the drill distributed nodes. But it isn't yet clear to me how this aspect works. Do these classes just get loaded on the distributed drill nodes? Or is the classpath augmented in some way on the drill nodes so that they see a jar that contains all these classes?

I have two questions:

(1) what about dependencies? The UDF may depend on libraries which depend on other libraries, etc.

So UDFs are a bit of a special case, but if they do have dependencies, you have to also include those JAR files in the UDF directory, or in Drill's 3rd party JAR folder. I'm not that good with maven, but I've often wondered about making a so-called fat-JAR which includes the dependencies as part of the UDF JAR file.

(2) what about non-class files, e.g., things under src/main/resources of the project that go into the jar, but aren't "class" files? How do those things also get moved? How would code running in the drill node access these? The usual method is to call getResource(URL) with a URL that gives the path within a jar file to the resource in question.

Take a look at this UDF. https://github.com/datadistillr/drill-geoip-functions
This UDF has a few external resources including a CSV file and the MaxMind databases.

Thanks for any info.

mbeckerle · 2024-01-23T19:59:37Z

Ok, so the geo-ip UDF stuff has no special mechanisms or description about those resource files, so the generic code that "scans" must find them and drag them along automatically.

That's the behavior I want.

@cgivre What is "Drill's 3rd Party Jar folder"?

If a magic folder just gets dragged over to all nodes, and drill uses a class loader that arranges for jars in that folder to be searched, then there is very little to do, since a DFDL schema can be just a set of jar files containing related resources, and the classes for Daffodil's own UDFs and layers which are java code extensions of its own kind.

Uses JPrimType now, not strings.

mbeckerle · 2024-04-27T19:49:47Z

This now passes all the daffodil contrib tests using the published official Daffodil 3.7.0.

It does not yet run in any scalable fashion, but the metadata/data interfacing is complete.

I would like to squash this to a single commit before merging, and it needs to be tested rebased onto the latest Drill commit.

mbeckerle · 2024-04-27T20:39:29Z

Creating a new squashed PR so as to avoid loss of the comments on this PR.

mbeckerle mentioned this pull request Oct 14, 2023

WIP: Metadata walker for JAPI apache/daffodil#1092

Closed

cgivre assigned mbeckerle Oct 17, 2023

cgivre added enhancement PRs that add a new functionality to Drill new-format New Format Plugin doc-impacting PRs that affect the documentation labels Oct 17, 2023

cgivre requested changes Oct 19, 2023

View reviewed changes

cgivre marked this pull request as draft October 19, 2023 02:20

cgivre reviewed Oct 19, 2023

View reviewed changes

...b/format-daffodil/src/test/java/org/apache/drill/exec/store/daffodil/TestDaffodilReader.java Outdated Show resolved Hide resolved

cgivre reviewed Oct 19, 2023

View reviewed changes

...b/format-daffodil/src/test/java/org/apache/drill/exec/store/daffodil/TestDaffodilReader.java Outdated Show resolved Hide resolved

cgivre reviewed Oct 19, 2023

View reviewed changes

distribution/src/assemble/component.xml Outdated Show resolved Hide resolved

cgivre reviewed Oct 19, 2023

View reviewed changes

contrib/pom.xml Outdated Show resolved Hide resolved

cgivre reviewed Oct 29, 2023

View reviewed changes

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java Outdated Show resolved Hide resolved

cgivre reviewed Oct 29, 2023

View reviewed changes

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java Outdated Show resolved Hide resolved

cgivre reviewed Oct 29, 2023

View reviewed changes

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java Outdated Show resolved Hide resolved

cgivre reviewed Oct 29, 2023

View reviewed changes

.../format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java Show resolved Hide resolved

cgivre reviewed Oct 29, 2023

View reviewed changes

...ffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilDrillInfosetOutputter.java Outdated Show resolved Hide resolved

mbeckerle commented Oct 31, 2023

View reviewed changes

mbeckerle force-pushed the daffodil-2835 branch 2 times, most recently from eb418bf to c36cc07 Compare November 7, 2023 00:00

mbeckerle commented Nov 7, 2023

View reviewed changes

mbeckerle force-pushed the daffodil-2835 branch from 96e602e to bf5e16c Compare November 10, 2023 04:34

cgivre mentioned this pull request Dec 6, 2023

elasticsearch storage is not correctly saved #2856

Closed

mbeckerle force-pushed the daffodil-2835 branch from 9b01eb0 to 7e77f19 Compare December 22, 2023 00:29

mbeckerle changed the title ~~WIP: Preliminary Review on adding Daffodil to Drill~~ DRILL-2835: Daffodil Feature for Drill Dec 22, 2023

mbeckerle marked this pull request as ready for review December 22, 2023 00:43

mbeckerle force-pushed the daffodil-2835 branch from 7e77f19 to ca709af Compare December 22, 2023 01:01

cgivre requested changes Jan 2, 2024

View reviewed changes

mbeckerle commented Jan 2, 2024

View reviewed changes

cgivre changed the title ~~DRILL-2835: Daffodil Feature for Drill~~ DRILL-8474: Add Daffodil Format Plugin Jan 3, 2024

Uses ScalarWriter now for typed setters for all DFDL types

13183ac

Date, Time, DateTime, Boolean, Unsigned integers, Integer, NonNegativeInteger,Decimal, float, double, hexBinary.

mbeckerle added 2 commits January 4, 2024 17:24

Code reformatting and reorg

225504a

I imported the dev-support/formatter/eclipse settings and used them to reformat the code in IntelliJ IDEA. No functional changes in this commit.

Remove catches of Exception and test printing

b80e74a

Also a few code cleanups.

mbeckerle commented Jan 5, 2024

View reviewed changes

Factored common MapBuilderLike out of SchemaBuilder and MapBuilder

ab567d9

This significantly simplifies the metadata walking to convert Daffodil metadata to drill metadata.

mbeckerle added 3 commits February 12, 2024 21:13

Update to latest flavor of Daffodil Metadata

ad25972

Uses JPrimType now, not strings.

Uses DFDLPrimType now, not JPrimType

7567911

Change to Daffodil 3.7.0 official release.

e15707a

mbeckerle closed this Apr 27, 2024

mbeckerle mentioned this pull request Apr 27, 2024

DRILL-8474: Adding Daffodil to Drill as a contrib. #2909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRILL-8474: Add Daffodil Format Plugin #2836

DRILL-8474: Add Daffodil Format Plugin #2836

mbeckerle commented Oct 14, 2023 •

edited

cgivre left a comment

cgivre Oct 19, 2023

mbeckerle Oct 30, 2023

cgivre commented Oct 29, 2023

mbeckerle commented Dec 22, 2023

mbeckerle commented Dec 22, 2023

cgivre left a comment

mbeckerle Jan 2, 2024

mbeckerle commented Jan 2, 2024

paul-rogers commented Jan 3, 2024 via email

mbeckerle commented Jan 4, 2024 via email

mbeckerle commented Jan 4, 2024

cgivre commented Jan 5, 2024

mbeckerle commented Jan 5, 2024

mbeckerle Jan 5, 2024

cgivre Jan 14, 2024

mbeckerle Jan 5, 2024

cgivre Jan 14, 2024

mbeckerle Jan 5, 2024

cgivre Jan 14, 2024

mbeckerle Jan 16, 2024

cgivre commented Jan 7, 2024

cgivre commented Jan 7, 2024

mbeckerle commented Jan 9, 2024

cgivre commented Jan 14, 2024

mbeckerle commented Jan 21, 2024

cgivre commented Jan 21, 2024

mbeckerle commented Jan 23, 2024

cgivre commented Jan 23, 2024

mbeckerle commented Jan 23, 2024 •

edited

mbeckerle commented Apr 27, 2024

mbeckerle commented Apr 27, 2024

DRILL-8474: Add Daffodil Format Plugin #2836

DRILL-8474: Add Daffodil Format Plugin #2836

Conversation

mbeckerle commented Oct 14, 2023 • edited

cgivre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgivre commented Oct 29, 2023

mbeckerle commented Dec 22, 2023

mbeckerle commented Dec 22, 2023

cgivre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbeckerle commented Jan 2, 2024

paul-rogers commented Jan 3, 2024 via email

mbeckerle commented Jan 4, 2024 via email

mbeckerle commented Jan 4, 2024

cgivre commented Jan 5, 2024

mbeckerle commented Jan 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgivre commented Jan 7, 2024

cgivre commented Jan 7, 2024

mbeckerle commented Jan 9, 2024

cgivre commented Jan 14, 2024

mbeckerle commented Jan 21, 2024

cgivre commented Jan 21, 2024

mbeckerle commented Jan 23, 2024

cgivre commented Jan 23, 2024

mbeckerle commented Jan 23, 2024 • edited

mbeckerle commented Apr 27, 2024

mbeckerle commented Apr 27, 2024

mbeckerle commented Oct 14, 2023 •

edited

mbeckerle commented Jan 23, 2024 •

edited