Skip to content

Daffodil 2663 pluggable charsets#768

Merged
stevedlawrence merged 1 commit intoapache:mainfrom
alexanderrevello:daffodil-2663-pulggable-charsets
May 23, 2022
Merged

Daffodil 2663 pluggable charsets#768
stevedlawrence merged 1 commit intoapache:mainfrom
alexanderrevello:daffodil-2663-pulggable-charsets

Conversation

@alexanderrevello
Copy link
Contributor

No description provided.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far. I'll review the tests more on next round.

* A charset transformer is created at runtime as part of a single parse/unparse call.
* Hence, they can be stateful without causing thread-safety issues.
*/
abstract class CharsetTransformer(charsetName: String){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can collapse CharsetTransformer and have a CharsetTransformer just be a BitsCharset.

Unless there is a need for CharsetTransformer, the added indirection is misleading, as people will do what I just did, which is search for the reason it is needed.

.newInstance()
if (cs2 == null) {
tci.schemaDefinitionError(
"Unsupported encoding: %s. Supported encodings: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just Assert.invariant(cs2 ne null) because the find() call above will SDE if the encoding cannot be found, and the other calls are never supposed to return null. So this schemaDefinitionError call will never be hit.

That will get rid of your codeCov errors.

# layering, but one should expect to have to adapt their code substantially.
#
org.apache.daffodil.charsets.CustomNonByteSizeCharsetCompiler
org.apache.daffodil.charsets.CustomJavaCharsetCompiler No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing final line ending

maybeCompiler.get
}
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing final NL

val maybeCompiler: Option[CharsetCompiler] = charsetCompilerMap.get(name)
if (maybeCompiler.isEmpty) {
val choices = charsetCompilerMap.keySet.mkString(", ")
context.SDE("The charset '%s' was not found. Available choices are: %s", name, choices + ", " + CharsetUtils.supportedEncodingsString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messages should not use the term "charset", but either "encoding" or "charset encoding". The XML term for 'charset' is 'encoding' and DFDL schema authors know the concept by that name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike also said he would like Daffodil's current charsets re-implemented as pluggable charsets, so you'll probably need to remove CharsetUtils.supportedEncodingsString from this line.

* The state is passed in order to provide diagnostic context if not found.
*/
def find(name: String, context: ThrowsSDE): CharsetCompiler = {
val maybeCompiler: Option[CharsetCompiler] = charsetCompilerMap.get(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the standard scala Option type, and our own more efficient but less rich Maybe type.
In the fast paths of the runtime we try to use Maybe, but elsewhere we use Option. You are using Option here which is right. I just suggest name this variable optCompiler for consistency.

@@ -95,10 +96,23 @@ abstract class CharsetEvBase(encodingEv: EncodingEvBase, tci: DPathCompileInfo)
val encString = encodingEv.evaluate(state)
val cs = CharsetUtils.getCharset(encString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get rid of all the built-in charsets, and implement them all as CharsetCompiler (with meta-inf, etc.) so there is only one way a charset is found/resolved, etc.

They would still all be part of daffodil-runtime1, just they would be detected and loaded using the dynamic load SPI mechanism, just like any user-supplied charset.

It is quite confusing to have two such mechanisms, when there is no good reason not to just have one.

So CharsetUtils.getCharset will go away, or perhaps become the code that does what you are doing here.
DaffodilCharsetProvider would also go away, or morph similarly.

import org.apache.daffodil.io.FormatInfo

final class CustomJavaCharset(
name2: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange indentation.

When there are only a small number of args, we tend not to do the thing where args each get a line, as it reduces code density for no adantage. At 3+ args, sure, because it will line wrap anyway. But just 1 or 2 args keep args on one line.

extends CharsetTransformer(
name2){

def bitsCharset = BitsCharsetISO885913
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is named CustomJavaCharset fot the test, but the bitsCharset is wired to BitsCharsetISO885913 ?

I think CharsetTransformers should go away entirely, but until then I would named this BitsCharsetISO885913Transformer, and why is name2 being ignored?

If you ignore an arg like this, probably a comment in the code is required.

val byte = getByte(dis, 0)
byte.toChar
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final line ending

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly amplifying Mike's suggestions for improving the pull request, including re-implementing Daffodil's own charsets as pluggable charsets.

val maybeCompiler: Option[CharsetCompiler] = charsetCompilerMap.get(name)
if (maybeCompiler.isEmpty) {
val choices = charsetCompilerMap.keySet.mkString(", ")
context.SDE("The charset '%s' was not found. Available choices are: %s", name, choices + ", " + CharsetUtils.supportedEncodingsString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike also said he would like Daffodil's current charsets re-implemented as pluggable charsets, so you'll probably need to remove CharsetUtils.supportedEncodingsString from this line.

abstract class CharsetTransformerFactory(val name: String)
extends Serializable {

def newInstance(): CharsetTransformer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike said CharsetTransformer should simply be BitsCharset, so you probably want to rename CharsetTransformerFactory to BitsCharsetFactory.

abstract class CharsetTransformerFactory(val name: String)
extends Serializable {

def newInstance(): CharsetTransformer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike said CharsetTransformer should simply be BitsCharset, so you probably want to rename CharsetTransformerFactory to BitsCharsetFactory.

#
# These are partial/subsets of "real" functionality that give us test coverage
# of the layering features of Daffodil. They can be reused to create "real"
# layering, but one should expect to have to adapt their code substantially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those comments were pasted here from another place and don't belong here, or should be updated to talk only about your test charsets.

#
# These are partial/subsets of "real" functionality that give us test coverage
# of the layering features of Daffodil. They can be reused to create "real"
# layering, but one should expect to have to adapt their code substantially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those comments were pasted here from another place and don't belong here, or should be updated to talk only about your test charsets.


<annotation>
<appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="tns:GeneralFormat" initiator="" terminator="" leadingSkip="0" trailingSkip="0" textBidi="no" floating="no" encoding="ASCII" byteOrder="bigEndian" alignment="implicit" alignmentUnits="bits" fillByte="f" occursCountKind="implicit" truncateSpecifiedLengthString="no" ignoreCase="no" representation="text" lengthKind="delimited" nilValueDelimiterPolicy="both" emptyValueDelimiterPolicy="none" documentFinalTerminatorCanBeMissing="yes" initiatedContent="no" separatorSuppressionPolicy="anyEmpty" separatorPosition="infix" textTrimKind="none" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please override only properties that really need to be different than GeneralFormat's properties, and define them on multiple lines for readability.


<annotation>
<appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="tns:GeneralFormat" initiator="" terminator="" leadingSkip="0" trailingSkip="0" textBidi="no" floating="no" encoding="ASCII" byteOrder="bigEndian" alignment="implicit" alignmentUnits="bits" fillByte="f" occursCountKind="implicit" truncateSpecifiedLengthString="no" ignoreCase="no" representation="text" lengthKind="delimited" nilValueDelimiterPolicy="both" emptyValueDelimiterPolicy="none" documentFinalTerminatorCanBeMissing="yes" initiatedContent="no" separatorSuppressionPolicy="anyEmpty" separatorPosition="infix" textTrimKind="none" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please override only properties that really need to be different than GeneralFormat's properties.

xmlns:ex="http://example.com"
xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset" defaultRoundTrip="none">

<tdml:parserTestCase name="parse_charsets" root="s1" model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd" roundTrip="none">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have defaultRoundTrip="none" above so we don't need roundTrip="none" in each test.

xmlns:ex="http://example.com"
xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset" defaultRoundTrip="none">

<tdml:parserTestCase name="parse_charsets" root="s1" model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd" roundTrip="none">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have defaultRoundTrip="none" above so we don't need roundTrip="none" in each test.

@mbeckerle
Copy link
Contributor

This now includes commits that are not really part of this change set, but they are showing up as commits in this review.

That and... github UI is having trouble. I can't seem to narrow the review to just the last 3 commits. It doesn't seem to allow me to select that. (I can select 2 commits, but not 3)

The exact best way to fix this isn't clear. I think the first 2 and last 5 commits should be in this change set, the other commits shouldn't be part of this.

@alexanderrevello alexanderrevello force-pushed the daffodil-2663-pulggable-charsets branch from 03fe6dd to 3269145 Compare March 23, 2022 19:11
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting very close. I have only a few more changes I'd like to see.


package org.apache.daffodil.processors.charset

//import org.apache.daffodil.processors.charset.BitsCharset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. How does this work without this import. You are definitely referencing this BitsCharset below.

# limitations under the License.
#

org.apache.daffodil.charsets.BitsCharsetISO88591ReverseCompiler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good case for exception to the caml-case naming conventions is warranted.
BitsCharset_ISO_8859_1_ReverseCompiler
BitsCharset_ISO_8859_13_Compiler
Those would be far easier to understand. Being able to pick out the "13" is important.

<complexType>
<sequence>
<element name="e1" dfdl:encoding="ISO-8859-13" dfdl:lengthKind="explicit" dfdl:length="8" type="xsd:string" />
<element name="e2" dfdl:encoding="X-DFDL-ISO-88591-8-BIT-PACKED-LSB-FIRST-REVERSE" dfdl:lengthKind="delimited" type="xsd:string" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ISO-8859-1 (missing a hyphen) consistent with ISO-8859-13 above.

import org.apache.daffodil.processors.charset.BitsCharsetFactory

object BitsCharsetISO88591Reverse extends{
override val name = "X-DFDL-ISO-88591-8-BIT-PACKED-LSB-FIRST-REVERSE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISO-8859-1 (add hypen)

import org.apache.daffodil.processors.charset.CharsetCompiler
import org.apache.daffodil.processors.charset.BitsCharsetFactory

object BitsCharsetISO885913 extends {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this BitsCharsetTest8859_13 or something.

So this is half a charset.

But this makes your tests wierd. Since they're asymmetric. Parse it behaves as 8859-1, unparse as 8859-13.

You can fix this easily because you just need to define decodeOneChar so that it does the lookup of the char in a length 256 string having the chars corresponding to bytes 0 to 255 of iso-8859-13, which you can do via

val decodeString = {
  val bytes = ByteBuffer.wrap((0 to 255).map{ _.toByte }.toArray)
  Charset.forName("iso-8859-13").newDecoder().decode(bytes).toString
  }
  
  protected override def decodeOneChar(dis: InputSourceDataInputStream, finfo: FormatInfo): Char = {
    val byte = getByte(dis, 0)
    decodeString(byte)
  }



final class BitsCharsetISO88591ReverseCompiler
extends CharsetCompiler("X-DFDL-ISO-88591-8-BIT-PACKED-LSB-FIRST-REVERSE") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add hyphen 8859-1

<tdml:infoset>
<tdml:dfdlInfoset>
<s1>
<e1>À1234567</e1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment that first character is Unicode Code point 0xC0, LATIN CAPITAL LETTER A WITH GRAVE. Just in case someone is lacking a font that displays it.

<tdml:dfdlInfoset>
<s2>
<e1>À1234567</e1>
<e2>À1234567</e2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First char inside should be Ą once you have a real 8859-13 decoder, which I suggest you fix below in comments on the scala code for this charset.

@mbeckerle mbeckerle changed the title Daffodil 2663 pulggable charsets Daffodil 2663 pluggable charsets Mar 23, 2022
@mbeckerle
Copy link
Contributor

This now includes commits that are not really part of this change set, but they are showing up as commits in this review.

That and... github UI is having trouble. I can't seem to narrow the review to just the last 3 commits. It doesn't seem to allow me to select that. (I can select 2 commits, but not 3)

The exact best way to fix this isn't clear. I think the first 2 and last 5 commits should be in this change set, the other commits shouldn't be part of this.

Note to other reviewers - this got fixed - it's now just the right set of commits.

@alexanderrevello alexanderrevello force-pushed the daffodil-2663-pulggable-charsets branch from 3269145 to f1a4187 Compare March 24, 2022 22:42
Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your pull request looks pretty close to completion. I have proposed a new naming strategy that would make your pull request even better. If the new naming strategy makes sense, the changes won't take too much time if you use your IDE's refactoring support to make the changes easier to do.


org.apache.daffodil.processors.charset.ASCIICompiler
org.apache.daffodil.processors.charset.ASCIICompilerAlias
org.apache.daffodil.processors.charset.BitsCharsetAISPayloadArmoringCompiler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at this pull request as carefully as I had time to do and I think there are better ways to name some things. I know that Mike suggested doing the same thing as daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/Layer* but I think copying Compiler and Transformer from the Layer* classes is going too far. These names make some sense for layer classes which transform data, but make less sense for classes which only create new BitsCharset objects.

I suggest that we use this naming strategy instead:

BitsCharset - Daffodil's primary abstraction for dealing with character sets
BitsCharsetFactory - Daffodil's abstraction for creating a new BitsCharset (no Transformer)
BitsCharsetDefinition - Daffodil's abstraction for creating a new BitsCharsetFactory (not CharsetCompiler)

BitsCharsetUSASCII - charset for "US-ASCII" and "ASCII" character sets
BitsCharsetUSASCIIFactory - factory for creating a new BitsCharsetUSACII (no Transformer)
BitsCharsetUSASCIIDefinition - definition for creating a new BitsCharsetUSASCIIFactory from "US-ASCII"
BitsCharsetUSASCIIAliasDefinition - alias definition for creating a new BitsCharsetUSASCIIFactory from "ASCII"

Given this new naming strategy, I would make the first three lines of this file look like this and replace Compiler with Definition in the rest of the lines:

org.apache.daffodil.processors.charset.BitsCharsetUSASCIIDefinition
org.apache.daffodil.processors.charset.BitsCharsetUSASCIIAliasDefinition
org.apache.daffodil.processors.charset.BitsCharsetAISPayloadArmoringDefinition

Then I would sort all the definitions' lines alphabetically to establish a standard ordering of definitions in this file. Finally, I would rename this file from CharsetCompiler to BitsCharsetDefinition.

Comment on lines 48 to 52
final class BitsCharsetAISPayloadArmoringCompiler
extends CharsetCompiler("X-DAFFODIL-AIS-PAYLOAD-ARMORING") {

override def compileCharset() = {
new BitsCharsetAISPayloadArmoringTransformerFactory(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given our new naming strategy, I would make these lines look like this,

final class BitsCharsetAISPayloadArmoringDefinition
  extends BitsCharsetDefinition("X-DAFFODIL-AIS-PAYLOAD-ARMORING") {

  override def newFactory() = {
    new BitsCharsetAISPayloadArmoringFactory
  }
}

Note that I would not pass name to the factory because I see no justification for passing it. The factory only creates one charset statically chosen at compile time; it's not supposed to look at name and create one of several possible charsets dynamically chosen at runtime. Do you agree, Mike?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind if BitsCharsetAISPayloadArmoringFactory can ever fail and issue a diagnostic, the schema author/user knows this as "X-DAFFODIL-AIS-PAYLOAD-ARMORING", and that's the meaningful name to them for use in a diagnostic. That's the only reason I can think of for passing the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name() is required for the SimpleNamedServiceLoader, however it does not need to be passed, the implementing class can just set it. I will change it

}
}

class BitsCharsetAISPayloadArmoringTransformerFactory(name: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final class BitsCharsetAISPayloadArmoringFactory()

I would add 'final' (unless there is a reason to subclass factories?), remove Transformer, and remove name: String since I see no reason to pass a string that's never used inside the class and I can't imagine it would be useful for debugging either.

Comment on lines 34 to 42
final class BitsCharsetBase4LSBFCompiler
extends CharsetCompiler("X-DFDL-BASE4-LSBF") {

override def compileCharset() = {
new BitsCharsetBase4LSBFTransformerFactory(name)
}
}

class BitsCharsetBase4LSBFTransformerFactory(name: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I would make similar changes to the rest of the new definitions in this pull request (Compiler -> Definition, compileCharset -> newFactory, TransfomerFactory -> Factory, add final, remove name):

final class BitsCharsetBase4LSBFDefinition
  extends BitsCharsetDefinition("X-DFDL-BASE4-LSBF") {

  override def newFactory() = {
    new BitsCharsetBase4LSBFFactory
  }
}

final class BitsCharsetBase4LSBFFactory()

@@ -0,0 +1,33 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this file from BitsCharsetTransformerFactory.scala to BitsCharsetFactory.scala and remove the first line (it's just a blank line).


object CharsetCompilerRegistry {

private lazy val charsetCompilerMap: Map[String, CharsetCompiler] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to

  private lazy val definitionMap: Map[String, BitsCharsetDefinition] =
    SimpleNamedServiceLoader.loadClass[BitsCharsetDefinition](classOf[BitsCharsetDefinition])

Update rest of file likewise.

Comment on lines 52 to 54
def SDE(id: String, args: Any*): Nothing = {
throw new Exception(id.format(args: _*))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation.

def getCharset(name: String): BitsCharset = {
val cs = DaffodilCharsetProvider.charsetForName(name)
cs
CharsetCompilerRegistry.find(name).compileCharset.newInstance()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will become

    BitsCharsetDefinitionRegistry.find(name).newFactory().newInstance()

Not sure if Scala conventions would keep empty parentheses or not.

#

org.apache.daffodil.charsets.BitsCharset_ISO_8859_1_Reverse_Compiler
org.apache.daffodil.charsets.BitsCharsetTest_ISO_8859_13_Compiler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename file from CharsetCompiler to BitsCharsetDefinition. Replace Compiler with Definition.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple buggy comments in tests (you have your Grave accents and Greek Ogoneks backward :-)

I think the renaming @tusji suggested will be a big improvement as well, so is very worth doing.

<tdml:dfdlInfoset>
<s1>
<!-- First char in e1 is unicode char 0xC0 captial A with grave -->
<e1>Ą1234567</e1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the comment for the wrong char. This one is the 0x104 captial A with Greek Ogonek. (Ą)

<tdml:infoset>
<tdml:dfdlInfoset>
<s2>
<!-- First char in e1 is unicode char 0x104 captial A with Greek Ogonek -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have these two comments backward. This first one is the A with Grave.

<!-- First char in e1 is unicode char 0x104 captial A with Greek Ogonek -->
<e1>À1234567</e1>
<!-- First char in e2 is unicode char 0xC0 captial A with grave -->
<e2>Ą1234567</e2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second one is the A with greek ogonek.

@alexanderrevello alexanderrevello deleted the daffodil-2663-pulggable-charsets branch March 30, 2022 14:25
@alexanderrevello alexanderrevello restored the daffodil-2663-pulggable-charsets branch March 30, 2022 14:35
@alexanderrevello
Copy link
Contributor Author

corrected typo closed the branch, reopening with typo to preserve comments

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 This is an excellent change set.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent progress! Still needs some changes to ensure error paths are handled appropriately, though.

}
}

class BitsCharsetBase4LSBFFactory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed a final here :).

Comment on lines 20 to 21
* Must be implemented by all charsets.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these lines. I'd meant to ask you to move the first sentence after the second sentence, and you've done that so these lines are redundant.

import org.apache.daffodil.exceptions.ThrowsSDE
import org.apache.daffodil.util.SimpleNamedServiceLoader

object BitsCharsetDefinitionRegistry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be wonderful to add a comment here briefly explaining what this object is used for, something like "Finds all pluggable BitCharsets and makes them available to Daffodil."

P.S. In later comments, I'm now suggesting that you chop Registry off BitsCharsetDefinitionRegistry's name and move it to BitsCharsetDefinition.scala.

}
}

def SDE(id: String, args: Any*): Nothing = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to make this helper function private.

P.S. In later comments, I'm now suggesting that you change find(name) to return an Option[BitsCharsetDefinition] instead of throwing a SDE, move this private SDE function to CharsetUtils.scala, and call it in CharsetUtils.getCharset(name) instead.

throw new Exception(id.format(args: _*))
}

def supportedEncodingsString = {bitsCharsetDefinitionMap.keySet.mkString(", ")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have a one-liner, you don't need braces, and it would be nice to add a test to TestBitsCharsetDefinition.scala which calls this function and verifies the existence of an encoding's name with a comma within the returned string to check the string has the right content. While you're at it, let's also add two negative tests which call each find method with an impossible to find encoding in order to put the error paths under test coverage.

extends Serializable {

def newInstance(): BitsCharset
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. In Scala, we don't have to put each trait/object/class in its own file like we do in Java. You could define BitsCharsetDefinition, BitsCharsetDefinitionRegistry, and BitsCharsetFactory in the same file (BitsCharsetDefinition.scala) and that'll mean the extra tests logically belong in TestBitsCharsetDefinition.scala (no need to add another test class). Also, you can remove newInstance()'s parentheses.


def name(): String

def newFactory(): BitsCharsetFactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these methods have no side effects and no arguments, we can remove the parentheses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parentheses for name are required for simplenamedserviceloader consistency

def getCharset(name: String): BitsCharset = {
val cs = DaffodilCharsetProvider.charsetForName(name)
cs
BitsCharsetDefinitionRegistry.find(name).newFactory.newInstance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized something concerning about this change to Charset.getCharset. The scaladoc contract for DaffodilCharsetProvider.charsetForName(name) promised it would return null if the charset couldn't be found. However, the scaladoc contract for BitsCharsetDefinitionRegistry.find(name) (well, there isn't one because you haven't written it yet, but if you did base it on find(name)'s existing code, it would say) promises it will throw a SDE if the charset can't be found. We've just changed Charset.getCharset(name)'s behavior on its error path, but there are two places in Daffodil code which still check if getCharset returns null on its error path:

daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/EvEncoding.scala
98:      tci.schemaDefinitionError("Unsupported encoding: %s. Supported encodings: %s", encString, CharsetUtils.supportedEncodingsString)

daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
2115:      case null => Assert.usageError("Unsupported encoding: " + encodingName + ". Supported encodings: " + CharsetUtils.supportedEncodingsString)

Both of these error paths throw exceptions as well, so you can simply delete their code since they're now unnecessary. (P.S. I see you've already deleted the error path code from EvEncoding.scala but TDMLRunner.scala still has error path code.) I think it's better to guarantee by contract that getCharset will return a Charset or throw a SDE since too many existing places just call CharsetUtils.getCharset(name) without checking for a null - these places are sure to throw a NPE on the error path anyway.

However, I think we want to change BitsCharsetDefinitionRegistry.find(name) so it returns Option[BitsCharsetDefinition] instead of throwing a SDE and throw the SDE here from getCharset instead. Someone may have an use case that requires them to programmatically look up an user-specified encoding and use that charset if it can be found or else fall back to a default charset without throwing a SDE. Do you agree, @mbeckerle and @stevedlawrence?

P.S. If we're going to expect someone to call BitsCharsetDefinitionRegistry.find(name) directly, I think we ought to chop Registry off the object's name; don't you think BitsCharsetDefinition.find(name) is a more readable call (especially since it will return Option[BitsCharsetDefinition])?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There does seem to be precedent for find() methods to return Option types in scala, and since getCharset provides a single place for the error throw, that does seem to be a useful change.

I don't like stripping Registry suffix off the names of the registry. Something needs to indicate that this is a collection of BitsCharsetDefinitions that you are finding a charset within. I don't like just using plurality (BitsCharsetDefinions plural).

The "Registry" suffix connotes that these are gathered from some diverse places and "registered" which they are. So I think this convention, despite the name getting long-ish, is useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's keep the Registry suffix then.

}
}

class BitsCharsetHexLSBFFactory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another final here :)

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! Couple questions on deprecation ability and maybe we can remove the factory indirection?

org.apache.daffodil.processors.charset.BitsCharset6BitDFI264DUI001Definition
org.apache.daffodil.processors.charset.BitsCharset6BitDFI311DUI002Definition
org.apache.daffodil.processors.charset.BitsCharset6BitICAOAircraftIDAliasDefinition
org.apache.daffodil.processors.charset.BitsCharset6BitICAOAircraftIDDefinition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have this capability for schemas to provide their own charset implementations, does it make sense to deprecate some of these that are very specific to certain schema? Perhaps we output a warning when certain charsets are used in a schema and nudge users to move these to some other package?

Doesn't have to be done as part of this, but maybe a separate issue/PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer deprecating charsets to be done as a separate issue/PR. Whoever is in the best place to know which charsets should be deprecated (Mike or Steve?) should list these charsets in the issue, and then any committer can work on the PR.


def name(): String

def newFactory(): BitsCharsetFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case for the Factory indirection? Based on the comment it seems to be related to serialization, but I don't see anywhere where we store the created factories where they could be serialized.

The only use I see is find(...).newFactory().newInstance(). So we create a factory, create an instance from that factory, and throw away the factory. I wonder if we can simplify this and just treat BitsCharsetDefinition as if it were the factory--it can have the newInstance function that is directly called when a new instance of a charset is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the comments about serialization were copied from the LayerCompiler classes. It probably is true that Daffodil processors store / serialize layer classes in such a way that this indirection makes sense, but it may not be true that Daffodil processors need this indirection for BitCharset classes at all. @alexanderrevello, try removing the indirection and see if all the tests (including integration/CLI tests) still pass. If so, then Steve is right and the indirection is unneeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indirection is not needed and has been removed

* Factory for a Daffodil charset.
*
* This is the serialized object which is saved as part of a processor.
* It constructs a charset at runtime when newInstance() is called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true anymore?Mentioned in another comment, but I don't see this factory being stored anywhere to be serialized? I wonder if things changed so we only serialized the name of a charset, and on deserialization we look up the charset by that string and create a new instance?

@alexanderrevello alexanderrevello force-pushed the daffodil-2663-pulggable-charsets branch from f66908d to 1cc18db Compare April 11, 2022 05:10
cs
val cs = BitsCharsetDefinitionRegistry.find(name).getOrElse(null)
if (cs == null)
SDE("The encoding '%s' was not found. Available choices are: %s", name, BitsCharsetDefinitionRegistry.supportedEncodingsString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the behavior of this function. In the past it returned null if the encoding wasn't found and the caller was expected to handle it correctly. I'd guess we want to keep that behavior, since some callers might just want to know if a charset exists or not, and do something other than error if not.

Maybe something like this instead?

val cs = BitsCharsetDefinitionRegistry.find(name).map(_.newInstance).getOrElse(null)
cs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this SDE should be removed. This function should either return a charset or null. The callers will determine how to handle null.


private def SDE(id: String, args: Any*): Nothing = {
throw new Exception(id.format(args: _*))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually avoid throwing generic Exception's. It often makes it hard to know how to best handle the error. There's also no context associated with this SDE, like schema file or line number, which most SDE's have. I think that's because it just doesn't exist here. Which makes me think more that CharsetUtils should not be throwing any exceptions/SDE's but should instead rely on callers to handle that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest get rid of SDE here. (remove the method) You don't have enough context to know whether something is a schema-definition-error or some other kind of error at this point. These are just charset utils, general purpose code.

You either need to define a EncodingUnavailableException, or you should just be returning null/None or such to indicate the lookup failed. I guess I think the latter is preferable.

There is a method to return a list of the available charsets so returning null is preferable, and the caller can create a sensible diagnostic message.

# layering, but one should expect to have to adapt their code substantially.
#
org.apache.daffodil.processors.charset.CustomNonByteSizeCharsetCompiler
org.apache.daffodil.processors.charset.CustomJavaCharsetCompiler No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments mention that these charsets are defined in src/test/resources since they are only used for tests. But this services file is in src/main/resources. I think this service file also needs to be in src/test/resources so that these charsets are only loaded during tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think this comment was resolved. This file should be in src/test/resources, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file was removed as it was no longer needed. Correct version is in the test folder


override def compute(state: ParseOrUnparseState) = {
val encString = encodingEv.evaluate(state)
val cs = CharsetUtils.getCharset(encString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still use CharsetUtils, and CharsetUtils should be the only thing that uses the registry? This way we can easily change how the registry works if we ever need to since it's hidden behind CharsetUtils

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment wasn't addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

tci.schemaDefinitionError("Unsupported encoding: %s. Supported encodings: %s", encString, CharsetUtils.supportedEncodingsString)
val bcd = BitsCharsetDefinitionRegistry.find(encString).getOrElse(null)
if (bcd == null) {
tci.schemaDefinitionError("Unsupported encoding: %s. Supported encodings: %s", encString, BitsCharsetDefinitionRegistry.supportedEncodingsString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't your change, but supportedEncodingString is potentially very large (we have a lot of encodings) so this could be very verbose. I wonder if we should instead just output the unsuported encoding? We can modify the registry so that it does something like log.debug to output the encodings so that if a user really can't figure it out they can enable debug logs to get that information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to be verbose here.

I think there are two errors. One the user didn't get the charset name right, so it isn't found.

The other is the user does have the name right, but the list of charsets doesn't contain the charset because they didn't get the classpath right.

The latter error you need to see that the charset you seek isn't in the list at all. So the verbose list is needed.

So I'm ok with this as is.

object BitsCharsetDefinitionRegistry {

private lazy val bitsCharsetDefinitionMap: Map[String, BitsCharsetDefinition] =
SimpleNamedServiceLoader.loadClass[BitsCharsetDefinition](classOf[BitsCharsetDefinition])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All charsets have a name, but some also have aliases. For example, US-ASCII is the official name, but we also support ASCII. This SimpleNamedServiceLoader only knows about the name, which I think will break aliases?

Does our SimpleNamedServiceLoader need to change so that it can somehow support aliases?

/**
* Given name, finds the BitsCharsetDefinition or null if not found
*/
def find(name: String): Option[BitsCharsetDefinition] = bitsCharsetDefinitionMap.get(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do any case insensitive stuff. I thought we supported lower or upper case charset names? But maybe I'm wrong? Should we, or should all encodings match exactly what is defined by the name function returned by the definition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have confirmed that we do currently support case insensitive encodings. I actually don't see in the code where we handle that, but creating schemas with different cases of the same encoding does work as expected. I'm not sure if we need to do anything special with these new changes or if will just work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I found how we handle case insensitivity:

https://github.com/apache/daffodil/blob/main/daffodil-lib/src/main/scala/org/apache/daffodil/cookers/Cookers.scala#L82=

The encoding property is always capitalized by our encoding cooker. So in order for a charset to be findable, the name() function must return an all capitalized string. Ours all currently do that I think, but we should probably document that in the BitsCharsetDefinition class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that as a comment

@tuxji
Copy link
Contributor

tuxji commented Apr 11, 2022

Hi @alexanderrevello, somehow you've merged two new commits from the main branch into your pull request which are now causing two conflicting files:

daffodil-cli/bin.LICENSE
daffodil-cli/bin.NOTICE

from these commits:

97992f6 Update copyright year to 2022
e9e3703 Refresh bin.NOTICE and improve bin.LICENSE

I did review your other actual (new) commits,

f1f1ac6 requested changes including new naming structure, making name not be passed where it isn't needed and comment fixes
1cc18db removed factory misdirection, BitsCharsetDefinitionRegistry.find returns

Aside from requesting that the charsets in daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.processors.charset.BitsCharsetDefinition be sorted alphabetically, I think your actual changes look good. If @stevedlawrence agrees, I propose that you make that additional change, rebase and squash your pull request so that these conflicting files go away, we let CI run its tests, and we take one last look before we approve and merge your pull request.

Thanks for seeing this PR through!

@alexanderrevello
Copy link
Contributor Author

yeah I will get rid of those extra commits

@alexanderrevello alexanderrevello force-pushed the daffodil-2663-pulggable-charsets branch from 1cc18db to e11c4aa Compare April 11, 2022 14:31
*/
abstract class BitsCharsetDefinition {

def name(): String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe what @stevedlawrence meant was that we should add a comment here saying that name() must return an all-upper-case string for the encoding to be findable by Daffodil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's correct, sorry for the confusion. Eventually users are going to start implementing their own custom BitsCharsetDefinition and looking at this class for documentation on what should be returned. Making it clear that name() must return all uppercase is important.

Another option might be modify SimpleNamedServiceLoader to have some kind of option to specifiy this behavior, so that it just always capitalizes whatever name returns. Then users don't have to really care. Might need changes to support aliases anyways (not sure how best to handle that).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DFDL encoding names are case insensitive. Users will be defining a DFDL schema and its associated new encodings together. It should not matter what case they use here or in the DFDL schema. We should standardize this to upper-case or whatever we want internally to make the lookups work efficiently, which I think means we upcase the name here, and the user name supplied to the dfdl:encoding property gets upcased before being used for lookup.

}

final class BitsCharsetIBM037AliasDefinition
extends BitsCharsetDefinition {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about how to reduce much of this boilerplate and also how to handle aliases and the upper case issue. I like the idea of have separate Definition classes for aliases, which I didn't notice when I previously mentioned aliases, but what if we make BitsCharsetDefinition something like this:

abstract class BitsCharsetDefinition(charset: BitsCharset, alias: Option[String] = None) {
  final def charset(): BitsCharset = charset
  final def name(): String = alias.getOrElse(charset.name).toUpperCase()
}

So the name is taken from the actual BitsCharset if there's no alias, but has a way to override that with an optional alias parameter.

Then the definitions become less verbose one-liners like this:

final class BitsCharsetIBM037Definition extends BitsCharsetDefinition(BitsCharsetIBM037)
final class BitsCharsetEBCDIC_CP_USDefinition extends BitsCharsetDefinition(BitsCharsetIBM037, Some("EBCDIC-CP-US"))

Note that this replaces the newInstance function with charset because a BitsCharsetDefintion now can't ever actually return a new instance--it always returns whatever instance was passed to the constructor. But this is probably good. We shouldn't make it possible to create more than one instance of BitsCharset per charset. There should only ever be one stateless thread-safe BitsCharset instance per charset. It is the decoders/encoders created from those that can have state, and Daffodil will create new instances of those as needed.

Also note that we can remove the alias member from BitsCharset, since those are all now handled with unique Definition classes.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more minor change to get rid of Exception throw method.


private def SDE(id: String, args: Any*): Nothing = {
throw new Exception(id.format(args: _*))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest get rid of SDE here. (remove the method) You don't have enough context to know whether something is a schema-definition-error or some other kind of error at this point. These are just charset utils, general purpose code.

You either need to define a EncodingUnavailableException, or you should just be returning null/None or such to indicate the lookup failed. I guess I think the latter is preferable.

There is a method to return a list of the available charsets so returning null is preferable, and the caller can create a sensible diagnostic message.

tci.schemaDefinitionError("Unsupported encoding: %s. Supported encodings: %s", encString, CharsetUtils.supportedEncodingsString)
val bcd = BitsCharsetDefinitionRegistry.find(encString).getOrElse(null)
if (bcd == null) {
tci.schemaDefinitionError("Unsupported encoding: %s. Supported encodings: %s", encString, BitsCharsetDefinitionRegistry.supportedEncodingsString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to be verbose here.

I think there are two errors. One the user didn't get the charset name right, so it isn't found.

The other is the user does have the name right, but the list of charsets doesn't contain the charset because they didn't get the classpath right.

The latter error you need to see that the charset you seek isn't in the list at all. So the verbose list is needed.

So I'm ok with this as is.

BitsCharsetIBM037
}
}
final class BitsCharsetIBM037EBCDIC_CP_USDefinition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thhoughts on naming this just BitsCharsetEBCDIC_CP_USDefintion? The fact that this is aliased to IBM037 doesn't really need to be in the name and makes things a bit less verbose.

cs
val cs = BitsCharsetDefinitionRegistry.find(name).getOrElse(null)
if (cs == null)
SDE("The encoding '%s' was not found. Available choices are: %s", name, BitsCharsetDefinitionRegistry.supportedEncodingsString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this SDE should be removed. This function should either return a charset or null. The callers will determine how to handle null.

# layering, but one should expect to have to adapt their code substantially.
#
org.apache.daffodil.processors.charset.CustomNonByteSizeCharsetCompiler
org.apache.daffodil.processors.charset.CustomJavaCharsetCompiler No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think this comment was resolved. This file should be in src/test/resources, right?


override def compute(state: ParseOrUnparseState) = {
val encString = encodingEv.evaluate(state)
val cs = CharsetUtils.getCharset(encString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment wasn't addressed.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Steve and Mike's suggestions. Once you make the changes they suggest, we can merge your pull request (which looks like a really nice improvement).

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍 looks great!

Please squash into a single commit, update the commit message if needed (include the bug number on the last line) and this should be good to merged.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@alexanderrevello alexanderrevello force-pushed the daffodil-2663-pulggable-charsets branch from dbb4eec to e6e2d58 Compare May 20, 2022 21:46
Refactored existing charsets to use the BitsCharsetDefinition and created support for it. To implement a charset follow the format and add the classpath in the file in resources/META-INF/services. This process has been shown in daffodil-test/.../charsets

DAFFODIL-2663
@stevedlawrence stevedlawrence merged commit c81b213 into apache:main May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments