Skip to content

Review and Update Usage of .toSeq#1511

Merged
olabusayoT merged 1 commit intoapache:mainfrom
olabusayoT:daf-2972-review-use-toSeq
Aug 14, 2025
Merged

Review and Update Usage of .toSeq#1511
olabusayoT merged 1 commit intoapache:mainfrom
olabusayoT:daf-2972-review-use-toSeq

Conversation

@olabusayoT
Copy link
Contributor

  • we make use of .toSeq liberally within the code which can lead to unnecessary copies, so we review the code and update where possible to avoid .toSeq and replace with a more performant option
  • ArraySeq.unsafeWrapArray wraps an array in a ArraySeq is a performant way to get a seq from an array
  • use Arrays, Iterables or original collection in certain places instead of converting to Seq

DAFFODIL-2972

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, though I think many of these changes probably make readability worse without much gain. I would recomend with only remove .toSeq in performance critical sections. I think most of these are not critical so readability is more important and using .toSeq is probably fine.

val jarFiles = Paths.get("daffodil-cli/target/universal/stage/lib").toFile.listFiles().toSeq
val jarFiles = ArraySeq.unsafeWrapArray(
Paths.get("daffodil-cli/target/universal/stage/lib").toFile.listFiles()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's fine to use .toSeq in test code where performance isn't critical. Readability is probably more important in cases like this, especially if it avoids an unsafe wrap.

@@ -124,7 +125,7 @@ class DaffodilCCodeGenerator(root: Root) extends api.CodeGenerator {
val cgState = new CodeGeneratorState(root)
DaffodilCCodeGenerator.generateCode(root.document, cgState)
diagnostics = new java.util.LinkedList[api.Diagnostic](diagnostics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in another PR, but I'm not sure LinkedList the best choice for diagnostics due to it's mutability. I think maybe the best approach might be to use Seq internally for diagnostics (which gives us constant prepend and immutability), and only when API users call getDiagnostics do we call .asJava to wrap it with a more Java-friendly java.util.List.

Then this code would avoid asJava completely and just do a normal Seq prepend.

val command = commands.find(inPath)
if (command.isDefined)
command.get.split(' ').toSeq
ArraySeq.unsafeWrapArray(command.get.split(' '))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toSeq is probably preferred here. There's is not performance critical.

optAttr.map { _.trim.split("\\s+").toSeq }.getOrElse { Seq.empty }
optAttr.map { w => ArraySeq.unsafeWrapArray(w.trim.split("\\s+")) }.getOrElse {
Seq.empty
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous is probably preferred, not performance critical.

)
None
}
val instancesFound: Map[String, mutable.Buffer[T]] = instanceBuf.groupBy { _.name() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .toSeq is fine here. This isn't performance critical. It's also kindof nice since it switches to immutable data structure once we have found all the instances.

Seq()
Array.empty[Class[_]]
} else {
functionClasses.toSeq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.toSeq is probably fine here. Not performance critical and more clear.

defaultValidationDefault: String = defaultValidationDefaultDefault,
defaultImplementationsDefault: Seq[String] =
ArraySeq.unsafeWrapArray(defaultImplementationsDefaultDefault)
defaultImplementationsDefault: Array[String] = defaultImplementationsDefaultDefault
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't performance critical, I'd suggest we change this back to a Seq[String], and change defaultImmplementationsDefaultDefault to do something like:

def defaultImplementationsDefaultDefault = TDMLImplementation.values.map(_.toString).toSeq

The .toSeq is not a big deal in this case and it avoids the unsafewraparray.

if (str == "") defaultImplementationsDefault
else {
// parse the str to get a list of strings
str.split("""\s+""").toSeq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.toSeq is fine here for readability.

val dpParseDiag = new java.util.ArrayList[String]
actual.getDiagnostics.forEach(d => dpParseDiag.add(d.toString))
val saxParseDiag = new java.util.ArrayList[String]
errorHandler.getDiagnostics.forEach(d => saxParseDiag.add(d.toString))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous code is fine and more readable here. This code is only used for regression testing when DAFFODIL_TDML_API_INFOSETS is "all". When that is enabled, we're going to parse with all infosets which will have a lots of extra overhead, so this little bit of .toSeq doesn't matter. Readability is more important here.

seqDiagExpected: Seq[String],
seqDiagActual: Seq[String]
expected: java.util.List[String],
actual: java.util.List[String]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we stick with Seq. More readable, and this is already going to be slow so not a big deal.

@olabusayoT olabusayoT requested a review from mbeckerle August 6, 2025 16:17
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I have nothing to add over Steve's comments. It does seem unfortunate that Scala code ends up having to use obscure bases like Iterable rather than just being concrete with Seq.

@olabusayoT olabusayoT force-pushed the daf-2972-review-use-toSeq branch from 7c46afa to c3f8fe5 Compare August 14, 2025 18:28
- we make use of .toSeq liberally within the code which can lead to unnecessary copies, so we review the code and update where possible to avoid .toSeq and replace with a more performant option
- in none performance-critical areas (eg tests), prioritize readability
- update old code to use .count instead of the more expensive .filter.sizey
- use Iterables or original collection in certain places instead of converting to Seq

DAFFODIL-2972
@olabusayoT olabusayoT force-pushed the daf-2972-review-use-toSeq branch from c3f8fe5 to 00ccac2 Compare August 14, 2025 19:01
@olabusayoT olabusayoT merged commit 1c468b3 into apache:main Aug 14, 2025
12 checks passed
@olabusayoT olabusayoT deleted the daf-2972-review-use-toSeq branch August 14, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants