Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Optimizing the creation of paths #61

Merged
merged 5 commits into from

2 participants

@ncreep
  • Merging the steps of traversal and selector application.
  • Code requires serious clean up.

And the performance:

[info] -- System Information --
[info] Heap: 1820MB
[info] Java: Sun Microsystems Inc. 1.6.0_25
[info] OS: Windows 7 6.1 amd64
[info]
[info] -- Memory Usage (7 MB) --
[info] anti-xml: 56605672
[info] scala.xml: 26514864
[info] javax.xml: 49023256
[info]
[info] -- Execution Time --
[info] Loading a 7 MB XML file
[info] + anti-xml: min: 451 ms, max: 515 ms, average: 469 ms
[info] + anti-xml StAX: min: 423 ms, max: 521 ms, average: 439 ms
[info] + scala.xml: min: 153 ms, max: 158 ms, average: 154 ms
[info] + javax.xml: min: 113 ms, max: 118 ms, average: 114 ms
[info] Shallow selection in a 7 MB tree
[info] + anti-xml: min: 9 ms, max: 41 ms, average: 12 ms
[info] + scala.xml: min: 33 ms, max: 49 ms, average: 37 ms
[info] Deep selection in a 7 MB tree
[info] + anti-xml: min: 7 ms, max: 13 ms, average: 8 ms
[info] + scala.xml: min: 447 ms, max: 488 ms, average: 459 ms
[info] + javax.xml: min: 13 ms, max: 18 ms, average: 14 ms

ncreep added some commits
@ncreep ncreep Doc fix. 23579c3
@ncreep ncreep Merge branch 'zipper-replacement' of git://github.com/djspiewak/anti-…
…xml into zipper-replacement
e345654
@ncreep ncreep Path creation relies on efficient hashing of Elems.
Added a comment about hashing in PathCreator and DeepZipper.
e735f2b
@ncreep ncreep Reimplemented stringToSelector in terms of an OptimizingSelector.
- Removed the ElemSelector class.
- This shows off the usage of OptimizingSelector and removes a case
match in PathCreator.
433c63f
@ncreep ncreep Optimizing the creation of paths.
- Merging the steps of traversal and selector application.
- Code requires serious clean up.
ec7bc79
@djspiewak djspiewak merged commit ec7bc79 into from
@djspiewak
Owner

Woah…

[info] Heap: 2039MB
[info] Java: Apple Inc. 1.6.0_26
[info] OS: Mac OS X 10.7.1 x86_64
[info] 
[info] -- Memory Usage (7 MB) --
[info] anti-xml:  56605672
[info] scala.xml: 26514864
[info] javax.xml: 49023256
[info] 
[info] -- Execution Time --
[info] Loading a 7 MB XML file
[info]  + anti-xml: min: 498 ms, max: 584 ms, average: 509 ms
[info]  + anti-xml StAX: min: 493 ms, max: 613 ms, average: 525 ms
[info]  + scala.xml: min: 230 ms, max: 279 ms, average: 241 ms
[info]  + javax.xml: min: 124 ms, max: 134 ms, average: 125 ms
[info] Shallow selection in a 7 MB tree
[info]  + anti-xml: min: 8 ms, max: 13 ms, average: 9 ms
[info]  + scala.xml: min: 22 ms, max: 26 ms, average: 23 ms
[info] Deep selection in a 7 MB tree
[info]  + anti-xml: min: 6 ms, max: 8 ms, average: 6 ms
[info]  + scala.xml: min: 357 ms, max: 381 ms, average: 369 ms
[info]  + javax.xml: min: 16 ms, max: 19 ms, average: 16 ms

Is this with or without using the bloom filter optimization all the way down? Either way, these are ridiculously impressive numbers! Merged.

@ncreep

With bloom filters all the way down.

Strange how the benchmarks are not consistent, both with different sizes of the XML files and with different machines.
Your machine is faster on selection, but mine is faster loading.

Is there a reason for this kind of behavior?

@djspiewak
Owner

I'm guessing it's because the storage layer on my machine is an SSD, which is substantially faster than a hard drive on random access, but substantially slower than a hard drive for sequential reads (which is a large factor in the load times).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 24, 2011
  1. @ncreep

    Doc fix.

    ncreep authored
Commits on Sep 26, 2011
  1. @ncreep

    Merge branch 'zipper-replacement' of git://github.com/djspiewak/anti-…

    ncreep authored
    …xml into zipper-replacement
  2. @ncreep

    Path creation relies on efficient hashing of Elems.

    ncreep authored
    Added a comment about hashing in PathCreator and DeepZipper.
  3. @ncreep

    Reimplemented stringToSelector in terms of an OptimizingSelector.

    ncreep authored
    - Removed the ElemSelector class.
    - This shows off the usage of OptimizingSelector and removes a case
    match in PathCreator.
  4. @ncreep

    Optimizing the creation of paths.

    ncreep authored
    - Merging the steps of traversal and selector application.
    - Code requires serious clean up.
This page is out of date. Refresh to see the latest.
View
1  src/main/scala/com/codecommit/antixml/DeepZipper.scala
@@ -300,6 +300,7 @@ trait DeepZipper[+A <: Node] extends Group[A] with IndexedSeqLike[A, DeepZipper[
* of contexts at depth -1 from the original.
*/
private def mergeDepth(singleDepthContexts: Seq[FullContext], singleDepthTransforms: Map[FullLoc, NodeTransform]) = {
+ // the following can only be used if [[Elem]] has efficient hashing
val contexts = singleDepthContexts.groupBy(_.parentsList) withDefaultValue Seq()
val transforms = singleDepthTransforms.groupBy(_._1.parentsList) withDefaultValue Map()
View
124 src/main/scala/com/codecommit/antixml/PathCreator.scala
@@ -12,95 +12,102 @@ private[antixml] object PathCreator {
/** A function that creates paths on group, to be used when constructing zippers. */
type PathFunction[+A] = Group[Node] => PathVals[A]
- /*
- * First applying the paths using the overloads without the selector,
- * then applying the selector.
- * This way the traversal is not modified during selection.
- */
-
/** A path function that selects on nodes in the given group. */
def fromNodes[A](selector: Selector[A])(nodes: Group[Node]): PathVals[A] = {
- applySelector(selector)(fromNodesWithParent(Nil, nodes))
+ collectGroup(nodes, selector, Nil)
}
-
+
/** A path function that selects on the given nodes and recursively on the children (breadth first). */
def all[A](selector: Selector[A])(nodes: Group[Node]): PathVals[A] = {
- fromNodes(selector)(nodes) ++ allChildren(selector)(nodes)
+ collectGroupRecursive(List((nodes, Nil)), selector)
}
/** A path function that selects on the children of the given group. */
def directChildren[A](selector: Selector[A])(nodes: Group[Node]): PathVals[A] = {
- if (dispatchSelector(selector, nodes)) applySelector(selector)(directChildren(nodes))
- else Nil // nothing to collect
+ collectChildrenOfGroup(nodes, selector)
}
/** A path function that selects on the recursively on all the children of the given group (breadth first). */
def allChildren[A](selector: Selector[A])(nodes: Group[Node]): PathVals[A] = {
- applySelector(selector)(allChildren(nodes))
- }
-
- /** Lifting the selector so that it can operate on path entries. */
- private def liftSelector[A](s: Selector[A]): PartialFunction[(WithLoc[Node], ParentsList), (WithLoc[A], ParentsList)] = {
- case (WithLoc(n, i), p) if s.isDefinedAt(n) => (WithLoc(s(n), i), p)
+ collectGroupRecursive(collectGroupChildren(nodes, Nil, selector), selector)
}
- /** Applies the selector to the given path. */
- private def applySelector[A](s: Selector[A])(path: PathVals[Node]): PathVals[A] = {
- path.collect(liftSelector(s))
+ /** Collects items from the given group that match the selector. */
+ private def collectGroup[A](nodes: Group[Node], s: Selector[A], p: ParentsList): PathVals[A] = {
+ dispatchSelector(s, nodes) {
+ val ni = nodes.zipWithIndex
+ for ((n, i) <- ni if s isDefinedAt n) yield (WithLoc(s(n), i), p)
+ }
}
- /** Converting a group of nodes to the corresponding node locations. */
- private def nodesToLocs[A <: Node](nodes: Group[Node]) = {
- nodes.zipWithIndex.map(Function.tupled(WithLoc[Node]))
+ /** Collects items from the list groups that match the selector. */
+ private def collectGroups[A](groups: Seq[(Group[Node], ParentsList)], s: Selector[A]): PathVals[A] = {
+ groups flatMap {gp =>
+ val (g, p) = gp
+ collectGroup(g, s, p)
+ }
}
- /** Creating a path from this group of nodes. */
- private def fromNodesWithParent(p: ParentsList, n: Group[Node]) = {
- nodesToLocs(n) map ((_, p))
+ /** Applies the group selector collection function on the children of the given group. */
+ private def collectChildrenOfGroupWith[A]
+ (nodes: Group[Node], s: Selector[A], p: ParentsList)
+ (toVals: (Group[Node], Selector[A], ParentsList) => PathVals[A]): PathVals[A] = {
+ dispatchSelector(s, nodes) {
+ val ni = nodes.zipWithIndex
+ ni flatMap {
+ case (e: Elem, i) => toVals(e.children, s, ParentLoc(e, i) :: p)
+ case _ => Nil
+ }
+ }
}
- private def directChildren(nodes: Group[Node]): PathVals[Node] = collectChild(nodes, Nil)
-
- private def allChildren(nodes: Group[Node]): PathVals[Node] = {
- allChildren(directChildren(nodes))
+ /** Collects items from the children of the given group that match the selector. */
+ private def collectChildrenOfGroup[A](nodes: Group[Node], s: Selector[A]): PathVals[A] = {
+ collectChildrenOfGroupWith(nodes, s, Nil) (collectGroup _)
}
- /** Recursively taking all the children of a given path. */
- private def allChildren(p: PathVals[Node]): PathVals[Node] = {
- if (p.isEmpty) Nil
+ /** Recursively collects items from the given group that match the selector. */
+ private def collectGroupRecursive[A](groups: Seq[(Group[Node], ParentsList)], s: Selector[A]): PathVals[A] = {
+ if (groups.isEmpty) Nil
else {
- val children =
- p.flatMap{ nlp =>
- val (WithLoc(n, l), p) = nlp
- collectChild(n, l, p)
- }
- p ++ allChildren(children)
+ val allChildren =
+ groups flatMap { gp =>
+ val (g, p) = gp
+ collectGroupChildren(g, p, s)
+ }
+ collectGroups(groups, s) ++ collectGroupRecursive(allChildren, s)
}
}
- /** Collecting the children of a single node. */
- private def collectChild(n: Node, l: Location, p: ParentsList): PathVals[Node] = {
- n match {
- case e: Elem => fromNodesWithParent(ParentLoc(e, l) :: p, e.children)
- case _ => Nil
+ /** Gathering all the children of the group that may match the selector. */
+ private def collectGroupChildren(g: Group[Node], p: ParentsList, s: Selector[_]): Seq[(Group[Node], ParentsList)] = {
+ dispatchSelector[Seq[(Group[Node], ParentsList)]](s, g)(Nil) {
+ val gi = g.zipWithIndex
+ gi flatMap {
+ case (e: Elem, i) => Some((e.children, ParentLoc(e, i) :: p))
+ case _ => None
+ }
}
}
- /** Collecting the children of the given nodes. */
- private def collectChild(n: Group[Node], p: ParentsList): PathVals[Node] = {
- n.zipWithIndex flatMap { nl =>
- val (n, l) = nl
- collectChild(n, l, p)
- }
+ /** If dispatching on the selector yields true, executing the given code block, otherwise returning
+ * the default value.*/
+ private def dispatchSelector[A](s: Selector[_], g: Group[Node])(default: A)(vals: => A): A = {
+ if (dispatchSelector(s, g)) vals
+ else default
+ }
+
+ /** If dispatching on the selector yields true, executing the given code block, otherwise returning
+ * an empty list.
+ */
+ private def dispatchSelector[A](s: Selector[A], g: Group[Node])(vals: => PathVals[A]): PathVals[A] = {
+ dispatchSelector[PathVals[A]](s, g)(Nil)(vals)
}
-
-
/** Returns true if there is a chance that applying the given selector on the group
* would yield some results. */
private def dispatchSelector(s: Selector[_], g: Group[Node]) = {
s match {
- case e: ElemSelector => g matches e.elementName
case opt: OptimizingSelector[_] => opt.canMatchIn(g)
case _ => true // no info about the selector, should proceed
}
@@ -116,14 +123,7 @@ private[antixml] object PathCreator {
/** The location contexts and the corresponding contents. */
val (locs, contents) = contexts.unzip
- require(removeParents(locs).toSet.size == locs.size, "Cannot have duplicate locations in path") // enforcing no duplicates policy
-
- /** A wrapper for location which omits [[Elem]] data from the parents list.
- * Hashing on this is faster than of on [[LocationContext]]. */
- private case class LocContextNoParents(loc: Location, parents: Seq[Location])
- private def removeParents(locs: Seq[LocationContext]) = {
- locs map (l => LocContextNoParents(l.loc, l.parentsList.map(_.loc)))
- }
-
+ // this can only be used if [[Elem]] has efficient hashing
+ require((locs).toSet.size == locs.size, "Cannot have duplicate locations in path") // enforcing no duplicates policy
}
}
View
23 src/main/scala/com/codecommit/antixml/Selector.scala
@@ -42,17 +42,6 @@ trait OptimizingSelector[+A] extends Selector[A] {
def canMatchIn(group: Group[Node]): Boolean
}
-/** A selector that selects an element by name. */
-private[antixml] class ElemSelector(val elementName: String) extends Selector[Elem] {
- // not using a case class to allow inheritance
- private val pf: PartialFunction[Node, Elem] = {
- case e @ Elem(_, `elementName`, _, _, _) => e
- }
-
- def apply(node: Node) = pf(node)
- def isDefinedAt(node: Node) = pf isDefinedAt node
-}
-
object Selector {
/**
@@ -60,7 +49,17 @@ object Selector {
* which can then be passed to the appropriate methods on [[com.codecommit.antixml.Group]].
* For example: `ns \ "name"`
*/
- implicit def stringToSelector(name: String): Selector[Elem] = new ElemSelector(name)
+ implicit def stringToSelector(name: String): Selector[Elem] =
+ new OptimizingSelector[Elem] {
+ private val pf: PartialFunction[Node, Elem] = {
+ case e @ Elem(_, `name`, _, _, _) => e
+ }
+
+ def apply(node: Node) = pf(node)
+ def isDefinedAt(node: Node) = pf isDefinedAt node
+ def canMatchIn(group: Group[Node]) = group.matches(name)
+ }
+
/**
* Implicitly lifts a [[scala.Symbol]] into an instance of [[com.codecommit.antixml.Selector]]
Something went wrong with that request. Please try again.