Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds subrange indexer for XMLElement #174

Merged
merged 5 commits into from
Feb 4, 2018
Merged

Adds subrange indexer for XMLElement #174

merged 5 commits into from
Feb 4, 2018

Conversation

ChristianSteffens
Copy link
Contributor

First draft for a subrange XMLIndexer based on an XMLElement - see #170

@drmohundro
Copy link
Owner

Thanks! I like your means of having a builder method to create the XMLIndexer - seems like a clean way to handle this. I'll take a look more closely later.

In the meantime, could you look into getting a unit test added?

@drmohundro
Copy link
Owner

I'm just hypothesizing at the moment... trying to consider the best and most flexible design for this. Given that, take all of this with a grain of salt..

I'm hesitant about the filtering being strictly index based, because I could see a filter working based on child elements as well. That leads me to think about possibly passing in a callback function... something like this based on your test:

let element: XMLElement = xml!["root"]["catalog"]["book"][1].element!
let subIndexer: XMLIndexer = try! element.filteredIndexer { elem in
    let subrangeChildren: [XMLContent] = Array(elem.xmlChildren[1...3])
    let elem = XMLElement(name: name, index: index, options: options, children: subrangeChildren, allAttributes: allAttributes)
    return XMLIndexer.element(elem)
}

In this case, filteredIndexer would have a signature like:

public func filteredIndexer(_ filter: (XMLElement) -> XMLIndexer) -> XMLIndexer { }

That means that the filtering is customizable from the caller versus either 1) forcing it to be index-based or 2) having multiple overloaded methods to handle filtering.

Does that seem at all reasonable to you? Or am I just overcomplicating things? 😅

@ChristianSteffens
Copy link
Contributor Author

I'm not sure I follow everything you proposed. Generally I'm open to a more 'generic' approach, for now there're only two options for filtering the new indexer:

  1. The index(es) itself
  2. filter by xmlChildren or not

You're proposing now, to avoid creating specific functions / arguments for these and (!) upcoming options, but instead using a filter specified by a callback.

Okay, seem reasonable, even though it's not necessary for our usecase I see the advantage in the long run. In the end it would then makes sense to somehow add this method to the XMLIndexer itself and then rename it to filter.

Then we should mimic the behaviour from Swift's own filter's on CollectionTypes, this would keep it consistent (like with the userInfo #171).

I'm willing to take a look at that - if I understood everything correctly ;-)

@drmohundro
Copy link
Owner

Yes, you're correct - following something like the filter on CollectionTypes approach sounds like what I was thinking.

I'm happy to help throw some code or gists up, too, if that would help!

@ChristianSteffens
Copy link
Contributor Author

Okay, I created now two filter methods. I tried including the filter by indexes in one and the same filter method. Even though this is possible it's not really useful.

Imagine our specific usecase: We like to create a (Sub)XMLIndexer based on all XMLElements (filter 1) and a specific range of indexes (filter 2). Doing this in one call is technical possible, but then the user would have to know how the indexes are going to change due to the applying of filter 1. I tried messing / updating the indexes internally, but that makes things quite complicated.

@drmohundro
Copy link
Owner

Sorry for the long delay on responses - I've been traveling the past few days. Let me see if I can play around with the tests in your PR to see if I can come up with any other possible filter implementations that might be more flexible. If not, no big deal. And again, thanks for your suggestions and patience! :)

drmohundro added a commit that referenced this pull request Feb 3, 2018
@drmohundro
Copy link
Owner

@LordNali - I just created a branch to try to experiment with this a little bit... see c3406cf and https://github.com/drmohundro/SWXMLHash/compare/subindexer-suggestions. The main things I did were:

  • moved the filtering to XMLIndexer
  • passed along the index from the filter method

I left your tests mostly as-is aside from moving the call to the indexer. You can also see an additional test I added as I was thinking that the filter calls could possibly be chained.

Does it 1) make sense and 2) cover your specific use case, too?

@ChristianSteffens
Copy link
Contributor Author

Does it 1) make sense and 2) cover your specific use case, too?

Yes and yes - your proposal is indeed much more cleaner and powerful, so I would certainly go for it.

@codecov-io
Copy link

Codecov Report

Merging #174 into master will increase coverage by 0.09%.
The diff coverage is 28.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #174      +/-   ##
==========================================
+ Coverage   27.13%   27.23%   +0.09%     
==========================================
  Files          13       13              
  Lines        2034     2122      +88     
==========================================
+ Hits          552      578      +26     
- Misses       1482     1544      +62
Impacted Files Coverage Δ
Tests/SWXMLHashTests/XMLParsingTests.swift 0% <0%> (ø) ⬆️
Tests/SWXMLHashTests/LazyXMLParsingTests.swift 0% <0%> (ø) ⬆️
Source/SWXMLHash.swift 76.92% <76.47%> (-0.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 991ff20...9d4f07d. Read the comment docs.

@drmohundro drmohundro merged commit 0b18f5b into drmohundro:master Feb 4, 2018
@drmohundro
Copy link
Owner

I just pushed 4.4.0 to track this. Also, I pushed my commits over to your branch - keeps all of the changes in this pull request. Thanks again for the great suggestion and the PR!

@ChristianSteffens
Copy link
Contributor Author

Hmm, I just tried using the new version / filter. The filter seem to work just fine (I printed the new indexers all property and I see all the expected elements). But calling the value() on this new indexer to extract our custom object fails.

It seems that our custom XMLIndexerDeserializable method static func deserialize(_ indexer: XMLIndexer) throws -> OurClass doesn't get called at all - could it be possible that the new indexer can't use the value() method like the original indexer?

@drmohundro
Copy link
Owner

Hmm... that is strange. I'll have to check that and see if I can figure out what is going on. I'll see if I can get some test cases in place to reproduce the behavior.

drmohundro added a commit that referenced this pull request Feb 6, 2018
See discussion in #174. The issue was related to whether or not filter
should return a list or an element... that should depend on if a single
element was filtered to or not.
@drmohundro
Copy link
Owner

drmohundro commented Feb 6, 2018

@LordNali I was able to reproduce it. I've got a fix in place with a test... see #177. I've pushed out 4.4.1 to track this. Let me know if you still see issues after this version.

(as an aside, the issue was related to how the filter method works. Because XMLIndexer is an enum and can model lists or single elements, it has to contextually determine whether or not to return a list or an element... and the code was always returning a list when it in this one case)

@ChristianSteffens
Copy link
Contributor Author

I just tried to new version, but still the same issue. We're filtering only by index not by specific elements) and then using value() on it. But the actual value implementation is never called (I used a breakpoint). I saw your test with BasicItem - that test works without any problems, weird.

@drmohundro
Copy link
Owner

Do you think you could get a repro set up that I could look at? Or a failing test? If we can only reproduce it given your XML and the data is sensitive, then if you'd be able to shoot me a zip of a repro instead? I think we're close here, but I'm just not getting the exact set up locally I guess.

@ChristianSteffens
Copy link
Contributor Author

ChristianSteffens commented Feb 7, 2018

Okay, I narrowed it a bit down - this is the method that fails:

func value<T: XMLIndexerDeserializable>() throws -> T {
    switch self {
    case .element:
        return try T.deserialize(self)
    case .stream(let opStream):
        return try opStream.findElements().value()
    default:
        throw XMLDeserializationError.nodeIsInvalid(node: self)
    }
}

And the reason is the provided XMLIndexer which is in fact an List and not an element (or stream). So we've got an error nodeIsInvalid. Did I miss something, because using the element of the new (sub) indexer wan't help me here, since I'd like to build a value based on the subIndexerer's children / all property.

Maybe I understood the behaviour of the deserializer wrong, but then again, with the previous subIndexer implementation our code did work without any problems.

drmohundro added a commit that referenced this pull request Feb 8, 2018
In #174, there appears to still be some cases where filter doesn't seem
to work as expected. Hopefully these tests will help iron out some other
possibilities.
@drmohundro
Copy link
Owner

Well, it seems odd to me that there would ever be a case where the XMLAttributeDeserializable implementation would hit, as you can't really have attributes within the context of a list of elements. If there is a way to hit this code with a list (and it seems your code is doing that), then I'm missing something!

I just added 5460d01 to cover a few more cases... it has some print statements that show that the same method you referenced is getting hit, but those tests seem to pass for me.

Do you see anything, particularly in the deserialization of BasicItem (particularly here), that doesn't look like your usage?

I feel like we're getting closer.

drmohundro added a commit that referenced this pull request Feb 8, 2018
In #174, there appears to still be some cases where filter doesn't seem
to work as expected. Hopefully these tests will help iron out some other
possibilities.
@ChristianSteffens
Copy link
Contributor Author

Oh that was my fault - I updated the method that fails. It's indeed XMLIndexerDeserializable.

But still it fails due to the fact, that it's called by a List instead of XMLElements (or Stream). Is this the expected behaviour, List instead of Element?

I integrated SWXMLHash now from source instead as framework, you make debugging much easier (for now).

@drmohundro
Copy link
Owner

It depends on which path it falls into based on what the target deserialization type is. So, for example, if you're deserializing into an array of objects (i.e. [T]), then it should fall into a deserialization method that does handle the .list case. See https://github.com/drmohundro/SWXMLHash/blob/master/Source/SWXMLHash%2BTypeConversion.swift#L362-L373 as an example.

When your call to .value() fails, what is the left side of the call? Is it something like this?

let foo: [Stuff] = xml["stuff"].filter { }.value()

Meaning, it should deserialize to a list of Stuff instead of a single instance?

@ChristianSteffens
Copy link
Contributor Author

We're trying to read one (1) single object - so the call should be redirected to the value method value<T: XMLIndexerDeserializable>() throws -> T. Which it does - since we've got an error in exactly that method. So the left side of our call is OurClass, and not [OurClass].

The problem is the fact, that the new indexer from the new filter method doesn't return a single XMLElement, but a List. Therefore this call to the mentioned value method obviously fails. So the issue isn't the value method - but the return value of the filter method, which is a list in this case.

I understand or at least find in somehow logic, that the filter returns a list, since we're specifying the valid indexes and expect a list of elements to be returned. But this certainly makes it impossible to use our existing deserialization method. The thing is: With "my" previous filter implementation the valuemethod did work without any problems, so in this case there was no list returned, but a new indexer based on the existing one - so the overall structure is the same only some children have been filtered out. This is different to the current filter implementation I guess?

Is there now way to imitate that behaviour in the new filter method?

@drmohundro
Copy link
Owner

Just a quick note to explain why I'm a little hesitant to add additional functionality to XMLElement - I've spent a lot of time on support related questions because of confusion around multiple extension points (whether around XMLIndexer, XMLElement, etc.). As a result, I'm considering in a future version moving the public methods all to XMLIndexer.

So, the new filter method returns a list if more than one element is found - it checks the count of the elements currently indexed and, if it is 1, it returns XMLIndexer.element; otherwise it returns XMLIndexer.list (or at least that's the intent).

This code below works for me:

<root>
  <arrayOfGoodBasicItems>
    <basicItem id="1234a">
      <name>item 1</name>
      <price>1</price>
    </basicItem>
    <basicItem id="1234b">
      <name>item 2</name>
      <price>2</price>
    </basicItem>
    <basicItem id="1234c">
      <name>item 3</name>
      <price>3</price>
    </basicItem>
  </arrayOfGoodBasicItems>
</root>
let subParser = parser!["root"]["arrayOfGoodBasicItems"].filter { _, idx in idx == 0 }
let value: BasicItem = try subParser.value()

That said, if I change the filter method to look for more than one index (e.g. idx >= 0), then it will fail because it matches multiple elements.

Here's a question... are you expecting the filter method to be operating on the child elements or on what is currently indexed? Today, filter is looking at children elements versus all. Is that what might be happening?

For example:

parser!["root"]["arrayOfGoodBasicItems"].filter { _, idx in idx == 0 }

vs.

parser!["root"]["arrayOfGoodBasicItems"]["basicItems"].filter { _, idx in idx == 0 }

Which one makes more sense to you?

@ChristianSteffens
Copy link
Contributor Author

Okay, that's a detailed explanation and helps at lot. Our issue is certainly the actual style of our XML (I mentioned it in the beginning and unfortunately we can't change the XML input).

Our XML look likes this - to keep working with your BasicItem Example:

   <root>
      <...>
      <basicItem id="1234a">
      <name>item 1</name>
      <price>1</price>
      <...>
   </root>

The problem is the fact, that the BasicItem properties are on the same level like the BasicItems root element. So these properties are not child elements of the key element. So to be fair, the filter behaves perfect by creating a List instead of an Element with several children.

The actual question is how to handle this? Is this a task for the filter method or should the be some kind of external transformation on the XML? In the end there're two tasks to be accomplished for our usecase:

  1. Filter the Elements
  2. Change the hierarchy of the Elements

Some might argue this is cleary a problem of the input XML and I would certainly agree to that, but like I said - we can't change the style / hierarchy of the XML.

@drmohundro
Copy link
Owner

I totally hear you - blaming the XML is valid and can't be changed... if we could blame the XML and have it fixed, we wouldn't still be dealing with SOAP 😄

So, I was thinking about this and I think the solution is to have two filter methods:

  • filterAll (to correspond to the existing all method)
    • it would do what I think you're actually wanting to do
  • filterChildren (to correspond to the existing children method)
    • it would act like the current filter does

Given the above implementations, I think you could solve it like this:

// NOTE: using filterAll instead of filterChildren
let subParser = xml["root"].filterAll { _, idx in idx >= 1 && idx <= 3 }
let basicItem: BasicItem = subParser["basicItem"].value()
// and so on...

I'll see if I can get this in a branch. if this works, I'll just deprecate the existing filter method and point it to filterChildren instead.

@drmohundro
Copy link
Owner

I just opened #178 that will hopefully resolve this. Please take a look at it... I'll hold off on merging until you get a chance to check it out. In hindsight, my original implementation was definitely off - it wasn't even consistent in whether it filtered against all or children... it depended entirely on if the filter was against a list or an element. Not the most intuitive or easy to follow! Hopefully this makes more sense.

@ChristianSteffens
Copy link
Contributor Author

I’ll check this out next week. Thanks a lot for the new PR.

@ChristianSteffens
Copy link
Contributor Author

Okay, so I tried #178 - but IMHO this doesn't address our actual issue. I'm using now the new method XMLIndexer.filterChildren() to filter the current XMLIndexer children by their index. The result is a list of XMLElements - which seems fine at first (and the actual entries of the list are the one we're expecting).

But then we're trying to call func value<T: XMLIndexerDeserializable>() throws -> T on this new XMLIndexer. This then obviously fails, because T is here not a single XMLElement, but a list of XMLElements.

I'm not 100 % sure we're using the provided methods in the right way. We ned a way of filtering the all / children property of an XMLIndexer so that the XMLIndexer is still of the same type (here XMLElement), but has different all / children properties. Currently I'm not seeing a way of doing this just using the provided filter methods?

@drmohundro
Copy link
Owner

I'm going to describe to you what I'm understanding and you tell me if I'm stating it correctly.

<root>
  <...>
  <basicItem id="1234a">
  <name>item 1</name>
  <price>1</price>
  <...>
</root>

You want to deserialize the root element, but only with the child elements of "basicItem", "name" and "price". So something like:

// stay on "root", don't "index" down
let result = xml["root"].filterChildren({ _, idx in idx >= 1 && idx <= 3 }).value()
// result above would only have <root><basicItem/><name/><price/></root>?
// ... versus what it does today which is having <basicItem/><name/></price>?

So, given the above, this would STILL remain on the "root" element, though? It wouldn't index down to be siblings with "basicItem", "name" and "price"? Am I following?

@ChristianSteffens
Copy link
Contributor Author

I think it's a bit more complicated then that:

   <root>
      <...>
      <root property 1>
      <root property 2>
      <basicItem id="1234a">
      <name>item 1</name>
      <price>1</price>
      <root property 3>
      <...>
   </root>

Our XML contains here the actual root element and inside the root element (a list, but for this example only one) additional element, this element belongs like root property 1 - 3 to root, but is not of primitive type (e.g. Int or String), but of a custom type (here Class) for which we wrote an own deserializer.

What the filter then should in the end archive is the creation of a indexer that looks like this:

   <root>
      <basicItem id="1234a">
      <name>item 1</name>
      <price>1</price>
   </root>

So the same XMLIndexer like for the root element, but without the primitive root properties like.

@drmohundro
Copy link
Owner

Okay, here's how it works now. Instead of filterChildren moving the indexing down, it now stays at the same level.

Given the latest commit in #178, I put the following in a Swift playground to play with:

let xmlToParse = """
<root>
<some-weird-element />
<another-weird-element />
<prop1 name="prop1" />
<prop2 name="prop2" />
<basicItem id="1234a">
<name>item 1</name>
<price>1</price>
</basicItem>
<prop3 name="prop3" />
<last-weird-element />
</root>
"""

let parser = SWXMLHash.parse(xmlToParse)

let subParser = parser["root"].filterChildren { _, index in index >= 2 && index <= 5 }

print("non-filtered")
print(parser)

print("\nfiltered")
print(subParser)

It outputs this:

non-filtered
<root>
<some-weird-element></some-weird-element>
<another-weird-element></another-weird-element>
<prop1 name="prop1"></prop1>
<prop2 name="prop2"></prop2>
<basicItem id="1234a">
<name>item 1</name>
<price>1</price>
</basicItem>
<prop3 name="prop3"></prop3>
<last-weird-element></last-weird-element>
</root>

filtered
<root><prop1 name="prop1"></prop1><prop2 name="prop2"></prop2><basicItem id="1234a">
<name>item 1</name>
<price>1</price>
</basicItem><prop3 name="prop3"></prop3></root>

I think this more closely aligns with what you're looking for. The only caveat I can think of is that this excludes any non-XML content, mainly like whitespace. That's why the filtered output isn't lined up the same.

@ChristianSteffens
Copy link
Contributor Author

ChristianSteffens commented Feb 21, 2018

The only caveat I can think of is that this excludes any non-XML content, mainly like whitespace.

Could this be an issue - in the future? I don't want you to introduce something only for our usecase, that mess things up.

I just tested the new commit, this time it works perfectly for our XML.

@drmohundro
Copy link
Owner

Nah, I'm not too worried about the whitespace issue... the only case that I'm aware of that people are explicitly interested in it is related to displaying XML, usually in terms of attempts to parse HTML.

Also, thanks for the feedback and working with me through this - I'll see if I can get this version pushed out soon!

@drmohundro
Copy link
Owner

4.5.0 is now released to track this! Thanks again for working with me through this to get the right code out 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants