New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] [new package] Add std.experimental.xml package #4741

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
@lodo1995
Contributor

lodo1995 commented Aug 22, 2016

Here is the result of these months of work.
I want to thank my mentor Robert @burner Schadek for his great help, and everybody who already gave their feedback.

This is an (almost exact) copy of my repository which is also available as a dub package.
The documentation is available here.

I would really love if feedback could focus on design considerations first, then naming issues (I'm not that good at naming things) and then small nitpicks.

I will try to keep low pressure on the testing environment by performing all further development based on your feedback on my repository (see above) and pushing here no more than twice a day.

Things still to do (work in progress):

  • better DOM: the W3C specification is huge; by the way almost all functionality is there; only some small things are missing;
  • better docs: should check if the docs are pretty with the Phobos styling, and should add docs for some functions;
  • advanced DTD handling: this functionality is being worked on in my repo; can be added to this package in a second iteration
  • legacy API: in my repo I have a crude wrapper that exposes the old std.xml, to ease transition; should it be included? probably not a good idea
  • check that integration is complete (makefiles, indexes and so on)
  • more unittests (especially for domimpl.d)

Wishlist:

  • Issue 16410: attribute inference inside templated classes (not a blocker, but would make @nogc DOM less impossible)
  • std.container.AA with custom allocators; again, not a blocker, but would help @nogc
  • an Appender with custom allocators, for the very same reason; currently this package uses a custom one.

Open questions:

  • currently the DOM implementation tries to follow the W3C spec; what about the WHATWG spec? Which one should we try to adhere to?
  • to include or not to include the "legacy layer" to allow use of the old API backed by the new library, to ease transition? probably not a good idea

(EDIT: of course this needs @andralex approval)

@dlang-bot

This comment has been minimized.

Show comment
Hide comment
@dlang-bot

dlang-bot Aug 22, 2016

Contributor

@lodo1995, thanks for your PR! By analyzing the annotation information on this pull request, we identified @9rnsr, @9il and @WalterBright to be potential reviewers. @9rnsr: The PR was automatically assigned to you, please reassign it if you were identified mistakenly.

(The DLang Bot is under development. If you experience any issues, please open an issue at its repo.)

Contributor

dlang-bot commented Aug 22, 2016

@lodo1995, thanks for your PR! By analyzing the annotation information on this pull request, we identified @9rnsr, @9il and @WalterBright to be potential reviewers. @9rnsr: The PR was automatically assigned to you, please reassign it if you were identified mistakenly.

(The DLang Bot is under development. If you experience any issues, please open an issue at its repo.)

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Aug 22, 2016

Codecov Report

Merging #4741 into master will decrease coverage by 0.63%.
The diff coverage is 84.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4741      +/-   ##
==========================================
- Coverage   88.78%   88.15%   -0.64%     
==========================================
  Files         121      134      +13     
  Lines       74159    76768    +2609     
==========================================
+ Hits        65845    67671    +1826     
- Misses       8314     9097     +783
Impacted Files Coverage Δ
std/experimental/xml/domimpl.d 49.67% <ø> (ø)
std/experimental/xml/interfaces.d 0% <0%> (ø)
std/experimental/xml/dom.d 0% <0%> (ø)
std/experimental/xml/package.d 100% <100%> (ø)
std/experimental/xml/appender.d 71.42% <71.42%> (ø)
std/experimental/xml/domparser.d 76.31% <76.31%> (ø)
std/experimental/xml/lexers.d 83.39% <83.39%> (ø)
std/experimental/xml/cursor.d 83.54% <83.54%> (ø)
std/experimental/xml/validation.d 84.69% <84.69%> (ø)
std/experimental/xml/writer.d 85.62% <85.62%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0538d0c...2645b26. Read the comment docs.

codecov-io commented Aug 22, 2016

Codecov Report

Merging #4741 into master will decrease coverage by 0.63%.
The diff coverage is 84.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4741      +/-   ##
==========================================
- Coverage   88.78%   88.15%   -0.64%     
==========================================
  Files         121      134      +13     
  Lines       74159    76768    +2609     
==========================================
+ Hits        65845    67671    +1826     
- Misses       8314     9097     +783
Impacted Files Coverage Δ
std/experimental/xml/domimpl.d 49.67% <ø> (ø)
std/experimental/xml/interfaces.d 0% <0%> (ø)
std/experimental/xml/dom.d 0% <0%> (ø)
std/experimental/xml/package.d 100% <100%> (ø)
std/experimental/xml/appender.d 71.42% <71.42%> (ø)
std/experimental/xml/domparser.d 76.31% <76.31%> (ø)
std/experimental/xml/lexers.d 83.39% <83.39%> (ø)
std/experimental/xml/cursor.d 83.54% <83.54%> (ø)
std/experimental/xml/validation.d 84.69% <84.69%> (ø)
std/experimental/xml/writer.d 85.62% <85.62%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0538d0c...2645b26. Read the comment docs.

@burner burner added the @andralex label Aug 22, 2016

@burner burner self-assigned this Aug 22, 2016

@don-clugston-sociomantic

This comment has been minimized.

Show comment
Hide comment
@don-clugston-sociomantic

don-clugston-sociomantic Aug 22, 2016

Contributor

legacy API: in my repo I have a crude wrapper that exposes the old std.xml, to ease transition; should it be included?

I do not think it should. For it to actually ease transition, the behaviour must not differ from the existing std.xml in even the slightest way. I doubt you can do that without introducing a large maintenance burden and a string of bug reports. And the reason why this new package exists, is because the existing package is so poor, and not fixable by incremental improvement. It is an opportunity to make a clean break.

Contributor

don-clugston-sociomantic commented Aug 22, 2016

legacy API: in my repo I have a crude wrapper that exposes the old std.xml, to ease transition; should it be included?

I do not think it should. For it to actually ease transition, the behaviour must not differ from the existing std.xml in even the slightest way. I doubt you can do that without introducing a large maintenance burden and a string of bug reports. And the reason why this new package exists, is because the existing package is so poor, and not fixable by incremental improvement. It is an opportunity to make a clean break.

@andralex

This comment has been minimized.

Show comment
Hide comment
@andralex

andralex Aug 22, 2016

Member

Yah, don't worry about compatibility. We only need to make sure we use different entity names (either the whole module or the artifacts in it). I'm okay with just aiming at calling it std.xml2 once adopted. The review should address that.

Member

andralex commented Aug 22, 2016

Yah, don't worry about compatibility. We only need to make sure we use different entity names (either the whole module or the artifacts in it). I'm okay with just aiming at calling it std.xml2 once adopted. The review should address that.

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Aug 22, 2016

Contributor

@don-clugston-sociomantic @andralex Seems reasonable. The behaviour of the current library is not that easy to reproduce correctly.

By the way does someone have a clue of why the autotester fails?
The compiler complains about some overrides in domimpl.d not being correctly overriding the functions in dom.d.
But if they were in fact wrong, the error would appear on all platforms. It shouldn't be specific.
Instead, the error appears randomly (looking at the history, even on the same platform it is not consistent), more often with 32 bits targets. I'm not able to reproduce it on my machine.

Now I feel very stupid... I will commit a fix ASAP.

Contributor

lodo1995 commented Aug 22, 2016

@don-clugston-sociomantic @andralex Seems reasonable. The behaviour of the current library is not that easy to reproduce correctly.

By the way does someone have a clue of why the autotester fails?
The compiler complains about some overrides in domimpl.d not being correctly overriding the functions in dom.d.
But if they were in fact wrong, the error would appear on all platforms. It shouldn't be specific.
Instead, the error appears randomly (looking at the history, even on the same platform it is not consistent), more often with 32 bits targets. I'm not able to reproduce it on my machine.

Now I feel very stupid... I will commit a fix ASAP.

@andralex

This comment has been minimized.

Show comment
Hide comment
@andralex

andralex Aug 22, 2016

Member

@lodo1995 I'm seeing e.g. in https://auto-tester.puremagic.com/show-run.ghtml?projectid=1&runid=2150441&isPull=true:

std/experimental/xml/domimpl.d(1525): Error: function std.experimental.xml.domimpl.DOMImplementation!(string, shared(GCAllocator), bool delegate(DOMError!string)).DOMImplementation.Element.Map.length does not override any function, did you mean to override 'std.experimental.xml.dom.NamedNodeMap!string.NamedNodeMap.length'?
std/experimental/xml/domimpl.d(1536): Error: function std.experimental.xml.domimpl.DOMImplementation!(string, shared(GCAllocator), bool delegate(DOMError!string)).DOMImplementation.Element.Map.item does not override any function, did you mean to override 'std.experimental.xml.dom.NamedNodeMap!string.NamedNodeMap.item'?
std/experimental/xml/domparser.d(228): Error: template instance std.experimental.xml.domimpl.DOMImplementation!(string, shared(GCAllocator), bool delegate(DOMError!string)) error instantiating

Are these traceable to the code?

Member

andralex commented Aug 22, 2016

@lodo1995 I'm seeing e.g. in https://auto-tester.puremagic.com/show-run.ghtml?projectid=1&runid=2150441&isPull=true:

std/experimental/xml/domimpl.d(1525): Error: function std.experimental.xml.domimpl.DOMImplementation!(string, shared(GCAllocator), bool delegate(DOMError!string)).DOMImplementation.Element.Map.length does not override any function, did you mean to override 'std.experimental.xml.dom.NamedNodeMap!string.NamedNodeMap.length'?
std/experimental/xml/domimpl.d(1536): Error: function std.experimental.xml.domimpl.DOMImplementation!(string, shared(GCAllocator), bool delegate(DOMError!string)).DOMImplementation.Element.Map.item does not override any function, did you mean to override 'std.experimental.xml.dom.NamedNodeMap!string.NamedNodeMap.item'?
std/experimental/xml/domparser.d(228): Error: template instance std.experimental.xml.domimpl.DOMImplementation!(string, shared(GCAllocator), bool delegate(DOMError!string)) error instantiating

Are these traceable to the code?

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Aug 22, 2016

Contributor

@andralex Yes, I found out the issue. It's a size_t vs ulong issue, that's why it appears in 32 bit builds. I will upload a fix. The failures in 64 bit code are older, and are due to formatting issues.

Contributor

lodo1995 commented Aug 22, 2016

@andralex Yes, I found out the issue. It's a size_t vs ulong issue, that's why it appears in 32 bit builds. I will upload a fix. The failures in 64 bit code are older, and are due to formatting issues.

@jacob-carlborg

This comment has been minimized.

Show comment
Hide comment
@jacob-carlborg

jacob-carlborg Aug 24, 2016

Contributor

Phobos naming convention is not followed.

Contributor

jacob-carlborg commented Aug 24, 2016

Phobos naming convention is not followed.

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Aug 24, 2016

Contributor

Phobos naming convention is not followed.

@jacob-carlborg Yes, you are right. I forgot to change that. Enum members should not be all upper.

If I remember correctly, you also pointed out on the forum that the cursor members getXxx() should simply be called xxx(). I'd like to get some more feedback on this, as those fields are not in fact properties of the cursor, but of the node "pointed by" the cursor (the node does not really exist as an object, though). So maybe something like nodeXxx() would be better?
But of course I'm open to any suggestion.

Contributor

lodo1995 commented Aug 24, 2016

Phobos naming convention is not followed.

@jacob-carlborg Yes, you are right. I forgot to change that. Enum members should not be all upper.

If I remember correctly, you also pointed out on the forum that the cursor members getXxx() should simply be called xxx(). I'd like to get some more feedback on this, as those fields are not in fact properties of the cursor, but of the node "pointed by" the cursor (the node does not really exist as an object, though). So maybe something like nodeXxx() would be better?
But of course I'm open to any suggestion.

@jacob-carlborg

This comment has been minimized.

Show comment
Hide comment
@jacob-carlborg

jacob-carlborg Aug 24, 2016

Contributor

If I remember correctly, you also pointed out on the forum that the cursor members getXxx() should simply be called xxx(). I'd like to get some more feedback on this, as those fields are not in fact properties of the cursor, but of the node "pointed by" the cursor (the node does not really exist as an object, though). So maybe something like nodeXxx() would be better?
But of course I'm open to any suggestion.

Hmm, not sure. Let's see if someone else has a suggestion.

Contributor

jacob-carlborg commented Aug 24, 2016

If I remember correctly, you also pointed out on the forum that the cursor members getXxx() should simply be called xxx(). I'd like to get some more feedback on this, as those fields are not in fact properties of the cursor, but of the node "pointed by" the cursor (the node does not really exist as an object, though). So maybe something like nodeXxx() would be better?
But of course I'm open to any suggestion.

Hmm, not sure. Let's see if someone else has a suggestion.

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Aug 25, 2016

Contributor

@andralex @burner any opinion on the naming issue?

Contributor

lodo1995 commented Aug 25, 2016

@andralex @burner any opinion on the naming issue?

@burner

This comment has been minimized.

Show comment
Hide comment
@burner

burner Aug 25, 2016

Member

IMO if they are member an struct or class all lowercase property. If a property is even needed. If the member function computes something or calls something that might compute something the fact should be reflected in the name. getXXX or computeXXX.

Member

burner commented Aug 25, 2016

IMO if they are member an struct or class all lowercase property. If a property is even needed. If the member function computes something or calls something that might compute something the fact should be reflected in the name. getXXX or computeXXX.

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Aug 25, 2016

Contributor

@burner These methods do very simple computations; some of them cache their results for successive calls. For reference, they are getKind, getLocalName, getPrefix, getContent, ...

Contributor

lodo1995 commented Aug 25, 2016

@burner These methods do very simple computations; some of them cache their results for successive calls. For reference, they are getKind, getLocalName, getPrefix, getContent, ...

@wilzbach

This comment has been minimized.

Show comment
Hide comment
@wilzbach

wilzbach Aug 25, 2016

Member

@burner These methods do very simple computations; some of them cache their results for successive calls. For reference, they are getKind, getLocalName, getPrefix, getContent, ...

A good example would be std.container, where the complexity of an operation is part of the documentation and if there are more ways to do it even its name.
So e.g. even though it takes log-time to compute, it's uses front and not getFront.

Hence AFAIK adn as @jacob-carlborg mentioned in D-style there are no getters and to avoid confusion, it could be something like localNames, content,prefix,kind(maybe here sth. likenodeType`?)

@andralex @burner any opinion on the naming issue?

You should try to ping @andralex via mail. He is pretty good in naming ;-)

Member

wilzbach commented Aug 25, 2016

@burner These methods do very simple computations; some of them cache their results for successive calls. For reference, they are getKind, getLocalName, getPrefix, getContent, ...

A good example would be std.container, where the complexity of an operation is part of the documentation and if there are more ways to do it even its name.
So e.g. even though it takes log-time to compute, it's uses front and not getFront.

Hence AFAIK adn as @jacob-carlborg mentioned in D-style there are no getters and to avoid confusion, it could be something like localNames, content,prefix,kind(maybe here sth. likenodeType`?)

@andralex @burner any opinion on the naming issue?

You should try to ping @andralex via mail. He is pretty good in naming ;-)

@jacob-carlborg

This comment has been minimized.

Show comment
Hide comment
@jacob-carlborg

jacob-carlborg Aug 25, 2016

Contributor

@burner the point of having properties is to be able to have computed fields. Example:

struct Person
{
    string firstName;
    string lastName;

    string fullName() { return firstName ~ " " ~ lastName; }
}
Contributor

jacob-carlborg commented Aug 25, 2016

@burner the point of having properties is to be able to have computed fields. Example:

struct Person
{
    string firstName;
    string lastName;

    string fullName() { return firstName ~ " " ~ lastName; }
}
@burner

This comment has been minimized.

Show comment
Hide comment
@burner

burner Aug 25, 2016

Member

@jacob-carlborg I disagree, just look at all the crazy long threads on the forums.
Anyway, I think a productive way forward is to follow @wilzbach advise and have a look at std.container for inspiration.

Member

burner commented Aug 25, 2016

@jacob-carlborg I disagree, just look at all the crazy long threads on the forums.
Anyway, I think a productive way forward is to follow @wilzbach advise and have a look at std.container for inspiration.

@jacob-carlborg

This comment has been minimized.

Show comment
Hide comment
@jacob-carlborg

jacob-carlborg Aug 25, 2016

Contributor

@burner then what is the point of having properties?

Contributor

jacob-carlborg commented Aug 25, 2016

@burner then what is the point of having properties?

@burner

This comment has been minimized.

Show comment
Hide comment
@burner

burner Aug 25, 2016

Member

to assert assignments and to make trival access operation like a[idx] or even better return this._current

Member

burner commented Aug 25, 2016

to assert assignments and to make trival access operation like a[idx] or even better return this._current

@andralex

This comment has been minimized.

Show comment
Hide comment
@andralex

andralex Aug 25, 2016

Member

Looking over the doc at https://lodo1995.github.io/experimental.xml/std/experimental/xml.html (nice writeup), I see no need for e.g. getContent - just content seems just fine. However, if there's some traditional nomenclature in the XML space that uses get then great, keep it.

Related: saxParser.setSource(input) may be more elegantly done as saxParser.source = input.

Indeed the all-lowercase enum names should be converted to EnumName.camelCase.

Few more unasked for thoughts: in this example

auto saxParser =
     chooseParser!input     // this is a shorthand for chooseLexer!Input.parse
    .cursor
    .saxParser!MyHandler;
saxParser.setSource(input);

I'm unclear why input must be used in both chooseParser and setSource. Also, frequently the input is unknown during compilation, how does that affect chooseParser?

Member

andralex commented Aug 25, 2016

Looking over the doc at https://lodo1995.github.io/experimental.xml/std/experimental/xml.html (nice writeup), I see no need for e.g. getContent - just content seems just fine. However, if there's some traditional nomenclature in the XML space that uses get then great, keep it.

Related: saxParser.setSource(input) may be more elegantly done as saxParser.source = input.

Indeed the all-lowercase enum names should be converted to EnumName.camelCase.

Few more unasked for thoughts: in this example

auto saxParser =
     chooseParser!input     // this is a shorthand for chooseLexer!Input.parse
    .cursor
    .saxParser!MyHandler;
saxParser.setSource(input);

I'm unclear why input must be used in both chooseParser and setSource. Also, frequently the input is unknown during compilation, how does that affect chooseParser?

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Sep 3, 2016

Contributor

Ok, I finally found some time to work on this PR.
@schveiguy @andralex I was thinking about initialization. Currently, the free functions (chooseLexer, parser, cursor, ...) create uninitialized components and setSource is needed to complete initialization. If I change them to create initialized components (assuming the ones passed on input are themselves initialized), so that setSource is no longer needed, then I can no longer create uninitialized components. This ability is needed to write things like the following:

// create uninitialized
auto cursor = chooseLexer!string
                      .parser
                      .cursor
;
foreach (input; inputs)
{
    // reinitialize every time
    cursor.setSource(input);
    foo(cursor);
}

So this is a tradeoff we need to decide. Steven brought up this issue, so I guess he prefers full initialization without setSource. I would like to hear what other people think, especially Andrei.

Contributor

lodo1995 commented Sep 3, 2016

Ok, I finally found some time to work on this PR.
@schveiguy @andralex I was thinking about initialization. Currently, the free functions (chooseLexer, parser, cursor, ...) create uninitialized components and setSource is needed to complete initialization. If I change them to create initialized components (assuming the ones passed on input are themselves initialized), so that setSource is no longer needed, then I can no longer create uninitialized components. This ability is needed to write things like the following:

// create uninitialized
auto cursor = chooseLexer!string
                      .parser
                      .cursor
;
foreach (input; inputs)
{
    // reinitialize every time
    cursor.setSource(input);
    foo(cursor);
}

So this is a tradeoff we need to decide. Steven brought up this issue, so I guess he prefers full initialization without setSource. I would like to hear what other people think, especially Andrei.

@klickverbot

This comment has been minimized.

Show comment
Hide comment
@klickverbot

klickverbot Sep 3, 2016

Member

Why would you need the ability to create uninitialised components? It doesn't seem like there would be a large performance cost associated with constructing the chain (although I haven't looked at what your code is doing there), so moving the declaration inside the loop would be just fine.

Member

klickverbot commented Sep 3, 2016

Why would you need the ability to create uninitialised components? It doesn't seem like there would be a large performance cost associated with constructing the chain (although I haven't looked at what your code is doing there), so moving the declaration inside the loop would be just fine.

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Sep 3, 2016

Contributor

@klickverbot Yes, currently the only different behaviour is that internal buffers are kept for reuse.

Contributor

lodo1995 commented Sep 3, 2016

@klickverbot Yes, currently the only different behaviour is that internal buffers are kept for reuse.

@schveiguy

This comment has been minimized.

Show comment
Hide comment
@schveiguy

schveiguy Sep 3, 2016

Member

I have to give a closer look at the code. Will get back to you.

Sent from my iPhone

On Sep 3, 2016, at 10:02 AM, Lodovico Giaretta notifications@github.com wrote:

Ok, I finally found some time to work on this PR.
@schveiguy @andralex I was thinking about initialization. Currently, the free functions (chooseLexer, parser, cursor, ...) create uninitialized components and setSource is needed to complete initialization. If I change them to create initialized components (assuming the ones passed on input are themselves initialized), so that setSource is no longer needed, then I can no longer create uninitialized components. This ability is needed to write things like the following:

// create uninitialized
auto cursor = chooseLexer!string
.parser
.cursor
;
foreach (input; inputs)
{
// reinitialize every time
cursor.setSource(input);
foo(cursor);
}
So this is a tradeoff we need to decide. Steven brought up this issue, so I guess he prefers full initialization without setSource. I would like to hear what other people think, especially Andrei.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Member

schveiguy commented Sep 3, 2016

I have to give a closer look at the code. Will get back to you.

Sent from my iPhone

On Sep 3, 2016, at 10:02 AM, Lodovico Giaretta notifications@github.com wrote:

Ok, I finally found some time to work on this PR.
@schveiguy @andralex I was thinking about initialization. Currently, the free functions (chooseLexer, parser, cursor, ...) create uninitialized components and setSource is needed to complete initialization. If I change them to create initialized components (assuming the ones passed on input are themselves initialized), so that setSource is no longer needed, then I can no longer create uninitialized components. This ability is needed to write things like the following:

// create uninitialized
auto cursor = chooseLexer!string
.parser
.cursor
;
foreach (input; inputs)
{
// reinitialize every time
cursor.setSource(input);
foo(cursor);
}
So this is a tradeoff we need to decide. Steven brought up this issue, so I guess he prefers full initialization without setSource. I would like to hear what other people think, especially Andrei.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@andralex

This comment has been minimized.

Show comment
Hide comment
@andralex

andralex Sep 3, 2016

Member

@lodo1995 leaving this to you, feel free to choose whatever you think is most flexible

Member

andralex commented Sep 3, 2016

@lodo1995 leaving this to you, feel free to choose whatever you think is most flexible

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Sep 7, 2016

Contributor

Changes of the new commit:

  • default error handlers now throw exceptions instead of using assert(0)
  • now if a chain starts with chooseLexer!InputType or chooseParser!InputType then the components are not initialized and a call to setSource(inputData) is needed; if, instead, the chain starts with inputData.lexer or inputData.parser then the components are created already initialized, without any need for setSource
  • alias InputType and void setSource(InputType) are no longer mandatory for components; this allows the implementation of components with hard-coded inputs; higher-level components should detect the availability of these facilities and act accordingly.
Contributor

lodo1995 commented Sep 7, 2016

Changes of the new commit:

  • default error handlers now throw exceptions instead of using assert(0)
  • now if a chain starts with chooseLexer!InputType or chooseParser!InputType then the components are not initialized and a call to setSource(inputData) is needed; if, instead, the chain starts with inputData.lexer or inputData.parser then the components are created already initialized, without any need for setSource
  • alias InputType and void setSource(InputType) are no longer mandatory for components; this allows the implementation of components with hard-coded inputs; higher-level components should detect the availability of these facilities and act accordingly.
@schveiguy

This comment has been minimized.

Show comment
Hide comment
@schveiguy

schveiguy Sep 8, 2016

Member

now if a chain starts with chooseLexer!InputType or chooseParser!InputType then the components are not initialized and a call to setSource(inputData) is needed; if, instead, the chain starts with inputData.lexer or inputData.parser then the components are created already initialized, without any need for setSource

Awesome! Sorry I haven't had a chance to look yet. I will get to this... so many things to do...

Member

schveiguy commented Sep 8, 2016

now if a chain starts with chooseLexer!InputType or chooseParser!InputType then the components are not initialized and a call to setSource(inputData) is needed; if, instead, the chain starts with inputData.lexer or inputData.parser then the components are created already initialized, without any need for setSource

Awesome! Sorry I haven't had a chance to look yet. I will get to this... so many things to do...

private Unqual!T[] arr;
private size_t used;
public this(ref Alloc alloc)

This comment has been minimized.

@BBasile

BBasile Sep 21, 2016

Contributor

This is trivial but you have an inconsistent usage of public here and in 3 next methods. They would only be justified after a global private: since public is the default protection attribute.

@BBasile

BBasile Sep 21, 2016

Contributor

This is trivial but you have an inconsistent usage of public here and in 3 next methods. They would only be justified after a global private: since public is the default protection attribute.

{
allocator = &alloc;
}
public this(Alloc* alloc)

This comment has been minimized.

@BBasile

BBasile Sep 21, 2016

Contributor

Can you define @disable this();? This is just in case someone would use the appender in the future and to prevent any misuse.

@BBasile

BBasile Sep 21, 2016

Contributor

Can you define @disable this();? This is just in case someone would use the appender in the future and to prevent any misuse.

@NVolcz

This comment has been minimized.

Show comment
Hide comment
@NVolcz

NVolcz Oct 11, 2016

Contributor

Is it possible to get the doctype of the document? Seems like <!DOCTYPE html> currently is returned as a "dtdEmpty" with content "html"

Contributor

NVolcz commented Oct 11, 2016

Is it possible to get the doctype of the document? Seems like <!DOCTYPE html> currently is returned as a "dtdEmpty" with content "html"

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Oct 12, 2016

Contributor

@NVolcz The parser has no other information, except that the root element must be <html>. That doctype does not give any other meaningful information.

Contributor

lodo1995 commented Oct 12, 2016

@NVolcz The parser has no other information, except that the root element must be <html>. That doctype does not give any other meaningful information.

@NVolcz

This comment has been minimized.

Show comment
Hide comment
@NVolcz

NVolcz Oct 15, 2016

Contributor

@lodo1995 Pardon for the confusion. What i was trying to say is that the entity name (is that what it is called?) is needed in some application. Currently only the content is returned.
Is there any example on how to use the cursor API do build a depth first parser? My current approach is something along:

while(!cursor.documentEnd) {
    if(cursor.enter()) {
        // Do stuff
    } else if(cursor.next()) {
        // Do other stuff
    } else if(!cursor.documentEnd) {
        cursor.exit();
    }
}

But it then skips all elementEnds.

Contributor

NVolcz commented Oct 15, 2016

@lodo1995 Pardon for the confusion. What i was trying to say is that the entity name (is that what it is called?) is needed in some application. Currently only the content is returned.
Is there any example on how to use the cursor API do build a depth first parser? My current approach is something along:

while(!cursor.documentEnd) {
    if(cursor.enter()) {
        // Do stuff
    } else if(cursor.next()) {
        // Do other stuff
    } else if(!cursor.documentEnd) {
        cursor.exit();
    }
}

But it then skips all elementEnds.

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Oct 15, 2016

Contributor

@NVolcz

Pardon for the confusion. What i was trying to say is that the entity name (is that what it is called?) is needed in some application. Currently only the content is returned.

I still don't understand... dtdEmpty means that the node is an empty <!DOCTYPE declaration. If on the other hand, you have (for example) an <!ATTLIST node, then your token will be attlistDecl. If you use an undefined declaration (such as <!FOO) then your token will be declaration and the name you used will be part of the declaration content. So all information is always available to the application.

Is there any example on how to use the cursor API do build a depth first parser?

https://lodo1995.github.io/experimental.xml/std/experimental/xml.html look at the 4th example. Your snippet looks good. The closing tag (elementEnd) should be available just before calling cursor.exit. But usually you shouldn't be using it.

Contributor

lodo1995 commented Oct 15, 2016

@NVolcz

Pardon for the confusion. What i was trying to say is that the entity name (is that what it is called?) is needed in some application. Currently only the content is returned.

I still don't understand... dtdEmpty means that the node is an empty <!DOCTYPE declaration. If on the other hand, you have (for example) an <!ATTLIST node, then your token will be attlistDecl. If you use an undefined declaration (such as <!FOO) then your token will be declaration and the name you used will be part of the declaration content. So all information is always available to the application.

Is there any example on how to use the cursor API do build a depth first parser?

https://lodo1995.github.io/experimental.xml/std/experimental/xml.html look at the 4th example. Your snippet looks good. The closing tag (elementEnd) should be available just before calling cursor.exit. But usually you shouldn't be using it.

@NVolcz

This comment has been minimized.

Show comment
Hide comment
@NVolcz

NVolcz Nov 9, 2016

Contributor

Sorry missunderstood the API for the dtdEmpty.
My newest issue:
rejectedsoftware/diet-ng#11
The parser forgets to tell you that it skips the closing tag of empty elements:
Works (elementEmpty):

<works />

Does not work:

<notWork></notWork>
Contributor

NVolcz commented Nov 9, 2016

Sorry missunderstood the API for the dtdEmpty.
My newest issue:
rejectedsoftware/diet-ng#11
The parser forgets to tell you that it skips the closing tag of empty elements:
Works (elementEmpty):

<works />

Does not work:

<notWork></notWork>
@wilzbach

This comment has been minimized.

Show comment
Hide comment
@wilzbach

wilzbach Dec 20, 2016

Member

Ping @lodo1995 - how about wrapping this up within 2016?

Member

wilzbach commented Dec 20, 2016

Ping @lodo1995 - how about wrapping this up within 2016?

@lodo1995

This comment has been minimized.

Show comment
Hide comment
@lodo1995

lodo1995 Dec 20, 2016

Contributor

@wilzbach I'd really like to finish this up, but I really don't have any free time to work on it currently. I'm really sorry, but this will have to wait.

Contributor

lodo1995 commented Dec 20, 2016

@wilzbach I'd really like to finish this up, but I really don't have any free time to work on it currently. I'm really sorry, but this will have to wait.

@lesderid

This comment has been minimized.

Show comment
Hide comment
@lesderid

lesderid Feb 6, 2017

Contributor

Any update on this? (@lodo1995)

Contributor

lesderid commented Feb 6, 2017

Any update on this? (@lodo1995)

@wilzbach

This comment has been minimized.

Show comment
Hide comment
@wilzbach

wilzbach Jul 9, 2017

Member

As @lodo1995 seems to have moved on and left stdx.xml to graveyear - is someone interested in reviving this from the dead?
Otherwise how about creating a special stdx repository which doesn't require so many formal barriers and can be used as a staging spot for collaboration on new modules? (Of course it would also be available via dub)

Member

wilzbach commented Jul 9, 2017

As @lodo1995 seems to have moved on and left stdx.xml to graveyear - is someone interested in reviving this from the dead?
Otherwise how about creating a special stdx repository which doesn't require so many formal barriers and can be used as a staging spot for collaboration on new modules? (Of course it would also be available via dub)

@wilzbach

This comment has been minimized.

Show comment
Hide comment
@wilzbach

wilzbach Jul 9, 2017

Member

Otherwise how about creating a special stdx repository

We could think about hosting this at dlang-community if there are concerns about having it at /dlang.

Member

wilzbach commented Jul 9, 2017

Otherwise how about creating a special stdx repository

We could think about hosting this at dlang-community if there are concerns about having it at /dlang.

@andralex

This comment has been minimized.

Show comment
Hide comment
@andralex

andralex Jul 9, 2017

Member

We should develop stricter standards for deliverables with our GSoC students.

Member

andralex commented Jul 9, 2017

We should develop stricter standards for deliverables with our GSoC students.

@burner

This comment has been minimized.

Show comment
Hide comment
@burner

burner Jul 17, 2017

Member

I wrote @lodo1995 what his plan are or if I can take over?

Member

burner commented Jul 17, 2017

I wrote @lodo1995 what his plan are or if I can take over?

@wilzbach

This comment has been minimized.

Show comment
Hide comment
@wilzbach

wilzbach Jul 17, 2017

Member

I wrote @lodo1995 what his plan are or if I can take over?

Judging by the fact that we haven't heard anything from him in the last seven months, it's sadly not a "can", but "have to" or "would be awesome if" 😉
(though of course I would be happy to hear from @lodo1995 again)

Member

wilzbach commented Jul 17, 2017

I wrote @lodo1995 what his plan are or if I can take over?

Judging by the fact that we haven't heard anything from him in the last seven months, it's sadly not a "can", but "have to" or "would be awesome if" 😉
(though of course I would be happy to hear from @lodo1995 again)

@wilzbach wilzbach added the orphaned label Dec 21, 2017

@wilzbach

This comment has been minimized.

Show comment
Hide comment
@wilzbach

wilzbach Dec 21, 2017

Member

Closing this now as it doesn't seem like @lodo1995 is under the living. If someone else wants to pick up the torch, please feel free to do so.
There's also an open discussion about moving the existing DUB package to dlang-community:

dlang-community/discussions#23

Member

wilzbach commented Dec 21, 2017

Closing this now as it doesn't seem like @lodo1995 is under the living. If someone else wants to pick up the torch, please feel free to do so.
There's also an open discussion about moving the existing DUB package to dlang-community:

dlang-community/discussions#23

@wilzbach wilzbach closed this Dec 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment