Full validation feature #34

funarog · 2021-10-20T17:03:55Z

Is there a plan to incorporate the same level of validation performed by the Universal Dependency tool, validate.py, found here https://github.com/UniversalDependencies/tools? Also Is there a plan to do a performance comparison between the official Universal Dependency validation python script and hs-conllu?

The universal dependency organization's validation software provides 5 levels of validation. Because it is written in python, I suspect it would be slower than written in Haskell or C++.

odanoburu · 2021-10-21T12:41:20Z

Hi @funarog!

There is no such plan, no :/ I'm afraid that having a competing implementation without a formal specification of UD constraints would lead to countless mismatches between them, not the mention the duplication of effort!

I do have some related plans; my biggest problem with UD validation is not slowness, but that it is not declarative. Adding new rules is likely hard to newcomers, and I'd wager the validation script could be made easier to maintain. But I'm curious: is slowness such a great problem for you? My plans might help with that too, and if you have a concrete problem that might motivate me to actually work on it :P (and perhaps you are even willing to collaborate on it!)

funarog · 2021-10-21T13:54:51Z

Bruno Cuconato Thank you for responding. I am interested in collaboration. My primary interest is in creating a mapping between syntactic structure and the underlying semantic structure; linguistic mappings in general. I agree there is not a formal specification of UD constraints. I am currently reciewing the UD tool validate.py and UD web page ( https://universaldependencies.org/validation-rules.html ) to determine if the rules can be more declarative. That would allow a parser generator to be built. I plan is to contact the UD developers and find out if that can be done.I believe it would be beneficial to the computational linguistic community. I agree with your statement: Adding new rules is likely hard to newcomers, and I'd wager the validation script could be made easier to maintain. It would also make it easier for the computational linguistic community to have easier access to the consul formats. Performance was just a curiousity. If a parser generator could easily be built one could easily use c++ Spirit and determine the benefits of different libraries/ languages. Performance probably matters to the UD contributors, since it appears they validate every language treebank on a regular bases. I am slowly getting familiar with Haskell, though I have been programming for 45 years. Keep in touch and we can determine where are interests intersect. By the way, I find hs-conllu very useful. I had started writing my own parser when I ran across hs-conllu. Which I have been reviewing the code. If I run across some suggested improvementsI will let you know. Thanks again. Greg Funaro

…

On Oct 21, 2021, at 7:41 AM, bc² ***@***.***> wrote: Hi @funarog! There is no such plan, no :/ I'm afraid that having a competing implementation without a formal specification of UD constraints would lead to countless mismatches between them, not the mention the duplication of effort! I do have some related plans; my biggest problem with UD validation is not slowness, but that it is not declarative. Adding new rules is likely hard to newcomers, and I'd wager the validation script could be made easier to maintain. But I'm curious: is slowness such a great problem for you? My plans might help with that too, and if you have a concrete problem that might motivate me to actually work on it :P (and perhaps you are even willing to collaborate on it!) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

funarog · 2022-06-25T19:14:39Z

Bruno Cuconato
I made some modifications to hs-conllu. Specifically, I converted String to Text and some design changes. I got about 137% improvement in speed and about 20% reduction in memory. Tried to push the changes into a new branch stringToText but ran into problems

odanoburu · 2022-06-27T12:01:30Z

hi @funarog , that's great! there are certainly design changes to be made, and it's cool you got a performance improvement :)

I don't think you can push to my version of the repository directly, so you'd have to push to your mirror/fork and make a pull request, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request

funarog · 2022-06-27T17:59:41Z

Okay I forked and pushed the changes to here: https://github.com/funarog/hs-conllu/blob/stringToText/src/Conllu/Print.hs <https://github.com/funarog/hs-conllu/blob/stringToText/src/Conllu/Print.hs> It looks to me there is a pull request here: https://github.com/odanoburu/hs-conllu/pulls <https://github.com/odanoburu/hs-conllu/pulls> There were a few design changes easily seen with diff.

…

On Jun 27, 2022, at 7:01 AM, bc² ***@***.***> wrote: hi @funarog <https://github.com/funarog> , that's great! there are certainly design changes to be made, and it's cool you got a performance improvement :) I don't think you can push to my version of the repository directly, so you'd have to push to your mirror/fork and make a pull request, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request> — Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFB7CS7PT7ANWIE77TLJWQLVRGJ2NANCNFSM5GMEG4UA>. You are receiving this because you were mentioned.

odanoburu · 2022-06-29T12:55:47Z

Hi @funarog , I don't think you've managed to create the pull request, although the fork was successful and the changes have been committed. I could pull the changes myself, but if you do open the PR I can then comment the changes on a line-by-line basis, because some things I disagree with (e.g. removing the Enum instance from EP), and others I don't really understand (and would like to ask about)

funarog · 2022-06-29T15:44:48Z

Bruno I will attempt the pull request again; probably later this evening. Regarding changes. My ultimate goal is to write a validating parser with the capabilities of the python validating parser used by the ConLLU formatting group found here: https://github.com/UniversalDependencies/tools/blob/master/validate.py <https://github.com/UniversalDependencies/tools/blob/master/validate.py> and with improved speed. As you already know simply converting the hs-connlu code from string to text will not guarantee a speed up. I was pleased to see a speed up. But I did not do an in depth performance analysis to see from where it came. Any changes I made 1) were done so that I could understand better what the hs-conllu code was doing 2) remove warnings or errors 3) try to follow recommended haskell practices. I would not change code simple for the sake of changing code. The changes I made, made it easier for me to follow and understand the code. I am not a Haskell expert by any means. (though I have been programming in the large for about 50 years in various languages). I removed the items like enum because it did not seem to impact the code, And, for the case of EP, I followed LYAHFGG: Enum members are sequentially ordered types — they can be enumerated. The main advantage of the Enum typeclass is that we can use its types in list ranges. They also have defined successors and predecesors, which you can get with the succ and pred functions. Since this did not seem the case, I removed the Enum derivation. If there is something I missed or a benefit that I am probably not aware, I would appreciate any insights you may have. Thanks Greg

…

On Jun 29, 2022, at 7:55 AM, bc² ***@***.***> wrote: Hi @funarog <https://github.com/funarog> , I don't think you've managed to create the pull request, although the fork was successful and the changes have been committed. I could pull the changes myself, but if you do open the PR I can then comment the changes on a line-by-line basis, because some things I disagree with (e.g. removing the Enum instance from EP), and others I don't really understand (and would like to ask about) — Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFB7CS4OBSR44UXDD7RJIO3VRRBV5ANCNFSM5GMEG4UA>. You are receiving this because you were mentioned.

odanoburu · 2022-06-29T22:38:23Z

My ultimate goal is to write a validating parser with the capabilities of the python validating parser used by the ConLLU formatting group found here

I see. That's an endearing goal; is it really no up to speed? I would be happy to merge any speed improvements on the parser and better design changes, but I don't think I'd want to extend the scope of the library to this level of validation. (You can of course use this library as a dependency of a new one.)

I would not change code simple for the sake of changing code.

Of course not :) But if we disagree we'll have to discuss before merging anything, and that's when the pull request interface shines! Sometimes that's too much trouble, feel free to use your fork as a dependency instead, and if after some time your library is clearly better and with a similar scope I'd be glad to forgo the hs-conllu name on hackage.

funarog · 2022-06-30T00:52:25Z

Bruno I think this is a pull request: #35 <#35> As far as accepting changes or modifications, feel free to accept or reject anything you wish. Greg

…

On Jun 27, 2022, at 7:01 AM, bc² ***@***.***> wrote: hi @funarog <https://github.com/funarog> , that's great! there are certainly design changes to be made, and it's cool you got a performance improvement :) I don't think you can push to my version of the repository directly, so you'd have to push to your mirror/fork and make a pull request, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request> — Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFB7CS7PT7ANWIE77TLJWQLVRGJ2NANCNFSM5GMEG4UA>. You are receiving this because you were mentioned.

arademaker · 2022-07-08T00:53:57Z

@funarog , would you be interested in implementing the support to read/write conllup

https://universaldependencies.org/ext-format.html

??

funarog · 2022-07-08T11:08:18Z

Yes. I am interested in implementing conllup. My current goal is to provide the parsing capabilities found in the python script: https://github.com/UniversalDependencies/tools/blob/master/validate.py .Then add conllup. IS there an urgent need for conllup format?

arademaker · 2022-07-08T16:23:44Z

Yes, in the https://universalpropositions.github.io/ we are using mainly conllup format. I do have many details to fix and I would love work with haskell .. ;-)

arademaker · 2022-07-08T16:24:56Z

For validation , as @odanoburu said. We need more thinking and probably an declarative approach .. I have some ideas…

odanoburu mentioned this issue Aug 20, 2022

Plans for the library #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full validation feature #34

Full validation feature #34

funarog commented Oct 20, 2021

odanoburu commented Oct 21, 2021

funarog commented Oct 21, 2021 via email

funarog commented Jun 25, 2022

odanoburu commented Jun 27, 2022

funarog commented Jun 27, 2022 via email

odanoburu commented Jun 29, 2022

funarog commented Jun 29, 2022 via email

odanoburu commented Jun 29, 2022

funarog commented Jun 30, 2022 via email

arademaker commented Jul 8, 2022 •

edited

Loading

funarog commented Jul 8, 2022

arademaker commented Jul 8, 2022

arademaker commented Jul 8, 2022

Full validation feature #34

Full validation feature #34

Comments

funarog commented Oct 20, 2021

odanoburu commented Oct 21, 2021

funarog commented Oct 21, 2021 via email

funarog commented Jun 25, 2022

odanoburu commented Jun 27, 2022

funarog commented Jun 27, 2022 via email

odanoburu commented Jun 29, 2022

funarog commented Jun 29, 2022 via email

odanoburu commented Jun 29, 2022

funarog commented Jun 30, 2022 via email

arademaker commented Jul 8, 2022 • edited Loading

funarog commented Jul 8, 2022

arademaker commented Jul 8, 2022

arademaker commented Jul 8, 2022

arademaker commented Jul 8, 2022 •

edited

Loading