-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full validation feature #34
Comments
Hi @funarog! There is no such plan, no :/ I'm afraid that having a competing implementation without a formal specification of UD constraints would lead to countless mismatches between them, not the mention the duplication of effort! I do have some related plans; my biggest problem with UD validation is not slowness, but that it is not declarative. Adding new rules is likely hard to newcomers, and I'd wager the validation script could be made easier to maintain. But I'm curious: is slowness such a great problem for you? My plans might help with that too, and if you have a concrete problem that might motivate me to actually work on it :P (and perhaps you are even willing to collaborate on it!) |
Bruno Cuconato
Thank you for responding. I am interested in collaboration. My primary interest is in creating a mapping between syntactic structure and the underlying semantic structure; linguistic mappings in general.
I agree there is not a formal specification of UD constraints. I am currently reciewing the UD tool validate.py and UD web page ( https://universaldependencies.org/validation-rules.html ) to determine if the rules can be more declarative. That would allow a parser generator to be built. I plan is to contact the UD developers and find out if that can be done.I believe it would be beneficial to the computational linguistic community.
I agree with your statement: Adding new rules is likely hard to newcomers, and I'd wager the validation script could be made easier to maintain. It would also make it easier for the computational linguistic community to have easier access to the consul formats.
Performance was just a curiousity. If a parser generator could easily be built one could easily use c++ Spirit and determine the benefits of different libraries/ languages. Performance probably matters to the UD contributors, since it appears they validate every language treebank on a regular bases.
I am slowly getting familiar with Haskell, though I have been programming for 45 years.
Keep in touch and we can determine where are interests intersect.
By the way, I find hs-conllu very useful. I had started writing my own parser when I ran across hs-conllu. Which I have been reviewing the code. If I run across some suggested improvementsI will let you know.
Thanks again.
Greg Funaro
… On Oct 21, 2021, at 7:41 AM, bc² ***@***.***> wrote:
Hi @funarog!
There is no such plan, no :/ I'm afraid that having a competing implementation without a formal specification of UD constraints would lead to countless mismatches between them, not the mention the duplication of effort!
I do have some related plans; my biggest problem with UD validation is not slowness, but that it is not declarative. Adding new rules is likely hard to newcomers, and I'd wager the validation script could be made easier to maintain. But I'm curious: is slowness such a great problem for you? My plans might help with that too, and if you have a concrete problem that might motivate me to actually work on it :P (and perhaps you are even willing to collaborate on it!)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
Bruno Cuconato |
hi @funarog , that's great! there are certainly design changes to be made, and it's cool you got a performance improvement :) I don't think you can push to my version of the repository directly, so you'd have to push to your mirror/fork and make a pull request, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request |
Hi @funarog , I don't think you've managed to create the pull request, although the fork was successful and the changes have been committed. I could pull the changes myself, but if you do open the PR I can then comment the changes on a line-by-line basis, because some things I disagree with (e.g. removing the Enum instance from EP), and others I don't really understand (and would like to ask about) |
Bruno
I will attempt the pull request again; probably later this evening.
Regarding changes. My ultimate goal is to write a validating parser with the capabilities of the python validating parser used by the ConLLU formatting group found here: https://github.com/UniversalDependencies/tools/blob/master/validate.py <https://github.com/UniversalDependencies/tools/blob/master/validate.py>
and with improved speed. As you already know simply converting the hs-connlu code from string to text will not guarantee a speed up. I was pleased to see a speed up. But I did not do an in depth performance analysis to see from where it came.
Any changes I made 1) were done so that I could understand better what the hs-conllu code was doing 2) remove warnings or errors 3) try to follow recommended haskell practices.
I would not change code simple for the sake of changing code. The changes I made, made it easier for me to follow and understand the code. I am not a Haskell expert by any means. (though I have been programming in the large for about 50 years in various languages). I removed the items like enum because it did not seem to impact the code, And, for the case of EP, I followed LYAHFGG:
Enum members are sequentially ordered types — they can be enumerated. The main advantage of the Enum typeclass is that we can use its types in list ranges. They also have defined successors and predecesors, which you can get with the succ and pred functions.
Since this did not seem the case, I removed the Enum derivation.
If there is something I missed or a benefit that I am probably not aware, I would appreciate any insights you may have.
Thanks
Greg
… On Jun 29, 2022, at 7:55 AM, bc² ***@***.***> wrote:
Hi @funarog <https://github.com/funarog> , I don't think you've managed to create the pull request, although the fork was successful and the changes have been committed. I could pull the changes myself, but if you do open the PR I can then comment the changes on a line-by-line basis, because some things I disagree with (e.g. removing the Enum instance from EP), and others I don't really understand (and would like to ask about)
—
Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFB7CS4OBSR44UXDD7RJIO3VRRBV5ANCNFSM5GMEG4UA>.
You are receiving this because you were mentioned.
|
I see. That's an endearing goal; is it really no up to speed? I would be happy to merge any speed improvements on the parser and better design changes, but I don't think I'd want to extend the scope of the library to this level of validation. (You can of course use this library as a dependency of a new one.)
Of course not :) But if we disagree we'll have to discuss before merging anything, and that's when the pull request interface shines! Sometimes that's too much trouble, feel free to use your fork as a dependency instead, and if after some time your library is clearly better and with a similar scope I'd be glad to forgo the hs-conllu name on hackage. |
Bruno
I think this is a pull request: #35 <#35>
As far as accepting changes or modifications, feel free to accept or reject anything you wish.
Greg
… On Jun 27, 2022, at 7:01 AM, bc² ***@***.***> wrote:
hi @funarog <https://github.com/funarog> , that's great! there are certainly design changes to be made, and it's cool you got a performance improvement :)
I don't think you can push to my version of the repository directly, so you'd have to push to your mirror/fork and make a pull request, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request>
—
Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFB7CS7PT7ANWIE77TLJWQLVRGJ2NANCNFSM5GMEG4UA>.
You are receiving this because you were mentioned.
|
@funarog , would you be interested in implementing the support to read/write conllup https://universaldependencies.org/ext-format.html ?? |
Yes. I am interested in implementing conllup. My current goal is to provide the parsing capabilities found in the python script: https://github.com/UniversalDependencies/tools/blob/master/validate.py .Then add conllup. IS there an urgent need for conllup format? |
Yes, in the https://universalpropositions.github.io/ we are using mainly conllup format. I do have many details to fix and I would love work with haskell .. ;-) |
For validation , as @odanoburu said. We need more thinking and probably an declarative approach .. I have some ideas… |
Is there a plan to incorporate the same level of validation performed by the Universal Dependency tool, validate.py, found here https://github.com/UniversalDependencies/tools? Also Is there a plan to do a performance comparison between the official Universal Dependency validation python script and hs-conllu?
The universal dependency organization's validation software provides 5 levels of validation. Because it is written in python, I suspect it would be slower than written in Haskell or C++.
The text was updated successfully, but these errors were encountered: