Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full validation feature #34

Open
funarog opened this issue Oct 20, 2021 · 13 comments
Open

Full validation feature #34

funarog opened this issue Oct 20, 2021 · 13 comments

Comments

@funarog
Copy link

funarog commented Oct 20, 2021

Is there a plan to incorporate the same level of validation performed by the Universal Dependency tool, validate.py, found here https://github.com/UniversalDependencies/tools? Also Is there a plan to do a performance comparison between the official Universal Dependency validation python script and hs-conllu?

The universal dependency organization's validation software provides 5 levels of validation. Because it is written in python, I suspect it would be slower than written in Haskell or C++.

@odanoburu
Copy link
Collaborator

Hi @funarog!

There is no such plan, no :/ I'm afraid that having a competing implementation without a formal specification of UD constraints would lead to countless mismatches between them, not the mention the duplication of effort!

I do have some related plans; my biggest problem with UD validation is not slowness, but that it is not declarative. Adding new rules is likely hard to newcomers, and I'd wager the validation script could be made easier to maintain. But I'm curious: is slowness such a great problem for you? My plans might help with that too, and if you have a concrete problem that might motivate me to actually work on it :P (and perhaps you are even willing to collaborate on it!)

@funarog
Copy link
Author

funarog commented Oct 21, 2021 via email

@funarog
Copy link
Author

funarog commented Jun 25, 2022

Bruno Cuconato
I made some modifications to hs-conllu. Specifically, I converted String to Text and some design changes. I got about 137% improvement in speed and about 20% reduction in memory. Tried to push the changes into a new branch stringToText but ran into problems

@odanoburu
Copy link
Collaborator

hi @funarog , that's great! there are certainly design changes to be made, and it's cool you got a performance improvement :)

I don't think you can push to my version of the repository directly, so you'd have to push to your mirror/fork and make a pull request, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request

@funarog
Copy link
Author

funarog commented Jun 27, 2022 via email

@odanoburu
Copy link
Collaborator

Hi @funarog , I don't think you've managed to create the pull request, although the fork was successful and the changes have been committed. I could pull the changes myself, but if you do open the PR I can then comment the changes on a line-by-line basis, because some things I disagree with (e.g. removing the Enum instance from EP), and others I don't really understand (and would like to ask about)

@funarog
Copy link
Author

funarog commented Jun 29, 2022 via email

@odanoburu
Copy link
Collaborator

My ultimate goal is to write a validating parser with the capabilities of the python validating parser used by the ConLLU formatting group found here

I see. That's an endearing goal; is it really no up to speed? I would be happy to merge any speed improvements on the parser and better design changes, but I don't think I'd want to extend the scope of the library to this level of validation. (You can of course use this library as a dependency of a new one.)

I would not change code simple for the sake of changing code.

Of course not :) But if we disagree we'll have to discuss before merging anything, and that's when the pull request interface shines! Sometimes that's too much trouble, feel free to use your fork as a dependency instead, and if after some time your library is clearly better and with a similar scope I'd be glad to forgo the hs-conllu name on hackage.

@funarog
Copy link
Author

funarog commented Jun 30, 2022 via email

@arademaker
Copy link
Owner

arademaker commented Jul 8, 2022

@funarog , would you be interested in implementing the support to read/write conllup

https://universaldependencies.org/ext-format.html

??

@funarog
Copy link
Author

funarog commented Jul 8, 2022

Yes. I am interested in implementing conllup. My current goal is to provide the parsing capabilities found in the python script: https://github.com/UniversalDependencies/tools/blob/master/validate.py .Then add conllup. IS there an urgent need for conllup format?

@arademaker
Copy link
Owner

Yes, in the https://universalpropositions.github.io/ we are using mainly conllup format. I do have many details to fix and I would love work with haskell .. ;-)

@arademaker
Copy link
Owner

For validation , as @odanoburu said. We need more thinking and probably an declarative approach .. I have some ideas…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants