Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code refactor, setup.py, auto-formatting #7

Closed
wants to merge 9 commits into from

Conversation

sai-prasanna
Copy link

@sai-prasanna sai-prasanna commented Mar 16, 2019

  1. I have created setup.py to make it pip installable. If you opt to accept these changes we can publish to PyPI under name errant.
  2. Added Types everywhere. And add a way to type check. If add a CI in future we can automate checking it on pull requests.
  3. Creating a Edit type, Error Type to use throughout instead of tuples.
  4. Move scripts to commands. Now we can invoke errant parallel_to_m2 etc to get the job done.
  5. Best practice for formatting code uniformly is to not do it by hand, so I have added black auto formatter, and simple way to invoke it to format the code.
  6. Used Levenshtien package for character edits. In my initial tests it gave a huge increase in speed, yet to check it overall on diverse sentences.

@chrisjbryant
Copy link
Owner

Thanks again for all of this. I don't want to release it during the shared task as it might confuse people, but will certainly take a look and do some testing afterwards, probably April. Hope you can wait!

@sai-prasanna
Copy link
Author

@chrisjbryant Cool, btw Spacy 2.1 released today https://explosion.ai/blog/spacy-v2-1 , so we can release a v1.0.1 with spacy 2.1 I guess.

@chrisjbryant
Copy link
Owner

chrisjbryant commented Mar 25, 2019

Fyi, Spacy 2.1.2 finally changed the POS mappings so I'd definitely have to revisit the rules to be compatible with spacy 2.1. Cf. explosion/spaCy#3455.

@sai-prasanna
Copy link
Author

@chrisjbryant A gentle reminder. If there is anything to be done my side, will do.

@chrisjbryant
Copy link
Owner

Yep, it's on my to-do list! The shared task finished last week and we're releasing the official results tomorrow so I still have a bit to do.
I'm also slightly concerned about releasing a new version so quickly since it will probably produce slightly different scores to the shared task. I'll test your version next week, but it would be good to have an option to use the official shared task setup as well as the newer, more updated version.

@sai-prasanna
Copy link
Author

Awesome, Our team has a submission going there in the shared task 😀 . We can release multiple versions probably the 1.0.0 with official scorer and 1.1.0 with updated spacy. Or if you can adapt the setup.py and the scaffolding code to existing code without all these other changes, that can be 1.0.0.

@chrisjbryant
Copy link
Owner

Awkward question: how easy is it to get rid of all the explicit typing?
I read that typing is only supported by python 3.6 and later, but we only have 3.5 on our main server and can't update it in case it breaks other people's projects. I suspect other people might find themselves in a similar position, so would really like the code to be compatible with python >=3.4 to give us a reasonable degree of backwards compatibility (3.4 was released in 2014).

@sai-prasanna
Copy link
Author

I guess they can be removed by a simple regex or some find and replace .. Will check if there is any automated way to remove them. I thought people generally used virtualenvs, but yeah for backward compatibility we can move types to comments for typepchecker to still function.

@sam-writer
Copy link
Contributor

Sorry to resurrect this, but we'd love to help make ERRANT pip-installable, ideally as a module, so it can be called from within Python.

I was under the same impression about version restrictions, but it turns out older version of Python can support typing, similar to how from __future__ import print_function would allow you to use print() in Python 2 code.

This stackoverflow answer discusses how you can support type syntax with Python 3.0-3.4 (3.5+ already have typing support). It seems to be just adding a dependency on typing, "a backport of the standard library typing module to Python versions older than 3.5." See also a github issue in the Python org which seems to agree.

So to make progress on this PR, I propose:

  1. Use Docker to verify that the code runs on 3.4, 3.5, 3.6 and 3.7 (possibly with the addition of a dependency on typing) by, for example, running the demo and confirming that it produces a file called test.m2 which is identical to out.m2.
  2. revisit the rules to confirm they work with spaCy 2.1.x POS mappings (including this to be thorough, without being sure what it means)

@chrisjbryant , if you can tell me more about 2, I am happy to help. Also, if 1 were completed as described, would you feel that there is adequate backwards compatibility?

@chrisjbryant
Copy link
Owner

It took 9 months, but I finally found the time to learn setup.py and refactor everything (inspired by this pull request)!
Sorry it took so long, but I also took the opportunity to learn about python packaging and OOP.
I hope you don't mind I also acknowledged your help in the changelog. @sai-prasanna
:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants