New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code base cleanups #2825

jbrockmendel opened this Issue Feb 7, 2019 · 3 comments


None yet
2 participants
Copy link

jbrockmendel commented Feb 7, 2019

There's a pylintrc file, but pylint throws a bunch of complaints, so it doesn't appear to be enforced.

What's the policy on "cleanup" PRs, modernization, etc?


This comment has been minimized.

Copy link
Contributor Author

jbrockmendel commented Feb 7, 2019

Collecting a few potential-cleanup targets as I find them, in case the answer is "go for it!"

  • Cython.Compiler.TreePath appears unused outside of tests, may be unnecessary.
  • commented-out code in Cython.Plex that can presumably be removed
  • a lot of compatibility code could be simplified by using six
  • #1845 if jedityper is unsupported and non-functional, might not be worth keeping around
  • I don't totally get what Plex.Timing is supposed to do, looks like time.time() might be simpler?

Bigger-picture, coming at this from a direction of a newcomer interested in solving #1628, #1461, and implementing linting for cython files, it seems like there would need to be a stronger separation between the "parsing cython" part of the pipeline and the "turning it into C" part.


This comment has been minimized.

Copy link

scoder commented Feb 8, 2019

Thanks for volunteering. I agree that improving the linting coverage would be nice. Cython has a very old code base (since 2001), and we only recently added a simple initial linting configuration at all. But read on.

TreePath is used inside of the compiler pipeline for tests, so it should stay. It implements an XPath-like language that executes assertions on the parse tree, normally right before code generation. It's important for verifying that specific optimisations or other changes in the tree were really applied.

Cleaning up Plex seems good. It's been mostly unchanged since we inherited it from Pyrex (where it already was a copy of some old Plex version), and it's unlikely to receive many changes in the future. It does what it should, and it does it well. So, stripping down the code to what's actually used is reasonable. But again, read on.

I doubt that using six would provide major simplifications. In any case, Py2 support is scheduled for removal in the course of this year, so any earlier changes in that regard would be a waste of time. That will also be the right time for broader modernisation changes in the code base.

Removing the jedityper seems reasonable. A new tool would best start from scratch anyway (and probably not be part of Cython itself).

Plex.Timing is unused and seems obsolete. The scanner hasn't been a bottleneck ever since we started, so we're unlikely to become interested in tuning it in the future.

I'm not sure what you mean by "a stronger separation" between parsing and code generation. Both are independent steps in the compiler pipeline that are connected by a parse tree. The main issue in #1628 is that this parse tree was not meant as a user facing AST, so it a) lacks some abstraction and b) is not meant as a stable API. Committing to it as it stands would reduce our future flexibility to evolve Cython as a language, so I'm not very eager to open it up at this point.

If you're interested in the parser part, then look at Cython/Parser/. It contains an initial attempt at a pgen grammar that is a little outdated by now, but was a first step (by @robertwb) towards replacing the custom parser in with a "standard" generated parser, based on the CPython grammar. Note that the C-implemented CPython pgen tool is up for removal, but there is already a Python replacement in lib2to3.

That's not an easy project, though. In order to make sure the grammar does not get stale again, we'd best use it ourselves as a parser in Cython, which means that we'd have to extract some smartness from the existing parser into later pipeline steps. That should be somewhat straight forward in most cases, but needs some work. And it also has a somewhat open outcome in that the result needs to convince us to give up the current convenience of just being able to do stuff in the existing parser. The main advantage would certainly be that such a parser could be reused and extended by external tools, such as a linter. Although I don't know of any linting tools that are directly based on Python's Grammar file… Anyway, #1628 is the right ticket for that.

#1461 seems unrelated. It's mostly about fixing the way Cython generates line tracing calls. Probably trivial to fix once we know when/where exactly to report the line execution.

@scoder scoder changed the title Linting policy? Code base cleanups Feb 8, 2019


This comment has been minimized.

Copy link
Contributor Author

jbrockmendel commented Feb 8, 2019

Cleaning up Plex seems good

I'll put together a PR.

I doubt that using six would provide major simplifications.

six is nixed, got it.

Removing the jedityper seems reasonable.

Will do, probably a separate PR (I hope you're OK with the multiple-self-contained-PRs style)

I'm not sure what you mean by "a stronger separation" between parsing and code generation.

Anyway, #1628 is the right ticket for that.

#1461 seems unrelated.

The three are connected in as much as the complex dependency graph makes a serious barrier to entry for newcomers (at least this one). The "stronger separation" suggestion is speculation about refactoring that might make this easier; even if that were a high priority, mine is not a particularly expert opinion. I'll take this part of the conversation to #1628.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment