Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public Interface #145

Open
alecandido opened this issue Aug 31, 2022 · 9 comments
Open

Public Interface #145

alecandido opened this issue Aug 31, 2022 · 9 comments
Assignees
Labels
refactor Refactor code
Milestone

Comments

@alecandido
Copy link
Member

alecandido commented Aug 31, 2022

We have to decide the public interface of EKO, for the following reason: at the moment, everything is potentially public.
In this way, we won't ever have a v1.0.0, since it is impossible that all internal details will be forever stable.

We also have to decide how to discriminate the public interface from the non-public one. Essentially, I know two ways:

  1. Python underscore convention: if a name is led by an underscore, e.g. _myfunc, then it is private
  2. Numpy export style: everything and only what is accessible in the top-level scope is public, i.e.
    import numpy as np
    
    np.myobj
    then myobj is public

The point about majors is preserving backward compatibility, if we only add stuffs to the public API we won't need a new major ever.
So new major are needed only for refactoring the public API. Otherwise, EKO will be v1.x.y forever :)


the list of exposed features should be (at least):

@felixhekhorn felixhekhorn added the refactor Refactor code label Sep 5, 2022
@felixhekhorn
Copy link
Contributor

Let's do 2) first and then, if needed, afterwards 1)

I'll edit the top entry to add a list of needed entries

@felixhekhorn
Copy link
Contributor

We also have to agree on a strategy for ekobox and ekomark

@alecandido
Copy link
Member Author

We also have to agree on a strategy for ekobox and ekomark

That's fair, but less urgent. ekomark in particular is really a byproduct, mainly for internal use (while ekobox is somewhat more relevant, getting closer to the user).

@felixhekhorn
Copy link
Contributor

I believe pandas is following our current best proposal and it provides the two functions we need (here and here) together with the pre-commit hook

@alecandido
Copy link
Member Author

alecandido commented Jan 27, 2023

First, we definitely need to prioritize this issue.

In particular, we need a way to mark the package content as private/public, in order to start doing it.

The general scheme is that everything is going to be private, unless those things that we really want to provide to the users, and we know that are going to be useful.
Examples:

  • ekore will be mostly public, but some content of harmonics might be kept private
  • ekomark will be entirely private
  • ekobox is user-facing, but if possible is going to be mostly private as well: the CLI itself is going to be public, but not the CLI defining functions, so ekobox.cli will be private, the eko_identity and product will be private (they have no known external users)
  • eko is complex and complicated: mellin is private without any doubt, msbar_masses is definitely private for the time being, basis_rotation is not private (used everywhere) and we have to decide which part of its content is public, and interpolation should be private, because it is messy and prone to change, but definitely used in yadism, and so on...

We want to allow public members to access private members, but this is complicated because when importing a module, it enters the scope of the importing one.
Thus, the current proposal is:

  1. to prepend an _ to the higher level object being private (subpackage, module, member), the objects inside do not need any leading _ (since they are intended to be already in a private scope)
  2. while importing an object not lead by _ directly into a public scope, rename it with:
    from .my.module import myobj as _myobj

We also want a tool to automatically check that rule 2. is respected (so it is not violating the publicity defined by rule 1. and underscores).
If such a tool exists, we will use it in CI and pre-commit. Otherwise, we will write ourselves using standard library's ast module to parse top-level import statements in all public scopes. This will be placed in the workflows repo and used to migrate to explicit publicity also the other packages. A suitable pre-commit hook will be provided.

The following stages will be followed:

  • a 1.0.0 alpha version will be released the moment the explicit publicity will have been defined in the whole package
  • a 1.0.0 beta version will be released as soon as all the main ingredients known to be strictly required (QED, N3LO, distributed computation) will be merged (only those that are going to make big breaking changes)
  • once we will have tested for a while the beta version, and we have fixed all the initial bugs (for sure present) we will release the first 1.0.0 release candidate
  • after a fixed amount of time without breaking bugs, we will finally release the actual 1.0.0
  • all non-breaking issues are going to be post-1.0.0

Please @NNPDF/theory start marking EKO issues according to the latest possible required stage, i.e. do not make your issue beta, if it is not a big breaking change, and if it is not breaking at all it has to be post.
Example: the migration to Rust will be a big step, but we can preserve a non-breaking Python interface, so #185 will be marked as post. The breaking part is already in #194

@alecandido alecandido pinned this issue Jan 27, 2023
@alecandido
Copy link
Member Author

alecandido commented Jan 29, 2023

Because of the we-are-adults-here policy, the state of solutions for public API definition in Python packages is rather poor.

Here possible approaches:

  1. leading underscores: officially supported and widely used, but a bit cumbersome to apply consistently at modules and sub-packages level
    • problem: essentially, you would like the content of _module.py to be private, even if it is not lead by an _ itself, e.g. ciao, but if you import it with from ._module import ciao in a public module pubmod.py, that it will be to the user as pubmod.ciao, and it will look like public itself
    • solutions:
      • put leading underscores to any private variable, regardless of being already in a private module (ugly and cumbersome)
      • always rename private imports in public modules, i.e. from ._module import ciao as _ciao
  2. https://gitlab.com/warsaw/public (not much adopted)
  3. https://github.com/andrewp-as-is/public.py (not much adopted)

The preferred solution at the moment is 1. with import renaming.
It would be possible to write a function import_from_private(), but we would lose autocompletion and more language support, available to the native import.
Better to write an automatic checker, and possibly automated formatter (renaming violating variables).

@alecandido
Copy link
Member Author

alecandido commented Jan 29, 2023

Otherwise, we will write ourselves using standard library's ast module to parse top-level import statements in all public scopes.

This was half a lie: ast is enough to analyze the code, e.g.

import ciao
from ciao import come
from come import va as bene

into

Module(
    body=[
        Import(
            names=[
                alias(name='ciao')]),
        ImportFrom(
            module='ciao',
            names=[
                alias(name='come')],
            level=0),
        ImportFrom(
            module='come',
            names=[
                alias(name='va', asname='bene')],
            level=0)],
    type_ignores=[])

but it is meant to be compiled to executable code, not to be dumped again.

For round-trip parsing, we need a Concrete Syntax Tree, not an abstract one. One of the following should perfectly work:

  • https://github.com/Instagram/LibCST written and supported by Instagram, it has exactly our scope
  • https://github.com/PyCQA/astroid used by PyLint and other related tools internally
    • it is just another, more elaborate, AST parser - for us ast would be sufficient
  • https://github.com/davidhalter/parso powering Jedi (that is used by IPython)
  • unfortunately Black is using its own internally, I would not recommend taking it (since it is not part of the public API, speaking of...)
  • isort as well uses its own, of course fully focused on imports (but it's internal, and adds also support for Cython...)
  • pyupgrade uses ast, and similarly other related tools (or even directly strings manipulation)

So the actual options are three: LibCST, parso, or stick to ast (and do not attempt to replace).

In case we want to limit deps (not incredibly critical) but still attempt replacement, we could also use ast, using the nodes' lineno and endlineno attributes to delete the lines and write a very stupid replacement (manually encoded in a string), followed by an isort step.

@alecandido
Copy link
Member Author

A popular solution (adopted by Pandas, but I seem to remember to have seen also somewhere else) is to move the full implementation inside a core package essentially:

cd src/eko
mkdir core
mv * core/

And then use modules and sub-packages outside core to define the public API.

With this convention, everything that is outside core is public, and inside core is private (and you can never export an entire module from core to somewhere else, because you would export also its imports).
This is not a Python convention, but it is also a simple enough and sensible convention. We could make more Pythonic by using _core.

However, it adds the overhead of manually defining all the public modules. But, possibly, it is more clear and less cumbersome than writing our own checker...

@alecandido
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Refactor code
Projects
None yet
Development

No branches or pull requests

2 participants