Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libconda: uncouple application from library #13789

Open
7 tasks
jaimergp opened this issue Apr 11, 2024 · 5 comments
Open
7 tasks

libconda: uncouple application from library #13789

jaimergp opened this issue Apr 11, 2024 · 5 comments
Assignees
Labels
epic a highlevel collection of smaller related issues source::contributor created by a frequent contributor type::tech-debt identifies or resolves some technical debt
Milestone

Comments

@jaimergp
Copy link
Contributor

jaimergp commented Apr 11, 2024

Checklist

  • I added a descriptive title
  • I searched open reports and couldn't find a duplicate

Summary

This is the meta issue that will list the strategy and ongoing efforts to implement a less coupled library/application separation in conda, the Python project.

For the rationale and previous discussions on the matter, refer to #11925.

I implemented a rough view on how this repository would "look like" if the different components were already uncoupled. See jaimergp/conda:libconda. Note this is just moving files around; the imports were not adjusted or tested. A real world example can also be seen in the mamba-org/mamba project: libmamba contains the library and API, while micromamba implements a CLI out of the API parts of libmamba.

The gist of the proposal is to have something like this:

.
├── .github/workflows: redesign so file filters only test the required packages in PRs 
│                      (for reduced CI runtime), but everything in main
├── docs/: general conda usage documentation, development guidelines, repository structure; 
│          it will rely on the sub packages to generate the API stuff and other 
│          subpackage-specific docs.
├── recipe/: the conda recipe that packages everything
├── src/
│   ├── conda-app/: the application that bundles everything, maybe just a conda recipe
│   │   ├── conda_app/: the configuration system, initialization, the shell integrations,
│   │   │               and the CLI
│   │   └── pyproject.toml
│   └── libconda/: the Python library, which does not import anything else
│       ├── conda/: named like this to keep the same import, but we can think about it
│       └── pyproject.toml
├── README.md and other stuff
└── base linter, pytest configuration for all subprojects too?

We can also go deeper and split conda_app further:

...
├── src/
│   ├── conda-app/: the application that bundles everything, maybe just a conda recipe
│   │   └── recipe/: the conda recipe for everything
│   ├── conda-config/: the configuration system that replaces the default values 
│   │   │              provided by libconda. maybe this is still part of libconda, though.
│   │   ├── conda_config/: a Pydantic-based config system (Travis has something for this)
│   │   └── pyproject.toml
│   ├── conda-shell/: the activation and initialization stuff
│   │   ├── conda_shell/
│   │   └── pyproject.toml
│   ├── conda-cli/: the command line interface (allows different frontends!)
│   │   ├── conda_cli/
│   │   └── pyproject.toml
│   └── libconda/: the Python library, which does not import anything else
│       ├── conda/: named like this to keep the same import, but we can think about it
│       └── pyproject.toml
...

We can start small and then split further if deemed necessary.

Challenges

One of the main problems is how to do this without blocking the work on other items that depend on the current layout (just conda/). I suggest starting a new branch libconda and then create small PRs targeting that branch til everything passes. We should try to be as quick as possible to reduce the amount of rebasing that needs to be done.

If possible, the refactor of the layout and imports should be automated in a script so they can be applied in open PRs once merged. This only makes sense if we can guarantee it works well, but it could also involve a lot of work. I'm assuming this is only about "moving files and renaming imports" but I guess we'll also need to move functions and classes from module to module, and other not-so-automatable work.

Questions

  • How do we make sure this is delivered without entering a cherry-picking / rebasing hell?
  • Survey of opinions: What's your preferred split? How do you envision the project structure? If you had to import from libconda, how does it look like in your head?

Linked Issues & PRs

This list will be updated as needed to reflect ongoing work. The list below is just an idea of the tasks needed.

Tasks

@jaimergp jaimergp added epic a highlevel collection of smaller related issues type::tech-debt identifies or resolves some technical debt source::contributor created by a frequent contributor labels Apr 11, 2024
@jaimergp jaimergp self-assigned this Apr 11, 2024
@jezdez jezdez changed the title [META] libconda: uncouple application from library libconda: uncouple application from library Apr 11, 2024
@beeankha beeankha added this to the 24.9.X milestone Apr 15, 2024
@schuylermartin45
Copy link
Contributor

I am generally in favor of having a second dev branch (tracking main) for this kind of work. I've used this strategy in the past and so long as the approval process matches, it will make the big nasty merge PR back to main at the end a little less painful.

My only immediate concern is that we should feel confident (enough) in our automated tests before we take on a massive refactor to ensure stability for our users. I've seen so many refactors go south because features broke or got dropped without anyone realizing it.

As for the refactoring of module paths, I know VS Code has some ability to do this natively but off the top of my head, I don't know of a script that does this. I'm sure one exists.

@jaimergp
Copy link
Contributor Author

Thanks @schuylermartin45! One thing I want to ensure is that we have a list of all importable symbols at the beginning of the refactor, plus some kind of import checker on CI, so all those "routes" are available at the end of the exercise (similar to the sitemap in a website + a link checker). We can always forward things with apipkg.

@kenodegard
Copy link
Contributor

🌶️ I still don’t think this is needed, at least not this drastic.

How does splitting conda mesh with CEP8 and CEP9, is libconda subject to both? Or are we ripping out a large portion of conda’s source into a new library in order to revert back to SemVer? How often will libconda be released? Will it be in parallel with conda_app?

The conda application is already siloed within conda.cli (conda/shells is perhaps an abomination but we can address that without creating libconda) so I see this as a highly complex refactor that wont help decrease overall complexity.

I completely agree that conda as a library needs to be improved and fixed, but IMO this can be done by continuing to do what we’ve been doing; fix logging (only call init_logging for CLI), fix error handling (remove all sys.exit calls, only use error reporting for CLI, etc.), rethink organization (e.g., conda.cli should not contain any logic just the CLI interface), etc.

@jaimergp
Copy link
Contributor Author

Thanks @kenodegard! I'll reply inline with my current point of view and the constraints/goals I have in mind. I'm open to changing my stance on almost all points provided we can satisfy the desired outcomes. I'm excited about the opportunities this work presents and, to be clear, I don't intend this to be a solo project. I will gladly welcome anyone who wants to participate!

How does splitting conda mesh with CEP8 and CEP9, is libconda subject to both? Or are we ripping out a large portion of conda’s source into a new library in order to revert back to SemVer? How often will libconda be released? Will it be in parallel with conda_app?

I am not proposing to change versioning schemes or release cadences. All subprojects would be part of the same release cycle and versioning. libconda doesn't have to follow semver. If needed we could have dual versioning for libconda, but I feel the calver + the deprecation cycle is working nicely already.

The conda application is already siloed within conda.cli (conda/shells is perhaps an abomination but we can address that without creating libconda) so I see this as a highly complex refactor that wont help decrease overall complexity.

The command-line setup is more or less siloed (with the exception of conda.activate, conda.initialize, conda.plan and whatnot), but imo the application logic is leaking eeeeverywhere. There are also instances of logic that would be worth having in a library, but it's right now confined to the cli namespace (e.g. conda list). Some other examples from the top of my head:

  • Almost every core class is metaclass'd to implement memoization and singletons. This can be surprising when doing library work. One example that comes to mind is PrefixData. It is cached per path. So if you want to compare states after a prefix modification, you need to clear the cache.
  • The context singleton is everywhere and it's always initialized from the user disk configuration because the hooks are executed at import time. This initialization is only relevant for the command line application.
  • The output reporting in the CLI is done from every part of the project; e.g. the Solver API class is also responsible for printing the Spinner objects in place.
  • Let's not talk about logging too much :P

My intention here is to uncouple the two layers explicitly so there's a namespace boundary. I agree is a complex refactor. I would argue it doesn't reduce complexity, but instead it compartmentalises it. After years of contributions to the projects I still don't know the differences between conda.core, conda.base and conda.common :D

I completely agree that conda as a library needs to be improved and fixed, but IMO this can be done by continuing to do what we’ve been doing; fix logging (only call init_logging for CLI), fix error handling (remove all sys.exit calls, only use error reporting for CLI, etc.), rethink organization (e.g., conda.cli should not contain any logic just the CLI interface), etc.

My concern with this continuous improvement plan is that it'll be tricky to define when we are done and what the deliverable is. We could leave things in the same package namespace if you want, provided there's a well defined end state we'd like to achieve. However:

  • This is also an opportunity to change how things are packaged and distributed.
  • We could finally have a reasonable core (and lightweight?) library other projects could depend on. Even on PyPI if that's desired.
  • The namespacing should help folks navigate the codebase better. If it doesn't improve the current situation, then we chose a bad design. This is what this issue is here for too.
  • It will also provide the ground work towards turning legacy code into legacy plugins, which should alleviate maintenance costs and simplify deprecation cycles while giving the opportunity for other teams to take over maintenance if they do need it (thinking of conda activate and friends).

All in all, I don't have a strong opinion on how to achieve the following items as long as the situation improves:

  • Clear separation of library and application.
  • Easier navigation of the codebase for newcomers.
  • Improved packaging and distribution opportunities.
  • Encapsulation of unwanted tech debt in isolated components that can be maintained separately.

@travishathaway
Copy link
Contributor

travishathaway commented May 2, 2024

@jaimergp,

I had a little time to think through this proposal and have the following feedback for you...

First, I like the lines of separation that have been drawn. I've long thought that having config being it's own thing is a good think and will make it easier for plugin authors to use it in the future. My only fear is that the context object has become a kind of God Object in our current code and may require some careful refactor work to be able to separate into an isolated module.

The second thing I worry about with this proposal is that we don't go far enough and fail to properly organize libconda. I understand that at first it will most likely just be "everything" else, but we should prioritize refactoring and re-organizing it once we have the first phase in place, which would be this higher level organization part (i.e. getting things in to the appropriate folders while perhaps trying to separate out configuration and shell related stuff as well as possible).

Speaking of phases, it might be nice to create a broad roadmap where we can easily understand what is being done in each phase to see what the critical path is for development and the goal we eventually want to reach.

I think you listed the goals of the project in your response to @kenodegard well, but you may want to further elaborate on them in order to get a very clear idea of why this work is important and perhaps even come up with specific examples for the benefits it will bring.

Here's my answers to your questions:

What's your preferred split? How do you envision the project structure? If you had to import from libconda, how does it look like in your head?

I really like how py-rattler has done this, specifically the quick start example in their docs:

https://mamba-org.github.io/rattler/py-rattler/

It might help us to start with an example that does something like that and use it to guide our decisions for refactoring/re-organizing libconda.

How do we make sure this is delivered without entering a cherry-picking / rebasing hell?

Not sure about this! I also wonder how we deprecate all of this? 🤔 It probably won't be that big of a deal because most stuff will still live in libconda which will have the conda namespace still. As we go forward it might get really tricky managing all this new import paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic a highlevel collection of smaller related issues source::contributor created by a frequent contributor type::tech-debt identifies or resolves some technical debt
Projects
Status: 🏗️ In Progress
Development

No branches or pull requests

5 participants