New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge all packages in a single pypi package (for real) #815
Comments
Hi @Fale Thanks for the feedback, but we do not plan to merge all packages into one. |
@lmazuel thanks for the answer. Debian is not the only distro out there. I'm trying to understand if the Debian approach is feasable for us as well or not, but from what I see, they have packaged only a minimal part of the azure python code. Also I noticed that OpenSUSE has the same problem (BUG #525) Also, the argument that this approach is consistent with the other languages is pointless, since is not consistent with python (and guess what, the users of the python-sdk do not care about the other SDKs, while they care about consistency with the rest of python) |
+1 for Fale's issue The code base is more like an SDK of individual SDK(s) rather than a single SDK with many classes? There is a lot of boilerplate code caused by the mass of pypi modules. For example, each pypi module has a setup.py (similar to project files in Visual Studio). |
Thanks @CalvinHartwell for your comment :) @Fale I agree that the cross language point is not interesting, let's forget I said that :) I cited Debian as an example because it's recent work for us, but I know there is more distro, RHEL, CentOS, Suse, ArchLinux etc (Mandriva when I was young...). Please be sure I keep an open-mind here and I'm very interested in the conversation (I really do). What I don't get is why you think a meta-package "azure" that installs some other package is a problem. Meta-package is something common in the Linux world, It's the whole point of the dependency system. In this situation, you have (for instance with random version number) a package "azure" 2.4.2 that will install "azure-mgmt-resource" 0.45.0 and "azure-mgmt-compute" 0.65.0 at the same time. So, you Currently, the "azure" meta-package is not accurate because some core libraries are still in preview. But the final plan is to make sure that "azure" will install every stable packages at the same time, with fixed version. "preview" services will be available for testing as separate package if you want. Please note that we also have people that are happy to install exactly what they want (only the Azure services they need). Thoughts? |
The current implementation has 3 problems from my point of view:
Also, the point of "many libraries" is a problem during the packaging effort because for many distro (included Fedora, and *EL, for which I'm looking at this SDK) every single pypi package has to be managed in a different (rpm) package, and therefore this means that to maintain the Azure SDK and Azure CLI I will have to maintain ~50 packages. On the other side, I'm the mainter of the AWS python SDK and AWS CLI tool and those are only 3 packages (2 for the SDK, 1 for CLI). |
@Fale To discuss, let's assume I merge everything into one package. Botocore is using meta-descriptions of RestAPI "on the fly", we use our meta-descriptions to generate Python code, which takes more place on disk. Our biggest package (Web) is currently doing 1.8Mb. On average packages are 500Kb sized. This means that 40 services of Azure, we can estimate the Edit: Change numbers to more accurate ones |
@Fale Also, why can't you put inside one rpm file several python packages at the same time? Is there some technical limitation somewhere, or is it just non conventional? |
Not be be read in a bad way, but your point is that your code is bloated and for this reason is better to split it? This argument is not very strong, I think... About the fact of putting more python packages in the same rpm is against the policy in many cases, also in the specific case is not possible (technically speaking) due to the fact that an RPM pakage has a single version and all azure packages have different versions and using wrong version is definitely against the policies |
If you have a strong equivalence |
Fedora guidelines force us to package things as much as possible "as the upstream" does. So, currently I should do the following packages:
Can I create the package python?-azure 2.1.2 that ships all files of the other modules? No
I can do what you describe only if the release cycle and the version number will be the same for all modules. |
So originally we also struggled with the issue of many Python packages vs. one rpm package which is why we opened the other issue and it took us a while to wrap our head around how to approach this. We finally decided to basically follow the Python package strategy. So what we have today are 2 packages https://build.opensuse.org/project/show/Cloud:Tools?search=azure, python-azure-sdk and python-azure-sdk-storage. As the whole thing gets broken into smaller pieces with different upstream teams managing different code streams I see the argument about the coordinated release problem. Also note that we decided to pull our sources from GitHub rather than pypi. We did struggle with the way things are pushed to pypi and found pulling from GitHub to be a better approach for us for package creation. To a certain degree we/I followed a similar approach with the ec2utils we provide in our Enceladus project, https://github.com/SUSE/Enceladus/tree/master/ec2utils, meaning different release cycles for each utility. Having different Python packages, which for us will eventually translate into different rpms implies that client code, for us azurectl, https://github.com/SUSE/azurectl, can be more precise about dependencies, which is an advantage. We have not yet packaged the new az tools, thus I cannot speak to the effect on that packaging effort and dependency management from that point of view, but I am certain we'll solve that in a reasonable way. While I share the concern of @Fale regarding package proliferation as well as the multi Python package approach not being "Pythonic", there are equally valid arguments on the other side, meaning to have a sdk-storage package, maybe one for networking etc. I think the argument about being "Pythonic" mostly comes into play after install. Meaning as long as I can from azure.storage import .... as a Python developer I really do not care whether site-package/azure/storage and sit-packages/azure/networking were installed by 2 distro packages or 1. Having multiple packages may be a bit more cumbersome for the developer to set up the system, meaning the developer has to potentially install many packages but that can be easily done with a one liner: pckgmgr --search python-azure | grep sdk | xargs pckmgr install or a meta-package, i.e. we can easily create a python-azure-sdk-all package or create a pattern that pulls all the other packages. I think it is equally valid to look at each service Azure provides as a separate target and have a separate SDK for that target as it is to look at the API as a whole. That it was decided at the origination of boto, and now carried into botocore, that all of the AWS API should be in one SDK, one Python package, is just as valid a decision as the decision made here that each service should have its own SDK managed by separate teams. So long story short, from my perspective either way is fine. If from a development perspective things are managed more easily at Msft to have multiple Python packages we are game to follow that route. |
Referencing related issue created for Python CLI which also has separate packages - Azure/azure-cli#1055 |
From the Debian perspective, I am ignoring PyPI and packaging from the Git repo. The idea that in the future there will be changes to the individual packages which will then be released via PyPI and not git tags breaks this. If the releases of all the core modules (i.e. the ones in this repo) are synchronised it makes everything a lot easier for me, and I guess for other distros too. |
Thanks @rjschwei and @irl ! |
It will be long and painful, but I can make it work with the policies. Thanks |
yes this makes packaging work much easier. The same code base referencing
yes and it should be possible to make this an automatic step Regards, |
@Fale fwiw - "do what upstream does" doesn't have to mean PyPI as PyPI is a downstream distribution of the upstream, there's no reason not to package up the sources from Git (possibly my Debian-oriented frame of reference). I have one source package, but plan to build one binary package for each of the logical PyPI packages within that. @lmazuel That's perfect for me (: @derekbekoe This would work great for the azure-cli package also. |
@irl: I think I'll go the github way too. We do this for many situations :). I'll go for one src.rpm and multiple rpms, but even having a single src.rpm package, it will be a fairly complex spec to be able to generate all the various sub packages properly considering files and versions etc |
@Fale I will just be shipping everything with the version number of the metapackage. Subcomponents may have differing versions, but they're all part of the larger "unified" release. |
@irl How do you manage packages dependencies that depends on the subcomponents? ie: https://github.com/Azure/azure-cli/blob/master/src/azure-cli-core/setup.py#L49 |
@Fale If there's a tagged unified release in git and the dependencies don't line up, then Microsoft has done a terrible job at release management. I don't anticipate this happening often. |
@irl Microsoft point is that they want to have different version numbers to be able to have different development cycles for the various parts of the codebase, so I anticipate this happening often going forward |
@Fale yes, between releases it may be broken and not all lined up, but the metapackage needs to be released with everything lined up otherwise it will never be installable, so the idea would be to package in distributions when, and only when, the metapackage sees a release and the git repo is tagged. |
Also:
|
@Fale not through the package management system they can't, and it's not a bug in your system if they've done something to break it. I fully anticipate other packages depending on the sdk, I have vagrant-azure in Debian depending on the Ruby SDK, and I'm quite happy to continue supporting this. I've had to patch the crap out of it to get it to work with the latest SDK, but as a distribution packager I expect to have to do some work occasionally. From what I can see, Microsoft are new to this and I'd rather compromise and accept a little extra work than make demands that they change their entire project management workflow and put them off engaging in open source communities. This situation is no different from any other situation where you have a library and it has dependencies, some external. As a distribution packager, you should be performing QA to catch these problems and working with upstream to find resolutions, or patching locally within your distribution to ensure all your packages line up. |
A couple of points and then I'll stop with this since we are going OT:
|
@Fale I think we're aggressively agreeing with each other perhaps. The important thing is that there are releases of the metapackage that have all the dependencies working together nicely. To summarise my view:
There is a great article on the topic of packages vs. pip here: https://notes.pault.ag/debian-python/ |
@lmazuel the dependency dropping would solve this circularity problem :) |
@bear454 I mean "no ARM", so ASM + |
Azure/azure-sdk-for-python#815 explains some of the issues
Is there a chance we could at least see another bundled release? I'm trying to maintain packages for azure-cli on Arch but I've run into a problem where the current 2.0.0rc6 release is too old and results in module errors and the git builds are too new resulting in a different set of errors. I've tried building each module in this repo separately but there's a ridiculous amount of them and several of them seem to be unable to install independently as needed for Arch packaging. I can see the argument for this because of independent release cycles but as a user I don't care. I need something that just works. The easiest solution, I think, would be to package major releases every so often that contain stable versions of each module. |
Hi @optlink Yes, the rc7 is planned. The problem is still that for a meta-package to be tagged as "stable", I need all sub-dependencies as stable. And it's not the case currently. However, if you do packages for the CLI, the CLI uses the same behavior: they are cut into services and sub-packages . For instance, this is the Network implementation of the CLI. And this package follow directly a specific version of Network (this 2.0.1 is linked to 0.30.0). So even if I do a rc7, I can't assure that at all packages will match all the sub-dependencies of the CLI. And even if I assure it today, this can change tomorrow with an update of Network or something else. I'm not sure I understand clearly your constraints (I'm a Ubuntu user, I just know Arch by name sorry :-( ), but send me an email at MS (<githubalias> at microsoft.com) and we will discuss with the CLI team how to help, or at least brainstorm something. FYI @derekbekoe @johanste |
On 03/30/2017 11:03 AM, Kelsey Maes wrote:
Is there a chance we could at least see another bundled release? I'm
trying to maintain packages for azure-cli on Arch but I've run into a
problem where the current 2.0.0rc6 release is too old and results in
module errors and the git builds are too new resulting in a different
set of errors.
I've tried building each module in this repo separately but there's a
ridiculous amount of them and several of them seem to be unable to
install independently as needed for Arch packaging.
I can see the argument for this because of independent release cycles
but as a user I don't care. I need something that just works. The
easiest solution, I think, would be to package major releases every so
often that contain stable versions of each module.
For what it's worth. So far we have also created a bundled package for
openSUSE and SUSE Linux Enterprise. However, we are starting with
packaging the az tools (azure-cli) and it is also split into many
pieces. Those in turn depend on the pieces of the SDK rather than the
SDK as a whole. Thus as a package you end up having to either maintain a
large number of packages or a large number of directives for provides in
order to sort out the proper version dependencies. We are going down the
road of many packages.
|
Thank you @rjschwei for your message. We will still continue to release one package per service, but do you have a suggestion of zip/tar.gz/package/tools or something that we can do to simplify your process? |
On 04/03/2017 12:19 PM, Laurent Mazuel wrote:
Thank you @rjschwei <https://github.com/rjschwei> for your message. We
will still continue to release one package per service, but do you have
a suggestion of zip/tar.gz/package/tools or something that we can do to
simplify your process?
I don't think there is anything else to do. The only simplification
would be to release everything as one, but we've already had that
discussion, and then have the cli package depend on that SDK version.
Anyway, since the cli gets released as Python packages based on service
components that in turn depend on Python packages that are released per
service component that's is really the model we have to follow on the
packaging level.
|
Hi! I have recently picked up the task to package the Azure SDK in openSUSE. For openSUSE, the current plan is to use the packages from the PyPi repository. However, while working through the various On the other hand, I could also use the tarballs generated by the git tags in the github repository. However, it's not clear to me which of these tags should be used when packaging the whole SDK while ensuring all modules are compatible with each other. If I read the discussion correctly, some of the modules can be too new so that they won't work with certain other modules anymore and can only be used individually. And if one wants to deploy the whole SDK, all modules must have the version belonging to a particular version of the whole SDK. So, my question now is: How do I get release tarballs with the proper versions for each module so that I get a complete and working SDK in the end? PyPi is currently apparently not the best source for the aforementioned reasons and so are the released creates through the git tags on github. Thanks! |
I just figured out that the SDK releases are available as single tarballs generated from the git tags, they follow this pattern:
e.g.:
So, I suggest just pulling the tarball from there and using this as a base for the packaging. |
Hi,
On 05/10/2017 05:02 AM, John Paul Adrian Glaubitz wrote:
I just figured out that the SDK releases are available as single
tarballs generated from the git tags, they follow this pattern:
|https://github.com/Azure/azure-sdk-for-python/archive/v?(\d.*)[A-Z,a-z,0-9]*\.zip|
e.g.:
|https://github.com/Azure/azure-sdk-for-python/archive/v2.0.0rc6.tar.gz|
So, I suggest just pulling the tarball from there and using this as a
base for the packaging.
That doesn't work because the azure-cli releases depend on the
individual components of the SDK, not on the SDK as a whole.
So as packager there are two choices:
a.) Create 1 package for SDK, as we pretty much do in openSUSE right now
and then have a very long list of Provides: statements where each
Provides lists a component. This list is going to be a PITA to maintain
and will inevitably be wrong and cause headachs
b.) Package each individual component of the SDK, the approach we are
now taking.
|
@derekbekoe, do you have any suggestions? |
Hi @glaubitz, sorry I didn't answer earlier, it was busy with //build/ this week and PyCon next, and I wanted to take the time to answer you correctly. I just want you to be sure I don't ignore you, I'll be back with my full brain soon :) |
Sorry, I wasn't clear enough then. I was not talking about creating a single RPM package, but to use the github tarball as a single source. Not because I particularly prefer github over PyPi but rather because the packages on PyPi are either outdated or broken.
I agree and that's definitely not what I want. However, having to pull every
That's definitely what I want to do. However, my problem currently is that I don't know for sure which set of packages I should use. Should I: a) Use the or b) Just use the latest tarball available in the github "Releases" tab, unpack that archive and generate the individual .zip files from there? For example, downloading https://github.com/Azure/azure-sdk-for-python/archive/azure-keyvault_0.3.3.tar.gz, unpacking it and creating the individual .zip files using that archive. The reason I ask is because each of these tarballs always contain the complete SDK and not just It's just confusing that the individual packages and the complete SDK show up on the same "Releases" tab. A releases normally indicates something that is stable - or at least beta - that users can download and use. That's why tagging releases for the individual packages while still containing the complete SDK is confusing as hell. Adrian |
To elaborate a little more: I just ran my small script over the unpacked |
@glaubitz I'm not really sure that managing tens of highly coupled packages is anywhere easier.... and that's why I opened this ticket (aka: I think it is not possible to manage this project in a sensible way, and this is why - given the IMHO unsatisfactory answers - I'm not packaging this for Fedora/EL) |
@Fale But you can just download the tarballs from the github releases page and you get all modules in a single tarball. In fact, that's what @irl is doing for Debian and since the Debian version is currently at v2.0.0rc6, it has less modules than are currently visible in the github repository. I generally don't have a problem juggling with a large number of sources - it's just a matter of good packaging tools after all - I'm just confused as to which versions to use for a stable distribution. |
Hi @glaubitz I understand it's complicated, really, :/. Github is not really built to host several packages in one repo. This is some answers:
Let's be pragmatic on what you want (before talking about how to do it): do you want to release one package like Debian like python-azure 2.0.0rc6? Or separate packages for each components? As @rjschwei was saying, the CLI is using each component package independently, so we might have an issue with that. Let's say we can sync azure-cli and azure-sdk bundle package, do you want to:
Once I get what you want the user experience to be, we will figure out the "how". |
Hi @lmazuel
You could put each package into a separate git repository and then use git submodules to references these modules in the git repository for the whole package. Lots of projects actually do this when they use third-party libraries like ffmpeg.
Ok, so this means that despite Thus, anyone wanting to use the releases from github really needs to download the tarball separately for each tagged package version.
Isn't
Sure. Will do that once I have finished writing this message ;).
I want to release separate components. But I also want that these components work with each other, at least that's what users are going to expect. If they use the package manager to install For me as the packager, it doesn't really matter whether the whole SDK is released in one tarball or as individual packages. I am writing some simple scripts that will help me deal with the upstream format to generate the RPM packages. What matters is that I know which versions I have to use to be able to assemble something that is going to work in the end on the users side. For example, if you have released any of the packages in a version which breaks compatibility with most of the other packages, I will naturally not use the latest version of that particular package. I will use the version which is still compatible with the rest and only update once all the other packages have made the transition upstream.
Yes, that's what I want. But again, creating a single package out of individual packages or vice versa is not the actual problem. The problem I have is that I don't know which versions are compatible with each other to form a complete, working SDK.
So, here's what I suggest: If I understand correctly, all the various packages are developed separately. So, these packages should naturally end in separate git repositories. Then use git submodules to link the packages in the main git repository of the Azure SDK. git submodules allows to link specific git commit versions of another repository. Thus, you are able to assemble the SDK from specific versions that are known to work together and you always have something releasable. If users want to use individual packages, they'll download the tagged tarball from the corresponding package's repository. If they want the whole SDK, they just download the latest tagged version as a tarball. |
Ideally we'd have 1 upstream tarball for each, the SDK and the CLI such that we can create python-azure-sdk-x.y.z and azure-cli.a.b.c packages with azure-cli.a.b.c depending on python-azure-sdk-x.y.z That's how the other guys do it ;) aws-cli has only a few dependencies with python-botocore being the equivalent to azure-sdk as the primary dependency. Anyway, I understand, as does probably everyone else interested in this topic, that there are tradeoffs either way and going with a development model of individual components is just as valid a choice as going with a development model that keeps everything together. However, with the chosen model of many components people down stream (packagers or direct users) still need to have some moment in time every now and then where all the pieces fit together. Based on the finding of @glaubitz this point in time is incredibly difficult to determine. So somehow a mechanism should exist that allows us to pull what would be considered a consistent SDK. If the answer to that is "whatever is on pypi" then that's OK, and maybe we just have to clean up a few things that @glaubitz ran across on pypi and then we are good to go. |
About SDK consistency:
Also, the source code truth is the sdist on PyPI. It's easy to get with XMLRPC, example for azure-keyvault 0.3.3: import xmlrpc.client
client = xmlrpc.client.ServerProxy("https://pypi.python.org/pypi")
[pkg['url'] for pkg in client.release_urls('azure-keyvault', '0.3.3') if pkg['python_version']=='source'][0] gives I'll discuss it with the CLI team today, I'll see if we can sync our release (for instance each 6 months). I want to release a 2.0.0, and I will try to use the exact same package than CLI 2.0.6. This way you can package azure-python-sdk 2.0.0 as a whole, and package azur-python-cli 2.0.6 as a whole as well, depending of azure-python-sdk 2.0.0 Thoughts? FYI @johanste |
On Mon, May 15, 2017 at 10:44:03AM -0700, Laurent Mazuel wrote:
About SDK consistency:
- For packages who depends on msrestazure, they must be have ">= 0.4". This is the only condition, meaning you can install azure-mgmt-resource 0.30.0rc6 and azure-mgmt-compute 1.0.0rc2 together with no issue. It's consistent in terms of installation, it's just weird in terms of features.
- For packages that not depends on msrestazure (I think there is three only, `azure-servicebus`, `azure-servicemanagement-legacy` and `azure-storage`), they are independant and consistent from version 0.20.0
Thanks. This answers my question.
Also, the source code truth is the sdist on PyPI. It's easy to get with XMLRPC, example for azure-keyvault 0.3.3:
```python
import xmlrpc.client
client = xmlrpc.client.ServerProxy("https://pypi.python.org/pypi")
[pkg['url'] for pkg in client.release_urls('azure-keyvault', '0.3.3') if pkg['python_version']=='source'][0]
```
gives
`https://pypi.python.org/packages/82/8b/9761cf4a00d9a9bdaf58507f21fce6ea5ea13236165afc0a0c19a74ac497/azure-keyvault-0.3.3.zip`
Aha, I wasn't aware of that. Thanks for the heads-up!
I'll discuss it with the CLI team today, I'll see if we can sync our release (for instance each 6 months). I want to release a 2.0.0, and I will try to use the exact same package than CLI 2.0.6. This way you can package azure-python-sdk 2.0.0 as a whole, and package azur-python-cli 2.0.6 as a whole as well, depending of azure-python-sdk 2.0.0
Thoughts?
We wanted to have separate packages in SUSE anyway, so that isn't
important. I really just wanted to know whether the version
dependencies are critical.
Thanks,
Adrian
|
<snip>
I'll discuss it with the CLI team today, I'll see if we can sync our release (for instance each 6 months). I want to release a 2.0.0, and I will try to use the exact same package than CLI 2.0.6. This way you can package azure-python-sdk 2.0.0 as a whole, and package azur-python-cli 2.0.6 as a whole as well, depending of azure-python-sdk 2.0.0
Thoughts?
That would be great but would require significant changes in the setup
of the CLI, i.e. within the components of the CLI the dependencies in
setup.py could no longer refer to the individual components of the SDK.
That's a bunch of work that will probably not fit the development model.
|
@rjschwei I'm not sure I get your issue? When you install a distrib package like @irl what do you think about that? Because if trying to sync SDK and CLI bundles makes no sense, I have no reason to do it. |
@lmazuel , sorry for falling off the face of the planet for a bit and creating a large time gap in the discussion. You are correct that the installed rpm package will also leave behind the Python information to satisfy installing the CLI bits. Thus if SDK and CLI releases can be synced such the cli-a.b.c depends on sdk-x.y.z then we could go to a one package model and we'd basically have 1 dependency in the CLI package. However, my concern with this approach would be that, to the best of my knowledge, no tools exist today to ensure this consistency. Of course such tools can be created, but in a sense these tools would counteract the development separation that at this time has been instituted in the SDK and CLI projects. So if you'd go through the effort to sync everything, which would really be nice for packagers, I think the development model would have to change to a certain degree. Getting everything in sync would basically mean to collect all the components and verify their dependencies are consistent within each, the SDK and the CLI and are consistent across the boundary. Creating a tool that ensures such consistency should be reasonably straight forward, but it still has to be created and maintained. However, during the "development phase" this consistency is not necessarily given, meaning CLI component A may depend on version X of SDK component H and CLI component B may depend on version Y of SDK component H. Which is fine as long as at the end of the development cycle both CLI components A and B depend on the same version of SDK component H. This drift makes testing very difficult. Also when there is a security issue because continuous testing is difficult it will not be a good idea to release the security fix from the development branch. The security fix will have to be inserted in two places, the current consistent (synced) code and the development code with a point release off the previous consistent set. This of course can all be managed, but the point is that developers on two teams will have to work more closely together than it appears was intended when the current development model was chosen. If we look at the same problem using the many-packages approach we can still get into a similar situation, if SDK component H gets a security fix and the version gets advanced. Now the CLI may be broken. However, because the dependencies are not conglomerated we know exactly which CLI packages need potential updates to accommodate the version bump of SDK component H. To make a long story short, a sync will make initial packaging easier, but individual packages will make dealing with version bumps due to security issues easier. One thing that would help tremendously would be if you and the CLI team can commit to semantic versioning http://semver.org/ at all levels and change the dependencies in all setup.py files accordingly, i.e.every dependency should be >= one major version and <= the next major version. There should not be any exact version matches enforced. If we can get to that point managing the plethora of packages will be reasonably straight forward. |
Closing, in favor of #1295 |
Having tens of pypi packages that are kind of united but not really makes it very difficult to package it in distros, please fix it (also because due to how python includes works there is no real advantage in doing a huge amount of small packages)
The text was updated successfully, but these errors were encountered: