Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge all packages in a single pypi package (for real) #815

Closed
Fale opened this issue Oct 10, 2016 · 63 comments
Closed

Merge all packages in a single pypi package (for real) #815

Fale opened this issue Oct 10, 2016 · 63 comments
Labels
Packaging question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@Fale
Copy link

Fale commented Oct 10, 2016

Having tens of pypi packages that are kind of united but not really makes it very difficult to package it in distros, please fix it (also because due to how python includes works there is no real advantage in doing a huge amount of small packages)

@lmazuel
Copy link
Member

lmazuel commented Oct 10, 2016

Hi @Fale

Thanks for the feedback, but we do not plan to merge all packages into one.
Each endpoint is managed by different service team that didn't have the same deadlines. For instance, it's not because the Batch team made a breaking change, that it's worth updating a big package including Compute and Resources. Actually, most of our users are using 2 or 3 packages (like Compute+Network) and don't want to install the whole services on their machine (size on disk + time to install).
In addition, this is a consistent experience across language.
We are already working with Debian to provide one "azure" package with a frozen state of several packages and we're happy so far with that solution. We will update in a near future the "azure" meta-package, so you'll be able to install one package with one command if you want, but giving the opportunity to other people to precisely chose what they want.

@Fale
Copy link
Author

Fale commented Oct 11, 2016

@lmazuel thanks for the answer. Debian is not the only distro out there. I'm trying to understand if the Debian approach is feasable for us as well or not, but from what I see, they have packaged only a minimal part of the azure python code. Also I noticed that OpenSUSE has the same problem (BUG #525)

Also, the argument that this approach is consistent with the other languages is pointless, since is not consistent with python (and guess what, the users of the python-sdk do not care about the other SDKs, while they care about consistency with the rest of python)

@CalvinHartwell
Copy link

CalvinHartwell commented Oct 11, 2016

+1 for Fale's issue

The code base is more like an SDK of individual SDK(s) rather than a single SDK with many classes? There is a lot of boilerplate code caused by the mass of pypi modules. For example, each pypi module has a setup.py (similar to project files in Visual Studio).

@lmazuel
Copy link
Member

lmazuel commented Oct 11, 2016

Thanks @CalvinHartwell for your comment :)

@Fale I agree that the cross language point is not interesting, let's forget I said that :)

I cited Debian as an example because it's recent work for us, but I know there is more distro, RHEL, CentOS, Suse, ArchLinux etc (Mandriva when I was young...). Please be sure I keep an open-mind here and I'm very interested in the conversation (I really do).

What I don't get is why you think a meta-package "azure" that installs some other package is a problem. Meta-package is something common in the Linux world, It's the whole point of the dependency system. In this situation, you have (for instance with random version number) a package "azure" 2.4.2 that will install "azure-mgmt-resource" 0.45.0 and "azure-mgmt-compute" 0.65.0 at the same time. So, you apt-get/yum install azure and got several Python packages at once. What's the issue?

Currently, the "azure" meta-package is not accurate because some core libraries are still in preview. But the final plan is to make sure that "azure" will install every stable packages at the same time, with fixed version. "preview" services will be available for testing as separate package if you want.

Please note that we also have people that are happy to install exactly what they want (only the Azure services they need).

Thoughts?

@Fale
Copy link
Author

Fale commented Oct 11, 2016

The current implementation has 3 problems from my point of view:

  1. You put a huge amount packages in the same repo. I get that this approach has some advantages, but is exactly the opposite of git and python best practices and common practices
  2. Every single package has it's own release cycle (and therefore version number), so the azure one, is not a real meta package but an installer.
  3. If this is an SDK, should have a single release cycle and release number, otherwise is a collection of libraries (which is not the same thing as an SDK)

Also, the point of "many libraries" is a problem during the packaging effort because for many distro (included Fedora, and *EL, for which I'm looking at this SDK) every single pypi package has to be managed in a different (rpm) package, and therefore this means that to maintain the Azure SDK and Azure CLI I will have to maintain ~50 packages. On the other side, I'm the mainter of the AWS python SDK and AWS CLI tool and those are only 3 packages (2 for the SDK, 1 for CLI).

@lmazuel
Copy link
Member

lmazuel commented Oct 11, 2016

@Fale To discuss, let's assume I merge everything into one package. Botocore is using meta-descriptions of RestAPI "on the fly", we use our meta-descriptions to generate Python code, which takes more place on disk. Our biggest package (Web) is currently doing 1.8Mb. On average packages are 500Kb sized. This means that 40 services of Azure, we can estimate the azure package to reach 20Mb. We plan to support several APIVersion for compatibility in a near future, which can leads a Python package of 200Mb easily. Of course it's an estimate, but it's a likely scenario. What do you think?

Edit: Change numbers to more accurate ones

@lmazuel
Copy link
Member

lmazuel commented Oct 11, 2016

@Fale Also, why can't you put inside one rpm file several python packages at the same time? Is there some technical limitation somewhere, or is it just non conventional?

@Fale
Copy link
Author

Fale commented Oct 11, 2016

Not be be read in a bad way, but your point is that your code is bloated and for this reason is better to split it? This argument is not very strong, I think...

About the fact of putting more python packages in the same rpm is against the policy in many cases, also in the specific case is not possible (technically speaking) due to the fact that an RPM pakage has a single version and all azure packages have different versions and using wrong version is definitely against the policies

@lmazuel
Copy link
Member

lmazuel commented Oct 11, 2016

If you have a strong equivalence azure 2.1.2 == azure-mgmt-resource 0.40.0 + azure-mgmt-compute 0.35.0 and so on, why don't you create a python3-azure package 2.1.2? There is no ambiguity, no cheating, and you have a single version to use. It's the approach used by Debian, and I don't see where it's against any policy? Really, I'd like to understand your point, but I'm not seeing the technical problem yet :(

@lmazuel
Copy link
Member

lmazuel commented Oct 11, 2016

@rjschwei @bear454 @schaefi, would you like to share your point of view for Suse?
@irl, would you like to share your point of view for Debian?

@Fale
Copy link
Author

Fale commented Oct 11, 2016

Fedora guidelines force us to package things as much as possible "as the upstream" does.

So, currently I should do the following packages:

  • python?-azure 2.1.2
  • python?-azure-mgmt-resource 0.40.0
  • python?-azure-mgmt-compute 0.35.0
  • etc

Can I create the package python?-azure 2.1.2 that ships all files of the other modules? No
Why?

  1. It violates the "stick to what upstream does" policy
  2. It will become a mess to manage due to the fact that other packages that will depend on specific libs of the SDK (probably the majority) will have dependencies to that specific module and to a set of versions (or a specific version) which will not be 2.1.2, but for instance could be [>=0.30.0; < 0.40.0] for azure-mgmt-compute.

I can do what you describe only if the release cycle and the version number will be the same for all modules.

@rjschwei
Copy link
Contributor

So originally we also struggled with the issue of many Python packages vs. one rpm package which is why we opened the other issue and it took us a while to wrap our head around how to approach this. We finally decided to basically follow the Python package strategy.

So what we have today are 2 packages https://build.opensuse.org/project/show/Cloud:Tools?search=azure, python-azure-sdk and python-azure-sdk-storage. As the whole thing gets broken into smaller pieces with different upstream teams managing different code streams I see the argument about the coordinated release problem.

Also note that we decided to pull our sources from GitHub rather than pypi. We did struggle with the way things are pushed to pypi and found pulling from GitHub to be a better approach for us for package creation.

To a certain degree we/I followed a similar approach with the ec2utils we provide in our Enceladus project, https://github.com/SUSE/Enceladus/tree/master/ec2utils, meaning different release cycles for each utility.

Having different Python packages, which for us will eventually translate into different rpms implies that client code, for us azurectl, https://github.com/SUSE/azurectl, can be more precise about dependencies, which is an advantage. We have not yet packaged the new az tools, thus I cannot speak to the effect on that packaging effort and dependency management from that point of view, but I am certain we'll solve that in a reasonable way.

While I share the concern of @Fale regarding package proliferation as well as the multi Python package approach not being "Pythonic", there are equally valid arguments on the other side, meaning to have a sdk-storage package, maybe one for networking etc. I think the argument about being "Pythonic" mostly comes into play after install. Meaning as long as I can

from azure.storage import ....
from azure.networkig import ...

as a Python developer I really do not care whether site-package/azure/storage and sit-packages/azure/networking were installed by 2 distro packages or 1. Having multiple packages may be a bit more cumbersome for the developer to set up the system, meaning the developer has to potentially install many packages but that can be easily done with a one liner:

pckgmgr --search python-azure | grep sdk | xargs pckmgr install

or a meta-package, i.e. we can easily create a python-azure-sdk-all package or create a pattern that pulls all the other packages.

I think it is equally valid to look at each service Azure provides as a separate target and have a separate SDK for that target as it is to look at the API as a whole.

That it was decided at the origination of boto, and now carried into botocore, that all of the AWS API should be in one SDK, one Python package, is just as valid a decision as the decision made here that each service should have its own SDK managed by separate teams.

So long story short, from my perspective either way is fine. If from a development perspective things are managed more easily at Msft to have multiple Python packages we are game to follow that route.

@derekbekoe
Copy link
Member

Referencing related issue created for Python CLI which also has separate packages - Azure/azure-cli#1055

@irl
Copy link
Contributor

irl commented Oct 11, 2016

From the Debian perspective, I am ignoring PyPI and packaging from the Git repo. The idea that in the future there will be changes to the individual packages which will then be released via PyPI and not git tags breaks this. If the releases of all the core modules (i.e. the ones in this repo) are synchronised it makes everything a lot easier for me, and I guess for other distros too.

@lmazuel
Copy link
Member

lmazuel commented Oct 11, 2016

Thanks @rjschwei and @irl !
So, in summary, what I understand is as long I create some checkpoint as tags in the repo, you're good to package the current Github state as "tag" version number. I can publish on PyPI new packages for a specific service if available, but it will be sync as a Linux package only when I release a new version of the "azure" meta-package (with a new associated tag on Github).
That's seems fair to me. Anyway, I plan to release more often the "azure" meta-package once the core ARM modules (Storage/Compute/Network/Resource) will be officially stable.
@Fale are we fine with that plan?

@Fale
Copy link
Author

Fale commented Oct 11, 2016

It will be long and painful, but I can make it work with the policies. Thanks

@schaefi
Copy link
Contributor

schaefi commented Oct 12, 2016

So, in summary, what I understand is as long I create some checkpoint
as tags in the repo, you're good to package the current Github state as
"tag" version number.

yes this makes packaging work much easier. The same code base referencing
that release tag should also exist on pypi. In my projects that happens
automatically see:

https://docs.travis-ci.com/user/deployment/pypi

I can publish on PyPI new packages for a specific
service if available, but it will be sync as a Linux package only when
I release a new version of the "azure" meta-package (with a new
associated tag on Github).

yes and it should be possible to make this an automatic step

Regards,
Marcus

@irl
Copy link
Contributor

irl commented Oct 12, 2016

@Fale fwiw - "do what upstream does" doesn't have to mean PyPI as PyPI is a downstream distribution of the upstream, there's no reason not to package up the sources from Git (possibly my Debian-oriented frame of reference). I have one source package, but plan to build one binary package for each of the logical PyPI packages within that.

@lmazuel That's perfect for me (:

@derekbekoe This would work great for the azure-cli package also.

@Fale
Copy link
Author

Fale commented Oct 12, 2016

@irl: I think I'll go the github way too. We do this for many situations :). I'll go for one src.rpm and multiple rpms, but even having a single src.rpm package, it will be a fairly complex spec to be able to generate all the various sub packages properly considering files and versions etc

@irl
Copy link
Contributor

irl commented Oct 12, 2016

@Fale I will just be shipping everything with the version number of the metapackage. Subcomponents may have differing versions, but they're all part of the larger "unified" release.

@Fale
Copy link
Author

Fale commented Oct 12, 2016

@irl How do you manage packages dependencies that depends on the subcomponents? ie: https://github.com/Azure/azure-cli/blob/master/src/azure-cli-core/setup.py#L49

@irl
Copy link
Contributor

irl commented Oct 12, 2016

@Fale If there's a tagged unified release in git and the dependencies don't line up, then Microsoft has done a terrible job at release management. I don't anticipate this happening often.

@Fale
Copy link
Author

Fale commented Oct 12, 2016

@irl Microsoft point is that they want to have different version numbers to be able to have different development cycles for the various parts of the codebase, so I anticipate this happening often going forward

@irl
Copy link
Contributor

irl commented Oct 12, 2016

@Fale yes, between releases it may be broken and not all lined up, but the metapackage needs to be released with everything lined up otherwise it will never be installable, so the idea would be to package in distributions when, and only when, the metapackage sees a release and the git repo is tagged.

@Fale
Copy link
Author

Fale commented Oct 12, 2016

Also:

  • people could "cherry pick" an update and break the other
  • other packages (non-Microsoft) could depend on Microsoft code

@irl
Copy link
Contributor

irl commented Oct 12, 2016

@Fale not through the package management system they can't, and it's not a bug in your system if they've done something to break it. I fully anticipate other packages depending on the sdk, I have vagrant-azure in Debian depending on the Ruby SDK, and I'm quite happy to continue supporting this. I've had to patch the crap out of it to get it to work with the latest SDK, but as a distribution packager I expect to have to do some work occasionally.

From what I can see, Microsoft are new to this and I'd rather compromise and accept a little extra work than make demands that they change their entire project management workflow and put them off engaging in open source communities. This situation is no different from any other situation where you have a library and it has dependencies, some external. As a distribution packager, you should be performing QA to catch these problems and working with upstream to find resolutions, or patching locally within your distribution to ensure all your packages line up.

@Fale
Copy link
Author

Fale commented Oct 12, 2016

A couple of points and then I'll stop with this since we are going OT:

Microsoft are new to this and I'd rather compromise and accept a little extra work than make demands that they change their entire project management workflow and put them off engaging in open source communities.

  1. Since Microsoft is new, is even more important to discuss with them "how open source works", so that they can understand it before doing errors
  2. I can compromise and accept extra work, I will not compromise and accept to break Fedora dependencies

@irl
Copy link
Contributor

irl commented Oct 12, 2016

@Fale I think we're aggressively agreeing with each other perhaps. The important thing is that there are releases of the metapackage that have all the dependencies working together nicely.

To summarise my view:

  • Releases get tagged in the git repo
  • When a release is tagged, all the modules will work together
  • Distributions will update packages when a metapackage is released
  • It is expected that changes in modules will be backwards compatible to the last version released with a metapackage, or the metapackge version gets bumped (version numbers are cheap)
  • It is expected that changes will not be massive and sweeping every release allowing for others to build applications against the SDK
  • Distribution packages are aimed at production use by end-users, developers are still able to install more recent releases for PyPI in order to test their applications with future versions
  • It is expected that if a developer breaks their system by mixing versions from distribution packages with PyPI, they should be able to understand why it is broken

There is a great article on the topic of packages vs. pip here: https://notes.pault.ag/debian-python/

@Fale
Copy link
Author

Fale commented Oct 18, 2016

@lmazuel the dependency dropping would solve this circularity problem :)

@lmazuel
Copy link
Member

lmazuel commented Oct 18, 2016

@bear454 I mean "no ARM", so ASM + azure-servicebus.
All common libraries for ARM are in msrest/msrestazure packages.

jonathangray added a commit to jonathangray/ports-azure that referenced this issue Jan 6, 2017
@optlink
Copy link

optlink commented Mar 30, 2017

Is there a chance we could at least see another bundled release? I'm trying to maintain packages for azure-cli on Arch but I've run into a problem where the current 2.0.0rc6 release is too old and results in module errors and the git builds are too new resulting in a different set of errors.

I've tried building each module in this repo separately but there's a ridiculous amount of them and several of them seem to be unable to install independently as needed for Arch packaging.

I can see the argument for this because of independent release cycles but as a user I don't care. I need something that just works. The easiest solution, I think, would be to package major releases every so often that contain stable versions of each module.

@lmazuel
Copy link
Member

lmazuel commented Mar 30, 2017

Hi @optlink

Yes, the rc7 is planned. The problem is still that for a meta-package to be tagged as "stable", I need all sub-dependencies as stable. And it's not the case currently.

However, if you do packages for the CLI, the CLI uses the same behavior: they are cut into services and sub-packages . For instance, this is the Network implementation of the CLI. And this package follow directly a specific version of Network (this 2.0.1 is linked to 0.30.0). So even if I do a rc7, I can't assure that at all packages will match all the sub-dependencies of the CLI. And even if I assure it today, this can change tomorrow with an update of Network or something else.

I'm not sure I understand clearly your constraints (I'm a Ubuntu user, I just know Arch by name sorry :-( ), but send me an email at MS (<githubalias> at microsoft.com) and we will discuss with the CLI team how to help, or at least brainstorm something.

FYI @derekbekoe @johanste

@rjschwei
Copy link
Contributor

rjschwei commented Apr 2, 2017 via email

@lmazuel
Copy link
Member

lmazuel commented Apr 3, 2017

Thank you @rjschwei for your message. We will still continue to release one package per service, but do you have a suggestion of zip/tar.gz/package/tools or something that we can do to simplify your process?

@rjschwei
Copy link
Contributor

rjschwei commented Apr 4, 2017 via email

@glaubitz
Copy link

glaubitz commented May 8, 2017

Hi!

I have recently picked up the task to package the Azure SDK in openSUSE. For openSUSE, the current plan is to use the packages from the PyPi repository. However, while working through the various azure-mgmt-* packages I noticed that many packages are either outdated on PyPi ( those are commerce, compute, network, powerbiembedded, resource, servicebus and storage from the mgmt packages and the meta packages azure-mgmt and azure-nspkg) or are missing the __init__.py files so that setuptools fails to install them properly (those are eventhub, media, network, resource, search, servermanager, servicebus and storage).

On the other hand, I could also use the tarballs generated by the git tags in the github repository. However, it's not clear to me which of these tags should be used when packaging the whole SDK while ensuring all modules are compatible with each other. If I read the discussion correctly, some of the modules can be too new so that they won't work with certain other modules anymore and can only be used individually. And if one wants to deploy the whole SDK, all modules must have the version belonging to a particular version of the whole SDK.

So, my question now is: How do I get release tarballs with the proper versions for each module so that I get a complete and working SDK in the end? PyPi is currently apparently not the best source for the aforementioned reasons and so are the released creates through the git tags on github.

Thanks!

@glaubitz
Copy link

I just figured out that the SDK releases are available as single tarballs generated from the git tags, they follow this pattern:

https://github.com/Azure/azure-sdk-for-python/archive/v?(\d.*)[A-Z,a-z,0-9]*\.zip

e.g.:

https://github.com/Azure/azure-sdk-for-python/archive/v2.0.0rc6.tar.gz

So, I suggest just pulling the tarball from there and using this as a base for the packaging.

@rjschwei
Copy link
Contributor

rjschwei commented May 10, 2017 via email

@johanste
Copy link
Member

@derekbekoe, do you have any suggestions?

@lmazuel
Copy link
Member

lmazuel commented May 11, 2017

Hi @glaubitz, sorry I didn't answer earlier, it was busy with //build/ this week and PyCon next, and I wanted to take the time to answer you correctly. I just want you to be sure I don't ignore you, I'll be back with my full brain soon :)

@glaubitz
Copy link

That doesn't work because the azure-cli releases depend on the individual components of the SDK, not on the SDK as a whole.

Sorry, I wasn't clear enough then. I was not talking about creating a single RPM package, but to use the github tarball as a single source. Not because I particularly prefer github over PyPi but rather because the packages on PyPi are either outdated or broken.

a.) Create 1 package for SDK, as we pretty much do in openSUSE right now and then have a very long list of Provides: statements where each Provides lists a component. This list is going to be a PITA to maintain and will inevitably be wrong and cause headachs

I agree and that's definitely not what I want. However, having to pull every

b.) Package each individual component of the SDK, the approach we are now taking.

That's definitely what I want to do. However, my problem currently is that I don't know for sure which set of packages I should use.

Should I:

a) Use the v2.0.0rc6.tar.gz as the source for all base packages and create RPMs from that? I have written a small script which creates the individual .zip files for all individual packages. Then complement these RPMs packages with the remaining packages from our list, just using the latest available release version for each package.

or

b) Just use the latest tarball available in the github "Releases" tab, unpack that archive and generate the individual .zip files from there? For example, downloading https://github.com/Azure/azure-sdk-for-python/archive/azure-keyvault_0.3.3.tar.gz, unpacking it and creating the individual .zip files using that archive.

The reason I ask is because each of these tarballs always contain the complete SDK and not just azure-keyvault, for example. Thus, when I download and use the tarball azure-keyvault_0.3.3.tar.gz, can I still assemble a working SDK from that or does that work with the v2.00rc6.tar.gz tarball as it has been tagged as a release of the whole SDK?

It's just confusing that the individual packages and the complete SDK show up on the same "Releases" tab. A releases normally indicates something that is stable - or at least beta - that users can download and use. That's why tagging releases for the individual packages while still containing the complete SDK is confusing as hell.

Adrian

@glaubitz
Copy link

To elaborate a little more: I just ran my small script over the unpacked azure-keyvault_0.3.3.tar.gz and it created azure-2.0.0rc7.zip among others, so the resulting SDK I got is something between rc6 and rc7 (since rc7 has not been officially tagged yet).

@Fale
Copy link
Author

Fale commented May 11, 2017

@glaubitz I'm not really sure that managing tens of highly coupled packages is anywhere easier.... and that's why I opened this ticket (aka: I think it is not possible to manage this project in a sensible way, and this is why - given the IMHO unsatisfactory answers - I'm not packaging this for Fedora/EL)

@glaubitz
Copy link

@Fale But you can just download the tarballs from the github releases page and you get all modules in a single tarball. In fact, that's what @irl is doing for Debian and since the Debian version is currently at v2.0.0rc6, it has less modules than are currently visible in the github repository.

I generally don't have a problem juggling with a large number of sources - it's just a matter of good packaging tools after all - I'm just confused as to which versions to use for a stable distribution.

@lmazuel
Copy link
Member

lmazuel commented May 11, 2017

Hi @glaubitz

I understand it's complicated, really, :/. Github is not really built to host several packages in one repo. This is some answers:

  • Tag are on purpose "<package>_<version>" and are made just for the specific package mentioned in the tag. I don't recommend for instance to use tag "azure-keyvault_0.3.3" to install "azure-mgmt-compute"
  • Tag like "v2.0.0rc6" are also intended to be accurate for "azure 2.0.0rc6" only, even if I'm pretty sure the state of the repo at this state was correct, according to the content of v2.0.0rc6
  • Package on PyPI are not outdated, I'm surprised you got issues? About the issues you found, could send me a more detail email at <githubalias>@microsoft.com?

Let's be pragmatic on what you want (before talking about how to do it): do you want to release one package like Debian like python-azure 2.0.0rc6? Or separate packages for each components? As @rjschwei was saying, the CLI is using each component package independently, so we might have an issue with that. Let's say we can sync azure-cli and azure-sdk bundle package, do you want to:

  • Release python-azure x.y.x with a lot of packages
  • Release python-azure-cli x.y.z that depends of python-azure x.y.x?
  • Something else?

Once I get what you want the user experience to be, we will figure out the "how".

@glaubitz
Copy link

Hi @lmazuel

I understand it's complicated, really, :/. Github is not really built to host several packages in one repo.

You could put each package into a separate git repository and then use git submodules to references these modules in the git repository for the whole package. Lots of projects actually do this when they use third-party libraries like ffmpeg.

Tag are on purpose "_" and are made just for the specific package mentioned in the tag. I don't recommend for instance to use tag "azure-keyvault_0.3.3" to install "azure-mgmt-compute"

Ok, so this means that despite azure-keyvault_0.3.3 containing the whole SDK, I should just always assume the remaining packages are effectively git snapshots and should not be used for anything but development. Thus, when I download azure-keyvault_0.3.3, the azure-mgmt-comput package inside this tarball is probably version 1.0.0rc1 plus some extra commits and shouldn't be used for production.

Thus, anyone wanting to use the releases from github really needs to download the tarball separately for each tagged package version.

Tag like "v2.0.0rc6" are also intended to be accurate for "azure 2.0.0rc6" only, even if I'm pretty sure the state of the repo at this state was correct, according to the content of v2.0.0rc6

Isn't azure supposed to be the primary meta package which allows to install the whole SDK in one step? I'm not sure what would be the point of tagging a version of number for the whole SDK if it doesn't mean the generated tarball doesn't create something that works.

Package on PyPI are not outdated, I'm surprised you got issues? About the issues you found, could send me a more detail email at @microsoft.com?

Sure. Will do that once I have finished writing this message ;).

Let's be pragmatic on what you want (before talking about how to do it): do you want to release one package like Debian like python-azure 2.0.0rc6? Or separate packages for each components?

I want to release separate components. But I also want that these components work with each other, at least that's what users are going to expect. If they use the package manager to install azure, they expect to get the SDK installed ready to be used without having to replace individual components.

For me as the packager, it doesn't really matter whether the whole SDK is released in one tarball or as individual packages. I am writing some simple scripts that will help me deal with the upstream format to generate the RPM packages. What matters is that I know which versions I have to use to be able to assemble something that is going to work in the end on the users side.

For example, if you have released any of the packages in a version which breaks compatibility with most of the other packages, I will naturally not use the latest version of that particular package. I will use the version which is still compatible with the rest and only update once all the other packages have made the transition upstream.

As @rjschwei was saying, the CLI is using each component package independently, so we might have an issue with that. Let's say we can sync azure-cli and azure-sdk bundle package, do you want to:

Release python-azure x.y.x with a lot of packages

Yes, that's what I want. But again, creating a single package out of individual packages or vice versa is not the actual problem. The problem I have is that I don't know which versions are compatible with each other to form a complete, working SDK.

Once I get what you want the user experience to be, we will figure out the "how".

So, here's what I suggest:

If I understand correctly, all the various packages are developed separately. So, these packages should naturally end in separate git repositories. Then use git submodules to link the packages in the main git repository of the Azure SDK. git submodules allows to link specific git commit versions of another repository. Thus, you are able to assemble the SDK from specific versions that are known to work together and you always have something releasable.

If users want to use individual packages, they'll download the tagged tarball from the corresponding package's repository. If they want the whole SDK, they just download the latest tagged version as a tarball.

@rjschwei
Copy link
Contributor

@lmazuel

Ideally we'd have 1 upstream tarball for each, the SDK and the CLI such that we can create

python-azure-sdk-x.y.z and azure-cli.a.b.c packages with azure-cli.a.b.c depending on python-azure-sdk-x.y.z

That's how the other guys do it ;) aws-cli has only a few dependencies with python-botocore being the equivalent to azure-sdk as the primary dependency.

Anyway, I understand, as does probably everyone else interested in this topic, that there are tradeoffs either way and going with a development model of individual components is just as valid a choice as going with a development model that keeps everything together. However, with the chosen model of many components people down stream (packagers or direct users) still need to have some moment in time every now and then where all the pieces fit together. Based on the finding of @glaubitz this point in time is incredibly difficult to determine.

So somehow a mechanism should exist that allows us to pull what would be considered a consistent SDK. If the answer to that is "whatever is on pypi" then that's OK, and maybe we just have to clean up a few things that @glaubitz ran across on pypi and then we are good to go.

@lmazuel
Copy link
Member

lmazuel commented May 15, 2017

@glaubitz @rjschwei

About SDK consistency:

  • For packages who depends on msrestazure, they must be have ">= 0.4". This is the only condition, meaning you can install azure-mgmt-resource 0.30.0rc6 and azure-mgmt-compute 1.0.0rc2 together with no issue. It's consistent in terms of installation, it's just weird in terms of features.
  • For packages that not depends on msrestazure (I think there is three only, azure-servicebus, azure-servicemanagement-legacy and azure-storage), they are independant and consistent from version 0.20.0

Also, the source code truth is the sdist on PyPI. It's easy to get with XMLRPC, example for azure-keyvault 0.3.3:

import xmlrpc.client
client = xmlrpc.client.ServerProxy("https://pypi.python.org/pypi")
[pkg['url'] for pkg in client.release_urls('azure-keyvault', '0.3.3') if pkg['python_version']=='source'][0]

gives
https://pypi.python.org/packages/82/8b/9761cf4a00d9a9bdaf58507f21fce6ea5ea13236165afc0a0c19a74ac497/azure-keyvault-0.3.3.zip

I'll discuss it with the CLI team today, I'll see if we can sync our release (for instance each 6 months). I want to release a 2.0.0, and I will try to use the exact same package than CLI 2.0.6. This way you can package azure-python-sdk 2.0.0 as a whole, and package azur-python-cli 2.0.6 as a whole as well, depending of azure-python-sdk 2.0.0

Thoughts?

FYI @johanste

@glaubitz
Copy link

glaubitz commented May 16, 2017 via email

@rjschwei
Copy link
Contributor

rjschwei commented May 16, 2017 via email

@lmazuel
Copy link
Member

lmazuel commented May 16, 2017

@rjschwei I'm not sure I get your issue? When you install a distrib package like python-azure-sdk, I think you install the necessary "dist-info" folders, so pip is not able to make the difference between a yum installation and an pip installation correct? So here, if your python-azure-cli depends on python-azure-sdk, and I took care to make them in sync, this should on the contrary makes your life easier?

@irl what do you think about that? Because if trying to sync SDK and CLI bundles makes no sense, I have no reason to do it.

@rjschwei
Copy link
Contributor

@lmazuel , sorry for falling off the face of the planet for a bit and creating a large time gap in the discussion.

You are correct that the installed rpm package will also leave behind the Python information to satisfy installing the CLI bits. Thus if SDK and CLI releases can be synced such the cli-a.b.c depends on sdk-x.y.z then we could go to a one package model and we'd basically have 1 dependency in the CLI package.

However, my concern with this approach would be that, to the best of my knowledge, no tools exist today to ensure this consistency. Of course such tools can be created, but in a sense these tools would counteract the development separation that at this time has been instituted in the SDK and CLI projects.

So if you'd go through the effort to sync everything, which would really be nice for packagers, I think the development model would have to change to a certain degree.

Getting everything in sync would basically mean to collect all the components and verify their dependencies are consistent within each, the SDK and the CLI and are consistent across the boundary. Creating a tool that ensures such consistency should be reasonably straight forward, but it still has to be created and maintained.

However, during the "development phase" this consistency is not necessarily given, meaning CLI component A may depend on version X of SDK component H and CLI component B may depend on version Y of SDK component H. Which is fine as long as at the end of the development cycle both CLI components A and B depend on the same version of SDK component H. This drift makes testing very difficult. Also when there is a security issue because continuous testing is difficult it will not be a good idea to release the security fix from the development branch. The security fix will have to be inserted in two places, the current consistent (synced) code and the development code with a point release off the previous consistent set. This of course can all be managed, but the point is that developers on two teams will have to work more closely together than it appears was intended when the current development model was chosen.

If we look at the same problem using the many-packages approach we can still get into a similar situation, if SDK component H gets a security fix and the version gets advanced. Now the CLI may be broken. However, because the dependencies are not conglomerated we know exactly which CLI packages need potential updates to accommodate the version bump of SDK component H.

To make a long story short, a sync will make initial packaging easier, but individual packages will make dealing with version bumps due to security issues easier.

One thing that would help tremendously would be if you and the CLI team can commit to semantic versioning http://semver.org/ at all levels and change the dependencies in all setup.py files accordingly, i.e.every dependency should be >= one major version and <= the next major version. There should not be any exact version matches enforced. If we can get to that point managing the plethora of packages will be reasonably straight forward.

@lmazuel
Copy link
Member

lmazuel commented Jul 24, 2018

Closing, in favor of #1295

@lmazuel lmazuel closed this as completed Jul 24, 2018
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Packaging question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests