Shared package versions break mirroring? #60

Closed
travisgroth opened this Issue May 22, 2014 · 7 comments

Comments

Projects
None yet
3 participants
@travisgroth

In some repositories there are packages that are identical across distributions. An example would be the .deb packages in the puppetlabs repos - the same package might be used for squeeze, wheezy and precise. When this happens, aptly appears to choke. It would be great if it handled this gracefully and de-duplicated the packages correctly in the database. Am I missing the correct way to handle this?

# aptly mirror list
List of mirrors:
 * [puppet-precise]: http://apt.puppetlabs.com/ precise
 * [puppet-squeeze]: http://apt.puppetlabs.com/ squeeze
Mirror `puppet-squeeze` has been successfully updated.

# aptly mirror update puppet-precise
Downloading http://apt.puppetlabs.com/dists/precise/Release...
Downloading & parsing package files...
Downloading http://apt.puppetlabs.com/dists/precise/dependencies/binary-amd64/Packages.bz2...
Downloading http://apt.puppetlabs.com/dists/precise/dependencies/binary-amd64/Packages.gz...
ERROR: unable to update: unable to save: leiningen_1.7.1-1puppetlabs1_all, conflict with existing packge
admin@netvault:/stor/media/repo/aptly# aptly mirror update puppet-squeeze
Downloading http://apt.puppetlabs.com/dists/squeeze/Release...
Downloading & parsing package files...
Downloading http://apt.puppetlabs.com/dists/squeeze/dependencies/binary-amd64/Packages.bz2...
Downloading http://apt.puppetlabs.com/dists/squeeze/dependencies/binary-amd64/Packages.gz...
Downloading http://apt.puppetlabs.com/dists/squeeze/devel/binary-amd64/Packages.bz2...
Downloading http://apt.puppetlabs.com/dists/squeeze/devel/binary-amd64/Packages.gz...
Downloading http://apt.puppetlabs.com/dists/squeeze/main/binary-amd64/Packages.bz2...
Downloading http://apt.puppetlabs.com/dists/squeeze/main/binary-amd64/Packages.gz...
Building download queue...
Download queue: 0 items (0 B)

Mirror `puppet-squeeze` has been successfully updated.

# aptly mirror update puppet-precise
Downloading http://apt.puppetlabs.com/dists/precise/Release...
Downloading & parsing package files...
Downloading http://apt.puppetlabs.com/dists/precise/dependencies/binary-amd64/Packages.bz2...
Downloading http://apt.puppetlabs.com/dists/precise/dependencies/binary-amd64/Packages.gz...
ERROR: unable to update: unable to save: leiningen_1.7.1-1puppetlabs1_all, conflict with existing packge
@ryanuber

This comment has been minimized.

Show comment
Hide comment
@ryanuber

ryanuber May 22, 2014

Contributor

@travisgroth, It actually looks like Aptly might be right in this case. Although the packages are the same in terms of name-version-arch, the packages have different checksums when downloaded. I downloaded both and checked them with md5sum:

a793c4b1325865ca848a27d7b83322f7  leiningen_1.7.1-1puppetlabs1_all (1).deb
6df7a71ede94e16ec2da528f455ec837  leiningen_1.7.1-1puppetlabs1_all.deb

This actually makes it hard to mirror both distributions without using two distinct Aptly data directories or something like that, in which case everything would be duplicated. A change in the storage schema would probably be required to handle this elegantly while still offering package deduplication.

@smira what do you think?

Contributor

ryanuber commented May 22, 2014

@travisgroth, It actually looks like Aptly might be right in this case. Although the packages are the same in terms of name-version-arch, the packages have different checksums when downloaded. I downloaded both and checked them with md5sum:

a793c4b1325865ca848a27d7b83322f7  leiningen_1.7.1-1puppetlabs1_all (1).deb
6df7a71ede94e16ec2da528f455ec837  leiningen_1.7.1-1puppetlabs1_all.deb

This actually makes it hard to mirror both distributions without using two distinct Aptly data directories or something like that, in which case everything would be duplicated. A change in the storage schema would probably be required to handle this elegantly while still offering package deduplication.

@smira what do you think?

@travisgroth

This comment has been minimized.

Show comment
Hide comment
@travisgroth

travisgroth May 22, 2014

@ryanuber oh wow I actually thought they were the same. I have some internal packages that are definitely identical across distros, so I'll test the behavior there.

The puppetlabs repo is certainly problematic. I guess you can't efficiently store those since the checksum isn't the same, but managing different aptly installs per distro due to that kind of upstream sounds like a pain. Even if inefficient, it would be nice to handle that kind of situation gracefully.

@ryanuber oh wow I actually thought they were the same. I have some internal packages that are definitely identical across distros, so I'll test the behavior there.

The puppetlabs repo is certainly problematic. I guess you can't efficiently store those since the checksum isn't the same, but managing different aptly installs per distro due to that kind of upstream sounds like a pain. Even if inefficient, it would be nice to handle that kind of situation gracefully.

@smira

This comment has been minimized.

Show comment
Hide comment
@smira

smira May 22, 2014

Member

@ryanuber, @travisgroth, aptly was designed to deduplicate packages right from the start. Packages are stored as one central key -> value mapping (where key is name_arch_version and value is package metadata). Storing different set of files with different checksum is not a problem. Based on that design, mirrors, snapshots and local repos are just references to central db of packages.

My assumption was based on https://wiki.debian.org/RepositoryFormat:

Duplicate Packages
A repository must not include different packages (different content) with the same package name, version, and architecture. When a repository is meant to be used as a supplement to another repository this should hold for the joint main+supplement repository as well.

I understand that in this particular case two parts are not expected to be main+supplement as they're for different distros.

On other hand, if I give instruction to aptly to pull package leningen (=1.7.1-1) from one snapshots to another, what should it pull? Package specification starts being ambiguous.

I'll think more about this issue and what is the best way to handle it. Probably something like "prefixing" would work here (so you could give unique prefix to the mirror and all packages coming from that mirror would never be in conflict with packages coming from other mirrors).

Member

smira commented May 22, 2014

@ryanuber, @travisgroth, aptly was designed to deduplicate packages right from the start. Packages are stored as one central key -> value mapping (where key is name_arch_version and value is package metadata). Storing different set of files with different checksum is not a problem. Based on that design, mirrors, snapshots and local repos are just references to central db of packages.

My assumption was based on https://wiki.debian.org/RepositoryFormat:

Duplicate Packages
A repository must not include different packages (different content) with the same package name, version, and architecture. When a repository is meant to be used as a supplement to another repository this should hold for the joint main+supplement repository as well.

I understand that in this particular case two parts are not expected to be main+supplement as they're for different distros.

On other hand, if I give instruction to aptly to pull package leningen (=1.7.1-1) from one snapshots to another, what should it pull? Package specification starts being ambiguous.

I'll think more about this issue and what is the best way to handle it. Probably something like "prefixing" would work here (so you could give unique prefix to the mirror and all packages coming from that mirror would never be in conflict with packages coming from other mirrors).

@smira

This comment has been minimized.

Show comment
Hide comment
@smira

smira May 22, 2014

Member

There's an issue with squeeze main + security repositories (https://groups.google.com/d/msg/aptly-discuss/XKdTA1NIjVU/xCmZ1YtI8DAJ) reported in aptly-discuss group. It looks the same, but I believe it is a bug in Debian archive (waiting for reply from ftp-masters).

Member

smira commented May 22, 2014

There's an issue with squeeze main + security repositories (https://groups.google.com/d/msg/aptly-discuss/XKdTA1NIjVU/xCmZ1YtI8DAJ) reported in aptly-discuss group. It looks the same, but I believe it is a bug in Debian archive (waiting for reply from ftp-masters).

@travisgroth

This comment has been minimized.

Show comment
Hide comment
@travisgroth

travisgroth May 25, 2014

@smira I assumed that uniqueness constraint was in place in the puppet repo as well. That said, since aptly is meant to handle multiple repos, this sounds like a problem that could come up regardless - two unrelated mirrors could be maintaining their own build of leningen 1.7.1-1, in theory. Internally, prefixing sounds OK. When pushing packages around, though, why not make the origin repo an optional parameter that either becomes required if there is more than one copy of a package name/version/arch tuple or defaults to the newest or oldest one.

@smira I assumed that uniqueness constraint was in place in the puppet repo as well. That said, since aptly is meant to handle multiple repos, this sounds like a problem that could come up regardless - two unrelated mirrors could be maintaining their own build of leningen 1.7.1-1, in theory. Internally, prefixing sounds OK. When pushing packages around, though, why not make the origin repo an optional parameter that either becomes required if there is more than one copy of a package name/version/arch tuple or defaults to the newest or oldest one.

@smira smira added this to the v0.6 milestone May 29, 2014

@smira

This comment has been minimized.

Show comment
Hide comment
@smira

smira May 29, 2014

Member

In the end I've found transparent for the user method to overcome these problems with duplicates - new versions of aptly would be able handle such packages.

Member

smira commented May 29, 2014

In the end I've found transparent for the user method to overcome these problems with duplicates - new versions of aptly would be able handle such packages.

smira added a commit that referenced this issue May 29, 2014

Change the way package key works: now it includes FilesHash. #60
Now duplicate packages (the same name/version/arch) but with different set of files
would be handled as separate entities.

smira added a commit that referenced this issue May 29, 2014

@smira

This comment has been minimized.

Show comment
Hide comment
@smira

smira May 29, 2014

Member

Now aptly should be able to handle conflicts like that if conflicting packages never "meet" in one snapshot/mirror/local repo.

If snapshot contains duplicate packages it can't be published and some operations (like snapshot pull) won't work.

Member

smira commented May 29, 2014

Now aptly should be able to handle conflicts like that if conflicting packages never "meet" in one snapshot/mirror/local repo.

If snapshot contains duplicate packages it can't be published and some operations (like snapshot pull) won't work.

@smira smira closed this Jun 7, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment