Skip to content
This repository has been archived by the owner on Mar 5, 2021. It is now read-only.

Add support for SHA256HEADER, FILE_DIGESTALGO, and Zstd compression #133

Merged
merged 15 commits into from Jan 4, 2019

Conversation

dwalluck
Copy link
Contributor

Signed-off-by: David Walluck dwalluck@redhat.com

@ctron
Copy link
Contributor

ctron commented Oct 29, 2018

I just had the time to look into this PR. Wow! 😃 I really like it.

There are a few minor things, but I definitely want to get this in.

I will need to file a few CQs for updated dependencies. This currently prevents the merging of this PR, I will take care of this, but it might take a while until they are processed by the Eclipse IP team.

I saw that the payload coding is a string, and the file digest is an integer. I would prefer them to be enums, if that is possible. As my understanding is, that they are both a limited set of choices, and putting in an unsupported value actually doesn't work as there would be no code to process this.

Maybe it would also make sense to extract the payload coding part as an actual interface, with different implementations such as Gzip, Xz, … only an idea, I would still merge it as it is. But that would make the payload codings actually pluggable.

I also add a few comments in the files themselves, only small things.

And thanks for this PR!


Btw, if your are interested, the "Eclipse Linux Package Tools" project proposal was made public last week 1. The idea is to focus this project on the RPM code of Package Drone and give a dedicated home. I definitely want to go ahead with this PR, before the project is approved. But I would encourage you to have a look.

@@ -132,16 +134,21 @@ private InputStream setupPayloadStream () throws IOException

switch ( payloadCoding )
{
case "none":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I saw none at some point. What is the reason to remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit I made to this pull request handled the case of "none" meaning no compression, following what you originally had in the code.

However, when I looked closer, it seems that RPM handles a missing compression flag in the header as defaulting to gzip. Therefore, it seems that if the payload coding is missing or empty, that is is supposed to default to gzip and not no compression. Also, it seems that it really shouldn't be possible to create RPMs without some compression, or it breaks spec.

Can you verify this? If it turns out not to be the case, then we can see about adding this back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both variants are wrong 😁 -> https://github.com/rpm-software-management/rpm/blob/ff4b9111aeba01dd025dd133ce617fb80f7398a0/build/pack.c#L281

To me it looks like the correct value would be null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the original code likely set the value to "none" on a null simply because switch doesn't support a null case. It's a bit confusing since "none" (if it were actually set in the header) is not a valid compressor. I will put the code back to allow creating uncompressed RPMs.

However, I am trying to see if the reading is consistent with this. For example,
https://github.com/rpm-software-management/rpm/blob/c7e68754584c49764c397cbccc23aebc491b1a3a/lib/rpmte.c#L632

Is it true that a null value defaults to gzip? I think this is where I got that idea that it could not read uncompressed RPMs, but I should have just tried it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the document you referenced, it states:

If the payload coding is not specified via the 1125 (Payload coding) header entry, then it is assumed to be “gzip”. The “passthrough” mode feature of zlib is used to allow the lack of the 1125 (Payload coding) header entry to also represent “none”.

So, maybe that is how it get away with using gzip? In any case, this string "none" does not seem to appear anywhere in the source code.

Additionally, according to the documentation, we apparently need to distinguish between lack of a header and a header value of null. However, the C code does not seem to distinguish between these as a single statement seems to check for both the header existence and the value at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, maybe that is how it get away with using gzip? In any case, this string "none" does not seem to appear anywhere in the source code.

It looks as if when rpm writes a file a missing header always indicates no compression. But, the way the reading is implemented, it may be that a missing header can contain either no compression or gzip compression. If this is the case, it should try gzip first and then fallback to no compression if not in gzip format. But, I'd really like to see this behavior verified against the official rpm first.

@@ -77,12 +88,25 @@ public long getSize ()

private boolean closed;

private String payloadCoding;

private Optional<String> payloadFlags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this to be a general purpose string. But I don't have a good idea either, except from allowing a generic Object instance, or refactoring the payload coding as an interface, with different implementations, allow for distinct constructors.

Maybe you have an idea.

Copy link
Contributor Author

@dwalluck dwalluck Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, the type could be an int representing the compression level. However, there could be other flags as well, which is the reason for using a string.

The C version of RPM unfortunately has C-style IO flags in this string. Such a string looks like w9.gzdio or w9.bzdio, etc. This string is used to set the payload compressor type and flags. The string must start with w followed by the compressor flags (if any), followed by a ., followed by the compressor type.

A real-world example is w7T16.xzdio. The Java implementation of xz does not yet support setting the number of threads. The javadocs state that "threading is planned but it is unknown when it will be implemented".

Is there a a way to improve this data type while future-proofing for additional flags? It is unfortunate that commons-compress does not provide a common class for options. The LZMA class seems to be missing support for setting the compression level even though xz has support for it (consider sending a pull request for this to commons-compress).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two possible options here:

  1. Make the "options" a plain Object and create a few distinct options classes. Like GzipOptions. This would allow to extend this over time, and be specific about the options. In case GzipOptions are passed and "xz" is selected, then this would simply fail with a proper exception.
  2. The payload coding is being extracted to an interface e.g. PayloadCoding and then having an GzipPayloadCoding, which provides the actual implementation of setting up a compressed stream. That could have all necessary options in its constructor.

Actually going for option 1 would be fine now, as you always could come back at a later time and simple use the options objects in the constructors of option 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the payload coding should be made Optional<String> instead of String, if null encoding means we should not write that header. It looks like we currently always write it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently allowing a value of null (see the comments on the commit). To be honest, I don't understand the use of Optional all that well (besides that it is useful for streams).

@ctron ctron self-assigned this Oct 29, 2018
@dwalluck
Copy link
Contributor Author

dwalluck commented Oct 30, 2018

I saw that the payload coding is a string, and the file digest is an integer. I would prefer them to be enums, if that is possible. As my understanding is, that they are both a limited set of choices, and putting in an unsupported value actually doesn't work as there would be no code to process this.

This is due to org.bouncycastle.bcpg.HashAlgorithmTags itself being an interface with int fields instead of an enum, unless you really want to wrap this class. Currently, putting an unsupported value will throw an exception, so it should not get through unnoticed.

I plan to create an enum which will reuse the org.bouncycastle.bcpg.HashAlgorithmTags values as these are the values used by RPM.

@dwalluck
Copy link
Contributor Author

Maybe it would also make sense to extract the payload coding part as an actual interface, with different implementations such as Gzip, Xz, … only an idea, I would still merge it as it is. But that would make the payload codings actually pluggable.

You mean implementing factories similar to how commons-compress works? This seems like overkill to me right now. But, I will at least try moving all compression-related functionality to a separate class and see how you like it.

Signed-off-by: David Walluck <dwalluck@redhat.com>
Signed-off-by: David Walluck <dwalluck@redhat.com>
Signed-off-by: David Walluck <dwalluck@redhat.com>
Signed-off-by: David Walluck <dwalluck@redhat.com>
Signed-off-by: David Walluck <dwalluck@redhat.com>
Signed-off-by: David Walluck <dwalluck@redhat.com>
Signed-off-by: David Walluck <dwalluck@redhat.com>
Signed-off-by: David Walluck <dwalluck@redhat.com>
{
throw new IOException ( String.format ( "Unknown payload format: %s", payloadFormat ) );
return this.in;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the header does not exist, it will simply return the input stream. I believe this is a fix from the behavior of returning "gzip".

return new XZInputStream ( this.in );
default:
throw new IOException ( String.format ( "Unknown coding: %s", payloadCoding ) );
coding = PayloadCoding.GZIP;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the payload coding header exists, but the value is null, we return GZIP. However, I believe that the upstream RPM may allow passthrough, supporting both GZIP and NONE with a null value.

throw new IOException ( String.format ( "Unknown coding: %s", coding ) );
}

public static Optional<Dependency> getDependency ( final PayloadCoding payloadCoding ) throws IOException
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider whether we should move the following methods into an interface and have different implementations for each PayloadCoding value. At least we should determine if these are the right methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could add them to the enum itself.

enum PayloadCoding {
  private final Optional<Dependency> dependency;

  private PayloadCoding(final Dependency dependency) {
    this.dependency = Optional.ofNullable(dependency);
  }

  public Optional<Dependency> getDependency () {
    return this.dependency;
  }

  NONE(null),
  …
  XZ(new Dependency ( "PayloadIsXz", "5.2-1", RpmDependencyFlags.LESS, RpmDependencyFlags.EQUAL, RpmDependencyFlags.RPMLIB )),
  …
  ;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please forget my previous comment. You already worked towards that idea.

Consider whether we should move the following methods into an interface and have different implementations for each PayloadCoding value. At least we should determine if these are the right methods.

Yes, those are method which should go into an interface, and then their implementations.

I think that once this interface and the implementations are extracted, the actual enum may be irrelevant. As all the necessary information is inside the implementation of each coding.

Signed-off-by: David Walluck <dwalluck@redhat.com>
import java.util.Map;
import java.util.TreeMap;

public class PayloadCodingRegistry
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the class instance needs to be looked up based on the string value in the RPM header, some way is needed to look up the class instance. I couldn't prevent a new class instance from being created manually, although default instances that need gzip (such as in PayloadRecorder) are looking it up in the registry.

@ctron ctron removed the CQ label Jan 4, 2019
@ctron ctron merged commit 03bd99a into eclipse-archived:master Jan 4, 2019
@ctron
Copy link
Contributor

ctron commented Jan 4, 2019

I am really, really sorry for the long delay in this PR. I merged it now and will take care of the small remaining changes myself.

Thanks for your patience on this!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants