-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore duplicate input archive entries #119
Comments
Or we could track already put class names in a HashSet, will create a pr for that. |
This seems like garbage-in, garbage-out. That is, the input jar file is malformed. It should not have duplicate entries. It seems the best solution is to fix the input jar so that it is not malformed. I don't think it is valid for the transformer to ignore duplicate entries in the input. Which of the duplicate entries is the correct entry? First, last, middle? The proper solution is to fix the malformed Jakarta EE 9 Platform TCK test archive. |
I agree in that it hides the case when the entries are not duplicates, however, this is an under-specified area in terms of Jakarta EE requirements for a very long time. It also does seem like a mismatch between what jars allow and what java.util.zip.ZipOutputStream allows which may be different than what the Jakarta EE implementation classloader does.
It is too late in the Jakarta EE 9 Platform TCK schedule to make those change, however that is only my current use case. If someone is trying to transform a large Jakarta EE 8 application to EE 9, they may not have the ability to update their application archive (or may not want to). https://github.com/scottmarlow/transformer-1/tree/issues_119_ignoreduplicate is the potential workaround. |
While the Zip File specification may allow duplicate entries, I don't think there is any support for this in the Java JAR File specification. Individual sections of the manifest do not permit duplicate entries.
Perhaps, but it seems fairly easy to delete the duplicate entry from the archive. Also, I would have assumed the Jakarta EE 9 TCK would be using the jakarta.* package names already, no? So why would this archive need transformation?
This just silently ignores later duplicate entries. If the solution is to ignore later duplicate entries, wouldn't it be better to elide the duplicate entries up front before transformation to avoid other potential failures due to errors in transforming the later duplicate entries that will end up being ignored? I would think we should have an option to enable this ignoring so that users must opt-in to this behavior. That is, the default behavior should be the current behavior where duplicate entries are a failure. |
Yes, that is definitely one of the workarounds/solutions.
I agree that it is definitely a no-op operation to transform an (already) EE 9 application to EE 9.
Yes, I agree. Perhaps they could first run their application through a filter that eliminates duplicate classes. I think that it is good that we are discussing this case here, as many users will hit this and they will wonder if they are hitting a bug or if there is really something wrong with their application. The currently error message is
👍 |
Updated to fail by default unless system property "FAIL_DUPLICATE_ARCHIVE_ENTRIES" is true. |
Duplicated by #156. I'm cancelling that, and have modified this issue to include any duplication within an archive. The problem is not specific to class type entries. My tendency is to add an option that controls the behavior. For example: I agree that the best fix is to correct the original archive to remove the duplicate entry. I understand, however, that this is not always easy to make happen. My understanding of how the error occurs has to do with how various archive builders function. That is, they stream out archive entries while building a list of entries to write to the archive central header (the table at the end of the archive). Many tools don't check for duplicates. Within the archive format, the entries themselves can tolerate duplications. Each entry is a more-or-less self contained region of the archive which has sufficient data to be extracted. The central header can also tolerate duplications. However, in either case, there are problems: Streaming through the entries, which is usually stateless except for the current entry cursor, causes the first duplicate to be overwritten by a second duplicate, while often leading to an unexpected file overwrite warning. Using a random access API (for example, using winzip) to view the archive will read the central header, and will usually build a table that takes only the last entry. (For example, using a sorted dictionary data structure.) |
Java's ZipOutputStream implementation does not tolerate duplicate entries and it seems to me that a zip with duplicate entries is not well formed. What does duplicate entries mean? A simple unzip would have the last duplicate over writing the earlier duplicates. But in transforming a ZipInputStream into a ZipOutputStream, we would have to let the first duplicate entry take precedence as we could not reverse the stream to replace an earlier duplicate entry. |
https://issues.redhat.com/browse/WFLY-14014 is for a Jakarta EE 9 Platform TCK test archive that contains duplicate classes. I'm thinking that we could catch/log/ignore duplicate classes such as
java.util.zip.ZipException: duplicate entry: com/sun/ts/tests/servlet/api/jakarta_servlet/singlethreadmodel/STMClientServlet$ThreadClient$TestThread.class
The text was updated successfully, but these errors were encountered: