Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve conflicting version of MIME4J #10301

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

poikilotherm
Copy link
Contributor

What this PR does / why we need it:

  • Apache Abdera Parser, Apache Tika and RESTeasy (Testing) use MIME4J
  • Tika and RESTeasy use newer APIs only present since v0.8+
  • Abdera is an abandoned project, uses v0.7.2 and is hopefully compatible with newer releases
  • v0.8.4 given by Apache Tika relies on vulnerable Apache Commons IO 2.6, we want 2.11 per dependency management. Upgrading to v0.8.7 as earliest version with 2.11 dependency

Which issue(s) this PR closes:

Closes #9077

Special notes for your reviewer:
None

Suggestions on how to test this:
Let Jenkins run the SWORD2 tests. Maybe @qqmyers can tell us how to run tests for full text indexing?

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope

Is there a release notes update needed for this change?:
Nope

Additional documentation:
None

- Apache Abdera Parser, Apache Tika and RESTeasy (Testing) use MIME4J
- Tika and RESTeasy use newer APIs only present since v0.8+
- Abdera is an abandoned project, uses v0.7.2 and is hopefully
  compatible with newer releases
- v0.8.4 given by Apache Tika relies on vulnerable Apache Commons IO
  2.6, we want 2.11 per dependency management. Upgrading to v0.8.7 as
  earliest version with 2.11 dependency
@coveralls
Copy link

coveralls commented Feb 6, 2024

Coverage Status

coverage: 20.139% (-0.002%) from 20.141%
when pulling 477eb06 on 9077-fix-mime4j
into 98231c5 on develop.

This comment has been minimized.

@qqmyers
Copy link
Member

qqmyers commented Feb 6, 2024

@poikilotherm tika has a v2.9.1 (we're at 2.4.1) which I think includes the v0.8.7 version you want. Should we upgrade tika in addition/instead? 2.9.1 looks like it works as well or better than the earlier version at QDR.

Re: testing - we don't have a suite of files to test all of full-text indexing so the basic test would be to configure full-text indexing (":SolrFullTextIndexing":"true"), reindex a dataset with test file(s) of various types, and see if they appear in search results for a term in the text (and don't appear in search when full-text is off).

@poikilotherm poikilotherm added Feature: Indexing Size: 3 A percentage of a sprint. 2.1 hours. labels Feb 6, 2024
@poikilotherm
Copy link
Contributor Author

I agree - we should upgrade Tika. Let me check if I can provide a Testcontainers based integration test, would be interesting to have this use case properly covered by a nice integration test.

Copy link

github-actions bot commented Feb 7, 2024

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9077-fix-mime4j
ghcr.io/gdcc/configbaker:9077-fix-mime4j

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@qqmyers qqmyers added GDCC: DANS related to GDCC work for DANS GDCC: QDR of interest to QDR labels Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Indexing GDCC: DANS related to GDCC work for DANS GDCC: QDR of interest to QDR Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

sword2-server library overrides tika's apache-mime4j-core dependency with older version
3 participants