Skip to content

TIKA-4723 follow-up: fix sqlite3 shade filter and correct docs#2810

Merged
tballison merged 3 commits into
apache:mainfrom
nddipiazza:TIKA-4723-followup-fixes
May 13, 2026
Merged

TIKA-4723 follow-up: fix sqlite3 shade filter and correct docs#2810
tballison merged 3 commits into
apache:mainfrom
nddipiazza:TIKA-4723-followup-fixes

Conversation

@nddipiazza
Copy link
Copy Markdown
Contributor

Summary

Follow-up fixes for TIKA-4723 (merged via #2809).

Changes

1. tika-parser-sqlite3-package/pom.xml — align shade filter with sister packages

The maven-shade-plugin filter in tika-parser-sqlite3-package was missing three exclusions present in both tika-parser-scientific-package and tika-parser-nlp-package:

  • module-info.class — without this exclusion, shading multiple deps that each carry a module-info.class causes a duplicate-entry error in the shaded jar on Java 9+.
  • META-INF/LICENSE.md — duplicate clutter; the ApacheLicenseResourceTransformer already handles the text-format LICENSE.
  • META-INF/NOTICE.md — same rationale as LICENSE.md.

2. docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc — fix incorrect TikaConfigException claim

The doc said:

tika-grpc requires at least one pf4j plugin to be loaded at startup; an empty plugins/ directory triggers a TikaConfigException with a download URL pointing at Apache dist.

This is factually wrong. TikaGrpcServerImpl (line 133) logs a LOG.warn when pluginManager.getPlugins().isEmpty() — it does not throw a TikaConfigException. The server continues to start; fetcher-dependent RPC calls simply fail at runtime. Corrected the description to match the actual code path.

Review Focus Areas

  • tika-parser-sqlite3-package/pom.xml shade <filters> block — confirm the three new exclusions are correct and complete.
  • release-artifacts.adoc paragraph about empty plugins — confirm the new wording accurately reflects startup behaviour.

Critical Files

  • tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/pom.xml
  • docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc

Testing Instructions

# Verify the sqlite3 shaded jar builds without duplicate module-info errors
mvn package -pl tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package -am -DskipTests

# Confirm shaded jar exists and no module-info duplication
jar tf tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/target/tika-parser-sqlite3-package-*-shaded.jar \
  | grep -c module-info   # should be 0

Review Checklist

  • sqlite3 shade filter exclusions match scientific and nlp packages
  • Docs accurately describe tika-grpc startup behaviour when no plugins loaded

nddipiazza and others added 2 commits May 12, 2026 15:22
…ackages

Add module-info.class, META-INF/LICENSE.md, and META-INF/NOTICE.md
exclusions to the maven-shade-plugin filter in tika-parser-sqlite3-package.
These were present in tika-parser-scientific-package and
tika-parser-nlp-package but missing in sqlite3, which could cause
duplicate module-info.class entries in the shaded jar on Java 9+.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…se-artifacts.adoc

The doc stated that starting tika-grpc with no plugins loaded triggers a
TikaConfigException. The actual behaviour (TikaGrpcServerImpl.java line 133)
is a LOG.warn with a helpful download URL; the server continues to start.
Correct the doc to match the real code path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nddipiazza
Copy link
Copy Markdown
Contributor Author

@tballison i reviewed your changes. found no issues. claude recommended this tiny stuff but i think we can close this pr unless you find value in that stuff

@tballison tballison marked this pull request as ready for review May 12, 2026 21:13
@tballison
Copy link
Copy Markdown
Contributor

Looks good. Will take a look early tomorrow. Thank you!

This doesn't break anything for grpc, right?

@nddipiazza
Copy link
Copy Markdown
Contributor Author

we have e2e tests @tballison i'll check to see how they are

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Follow-up for TIKA-4723 to keep the sqlite3 parser “shaded jar” behavior aligned with other parser packages and to correct release-guide documentation about tika-grpc behavior when no PF4J plugins are present.

Changes:

  • Added missing maven-shade-plugin filter exclusions (module-info.class, META-INF/LICENSE.md, META-INF/NOTICE.md) to the sqlite3 shaded package, matching sister packages.
  • Updated the release artifacts guide to reflect that tika-grpc logs a warning (rather than throwing TikaConfigException) when started without plugins.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/pom.xml Aligns shade filter exclusions with other shaded parser-package modules to avoid duplicate entries and reduce metadata clutter.
docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc Corrects documentation describing tika-grpc startup behavior with an empty plugin set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@tballison tballison merged commit 3bbf65c into apache:main May 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants