Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump tika.version from 2.1.0 to 2.2.0 #1330

Merged
merged 4 commits into from
Jan 10, 2022

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Dec 20, 2021

Bumps tika.version from 2.1.0 to 2.2.0.
Updates tika-core from 2.1.0 to 2.2.0

Changelog

Sourced from tika-core's changelog.

Release 2.2.1 - 12/19/2021

  • Upgrade log4j to 2.17.0 (TIKA-3625).

  • Upgrade to PDFBox 2.0.25 (TIKA-3622)

  • Fix bug that prevented metadata keys in the UnpackerResource in tika-server (TIKA-3624).

  • Upgrade log4j to 2.16.0 (TIKA-3623)

Release 2.2.0 - 12/13/2021

  • Add support for OneNote files downloaded from O365 (TIKA-3446).

  • Fix logic bug in PipesServer that prevented concatenation of content from attachments (TIKA-3609).

  • Improve extraction of embedded files from MSOffice files created by non-Microsoft tools (TIKA-3526).

  • Added back ability to ignore load errors in TikaConfig (TIKA-3575).

  • Make SecureContentHandler and other parameters configurable in AutoDetectParser programmatically and via tika-config.xml (TIKA-3594).

  • Fix default logging in tika-app in batch mode (TIKA-3589).

  • Fix bug that prevented specifying a config with the long --config= option in tika-app in batch mode (TIKA-3589).

  • Fix thread starvation after numerous restarts in PipesClient (TIKA-3588).

  • Fix race condition when starting multiple forked servers on multiple ports (TIKA-3586).

  • Add timeout per task to be configured via headers for tika-server's legacy endpoints /tika and /rmeta. Note that this timeout greater than taskTimeoutMillis (TIKA-3582).

  • Add metadata item for whether or not a PDF has a collection/ is a Portfolio PDF (TIKA-3579).

  • Add detection of ESRI Layer files (TIKA-3570).

  • Add detection of JPEG XL, MARC, ICC profiles, NES-ROM file types (TIKA-3562 and TIKA-3563)

  • Remove duplicate "subject" metadata keys that were intended

... (truncated)

Commits

Updates tika-parsers-standard-package from 2.1.0 to 2.2.0

Updates tika-parser-scientific-module from 2.1.0 to 2.2.0

Updates tika-parser-sqlite3-module from 2.1.0 to 2.2.0

Updates tika-langdetect-optimaize from 2.1.0 to 2.2.0

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added the update When updating an existing feature label Dec 20, 2021
@dependabot dependabot bot requested a review from dadoonet December 20, 2021 04:13
@dependabot dependabot bot added this to the 2.8 milestone Dec 20, 2021
@dependabot dependabot bot force-pushed the dependabot/maven/tika.version-2.2.0 branch from c9dc99f to 7d54384 Compare December 20, 2021 11:07
@dadoonet
Copy link
Owner

Failures are happening because of https://issues.apache.org/jira/browse/TIKA-3629

Let see how we can solve this.

@dadoonet dadoonet marked this pull request as draft December 20, 2021 11:34
@dadoonet
Copy link
Owner

@dependabot rebase

@dependabot @github
Copy link
Contributor Author

dependabot bot commented on behalf of github Jan 10, 2022

Looks like this PR has been edited by someone other than Dependabot. That means Dependabot can't rebase it - sorry!

If you're happy for Dependabot to recreate it from scratch, overwriting any edits, you can request @dependabot recreate.

dependabot bot and others added 3 commits January 10, 2022 16:21
Bumps `tika.version` from 2.1.0 to 2.2.0.

Updates `tika-core` from 2.1.0 to 2.2.0
- [Release notes](https://github.com/apache/tika/releases)
- [Changelog](https://github.com/apache/tika/blob/main/CHANGES.txt)
- [Commits](https://github.com/apache/tika/commits/2.2.0)

Updates `tika-parsers-standard-package` from 2.1.0 to 2.2.0

Updates `tika-parser-scientific-module` from 2.1.0 to 2.2.0

Updates `tika-parser-sqlite3-module` from 2.1.0 to 2.2.0

Updates `tika-langdetect-optimaize` from 2.1.0 to 2.2.0

---
updated-dependencies:
- dependency-name: org.apache.tika:tika-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.tika:tika-parsers-standard-package
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.tika:tika-parser-scientific-module
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.tika:tika-parser-sqlite3-module
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.tika:tika-langdetect-optimaize
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
* `meta:keyword` has been removed
* `cp:subject` has been removed
* `pdf:hasCollection` has been added
@dadoonet dadoonet force-pushed the dependabot/maven/tika.version-2.2.0 branch from 3bb4a53 to 4353b16 Compare January 10, 2022 15:21
As we don't have the keywords anymore, we can extract them from `pdf:docinfo:keywords`.

This should be reverted when https://issues.apache.org/jira/browse/TIKA-3629 is fixed.
@dadoonet dadoonet marked this pull request as ready for review January 10, 2022 16:38
@dadoonet dadoonet merged commit a06e102 into master Jan 10, 2022
@dependabot dependabot bot deleted the dependabot/maven/tika.version-2.2.0 branch January 10, 2022 16:39
@dadoonet dadoonet modified the milestones: 2.8, 2.9 Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
update When updating an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant