Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump tika-core from 1.26 to 1.27 #7897

Merged
merged 1 commit into from
Jul 12, 2021

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jul 12, 2021

Bumps tika-core from 1.26 to 1.27.

Changelog

Sourced from tika-core's changelog.

Release 2.0.0 - ???

  • Cleanup of fetcher integration with tika-server.

Release 2.0.0-BETA - 05/19/2021

  • Refactor pipes module for resilience

  • Add transcribe capability (TIKA-94).

Release 2.0.0-ALPHA - 01/13/2021

BREAKING CHANGES in 2.0.0

  • General

    • OCR is now triggered automatically for PDFs if tesseract is on the user's path see (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr) for how to disable OCR.
    • We upgraded from log4j to log4j2 in tika-app, tika-server and anywhere else we used to use log4j.
    • By default, when rendering a page for OCR, the PDFParser does not render glyphs/text.
    • Removed deprecated Metadata keys/properties (TIKA-1974).
    • Removed deprecated PDFPreflightParser (TIKA-3437).
    • Removed dangerous calls to read an inputstream or convert to bytes without specifying a charset
    • Parsers can be configured via tika-config.xml on instantiation. We have moved away from configuration via .properties files because of confusion among users. This affects the PDFParser, TesseractOCRParser and the StringsParser.
    • Changed namespaces of translator implementations (o.a.t.language.translate.impl) to avoid split-package with tika-core
  • tika-parsers

    • The parser modules have been broken into three main modules: tika-parsers-standard, tika-parsers-extended and tika-parsers-ml. Users may now need to add tika-parsers-extended to tika-app and tika-server to include parsers that used to be included by default (for example: envi, gdal, grib, isatab, netcdf).
    • PDFParser -- a) see above on OCR. b) This parser no longer warns if the jpeg2000 dependency is not included. Tika now relies on PDFBox to log an error if a jpeg2000 image should be processed but can't because the required external dependency is not available. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for the non-ASF-2.0-compatible jpeg2000 library.
    • CompressorParser -- users must add the com.github.luben:zstd-jni dependency to the classpath to process zstd files. This is an optional library that is no longer bundled in tika-parsers-standard-package because it contains native libs.
    • ChmParser was moved to org.apache.tika.parser.microsoft.chm
    • RTFParser was moved to org.apache.tika.parser.microsoft.rtf
    • We are now using non-shaded versions of xmpcore with namespaces com.adobe.internal.* vs com.adobe.*.

... (truncated)

Commits
  • ccf9442 [maven-release-plugin] prepare release 1.27-rc1
  • 31d44e9 prep for 1.27-rc1
  • f414130 TIKA-3459 -- integrate Drew Noakes metadata-extractor as the underlying MP4 p...
  • 74c5e5a TIKA-3460 -- add missing properties files for jaiimageio-core
  • 57f5912 TIKA-3457 -- general upgrades for 1.27
  • 4ba5fd7 TIKA-3456 -- LanguageDetector should chunk long strings and test for hasEnoug...
  • 90c6ea4 TIKA-3444 -- upgrade to pdfbox 2.0.24
  • 1224f88 TIKA-3441 -- improve likelihood that tesseract processes will be shutdown on ...
  • e8ec223 Merge remote-tracking branch 'origin/branch_1x' into branch_1x
  • d7fa2cd TIKA-3441 -- improve likelihood that tesseract processes will be shutdown on ...
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [tika-core](https://github.com/apache/tika) from 1.26 to 1.27.
- [Release notes](https://github.com/apache/tika/releases)
- [Changelog](https://github.com/apache/tika/blob/main/CHANGES.txt)
- [Commits](apache/tika@1.26...1.27)

---
updated-dependencies:
- dependency-name: org.apache.tika:tika-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@Siedlerchr Siedlerchr merged commit fea2435 into main Jul 12, 2021
@Siedlerchr Siedlerchr deleted the dependabot/gradle/org.apache.tika-tika-core-1.27 branch July 12, 2021 16:00
Siedlerchr added a commit that referenced this pull request Jul 15, 2021
* upstream/main: (45 commits)
  Squashed 'buildres/csl/csl-styles/' changes from ec4a4c0..176997d (#7910)
  Update citeproc-java to 3.0.0-alpha.2 (#7911)
  Search in PDF Files (#2838)
  Removed references to apache commons logging (#7907)
  Oobranch c : ootext and rangesort (#7788)
  Bump jackson-datatype-jsr310 from 2.12.3 to 2.12.4 (#7901)
  fix markdownlint
  Bump jackson-dataformat-yaml from 2.12.3 to 2.12.4 (#7899)
  Bump postgresql from 42.2.22 to 42.2.23 (#7902)
  Bump classgraph from 4.8.109 to 4.8.110 (#7900)
  Bump gittools/actions from 0.9.9 to 0.9.10 (#7898)
  Bump flowless from 0.6.3 to 0.6.4 (#7903)
  Bump jsoup from 1.13.1 to 1.14.1 (#7904)
  Bump tika-core from 1.26 to 1.27 (#7897)
  Try even more empty lines to provoke conflicts in CHANGELOG.md after a release
  Fix position in CHANGELOG.md
  Add corporate proxy workaround for Version.getAllAvailableVersions() (#7890)
  Update to Java 16 in build.gradle (#7892)
  Preparing Changelog for the next release cycle
  New development cycle
  ...

# Conflicts:
#	gradle/wrapper/gradle-wrapper.properties
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant