Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old Tika Version Pulling Wrong Width from Image for Metadata #23934

Closed
waqasakramdot opened this issue Jan 30, 2023 · 9 comments · Fixed by #24684, dotCMS/plugin-com.dotcms.tika#2 or #25084
Closed

Comments

@waqasakramdot
Copy link

waqasakramdot commented Jan 30, 2023

Problem Statement

Tika 1.19 returns the same Metadata for two same images but different resolutions.

dotCMS Ticket: https://dotcms.zendesk.com/agent/tickets/109867
Tika Ticket: https://issues.apache.org/jira/browse/TIKA-2630

Steps to Reproduce

For the following reproduction steps, I've attached two versions of an image that you can use. They are the same image, but in different sizes, 3600x2400 and 1152x768.

Reproduction steps:

  1. Create a new File Asset
  2. Upload one version of the image
  3. Publish and reopen the file asset
  4. Metadata displays correct information
  5. Replace the image with the other file
  6. Publish and reopen the file asset

Expected results: Metadata is updated to reflect the current live file
Actual results: Metadata still shows the old image's metadata which is inaccurate

First image

test1

Second image

test2

Acceptance Criteria

Metadata should correctly be extracted and shown in dotCMS be

dotCMS Version

For all dotCMS LTS and Agile versions till the current 23.01

Proposed Objective

Customer Support

Proposed Priority

Priority 3 - Average

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

dotCMS Ticket: https://dotcms.zendesk.com/agent/tickets/109867
Tika Ticket: https://issues.apache.org/jira/browse/TIKA-2630

https://dotcms.slack.com/archives/CSHTYUR7H/p1674828766355419

https://dotcms.slack.com/archives/CSHTYUR7H/p1674831755123549

Assumptions & Initiation Needs

No response

Sub-Tasks & Estimates

No response

@damen-dotcms damen-dotcms changed the title Wrong meta data extractor because of old Tika version Old Tika Version Pulling Wrong Width from Image for Metadata Jan 30, 2023
wezell added a commit that referenced this issue Feb 8, 2023
@damen-dotcms damen-dotcms removed the OKR : Code Maintenance Owned by Erick label Feb 15, 2023
@dcolina dcolina self-assigned this Mar 16, 2023
@wezell
Copy link
Contributor

wezell commented Mar 22, 2023

Also:

#24051

dcolina pushed a commit that referenced this issue Apr 19, 2023
* New Tika version 2.7.0 (latest)

* Tika dependencies are included into tika plugin.
@nollymar
Copy link
Contributor

Tika was upgraded to version 2.7.0.
PRs:
dotCMS/plugin-com.dotcms.tika#2
#24684

nollymar pushed a commit that referenced this issue Apr 25, 2023
* New Tika version 2.7.0 (latest)

* Tika dependencies are included into tika plugin.

Co-authored-by: daniel.colina <daniel.colina@dotcms.com>
@nollymar nollymar reopened this Apr 25, 2023
@fabrizzio-dotCMS
Copy link
Contributor

Looks good to me. Now the metadata gets recalculated properly showing accurately the respective dimensions

dcolina pushed a commit that referenced this issue May 16, 2023
…cies issue.

* SLF4J has been upgraded to 1.7.35
* org.apache.logging.log4j has been included as part of osgi-extra.conf to avoid dependency clashes inside of Tika Microsoft parsers.
@josemejias11
Copy link

Approved QA - Tested on 23.06_8c0a542e_SNAPSHOT // Docker // macOS 13.0 // FF v113.0

@erickgonzalez erickgonzalez added the LTS : Next Ticket that will be added to LTS label May 19, 2023
nollymar pushed a commit that referenced this issue May 24, 2023
nollymar pushed a commit that referenced this issue May 30, 2023
…cies issue.

* SLF4J has been upgraded to 1.7.35
* org.apache.logging.log4j has been included as part of osgi-extra.conf to avoid dependency clashes inside of Tika Microsoft parsers.
@nollymar nollymar linked a pull request May 30, 2023 that will close this issue
@nollymar
Copy link
Contributor

Note to QA: Please make sure blog titles are resolved when a site search index is generated

nollymar added a commit that referenced this issue May 31, 2023
* #23934 Tika fragment has been added in order to resolve tika dependencies issue.

* SLF4J has been upgraded to 1.7.35
* org.apache.logging.log4j has been included as part of osgi-extra.conf to avoid dependency clashes inside of Tika Microsoft parsers.

* #23934 Including keys from tika 1.x that were missing in tika 2.x

* #23934 Removing log4j libraries from felix-system

---------

Co-authored-by: daniel.colina <daniel.colina@dotcms.com>
Co-authored-by: nollymar <nollymarlonga@Nollymars-MacBook-Pro-2.local>
@bryanboza
Copy link
Member

Fixed, tested as part of the #24051

@erickgonzalez erickgonzalez added Next LTS Release and removed LTS : Next Ticket that will be added to LTS labels Jul 11, 2023
erickgonzalez added a commit that referenced this issue Jul 31, 2023
@erickgonzalez erickgonzalez added LTS: Excluded Ticket that has been excluded from at least one LTS and removed Next LTS Release labels Aug 1, 2023
@erickgonzalez
Copy link
Contributor

It is breaking over 200 tests, so labeling as Excluded from all LTS.

erickgonzalez added a commit that referenced this issue Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment