Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #944: Un-hide arguments field in Text Extraction action. #945

Merged
merged 2 commits into from
Jun 9, 2023

Conversation

alxp
Copy link

@alxp alxp commented Jun 2, 2023

GitHub Issue: (link) #944

What does this Pull Request do?

Un-hides the command-line arguments configuration option for the standard Text Extraction action.

What's new?

A site builder can now fine-tune Tesseract, including the ability to generate hOCR without needing a whole new action be defined.* Changes x feature to such that y

  • Does this change add any new dependencies? No
  • Does this change require any other modifications to be made to the repository
    (i.e. Regeneration activity, etc.)? No
  • Could this change impact execution of existing code? No

How should this be tested?

Go to admin/index and click on Actions

Create a new Advanced Action with the Text Extraction as a base.

Follow the help text to add hOCR generation via the arguments field

image

Observe that the command-line arguments field is now visible, and that the Mime Type field will save changes made to it.

Go to Contexts under Context UI

Add this action as a reaction to e.g. a page derivative context.

Remove the existing Text Extraction reaction if present.

Create a new Repository Item with Page model.

Add a media of type File and upload a TIFF with text in the image.

Observe that hOCR is now generated in the Extracted Text field .

Documentation Status

  • Does this change existing behaviour that's currently documented? No
  • Does this change require new pages or sections of documentation? No
  • Who does this need to be documented for?
  • Associated documentation pull request(s): ___ or documentation issue ___

Additional Notes:

Any additional information that you think would be helpful when reviewing this
PR.

There is forthcoming work that will include a new hOCR Extracted Text Media Use so I'm leaving creation of this out for this ticket.

Interested parties

Tag (@ mention) interested parties or, if unsure, @Islandora/committers

Copy link

@wgilling wgilling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the job! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants