Skip to content

Metadata manipulator: SimpleReplaceOai

Mark Jordan edited this page Oct 18, 2017 · 1 revision

Overview

This metadata manipulator performs simple search and replace on XML data generated by MIK's OAI-PMH toolchains. It performs the same function that the SimpleReplace metadata manipulator does but on entire MODS and DC XML documents, not top-level MODS XML fragments.

Toolchains

Can be used within any OAI-PMH toolchain.

Configuration

To register this manipulator in your toolchain, add an entry similar to the following to the "[MANIPULATORS]" section of your .ini file. The manipulator's configuration signature is

metadatamanipulators[] = "SimpleReplaceOai|/pattern/|replacement text"

For example, to replace the word "Page" with the word "Part" if it immediately follows the MODS markup <title>, use this configuration:

metadatamanipulators[] = "SimpleReplaceOai|/<title>Page/|<title>Part"

Some additional examples include:

  • metadatamanipulators[] = "SimpleReplaceOai|/<roleTerm\stype=\"text\">photographer/|<roleTerm type=\"text\">Creator"
  • metadatamanipulators[] = 'SimpleReplaceOai|/<identifier\stype="local"\sdisplayLabel="Local\sidentifier">image(\d\d)<\/identifier>/|<identifier type="local" displayLabel="Local number">Number $1</identifier>'

Because MIK uses the pipe (|) as a delimiter between manipulator parameters, that character cannot be used in patterns. To work around this, you can apply multiple instances of the SimpleReplaceOai manipulator to account for multiple replacement matches. For example, if you wanted to replace both "TO" and "toronto" with "Toronto", you would use this configuration:

metadatamanipulators[] = "SimpleReplaceOai|/TO/|Toronto"
metadatamanipulators[] = "SimpleReplaceOai|/toronto/|Toronto"

A common use for this manipulator is to remove characters you don't want to show up in your Islandora MODS files. For example, if you have migrated your metadata from a legacy database, you may have Unicode replacement characters () in your data. You can use the following regular expression to replace the character with nothing:

metadatamanipulators[] = "SimpleReplaceOai|/\x{FFFD}/u|"

Parameters

This manipulator takes two parameters:

  • The first parameter (required) is the pattern to match on. This pattern is a PHP Perl Compatible Regular Expression, without any leading or trailing quotation marks.
  • The second parameter (optional) is the replacement text. If you want to remove the text captured by the regular expression, omit the second parameter, e.g., "SimpleReplace|/foo/|".

Backreferences (e.g., $1, $2, etc.) are allowed.

Functionality

This metadata manipulator does not manipulate the PHP DOM. Instead, it performs preg_replace() operations directly to the XML document generated by the OAI DC and OAI MODS Metadata Parsers. Therefore, to make matches (and corresponding replacements) as precise as possible, you should include in your pattern any contextual string data that will limit the search and replace to only the elements you want to modify. In the example above, Page will only be replaced with Part if it occurs immediately following the <title> markup.

If this manipulator modifies the XML document, it writes an entry to the manipulator log indicating the record key plus the before and after fragment.

Clone this wiki locally