Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: PREMIS events from previous transfers are appearing in current transfers in qa/1.x #43

Closed
ross-spencer opened this issue Jul 26, 2018 · 2 comments

Comments

@ross-spencer
Copy link
Member

@ross-spencer ross-spencer commented Jul 26, 2018

Expected behaviour

PREMIS events should be encapsulated in a single transfer.

Current behaviour

PREMIS events are being aggregated for all transfers that might exist in a pipeline's database.

qa/1.x is at this commit: https://github.com/artefactual/archivematica/tree/0d466ec8329a37cbb45545f9bffc2f3c0bc3c92a

Steps to reproduce

  • Deploy qa/1.x.
  • Make sure you are using a new DB.
  • create a transfer: transfer_1, contianing one text file.
  • process and store AIP.
  • create a transfer: transfer_2, the same text file as before.
  • process and store AIP.

Some results:

  • transfer_1: METS: 73.0 KB
  • transfer_2: METS: 144.2 KB
  • Opening lines of transfer_1 METS:
<?xml version='1.0' encoding='UTF-8'?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version111/mets.xsd">
  <mets:metsHdr CREATEDATE="2018-07-26T18:05:48"/>
  <mets:dmdSec ID="dmdSec_1">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>0ee7acc8-bd30-4467-b0f6-b845453324d2</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_1-0ee7acc8-bd30-4467-b0f6-b845453324d2</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  • Opening lines of transfer_2 METS:
<?xml version='1.0' encoding='UTF-8'?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version111/mets.xsd">
  <mets:metsHdr CREATEDATE="2018-07-26T18:09:09"/>
  <mets:dmdSec ID="dmdSec_1">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>0ee7acc8-bd30-4467-b0f6-b845453324d2</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_1-0ee7acc8-bd30-4467-b0f6-b845453324d2</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  <mets:dmdSec ID="dmdSec_2">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>84f84d4c-6f6a-4fa3-a3ca-a8ac3a203264</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_2-84f84d4c-6f6a-4fa3-a3ca-a8ac3a203264</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  • Both transfer 1 and transfer 2 entries exist in the second AIP.

  • Create a third transfer, transfer_3:

More results:

  • transfer_3: METS: 215.4 KB
  • Opening lines of transfer_3 METS:
<?xml version='1.0' encoding='UTF-8'?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version111/mets.xsd">
  <mets:metsHdr CREATEDATE="2018-07-26T18:21:16"/>
  <mets:dmdSec ID="dmdSec_1">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>0ee7acc8-bd30-4467-b0f6-b845453324d2</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_1-0ee7acc8-bd30-4467-b0f6-b845453324d2</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  <mets:dmdSec ID="dmdSec_2">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>84f84d4c-6f6a-4fa3-a3ca-a8ac3a203264</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_2-84f84d4c-6f6a-4fa3-a3ca-a8ac3a203264</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  <mets:dmdSec ID="dmdSec_3">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>876530d4-3f49-4d03-8f65-26a59b87cd41</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_3-876530d4-3f49-4d03-8f65-26a59b87cd41</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>

Results from transfers 1, 2 and 3 are in the same AIP METS.

Your environment (version of Archivematica, OS version, etc)

qa/1.x docker-compose install on Ubuntu. Commit: https://github.com/artefactual/archivematica/tree/0d466ec8329a37cbb45545f9bffc2f3c0bc3c92a

@ross-spencer

This comment has been minimized.

Copy link
Member Author

@ross-spencer ross-spencer commented Jul 26, 2018

Per @sevein's suggestion I have tried the same at this commit: artefactual/archivematica@53167ee

And the opening lines of transfer_1 look like this:

<?xml version='1.0' encoding='UTF-8'?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version111/mets.xsd">
  <mets:metsHdr CREATEDATE="2018-07-26T19:20:27"/>
  <mets:dmdSec ID="dmdSec_1">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>3c8af6e9-a7a6-4678-8088-4141fbfaeb2e</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_1-3c8af6e9-a7a6-4678-8088-4141fbfaeb2e</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>

And transfer_2:

<?xml version='1.0' encoding='UTF-8'?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version111/mets.xsd">
  <mets:metsHdr CREATEDATE="2018-07-26T19:26:48"/>
  <mets:dmdSec ID="dmdSec_1">
    <mets:mdWrap MDTYPE="PREMIS:OBJECT">
      <mets:xmlData>
        <premis:object xmlns:premis="http://www.loc.gov/premis/v3" xsi:type="premis:intellectualEntity" xsi:schemaLocation="http://www.loc.gov/premis/v3 http://www.loc.gov/standards/premis/v3/premis.xsd" version="3.0">
          <premis:objectIdentifier>
            <premis:objectIdentifierType>UUID</premis:objectIdentifierType>
            <premis:objectIdentifierValue>a3689c2f-0bf1-46d7-9efa-dad34789a7c3</premis:objectIdentifierValue>
          </premis:objectIdentifier>
          <premis:originalName>transfer_2-a3689c2f-0bf1-46d7-9efa-dad34789a7c3</premis:originalName>
        </premis:object>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>

Both files are approx: 70 KB in size so it looks like the issue has been introduced with the more recent qa/1.x commits.

@jhsimpson jhsimpson added the Jisc RDSS label Jul 26, 2018
marktriggs added a commit to hudmol/archivematica that referenced this issue Jul 27, 2018
The `create_mets_v2` client script uses a bunch of global state for
things like counters and file IDs.  The original implementation could
rely on these globals being reset between runs (by virtue of running
the script in a new subprocess each time) but, after the batching
changes, this is no longer the case.

To make things more manageable, we pull all of this global stuff into
a single top-level object which gets replaced at the beginning of each
job run.  Where other modules call the functions of create_mets_v2,
they reset the global state as appropriate as well.

A secondary piece of shared state is the
`sharedVariablesAcrossModules` module, used by `create_mets_v2` and a
handful of other scripts.  Handling for this needed to be updated for
the batching changes in a small number of places.

Fixes archivematica/Issues#43 and archivematica/Issues#42
marktriggs added a commit to hudmol/archivematica that referenced this issue Jul 27, 2018
The `create_mets_v2` client script uses a bunch of global state for
things like counters and file IDs.  The original implementation could
rely on these globals being reset between runs (by virtue of running
the script in a new subprocess each time) but, after the batching
changes, this is no longer the case.

To make things more manageable, we pull all of this global stuff into
a single top-level object which gets replaced at the beginning of each
job run.  Where other modules call the functions of create_mets_v2,
they reset the global state as appropriate as well.

A secondary piece of shared state is the
`sharedVariablesAcrossModules` module, used by `create_mets_v2` and a
handful of other scripts.  Handling for this needed to be updated for
the batching changes in a small number of places.

Fixes archivematica/Issues#43 and archivematica/Issues#42
sevein added a commit to artefactual/archivematica that referenced this issue Jul 27, 2018
The `create_mets_v2` client script uses a bunch of global state for
things like counters and file IDs.  The original implementation could
rely on these globals being reset between runs (by virtue of running
the script in a new subprocess each time) but, after the batching
changes, this is no longer the case.

To make things more manageable, we pull all of this global stuff into
a single top-level object which gets replaced at the beginning of each
job run.  Where other modules call the functions of create_mets_v2,
they reset the global state as appropriate as well.

A secondary piece of shared state is the
`sharedVariablesAcrossModules` module, used by `create_mets_v2` and a
handful of other scripts.  Handling for this needed to be updated for
the batching changes in a small number of places.

This is connected to archivematica/Issues#42.
This is connected to archivematica/Issues#43.
This is connected to #938.
@sromkey sromkey added this to the 1.8.0 milestone Jul 30, 2018
@peterVG

This comment has been minimized.

Copy link

@peterVG peterVG commented Sep 27, 2018

Fixed in #1225. Tested with three transfers in a row but cannot recreate duplication error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.