Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot submit or edit datasets from Chrome version 120.0.6099.71 #2235

Closed
jeanetteclark opened this issue Dec 13, 2023 · 19 comments · Fixed by #2236
Closed

Cannot submit or edit datasets from Chrome version 120.0.6099.71 #2235

jeanetteclark opened this issue Dec 13, 2023 · 19 comments · Fixed by #2236

Comments

@jeanetteclark
Copy link
Collaborator

Describe the bug
All users of the new version of chrome cannot submit datasets. This has been seen on both the ADC and SCTLD.

To Reproduce
Steps to reproduce the behavior:

  1. Try to submit a dataset on test.arcticdata.io using the latest Chrome
  2. Observe error

The error that shows up in the red banner says:

There was problem to save the system metadata: urn:uuid:419ef6d8-3aa1-426e-a826-2385b47df3ca since CONCURRENT_MAP_PUT failed at Address[127.0.0.1]:5701 because of an exception thrown at Address[127.0.0.1]:5701

In the console we have:

Screen Shot 2023-12-13 at 11 29 12 AM

In catalina.out:

[ERROR]: Error while creating systemmetadata record: urn:uuid:54600ff9-7627-487a-8a68-a81ecc8b2c79 [edu.ucsb.nceas.metacat.IdentifierManager:insertOrUpdateSystemMetadata:1437]
[2023-12-12 10:11:51] [info] org.dataone.service.exceptions.InvalidSystemMetadata: The Permission shouldn't be null. It may result from sepcifying a permission by a typo, which is not one of read, write and changePermission.

Interestingly, an object actually is submitted, an EML saved as a plain text which is our attempt to help users not lose work on errors. If you try to save again after the first error you'll get a new error saying the pid is already in use.

Desktop (please complete the following information):

  • OS: iOS
  • Browser: Chrome
  • Version:120.0.6099.71

Related bug

If instead of submitting a new dataset, you edit a dataset that is not your own, that dataset will have it's access policy removed. The EML is submitted normally otherwise

The catalina.out error here is:

[2023-12-12 10:06:04] [info] metacat 20231212-10:06:04: [ERROR]: D1ResourceHandler: Serializing exception with code 401: READ not allowed on urn:uuid:a5a8a40a-00aa-42c1-ac61-8f426560e444 for subject[s]: CN=arctic-data-admins,DC=dataone,DC=org; public; http://orcid.org/0000-0003-2993-378X; authenticatedUser; [edu.ucsb.nceas.metacat.restservice.D1ResourceHandler:serializeException:591]
[2023-12-12 10:06:04] [info] org.dataone.service.exceptions.NotAuthorized: READ not allowed on urn:uuid:a5a8a40a-00aa-42c1-ac61-8f426560e444 for subject[s]: CN=arctic-data-admins,DC=dataone,DC=org; public; http://orcid.org/0000-0003-2993-378X; authenticatedUser;

@rushirajnenuji
Copy link
Member

Able to reproduce this with 2.25.0 , 2.26.0 and 2.27.0 MetacatUI release/s.

@taojing2002
Copy link

The SFWMD folks reported an issue that they couldn't add two data objects to an existing package. @jeanetteclark reproduced the issue by using the latest Chrome browser while she succeeded to add files by using Firefox. I think this is relevant even though the error message is different. The error shows the new eml object is invalid:

[ERROR]: D1NodeService.insertOrUpdateDocument - Error inserting or updating document: urn:uuid:8ae75c48-0d17-472a-9863-5ed93f8b5af6 since 
<?xml version="1.0"?><error>cvc-complex-type.2.3: Element 'allow' cannot have character [children], because the type's content type is element-only.</error> 
[edu.ucsb.nceas.metacat.dataone.D1NodeService:insertOrUpdateDocument:1185]

The package was originated from Morpho so it has the access part in the eml document, even though now we don't use that part for access control.

@taojing2002
Copy link

The SFWMD folks reported both Chrome and Microsoft Edge had the issue.

@rushirajnenuji
Copy link
Member

Not entirely sure if this is the problem, but sharing some observations. It looks like the sysmeta that we're trying to upload looks jumbled up in Chrome 120.X vs in Chrome 119.X. Here is the file that I was testing this with on test.arcticdata.io

Console log of sysmeta from Chrome 120.X:

<d1_v2.0:systemMetadata xmlns:d1_v2.0="http://ns.dataone.org/service/types/v2.0" xmlns:d1="http://ns.dataone.org/service/types/v1">    <serialVersion>0</serialVersion>    <identifier>urn:uuid:968b244e-b424-4d9a-8ee6-be51bc9f1b78</identifier>    <formatId>text/csv</formatId>    <size>1649</size>    <checksum algorithm="SHA-1">f09e6a3752cbc88d08b2417d6c5708dbfe618f29</checksum>    <submitter>http://orcid.org/0000-0003-4678-5213</submitter>    <rightsHolder>http://orcid.org/0000-0003-4678-5213</rightsHolder><accessPolicy>
	<allow>
		<subject>CN=arctic-data-admins,DC=dataone,DC=org</subject>
		<permission></permission>read
		<permission></permission>write
		<permission></permission>changePermission
	</allow>

</accessPolicy>    <fileName>Alaska_Salmon_Fishery_Exvessel_Prices_by_Area_and_Species.csv</fileName></d1_v2.0:systemMetadata>

Chrome 119.X

<d1_v2.0:systemMetadata xmlns:d1_v2.0="http://ns.dataone.org/service/types/v2.0" xmlns:d1="http://ns.dataone.org/service/types/v1">    <serialVersion>0</serialVersion>    <identifier>urn:uuid:bb5d7a22-be3a-49a9-a7be-519997a8ab75</identifier>    <formatId>text/csv</formatId>    <size>1649</size>    <checksum algorithm="SHA-1">f09e6a3752cbc88d08b2417d6c5708dbfe618f29</checksum>    <submitter>http://orcid.org/0000-0003-4678-5213</submitter>    <rightsHolder>http://orcid.org/0000-0003-4678-5213</rightsHolder><accessPolicy>
	<allow>
		<subject>CN=arctic-data-admins,DC=dataone,DC=org</subject>
		<permission>read</permission>
		<permission>write</permission>
		<permission>changePermission</permission>
	</allow>

</accessPolicy>    <fileName>Alaska_Salmon_Fishery_Exvessel_Prices_by_Area_and_Species.csv</fileName></d1_v2.0:systemMetadata>

Please ignore the SHA1 checksum, that was something I was trying to figure out the issue. I initially thought that it was checksum related so switched from MD5 to SHA1 (MetacatUI defaults to MD5)

I think this is the code block that generates this xml. Not sure why the same code is giving two different results in different versions of the browser.

Still testing a few things before I can tell for sure if this is the problem.

rushirajnenuji added a commit that referenced this issue Dec 18, 2023
Return serialized XML objects instead of String

Reference: #2235
rushirajnenuji added a commit that referenced this issue Dec 19, 2023
Update AccessRule model to parse garbled XML

Reference #2235
@rushirajnenuji
Copy link
Member

update: 12/18/2023

refactored the accessRule.serialize() and accessPolicy.serialize() functions to return XML objects instead of XMLString objects (commit). This seems to resolve the parsing issues with jQuery replaceWith that we’ve been experiencing with Chromium 120.X. (The idea is to provide jQuery with structured content instead of letting it parse and form a structure from the given XML string).

I’ve added this fix to this branch and deployed it on handy-owl (requires UCSB VPN) for testing.

status: able to successfully submit new datasets on Chromium 120.X; still seeing issues with editing


update: 12/19/2023

While editing existing datasets, the parsing of the access rule failed due to a garbled XML string (see below), despite the sysmeta response from the /object endpoint initially appearing to be okay.

<accesspolicy>
    <allow>
        <subject>http://orcid.org/0000-0001-8888-547X</subject>
        <permission></permission>read
        <permission></permission>write
        <permission></permission>changePermission
    </allow>
    <allow>
        <subject>CN=arctic-data-admins,DC=dataone,DC=org</subject>
        <permission></permission>
        "read "
        <permission></permission>
        "write "
        <permission></permission>
        "changePermission "
    </allow>
</accesspolicy>

Consequently, the accessRule model only captured the subject information, with no permissions set.

I have added a fix in the above commit to rectify this issue. I conducted tests with a few of Jeanette's datasets on handy-owl pointing at test.arcticdata.io, and it appears that I am now able to edit them successfully and access the new dataset version on MetadataView.

status: ready for additional testing at handy-owl (requires UCSB VPN)

@jeanetteclark
Copy link
Collaborator Author

Nice work @rushirajnenuji - I just tested editing and submitting from Chrome and it seems to be working for me

@helbashandy
Copy link
Collaborator

Excellent prompt response @rushirajnenuji! Thanks for the fix.

@robyngit robyngit linked a pull request Dec 19, 2023 that will close this issue
robyngit added a commit that referenced this issue Dec 19, 2023
Already merged into main.
@vchendrix
Copy link
Collaborator

@rushirajnenuji @robyngit @mbjones @artntek We are seeing strange behavior in metacat when reproducing this issue on our deployments. I created a new dataset (in chrome 120.0.6099.109) and tried to add a file and saw the error as described above in the UI. I also saw errors in Metacat (see below). I also tried to save the dataset and received an error: Your submission was cancelled due to an error.

For each PID in the error (see example below):

  • There is a sysmeta record with no access policy /catalog/d1/mn/v2/meta/$PID
  • There is no data /catalog/d1/mn/v2/object/$PID

Looking in Postgres

  • There are no access_log records for these actions
  • There are systemdatadata records
  • There are no records in identifier or index_event
  • there were files in the /var/metacat/temporary directory initially and then they were cleaned out

Sytem meta data records

metacat=# select * from systemmetadata order by date_modified desc limit 3;
                    guid                     | series_id | serial_version |      date_uploaded      |            rights_holder             |             checksum             | checksum_algorithm |   origin_member_node    | authoritive_member_node |      date_modified      |              submitter               |              object_format               |  size   | archived | replication_allowed | number_replicas | obsoletes | obsoleted_by | media_type |       file_name       
---------------------------------------------+-----------+----------------+-------------------------+--------------------------------------+----------------------------------+--------------------+-------------------------+-------------------------+-------------------------+--------------------------------------+------------------------------------------+---------+----------+---------------------+-----------------+-----------+--------------+------------+-----------------------
 ess-dive-a311878b5f17555-20231220T064215230 |           | 0              | 2023-12-20 06:42:15.468 | http://orcid.org/0000-0001-9061-8952 | 7ce5fa15e5a2fb042bca36514787b907 | MD5                | urn:node:mnTestESS_DIVE | urn:node:mnTestESS_DIVE | 2023-12-20 06:42:15.468 | http://orcid.org/0000-0001-9061-8952 | text/plain                               | 3019    | f        | f                   |              -1 |           |              |            | eml_draft_Hendrix.txt
 ess-dive-6b3b77b9e6b5884-20231220T064057978 |           | 0              | 2023-12-20 06:42:15.181 | http://orcid.org/0000-0001-9061-8952 | d6ef25443d49cf014786b417e1594d64 | MD5                | urn:node:mnTestESS_DIVE | urn:node:mnTestESS_DIVE | 2023-12-20 06:42:15.181 | http://orcid.org/0000-0001-9061-8952 | https://eml.ecoinformatics.org/eml-2.2.0 | 2685    | f        | f                   |              -1 |           |              |            | FOO.xml
 ess-dive-aeec62b628f13df-20231220T064123192 |           | 0              | 2023-12-20 06:41:24.44  | http://orcid.org/0000-0001-9061-8952 | 19b007b8a87dfc8cc1b007e12f9d77a6 | MD5                | urn:node:mnTestESS_DIVE | urn:node:mnTestESS_DIVE | 2023-12-20 06:41:24.44  | http://orcid.org/0000-0001-9061-8952 | application/octet-stream                 | 1048576 | f        | f                   |              -1 |           |              |            | test_data_1mb.dat
(3 rows)

Example of Metacat Issue
The files that were affected were all of the files related to the new dataset that I tried to create: data file, eml and the draf_eml.txt that was POSTed in the event of the save error.

% PID=ess-dive-aeec62b628f13df-20231220T064123192

% curl -H "Authorization: Bearer $ESS_DIVE_AUTH_TOKEN" "https://data-stage.ess-dive.lbl.gov/catalog/d1/mn/v2/meta/$PID"   
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:systemMetadata xmlns:ns2="http://ns.dataone.org/service/types/v1" xmlns:ns3="http://ns.dataone.org/service/types/v2.0">
    <serialVersion>0</serialVersion>
    <identifier>ess-dive-aeec62b628f13df-20231220T064123192</identifier>
    <formatId>application/octet-stream</formatId>
    <size>1048576</size>
    <checksum algorithm="MD5">19b007b8a87dfc8cc1b007e12f9d77a6</checksum>
    <submitter>http://orcid.org/0000-0001-9061-8952</submitter>
    <rightsHolder>http://orcid.org/0000-0001-9061-8952</rightsHolder>
    <replicationPolicy replicationAllowed="false"/>
    <archived>false</archived>
    <dateUploaded>2023-12-20T06:41:24.440+00:00</dateUploaded>
    <dateSysMetadataModified>2023-12-20T06:41:24.440+00:00</dateSysMetadataModified>
    <originMemberNode>urn:node:mnTestESS_DIVE</originMemberNode>
    <authoritativeMemberNode>urn:node:mnTestESS_DIVE</authoritativeMemberNode>
    <fileName>test_data_1mb.dat</fileName>
</ns3:systemMetadata>
 
% curl -H "Authorization: Bearer $ESS_DIVE_AUTH_TOKEN" "https://data-stage.ess-dive.lbl.gov/catalog/d1/mn/v2/object/$PID" 
<?xml version="1.0" encoding="UTF-8"?><error detailCode="1020" errorCode="404" name="NotFound">
    <description>The object specified by ess-dive-aeec62b628f13df-20231220T064123192 does not exist at this node.</description>
</error>


Metacat Log Errors

metacat 20231220-06:41:24: [ERROR]: Error while creating systemmetadata record: ess-dive-aeec62b628f13df-20231220T064123192 [edu.ucsb.nceas.metacat.IdentifierManager:insertOrUpdateSystemMetadata:1437]
org.dataone.service.exceptions.InvalidSystemMetadata: The Permission shouldn't be null. It may result from sepcifying a permission by a typo, which is not one of read, write and changePermission.

metacat 20231220-06:42:15: [ERROR]: Error while creating systemmetadata record: ess-dive-6b3b77b9e6b5884-20231220T064057978 [edu.ucsb.nceas.metacat.IdentifierManager:insertOrUpdateSystemMetadata:1437]
org.dataone.service.exceptions.InvalidSystemMetadata: The Permission shouldn't be null. It may result from sepcifying a permission by a typo, which is not one of read, write and changePermission.

metacat 20231220-06:42:15: [ERROR]: Error while creating systemmetadata record: ess-dive-a311878b5f17555-20231220T064215230 [edu.ucsb.nceas.metacat.IdentifierManager:insertOrUpdateSystemMetadata:1437]
org.dataone.service.exceptions.InvalidSystemMetadata: The Permission shouldn't be null. It may result from sepcifying a permission by a typo, which is not one of read, write and changePermission.

@vchendrix
Copy link
Collaborator

@rushirajnenuji I have also noticed that if I am not the rightsHolder but am in the group that is in the access policy, I cannot share a dataset. In the example below, I should have changePermission.

Screenshot 2023-12-19 at 11 06 50 PM

@vchendrix
Copy link
Collaborator

@rushirajnenuji @robyngit @mbjones @artntek We are seeing strange behavior in metacat when reproducing this issue on our deployments. I created a new dataset (in chrome 120.0.6099.109) and tried to add a file and saw the error as described above in the UI. I also saw errors in Metacat (see below). I also tried to save the dataset and received an error: Your submission was cancelled due to an error.

For each PID in the error (see example below):

  • There is a sysmeta record with no access policy /catalog/d1/mn/v2/meta/$PID
  • There is no data /catalog/d1/mn/v2/object/$PID

Looking in Postgres

  • There are no access_log records for these actions
  • There are systemdatadata records
  • There are no records in identifier or index_event
  • there were files in the /var/metacat/temporary directory initially and then they were cleaned out

Sytem meta data records

metacat=# select * from systemmetadata order by date_modified desc limit 3;
                    guid                     | series_id | serial_version |      date_uploaded      |            rights_holder             |             checksum             | checksum_algorithm |   origin_member_node    | authoritive_member_node |      date_modified      |              submitter               |              object_format               |  size   | archived | replication_allowed | number_replicas | obsoletes | obsoleted_by | media_type |       file_name       
---------------------------------------------+-----------+----------------+-------------------------+--------------------------------------+----------------------------------+--------------------+-------------------------+-------------------------+-------------------------+--------------------------------------+------------------------------------------+---------+----------+---------------------+-----------------+-----------+--------------+------------+-----------------------
 ess-dive-a311878b5f17555-20231220T064215230 |           | 0              | 2023-12-20 06:42:15.468 | http://orcid.org/0000-0001-9061-8952 | 7ce5fa15e5a2fb042bca36514787b907 | MD5                | urn:node:mnTestESS_DIVE | urn:node:mnTestESS_DIVE | 2023-12-20 06:42:15.468 | http://orcid.org/0000-0001-9061-8952 | text/plain                               | 3019    | f        | f                   |              -1 |           |              |            | eml_draft_Hendrix.txt
 ess-dive-6b3b77b9e6b5884-20231220T064057978 |           | 0              | 2023-12-20 06:42:15.181 | http://orcid.org/0000-0001-9061-8952 | d6ef25443d49cf014786b417e1594d64 | MD5                | urn:node:mnTestESS_DIVE | urn:node:mnTestESS_DIVE | 2023-12-20 06:42:15.181 | http://orcid.org/0000-0001-9061-8952 | https://eml.ecoinformatics.org/eml-2.2.0 | 2685    | f        | f                   |              -1 |           |              |            | FOO.xml
 ess-dive-aeec62b628f13df-20231220T064123192 |           | 0              | 2023-12-20 06:41:24.44  | http://orcid.org/0000-0001-9061-8952 | 19b007b8a87dfc8cc1b007e12f9d77a6 | MD5                | urn:node:mnTestESS_DIVE | urn:node:mnTestESS_DIVE | 2023-12-20 06:41:24.44  | http://orcid.org/0000-0001-9061-8952 | application/octet-stream                 | 1048576 | f        | f                   |              -1 |           |              |            | test_data_1mb.dat
(3 rows)

Example of Metacat Issue The files that were affected were all of the files related to the new dataset that I tried to create: data file, eml and the draf_eml.txt that was POSTed in the event of the save error.

% PID=ess-dive-aeec62b628f13df-20231220T064123192

% curl -H "Authorization: Bearer $ESS_DIVE_AUTH_TOKEN" "https://data-stage.ess-dive.lbl.gov/catalog/d1/mn/v2/meta/$PID"   
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:systemMetadata xmlns:ns2="http://ns.dataone.org/service/types/v1" xmlns:ns3="http://ns.dataone.org/service/types/v2.0">
    <serialVersion>0</serialVersion>
    <identifier>ess-dive-aeec62b628f13df-20231220T064123192</identifier>
    <formatId>application/octet-stream</formatId>
    <size>1048576</size>
    <checksum algorithm="MD5">19b007b8a87dfc8cc1b007e12f9d77a6</checksum>
    <submitter>http://orcid.org/0000-0001-9061-8952</submitter>
    <rightsHolder>http://orcid.org/0000-0001-9061-8952</rightsHolder>
    <replicationPolicy replicationAllowed="false"/>
    <archived>false</archived>
    <dateUploaded>2023-12-20T06:41:24.440+00:00</dateUploaded>
    <dateSysMetadataModified>2023-12-20T06:41:24.440+00:00</dateSysMetadataModified>
    <originMemberNode>urn:node:mnTestESS_DIVE</originMemberNode>
    <authoritativeMemberNode>urn:node:mnTestESS_DIVE</authoritativeMemberNode>
    <fileName>test_data_1mb.dat</fileName>
</ns3:systemMetadata>
 
% curl -H "Authorization: Bearer $ESS_DIVE_AUTH_TOKEN" "https://data-stage.ess-dive.lbl.gov/catalog/d1/mn/v2/object/$PID" 
<?xml version="1.0" encoding="UTF-8"?><error detailCode="1020" errorCode="404" name="NotFound">
    <description>The object specified by ess-dive-aeec62b628f13df-20231220T064123192 does not exist at this node.</description>
</error>

Metacat Log Errors

metacat 20231220-06:41:24: [ERROR]: Error while creating systemmetadata record: ess-dive-aeec62b628f13df-20231220T064123192 [edu.ucsb.nceas.metacat.IdentifierManager:insertOrUpdateSystemMetadata:1437]
org.dataone.service.exceptions.InvalidSystemMetadata: The Permission shouldn't be null. It may result from sepcifying a permission by a typo, which is not one of read, write and changePermission.

metacat 20231220-06:42:15: [ERROR]: Error while creating systemmetadata record: ess-dive-6b3b77b9e6b5884-20231220T064057978 [edu.ucsb.nceas.metacat.IdentifierManager:insertOrUpdateSystemMetadata:1437]
org.dataone.service.exceptions.InvalidSystemMetadata: The Permission shouldn't be null. It may result from sepcifying a permission by a typo, which is not one of read, write and changePermission.

metacat 20231220-06:42:15: [ERROR]: Error while creating systemmetadata record: ess-dive-a311878b5f17555-20231220T064215230 [edu.ucsb.nceas.metacat.IdentifierManager:insertOrUpdateSystemMetadata:1437]
org.dataone.service.exceptions.InvalidSystemMetadata: The Permission shouldn't be null. It may result from sepcifying a permission by a typo, which is not one of read, write and changePermission.

This is for Metacat 2.18.0

@robyngit
Copy link
Member

Hi @vchendrix, thanks for the detailed report! Was this error produced when submitting a dataset via the editor using the latest patched version of MetacatUI, or with one of the older versions?

@vchendrix
Copy link
Collaborator

Hi @vchendrix, thanks for the detailed report! Was this error produced when submitting a dataset via the editor using the latest patched version of MetacatUI, or with one of the older versions?

Hi @vchendrix, thanks for the detailed report! Was this error produced when submitting a dataset via the editor using the latest patched version of MetacatUI, or with one of the older versions?

Hey @robyngit. The error was produced with Metacat 2.23.0. We have yet to deploy the patch release.

@vchendrix
Copy link
Collaborator

vchendrix commented Dec 20, 2023

Looking in the index_event table on our test server I see events for some test that @helbashandy ran before me. My tests didn't produce any of these events. We should find out what steps produced these events. We have 63 of these events on our production server

metacat=# select * from index_event;;
                            guid                            | event_action |                                                                                                                                                                                                                                                        description                                                                                                                                                                                                                                                        |       event_date        

 ess-dive-5b68e3560849b9b-20231218T210640692                | update       | Failed to updatethe solr index for the id ess-dive-5b68e3560849b9b-20231218T210640692 since SolrIndex.update - could not update the solr index for the object ess-dive-5b68e3560849b9b-20231218T210640692 since The indexed document itself for pid ess-dive-5b68e3560849b9b-20231218T210640692 should not be null.                                                                                                                                                                                                       | 2023-12-19 23:50:46.308
 ess-dive-2ffcdfc42d4dc2f-20230720T032747391629             | update       | Failed to updatethe solr index for the id ess-dive-2ffcdfc42d4dc2f-20230720T032747391629 since SolrIndex.update - could not update the solr index for the object ess-dive-2ffcdfc42d4dc2f-20230720T032747391629 since Solr index doesn't have the information about the id ess-dive-030d8df7264a897-20230223T194749497 which is a component in the resource map ess-dive-2ffcdfc42d4dc2f-20230720T032747391629. Metacat-Index can't process the resource map prior to its components.                                     | 2023-12-19 23:50:55.671
 ess-dive-e78b80471562d52-20180321T064037133                | update       | Failed to updatethe solr index for the id ess-dive-e78b80471562d52-20180321T064037133 since SolrIndex.update - could not update the solr index for the object ess-dive-e78b80471562d52-20180321T064037133 since java.lang.NullPointerException                                                                                                                                                                                                                                                                            | 2023-12-19 23:50:55.906
 resource_map_urn:uuid:4be04935-247e-46dd-bf32-79f4697a4c71 | update       | Failed to updatethe solr index for the id resource_map_urn:uuid:4be04935-247e-46dd-bf32-79f4697a4c71 since SolrIndex.update - could not update the solr index for the object resource_map_urn:uuid:4be04935-247e-46dd-bf32-79f4697a4c71 since Solr index doesn't have the information about the id ess-dive-4341bba357edbd6-20230728T042347173 which is a component in the resource map resource_map_urn:uuid:4be04935-247e-46dd-bf32-79f4697a4c71. Metacat-Index can't process the resource map prior to its components. | 2023-12-19 23:51:14.11
 ess-dive-2ffcdfc42d4dc2f-20230720T033814257448             | update       | Failed to updatethe solr index for the id ess-dive-2ffcdfc42d4dc2f-20230720T033814257448 since SolrIndex.update - could not update the solr index for the object ess-dive-2ffcdfc42d4dc2f-20230720T033814257448 since Solr index doesn't have the information about the id ess-dive-f9278ce67775ee9-20220928T205533678228 which is a component in the resource map ess-dive-2ffcdfc42d4dc2f-20230720T033814257448. Metacat-Index can't process the resource map prior to its components.                                  | 2023-12-19 23:51:23.242
 ess-dive-79dd4920593b2be-20231219T005942775                | update       | Failed to updatethe solr index for the id ess-dive-79dd4920593b2be-20231219T005942775 since SolrIndex.update - could not update the solr index for the object ess-dive-79dd4920593b2be-20231219T005942775 since The indexed document itself for pid ess-dive-79dd4920593b2be-20231219T005942775 should not be null.                                                                                                                                                                                                       | 2023-12-19 23:52:58.032
 ess-dive-26803915cc504e2-20231218T210835400                | update       | Failed to updatethe solr index for the id ess-dive-26803915cc504e2-20231218T210835400 since SolrIndex.update - could not update the solr index for the object ess-dive-26803915cc504e2-20231218T210835400 since The indexed document itself for pid ess-dive-26803915cc504e2-20231218T210835400 should not be null.                                                                                                                                                                                                       | 2023-12-19 23:50:46.32
 urn:uuid:bee48133-4460-40d6-a7ed-db9e5a3f3891              | update       | Failed to updatethe solr index for the id urn:uuid:bee48133-4460-40d6-a7ed-db9e5a3f3891 since SolrIndex.update - could not update the solr index for the object urn:uuid:bee48133-4460-40d6-a7ed-db9e5a3f3891 since /var/metacat/documents/autogen.2021021817080046300.1 (No such file or directory)                                                                                                                                                                                                                      | 2023-12-19 23:50:46.519
 ess-dive-adabd576b1b5077-20230720T034246831147             | update       | Failed to updatethe solr index for the id ess-dive-adabd576b1b5077-20230720T034246831147 since SolrIndex.update - could not update the solr index for the object ess-dive-adabd576b1b5077-20230720T034246831147 since Solr index doesn't have the information about the id ess-dive-f9278ce67775ee9-20220928T205533678228 which is a component in the resource map ess-dive-adabd576b1b5077-20230720T034246831147. Metacat-Index can't process the resource map prior to its components.                                  | 2023-12-19 23:50:55.752
 ess-dive-2ffcdfc42d4dc2f-20230720T033616269649             | update       | Failed to updatethe solr index for the id ess-dive-2ffcdfc42d4dc2f-20230720T033616269649 since SolrIndex.update - could not update the solr index for the object ess-dive-2ffcdfc42d4dc2f-20230720T033616269649 since Solr index doesn't have the information about the id ess-dive-030d8df7264a897-20230223T194749497 which is a component in the resource map ess-dive-2ffcdfc42d4dc2f-20230720T033616269649. Metacat-Index can't process the resource map prior to its components.                                     | 2023-12-19 23:51:04.853
 ess-dive-4df34ff2fa636ab-20230720T034009723959             | update       | Failed to updatethe solr index for the id ess-dive-4df34ff2fa636ab-20230720T034009723959 since SolrIndex.update - could not update the solr index for the object ess-dive-4df34ff2fa636ab-20230720T034009723959 since Solr index doesn't have the information about the id ess-dive-030d8df7264a897-20230223T194749497 which is a component in the resource map ess-dive-4df34ff2fa636ab-20230720T034009723959. Metacat-Index can't process the resource map prior to its components.                                     | 2023-12-19 23:51:05.035
 resource_map_urn:uuid:ebb66562-cc7d-4462-938b-c832a34b54d2 | update       | Failed to updatethe solr index for the id resource_map_urn:uuid:ebb66562-cc7d-4462-938b-c832a34b54d2 since SolrIndex.update - could not update the solr index for the object resource_map_urn:uuid:ebb66562-cc7d-4462-938b-c832a34b54d2 since Solr index doesn't have the information about the id ess-dive-1e51b3cfe991f7b-20230728T042552195 which is a component in the resource map resource_map_urn:uuid:ebb66562-cc7d-4462-938b-c832a34b54d2. Metacat-Index can't process the resource map prior to its components. | 2023-12-19 23:51:15.153
 resource_map_urn:uuid:fda564d0-bfa6-4899-bd92-161303bfa7fe | update       | Failed to updatethe solr index for the id resource_map_urn:uuid:fda564d0-bfa6-4899-bd92-161303bfa7fe since SolrIndex.update - could not update the solr index for the object resource_map_urn:uuid:fda564d0-bfa6-4899-bd92-161303bfa7fe since Solr index doesn't have the information about the id ess-dive-4945f281035d2ff-20230728T042615262 which is a component in the resource map resource_map_urn:uuid:fda564d0-bfa6-4899-bd92-161303bfa7fe. Metacat-Index can't process the resource map prior to its components. | 2023-12-19 23:51:24.3
 ess-dive-53384ea2dcc02d2-20231219T005911613                | update       | Failed to updatethe solr index for the id ess-dive-53384ea2dcc02d2-20231219T005911613 since SolrIndex.update - could not update the solr index for the object ess-dive-53384ea2dcc02d2-20231219T005911613 since The indexed document itself for pid ess-dive-53384ea2dcc02d2-20231219T005911613 should not be null.                                                                                                                                                                                                       | 2023-12-19 23:53:12.01
 ess-dive-342523000c78bf3-20180601T145733873                | update       | Failed to updatethe solr index for the id ess-dive-342523000c78bf3-20180601T145733873 since SolrIndex.update - could not update the solr index for the object ess-dive-342523000c78bf3-20180601T145733873 since java.lang.NullPointerException                                                                                                                                                                                                                                                                            | 2023-12-19 23:50:46.628
 ess-dive-2bd8a6f4b6d2246-20180323T140322999362             | update       | Failed to updatethe solr index for the id ess-dive-2bd8a6f4b6d2246-20180323T140322999362 since SolrIndex.update - could not update the solr index for the object ess-dive-2bd8a6f4b6d2246-20180323T140322999362 since Solr index doesn't have the information about the id doi:10.3334/CDIAC/ATG.NDP013 which is a component in the resource map ess-dive-2bd8a6f4b6d2246-20180323T140322999362. Metacat-Index can't process the resource map prior to its components.                                                    | 2023-12-19 23:50:55.793
 resource_map_urn:uuid:1647c61c-f115-40cc-84bc-6a5693b5b5ea | update       | Failed to updatethe solr index for the id resource_map_urn:uuid:1647c61c-f115-40cc-84bc-6a5693b5b5ea since SolrIndex.update - could not update the solr index for the object resource_map_urn:uuid:1647c61c-f115-40cc-84bc-6a5693b5b5ea since Solr index doesn't have the information about the id ess-dive-c27cefcf7b5ce36-20230517T193855736 which is a component in the resource map resource_map_urn:uuid:1647c61c-f115-40cc-84bc-6a5693b5b5ea. Metacat-Index can't process the resource map prior to its components. | 2023-12-19 23:51:04.941
 ess-dive-2ffcdfc42d4dc2f-20230720T031733954840             | update       | Failed to updatethe solr index for the id ess-dive-2ffcdfc42d4dc2f-20230720T031733954840 since SolrIndex.update - could not update the solr index for the object ess-dive-2ffcdfc42d4dc2f-20230720T031733954840 since Solr index doesn't have the information about the id ess-dive-814f72a57944495-20210629T221923757 which is a component in the resource map ess-dive-2ffcdfc42d4dc2f-20230720T031733954840. Metacat-Index can't process the resource map prior to its components.                                     | 2023-12-19 23:51:15.189
 ess-dive-f5a273165045400-20180424T150126049266             | update       | Failed to updatethe solr index for the id ess-dive-f5a273165045400-20180424T150126049266 since SolrIndex.update - could not update the solr index for the object ess-dive-f5a273165045400-20180424T150126049266 since Solr index doesn't have the information about the id ess-dive-675826519565b98-20180424T150053297482 which is a component in the resource map ess-dive-f5a273165045400-20180424T150126049266. Metacat-Index can't process the resource map prior to its components. 
(19 rows)

@robyngit
Copy link
Member

robyngit commented Dec 20, 2023

My tests didn't produce any of these events. We should find out what steps produced these events.

And just to double check, in your tests, you were using the latest version of chrome and submitted a new or updated dataset via the editor? I'm trying to understand if you've found that this problem goes beyond the issue we tracked down and fixed in the latest release, if so, we should reopen this issue.


Edit:

Thanks to @artntek for filling in some of the details I missed here! I understand now that this was with the latest version of chrome and in the editor, and the unexpected part is that the data files are missing.

@artntek
Copy link
Contributor

artntek commented Dec 20, 2023

@mbjones and I investigated the scenario reported by @vchendrix. In summary:

  • the metacat code saves the metadata to Hazelcast (which, in turn, saves it to the database systemmetadata table), BEFORE checking the accessPolicy fields.
  • the Chrome bug causes the accessPolicy section to be messed up (see @rushirajnenuji's example), which in turn causes metacat to throw an exception at this point (...InvalidSystemMetadata: The Permission shouldn't be null....), and fails to save the data object.
  • Because the above steps are not part of a transaction, we now have the metadata saved, but not the data.

We therefore believe that you should be able to delete the orphaned systemmetadata records that do not have a corresponding file on the filesystem, and do not have an entry in the access_log table

@vchendrix
Copy link
Collaborator

@mbjones and I investigated the scenario reported by @vchendrix. In summary:

  • the metacat code saves the metadata to Hazelcast (which, in turn, saves it to the database systemmetadata table), BEFORE checking the accessPolicy fields.
  • the Chrome bug causes the accessPolicy section to be messed up (see @rushirajnenuji's example), which in turn causes metacat to throw an exception at this point (...InvalidSystemMetadata: The Permission shouldn't be null....), and fails to save the data object.
  • Because the above steps are not part of a transaction, we now have the metadata saved, but not the data.

We therefore believe that you should be able to delete the orphaned systemmetadata records that do not have a corresponding file on the filesystem, and do not have an entry in the access_log table

Ok. this makes sense. Do you think that you can write a clean up script for this?

@mbjones
Copy link
Member

mbjones commented Dec 20, 2023

Because the above steps are not part of a transaction, we now have the metadata saved, but not the data

To elaborate on this slighty, because the steps are not part of a transaction, we have the systemmetadata record saved, but the xml_access metadata record fails with an exception. The exception generated there prevents the rest of the downstream data objects and access log records from being saved.

@taojing2002 @doulikecookiedough in the upcoming metacat 3.0.0 relelase, hazelcast will be gone, but this transaction bug will still be latent in the insertOrUpdateSystemMetadata() method and should be fixed.

@artntek
Copy link
Contributor

artntek commented Dec 20, 2023

actually, on develop, this code needing a transaction has now moved to SystemMetadataManager.java -> updateSystemMetadata()

@artntek
Copy link
Contributor

artntek commented Dec 21, 2023

Ok. this makes sense. Do you think that you can write a clean up script for this?

Yep - we’re working on it

DONE: Metacat PR # 1764

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants