Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Validation and CSV #1927

Closed
wants to merge 184 commits into from

Conversation

bobbi-SMR
Copy link
Contributor

This branch provides support for ingesting CSV files as well as Excel files; it also provides support for validating those files without creating archival objects

Description

Related JIRA Ticket or GitHub Issue

https://archivesspace.atlassian.net/browse/ANW-386
(I can't find the validate JIRA)

How Has This Been Tested?

Manually tested using various spreadsheets; updated specs files

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • [ X] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • [ X] My change requires a change to the documentation.
  • [ X] I have read the CONTRIBUTING document.
  • I have authority to submit this code.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@lorawoodford lorawoodford added this to the 2.8.1 milestone Jul 20, 2020
@lorawoodford lorawoodford added config_changes PRs that include changes to config.rb/should be called out in release notes translations Adds/updates translations labels Jul 20, 2020
@lorawoodford lorawoodford added the contractor Work completed by program-employed contract developers label Aug 4, 2020
@mark-cooper
Copy link
Member

This PR was accepted in: #1980

@jdshaw
Copy link
Contributor

jdshaw commented Oct 20, 2020

While the importer works for the most part including the CSV import, there are a bunch of issues that need work, especially around the validation.

Job Runner Issues

  1. When the job is initially loaded, ie is in the queue, but has not started processing, the file presented for download is the file initially loaded by the end user. This should not be displayed as available for download.

Validation

  1. Validation ignores second dates and discards them without warning. To reproduce, add a date type without any additional data to the spreadsheet. No warnings generated, but the AO is created without the second date.
  2. Validation errors on extents does not stop ingest. To reproduce, add an extent number but no other data to the spreadsheet. An error is generated (Row 7: ERROR Unable to validate extent (Extent: whole 1 ): NOT FOUND: '' not found in list extent_extent_type), but the AO is ingested without the extent.
  3. Validator ignores child number without child type and discards information though ingest succeeds. To reproduce, add a container type and child indicator to the spreadsheet without a child type. AO and container are created, but no validation error is generated, and child indicator is silently discarded.
  4. In general, it seems like the ingest can succeed with creating AOs even when there are errors in the row. Multiple paths to create a bad row that succeeds, including bad data for DO, agent, container, extent, etc.

DO

  1. When the spreadsheet is just an empty template, we get a undefined on nil class error: ERROR: Error(s) detected: Processing stopped at row 6 [Some system error has occurred [undefined method `errors' for nil:NilClass]. Stack trace points to backend/app/lib/bulk_import/bulk_import_parser.rb:84
  2. Thumbnail or DO URL not marked as required in spreadsheet template. Why is this required by the importer? Not required to construct a DO object.
  3. Localization for error message is inconsistent and specifies URN instead of URI as used in staff frontend - should be consistent. Example: "Row 6: ERROR Row 6 will not be processed due to errors: Neither the Digital Object URN or the Thumbnail URN is specified"
  4. The Resource Identifier column is hidden in the XLSX but is part of the CSV. Is this intended and should they be consistent?
  5. The AO unique ID is used as the identifier string for the newly created DO. This prevents subsequent imports of DOs to the AO. Suggest that the DO identifier be a random UUID provided by the ingest script.

General & UI

  1. Selected type button state should be more apparent than just dark blue. Hover tip text or other information would be useful for type selection.
  2. Maybe make the selection more apparent by creating a select list? This would also be more extensible if other bulk import types are allowed in future.
  3. When the import is initiated at the AO level, only one option available is bulk loading AOs. This should be made more explicit in the modal.
  4. Also, why is bulk loading of DOs not allowed at the AO level? I don't see a technical reason why this shouldn't work.

@jdshaw
Copy link
Contributor

jdshaw commented Oct 26, 2020

See #2035

@lorawoodford lorawoodford removed this from the 2.8.1 milestone Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config_changes PRs that include changes to config.rb/should be called out in release notes contractor Work completed by program-employed contract developers translations Adds/updates translations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants