Skip to content

Generate CollectionSpace data overviews from profile/tenant configs

License

Notifications You must be signed in to change notification settings

collectionspace/cspace-config-untangler

Repository files navigation

CspaceConfigUntangler

Conceptual overview

Reads JSON config file output by CollectionSpace application.

Gets field definitions (field_defs), including repeatability, data type, value source, XML field name and parents, etc.

Gets fields as defined for use in forms (form_fields), including the panel in which the field is included, and UI hierarchy.

Gets messages assigned to fields, panels, and input tables from field_defs and the messages hash under the profile and record types. It is assumed messages set at profile level will override those at lower levels

For a given profile, matches each form_field to its corresponding field_def and creates a field object that combines all info for the field. If a form_field represents a field group populated with structured date fields, the individual structured date fields are provided from the extension, and the original form_field is treated as the parent UI grouping.

Note: there may be field_defs in a profile which do not match any form_fields. Field objects are not created/reported for these, because if a field has not been made available for viewing/editing in a form, it is not considered included in the profile.

Setup

  • Tested with Ruby 3.1.0

  • Do bundle --version

    • If the version of Bundler is lower than 2.2.29, do gem update bundler

    • Bundler should come standard with Ruby 2.7.0, but may be an older version. If you get an error that you don’t have Bundler installed when you try to check the version, do gem install bundler

  • Clone this repo

  • cd into cloned directory

  • bundle/install

  • Download your configs into the appropriate data/configs directory or directories

  • Configure your settings in lib/cspace_config_untangler.rb.

Optional: add the exe directory to your PATH

The benefit of this is that you can run ccu from the command line anywhere to interact with the application. If you don’t do this, you can still use the tool, but must cd into the cloned repository directory and use exe/ccu when entering a command in your terminal.

The way you do this is different depending on your operating system, terminal configuration, and whether you want it to be permanent or not, so google it.

Optional: create a client connection config file

Why/when is this necessary?

Unfortunately the UI config JSON file does not contain any information about which vocabularies (dynamic term lists) are configured for an instance. The only way to get this data programmatically is via API calls.

For the purposes of this code base, that means using collectionspace-client to interact with the API, and this requires, at minimum, management of authentication credentials for a user with at least the TENANT_READER role/permission. The purpose of cspace-config-untangler is solely to read available CollectionSpace configuration info, not to make any changes to any CollectionSpace instance, so we recommend against configuring connections with more than TENANT_READER permissions.

Currently, you only need to create this config file if you wish to run commands found beneath ccu vocabs. If you run one of those commands on a profile without a client connection config, it will just fail gracefully, telling you that you need to configure a connection.

You do not need to create a client connection config file if you are working only with the community supported profiles. The Reader user in those instances is set up following a pattern which works without needing configuration.

Client connection config file location

By default, the application will look for the config file in (your home directory}/.config/cspace-config-untangler/client_connection_config.yml. You can override that that by changing the default value of the :client_connection_config_path setting in lib/cspace_config_untangler.rb from nil to your path (a String).

Client connection config file format

The config file must be a valid YAML file.

Here is a sample, which will be explained below:

[source,yaml]
instanced:
  base_uri: https://collectionspace.institution_d_name.org/cspace-services
  username: readonlyuser@institution_d_name.org
  password: randomstring
  profile: anthro

The top level key is the profile name the connection will be associated, minus any version suffix associated with the profile/UI config file. For example, you may have a UI config file for instancea_8-0-1. The key here would be just instancea.

The profile key is not required, but can be used to indicate that the instance is using a UI config you have accessible in the Untangler.

⚠️
While we can keep previous versions of UI config files to compare or go back in time, this is not true for client connections. Those are always made to the current, live instance.
ℹ️
There is an --env option on relevant commands that allow you to specify that the dev or qa instances of community supported profiles should be used.

Usage

Once the setup is done, you should be able to cd into the cloned directory and type exe/ccu (or just ccu if you have installed as a gem) at the command prompt to get the list of available functions with their brief descriptions.

💡

The best source of info on what each function does and how to use it is the documentation available from the command line interface (CLI).

For the top-level command groups:

exe/ccu

For an overview of the specific commands inside a group (using the profiles group as an example):

exe/ccu profiles

For details on usage of a specific command (using the profiles compare command as an example):

exe/ccu profiles help compare

There are detailed instructions for some common tasks in the doc directory.

Known limitations/issues

General

This tool can only be used confidently with configs from CollectionSpace 6.1 and newer
  • For 5.2 configs, data source values are not consistently supplied for structured date fields. This is because configuration of the structured date fields was not written out to the JSON config in a standard way until 6.0.

  • The 6.1 release further refined the JSON config output allowing the full functionality of this tool

  • Does not currently report on fields in the ns2:collectionspace_core namespace

  • Does not currently report on fields in the rel:relations-common-list namespace because the way this data is defined in the config is very different from the rest

  • contact and blob get reported/treated as extensions within the tool, rather than sub-records

  • Does not support fields in custom namespaces added to contact or blob

Working with non-community profiles

  • Do exe/ccu fields csv -p all and check whether the data_type column has any blank values. If so, probably your profile has configured some fields from extensions in an unexpected manner. This can cause forms/default/props/subpath values (used to create form_field ids) to not match the fields/document/…​/{fieldname}/[config]/messages/name/id values (used to create field_def ids) for some fields. The Untangler is then unable to match up form_field info with field_def info to generate the necessary combined field info required for fully-populated fields CSV, CSV template, and RecordMapper output. You’ll need to do some hard-coding somewhere in the code to get a match

  • Do you have fields with the same name in different namespaces in the same record type? Use exe/ccu fields nonunique to generate a listing of any such fields.

    • The code tries to automatically fix this here but if any non-unique field names are sneaking through, you may need to hard-code something to fix this. Otherwise, you will get two columns in your CSV template with the same header and it won’t be clear which field that data should be imported into.

  • If you have record types with (a) no required field; or (b) multiple required fields, you will need to hard-code identifier_field values in record_mapper.rb’s `get_id_field method.

  • The mini template for a record type is ignored as a source for field information. If you have a field that is used only in a mini template, it will not be included in the field data, mappers, or CSV templates this tool produces.

  • RECOMMENDED: add your profile name and the last version of that profile that should be handled with fancy column/fieldname style. If you do not configure this for your profile, you will get warnings on the screen and in your log file, and data exported from CollectionSpace for round-tripping with the CSV importer may not be importable without fixing some column headers. See Other topics > Column styles for more explanation.

Other topics

JSON config source files

Since there is no way to programmatically grab the JSON config, this currently requires you to manually download the JSON config files from the following links. The JSON files should be saved as {profilename}.json in the data/configs directory.

You must follow the config naming conventions specified below in order for the Untangler to properly identify profile name and version!

And for the latest dev versions of profiles:

Set CCU.const_set('MAINPROFILE') value in lib/cspace_config_untangler.rb.

Config (and resulting mapper/template) naming conventions

Config file name must contain the profile name and profile version.

Use _ (underscore) to separate the profile name and profile version sections of the name.

Use - (hyphen) to separate words/numbers within a section.

Examples:

anthro_4-1-2.json

my-custom-config_2-0.json

This allows the Untangler to split the config file name on _ and unambiguously determine profile name vs. profile version.

Output files follow the same convention, adding the recordtype section:

anthro_4-1-2_concept-associated.json

Column styles ("last fancy column version")

This is related to:

  • the field names/column headers in CSVs exported from CollectionSpace

  • the field names/column headers in the CSV templates generated by this tool, and for which mapping instructions are generated for CSV import

💡

You can pretty much ignore this if:

  • you are using a pre-6.1 release of CollectionSpace, since you are unable to export data in CSV from search results.

  • you are not roundtripping exported data from CollectionSpace back in via the CSV Import Tool

If you are annoyed by warnings about it on the screen and in your logs, you can configure it, but it won’t really matter what you enter as the last fancy column version

This mainly affects fields which may be populated with terms from multiple authorities, where several columns of CSV data map into one CollectionSpace data field.

Prior to CollectionSpace 7.0, CollectionSpace export and this tool both tried to create shorter, less redundant column names using a more "fancy" algorithm, but the two tools ended up creating columns with slightly different names. We realized this, and the fact that it would require more data prep for roundtripping, while building 7.0.

In CollectionSpace 7.0 and beyond, the column names are longer and sometimes a bit internally redundant, but they are consistent with each other for both export and import.

For the community profiles, we increment the profile version with each CollectionSpace release, so the version used with 6.1 is enterd in the settings as the last fancy version for each profile.

If this affects you, add a line for your profile to the default_last_fancy_column_versions hash, and include the version of your profile that was used with CollectionSpace 6.1.

If you do not configure this for your profile, the consistent column naming style will be used.

If you are on 6.1 and configure this correctly, you will get fancy column headers. You may still have to fix some column names for import (the pre-processing step of the import will warn you about them). You would have to fix a lot more column names if you are exporting from 6.1 (fancy export column names), but using the consistent headers in your CSV import data.