This set of program examples accomodates workflows for tape optimized recalls of files using the metadata management capabilities of IBM Spectrum Discover.
The solution where these progam examples are used is shown in the figure below:
Files are stored in an IBM Spectrum Scale file system that is space managed by IBM Spectrum Archive Enterprise Edition. The user can see and access files, except for migrated files. Accessing migrated files requires interaction with the metadata management system provided by IBM Spectrum Discover.
IBM Spectrum Discover catalogs the metadata for files stored in the IBM Spectrum Scale file system, including the migration state for files. Furthermore, IBM Spectrum Discover allows adding custom tags for files that are migrated and must be recalled in a tape optimized manner. This gives the user to ability to see the migration state and tag files to be recalled.
The program examples in the repository use the IBM Spectrum Discover REST APIs.
For more information about the solution refer to this article.
This project is under MIT license.
The following workflow can be accomodated using the programming examples in this repository. These programming examples use the IBM Spectrum Discover REST APIs. The programming examples must be run on a server that has access to the files in the data source (Storage Scale file system).
-
The user can query the IBM Spectrum Discover metadata catalog to determine the migration state of files, using the lstag.sh program.
-
When the user requires access to migrated files, then he can add a custom recall tag to the files in the IBM Spectrum Discover metadata catalog, using the ftag.sh program.
-
The scheduled recallTagged.sh program periodically queries the IBM Spectrum Discover metadata catalog for files that are migrated and have the custom recall tag set. With this query the scheduled process composes a list of files that must be recalled and invokes the IBM Spectrum Archive Enterprise Edition tape optimized recall function. This process must be scheduled and executed on an IBM Spectrum Archive Enterprise Edition server.
-
Once the recall is completed the scheduled scancol.sh scans the IBM Spectrum Scale data source to update the migration status of all files. Furthermore, it updates the collection containing the set of files that are space managed by IBM Spectrum Archive Enterprise Edition. And it resets the custom recall for those files that are no longer migrated.
To accomodate this workflow, the solution must be deployed and prepared.
The following preparation steps must be performed in the IBM Spectrum Discover server to accomodate the workflow. In each step some configuration variables must be recorded:
-
Add the IBM Spectrum Scale data source and record the
data source nameandfile system name. -
When required then create a collection that includes the partition of the file system that is space managed. The partition can be the entire file system with some exclusions or it can be a fileset that is space managed. The collection can be used to narrow the scope of metadata records. Record the
collection name. -
Optionally, create a custom tag that is used to tag files for recall. The tag type can be Open, Restricted or Characteristics. Record the
tag nameandtag value. If the tag is not created, then the programftag.shwill automatically create it. -
Optionally, create an IBM Spectrum Discover user with the role Data Admin. Record the
data admin usernameand thedata admin user password.
Once these steps are completed, the program examples can be installed and configured.
Clone this repository to a server, that can access the files in the data source:
# git clone git@github.com:IBM/discover-tape-recall-integration.git
Enter the configuration variables that were create in the preparation step above in the file configParms.rc. The configuration parameters defined in this file are sourced in all other program examples:
| Parameter | Required | Description |
|---|---|---|
| sdServer | Yes | IP alias or address of the IBM Spectrum Discover server |
| sdUser | Yes | data admin username of the Data Admin user that was created in the IBM Spectrum Discover server |
| sdPasswd | Yes | data admin user password of the Data Admin user that was created in the IBM Spectrum Discover server |
| sdDb | Yes | Name of the IBM Spectrum Discover data base. The DB-name metaocean is default. |
| collName | No | collection name is the name of the collection(s) that represents the metadata records for the space managed file system partition(s). If no collection is used, then set this parameter to collName="". If one or more collection are used, then set the name to collName="'collection-name1', 'collection-name2, ..." |
| tagName | Yes | tag name is the name of the custom tag that is used to tag files for recall. If the tag was not created upfront, then the program creates the tag automatically. The default tag name is recallMe. |
| tagValue | Yes | tag value is the value of the custom tag tag name that is set for the files. The default tag value is true, indicating that the file must be recalled |
Get familiar with the programm examples below.
lstag.sh - display migration status and tags
This program allows the user to display the migration state and the value of the tag recallMe for a given path and file name specification. It queries the IBM Spectrum Discover metadata catalog with the filter provided by the user. The filter is a path and file name specification and can either be a fully qualified path name or a fully qualified file name. Wildcards are not currently supported.
The example below shows the selected metadata fields for file in path /ibm/fs1/discover1/test1:
# lstag.sh /ibm/fs1/discover1/test1
State Size recallMe Collection Path-and-Filename
------ ------- -------- ---------- -------------------
migrtd 857088 true archivecollection /ibm/fs1/discover1/test1/file_8.pdf
migrtd 788480 true archivecollection /ibm/fs1/discover1/test1/file_9.pdf
migrtd 236550 true archivecollection /ibm/fs1/discover1/test1/file_0.pdf
premig 848896 false archivecollection /ibm/fs1/discover1/test1/file_1.pdf
migrtd 290816 true archivecollection /ibm/fs1/discover1/test1/file_2.pdf
premig 599040 false archivecollection /ibm/fs1/discover1/test1/file_3.pdf
migrtd 386048 true archivecollection /ibm/fs1/discover1/test1/file_4.pdf
migrtd 795648 true archivecollection /ibm/fs1/discover1/test1/file_5.pdf
migrtd 644096 true archivecollection /ibm/fs1/discover1/test1/file_6.pdf
migrtd 117760 true archivecollection /ibm/fs1/discover1/test1/file_7.pdfftag.sh - set the recallMe tag to true
This program allows the user to tag metadata records for a given path and file name specification with the tag recallMe=true. It updates and executes an auto-tagging policy in the IBM Spectrum Discover server that adds the tag recallMe=true to metadata records matching the path and file name specification and where the state is migrated. The user provided path and file name specification and can either be a fully qualified path name or a fully qualified file name. Wildcards are not currently supported.
In the example the file /ibm/fs1/discover1/test1/file_1.pdf is tagged with the recallMe=true tag. Before the tag is added, the state of the file is the following in IBM Spectrum Discover:
# lstag.sh /ibm/fs1/discover1/test1/file_1.pdf
State Size recallMe Collection Path-and-Filename
----- ------- -------- ---------- -----------------
migrtd 848896 false archivecollection /ibm/fs1/discover1/test1/file_1.pdfAdding the tag:
# ftag.sh /ibm/fs1/discover1/test1/file_1.pdf
Info: checking if tag recallMe exists.
Info: creating and executing policy to tag the filesFinally, check the state again. The tag was successfully added:
# lstag.sh /ibm/fs1/discover1/test1/file_1.pdf
State Size recallMe Collection Path-and-Filename
----- ------- -------- ---------- -----------------
migrtd 848896 true archivecollection /ibm/fs1/discover1/test1/file_1.pdfrecallTagged.sh - recall tagged files
This program queries the metadata catalog for files in a specified collection that have the tag recallMe set to true and recalls these files. This program must run on an IBM Spectrum Archive server because it uses the eeadm recall command. The collection is provided as input parameter by the user.
The example below recalls all tagged files in the archivecollection:
# recallTagged.sh archivecollection
Info: Checking configuration parameters.
Info: obtaining file list from Spectrum Discover.
Info: recalling 10 files.
2021-12-31 10:33:51 GLESL268I: 10 file name(s) have been provided to recall.
2021-12-31 10:33:54 GLESL839I: All 10 file(s) has been successfully processed.This program is not intended for use by the user of the file system. It is an administrative program that the administrator of the IBM Spectrum Archive EE system should use. This program can be scheduled to run in certain intervals.
Note, after recalling files using IBM Spectrum Archive EE, the metadata records in the IBM Spectrum Discover catalog are not automatically updated. An additional program is used to update the catalog.
scancol.sh - Update metadata catalog
This program updates the IBM Spectrum Discover catalog for a specified data source and collection. It first scans the data source provided by the user as input parameter. Then it runs the collection policy for the collection provided by the user as input parameter. Finally, it runs a auto-tagging policy that sets the recallMe tag to false for all files that are not migrated in the collection.
The example below shows how to run this program for the data source name archive and the collection archivecollection (both created during preparation):
# scancol.sh archive archivecollection
Info: Checking configuration parameters.
--------------------------------------------------------------------
Info: checking and scanning data source connection archive
Info: Data source connection archive exists, scanning it.
Info: status: Complete
--------------------------------------------------------------------
Info: checking if collection policy exists.
Info: Collection policy archivecollection_tagpolicy exists, starting it.
Info: status: complete
---------------------------------------------------------------------
Info: checking if policy to remove tag recallMe exists.
Info: Starting policy recallMeNot-policy to remove the tag recallMe.
Info: status: completeAfter running the program scancol.sh, the tags state and recallMe were adjusted as show in the lstag.sh output below:
# lstag.sh /ibm/fs1/discover1/test1
State Size recallMe Collection Path-and-Filename
------ ------- -------- ---------- -------------------
premig 857088 false archivecollection /ibm/fs1/discover1/test1/file_8.pdf
premig 788480 false archivecollection /ibm/fs1/discover1/test1/file_9.pdf
premig 236550 false archivecollection /ibm/fs1/discover1/test1/file_0.pdf
premig 848896 false archivecollection /ibm/fs1/discover1/test1/file_1.pdf
premig 290816 false archivecollection /ibm/fs1/discover1/test1/file_2.pdf
premig 599040 false archivecollection /ibm/fs1/discover1/test1/file_3.pdf
premig 386048 false archivecollection /ibm/fs1/discover1/test1/file_4.pdf
premig 795648 false archivecollection /ibm/fs1/discover1/test1/file_5.pdf
premig 644096 false archivecollection /ibm/fs1/discover1/test1/file_6.pdf
premig 117760 false archivecollection /ibm/fs1/discover1/test1/file_7.pdfThis program is not intended for use by the user of the file system. It is an administrative program that the administrator of the IBM Spectrum Archive EE system should use. This program can be scheduled to run in certain intervals, perhaps it may be executed right after the tape optimized recall.
