## Use case for NFDInspector
In this use case, we will explore the capabilities of NFDInspector a little further. We will go through a typical application. In our scenario, a collection of stereo glass plates was digitized as part of a digitization project. The associated metadata was partly generated from existing legacy data and partly created by new indexing.
NDFInspector for LIDO-xml offers the possibility to adapt the parameters of the analysis to the respective use case. For this purpose, we will take a closer look at the configuration file and the setting options within it.<br>
The LIDO xml files were created and collected in the /LIDO_xml folder. There are a total of 2,063 LIDO-records in separate xml files. The existing data was generated from the portal __[westfalen.museum-digital.de](https://westfalen.museum-digital.de/objects?s=collection:670)__. For more information on the specifications of the LIDO format, please refer to the __[documentation](https://cidoc.mini.icom.museum/working-groups/lido/lido-overview/about-lido/what-is-lido/)__.

### Initialization
To begin the inspection, the NFDInspector package is imported and the lido_inspector is initialized. English is selected as the output language.

In [1]:
from nfdinspector.lido_inspector import LIDOInspector

lido_inspector = LIDOInspector(error_lang='en')

### Configuration
At this point, we will take a closer look at the configuration and therefore the configuration file. As there is no ready-made file available, the standard configuration can be retrieved from __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__ by calling the __[``.configuration``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.configuration)__ property.. The dictionary created in this way is saved as a JSON file and can also be displayed in the notebook.

In [2]:
import json

lido_config = lido_inspector.configuration
with open('lido_config.json', "w") as outfile:
    json.dump(lido_config, outfile, indent=4)

lido_config

{'work_id': {'pattern': ''},
 'title': {'inspect': True,
  'unique': True,
  'distinct_from_type': True,
  'min_word_num': 2,
  'max_word_num': 20},
 'category': {'inspect': True,
  'ref': True,
  'patterns': {'label': '', 'ref': ''}},
 'object_work_type': {'inspect': True,
  'ref': True,
  'patterns': {'label': '', 'ref': ''}},
 'classification': {'inspect': True,
  'ref': True,
  'patterns': {'label': '', 'ref': ''}},
 'object_description': {'inspect': True,
  'unique': True,
  'min_word_num': 20,
  'max_word_num': 500},
 'materials_tech': {'inspect': True, 'ref': True, 'differentiated': False},
 'object_measurements': {'inspect': True},
 'event': {'inspect': True, 'ref': True},
 'subject_concept': {'inspect': True, 'ref': True, 'min_num': 3},
 'resource': {'inspect': True},
 'record_type': {'inspect': True,
  'ref': True,
  'patterns': {'label': '', 'ref': ''}},
 'repository_name': {'inspect': True, 'ref': True},
 'record_source': {'inspect': True, 'ref': True},
 'record_rights': {'

When inspecting the configuration file, you can see the various setting options of the NFDInspector. In addition to the option to check whether certain data fields have entries, properties such as uniqueness within a data set, minimum and maximum length of entries, presence of referencing or ID can be checked. It is also possible to check entries for formal correctness using regular expressions.<br>
In this case, the metadata should be checked for compatibility with the requirements of the German Digital Library. These requirements are listed on the __[DDB website](https://wiki.deutsche-digitale-bibliothek.de/display/DFD/Anforderungen+an+die+Lieferdaten)__ and can be viewed there. The minimum requirements stipulate that eight metadata elements must be present:

- Data partner Identifier
    - unique and persistent identifier for institution supplying the dataset to the DDB
    - The identifier should preferably be an International Standard Identifier for Libraries and Related Organizations (ISIL), ISO 15511, assigned by the German ISIL Agency and Sigelstelle at the Staatsbibliothek zu Berlin. For museums, the ISIL identifiers are assigned by the Institute for Museum Research Berlin.
- Link to the digital object
- Legal status of the digital object
- Object title
    - The object title should not be longer than 200 characters
    - It is not sufficient to enter the object type or the object name as the object title for the DDB
- Object type
    - The Object type has to be refrenced by a controlled vocabulary
- Media type

<br>
A configuration file can map the requirements. To do this, we modify the standard configuration as follows. The already exported file can either be edited and read in again, or the configuration within the LIDOInspector instance can be changed. In this case we modify the standard configuration file to fit our needs. Therefore, we disable all other checks but the ones, specified above.

In [22]:
# read the modified configuration file
lido_inspector.config_file('../nfdinspector_tutorials/DDB_lido_config.json')

Now we can ingest the LIDO-xml files, we want to inspect by passing the containing directory to ``lido_inspector`` with the __[``.read_lido_files()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido_files)__ method and start the inspection according to our specifications in the config file.

In [23]:
lido_inspector.read_lido_files(files_path='../nfdinspector_tutorials/LIDO_xml/stereo_montandok')
lido_inspector.inspect()

The results of the inspections are stored as a list of dictionaries under __[``LIDOInspector.inspections``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.inspections)__ and can be accessed from there. Using __[``.to_json()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_json)__ we can write the results of the inspection to a JSON file.

In [None]:
lido_inspector.to_json(file_path='../nfdinspector_tutorials/results/results_DDB.json', indent=4)

To clean up the data, we remove all ``keys`` containing ``None`` and save the new list of results as ``cleaned_results``.

In [None]:
cleaned_results_DDB = [
    {key: value for key, value in inspection.items() if value is not None}
    for inspection in lido_inspector.inspections
]

### Data quality

What we have measured here has little significance for the formal quality of the metadata. Rather, we checked the compatibility with the DDB's requirements for data delivery. As was to be expected, the dataset fulfills these requirements. <br>
In order to measure the quality with the tool, we need better parameters against which the data records can be checked. The __[Minimum Record Working Group](https://wiki.deutsche-digitale-bibliothek.de/pages/viewpage.action?pageId=218628097)__ (__[AG Minimaldatensatz](https://wiki.deutsche-digitale-bibliothek.de/pages/viewpage.action?pageId=120422678)__) offers a possible approach to this. The Minimum Record Recommendation helps museums and collections publish their object data online while adhering to the FAIR and CARE principles. It defines a basic dataset structure that ensures a minimum standard of data quality while remaining accessible, even for institutions with limited resources. The goal is to provide consistent, high-quality, and reusable data for identifying and describing collection objects. The recommendation also supports museums in preparing their data for modern use cases like Linked Open Data.
We can incorporate these parameters into a configuration file and test the quality of our data based on the recommendation.

As discussed in the introduction, our aim is to test the meta data quality of newly created catalogue items. For this purpose, during testing the __[specifications](https://wiki.deutsche-digitale-bibliothek.de/pages/viewpage.action?pageId=218628099)__ of the Minimum Record Recommendation Working Group in regard to data elements which are populated during cataloguing are observed. The following table shows the relationship between the above specifications and the settings in the configuration file.

| MDR                            | config element      |
|--------------------------------|---------------------|
| Object title or name           | ``title``               |
| Object type or designation     | ``object_work_type``    |
| Classification(recomended)     | ``classification``      |
| Inventory number               | ``work_id``             |
| Object description(recomended) | ``object_description``  |
| Materials(recomended)          | ``materials_tech``      |
| Techniques(recomended)         | ``materials_tech``      |
| Measurements(recomended)       | ``object_measurements`` |
| Event in objekt history        | ``event``               |
| Subject keyword (recommended)  | ``subject_concept``     |
| Media file                     | ``resource``            |

The other options are disabled to make the result easier to use and read.

In [None]:
mdr_list = ['title', 'object_work_type', 'classification', 'work_id', 'object_description', 'materials_tech', 'object_measurements', 'event', 'subject_concept', 'resource']
for key in lido_inspector.configuration.keys():
    if key not in mdr_list:
        lido_inspector.configuration[key]['inspect'] = False

After we have modified the configuration, we can now perform the inspection by first reading the LIDO-XML data from a folder with the __[``.read_lido_files()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido_files)__ method and calling the __[``.inspect()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.inspect)__ method afterwards.

In [None]:
# reading LIDO-XML data
lido_inspector.read_lido_files('../nfdinspector_tutorials/LIDO_xml/stereo_montandok')
# executing the inspection
lido_inspector.inspect()

### Output
In addition to the methods described in the tutorial for outputting inspection results __[``.to_json()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_json)__ and __[``.to_csv()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_csv)__, it is also possible to customise the output to suit our requirements. For further processing and handling of the results, it would be desirable to summarise the data in an Excel spreadsheet. The formatting, filtering and evaluation options available here allow subsequent contributors to use the data in a low-threshold manner to identify and, if necessary, correct any problems with the indexing. We use the __[pandas](https://pandas.pydata.org/)__ library to clean up the table and output the file in the form of an Excel spreadsheet. This library needs to be installed on the machine.

In [None]:
import pandas as pd

df = pd.json_normalize(lido_inspector.inspections).dropna(axis=1, how="all")
for col in df:
    if col in ["lidoRecID", "workID", "id", "unitid"]:
        continue
    df[col] = df[col].str.join("; ")
df.to_excel('../nfdinspector_tutorials/results/LIDO_results.xlsx', index=False, sheet_name="inspections", na_rep="-")