## LIDOInspector Tutorial

### Authors
Rodion Lischnewski

### Introduction
NFDInspector is designed to facilitate the inspection of formal quality issues pertaining to research data. It is currently compatible with the LIDO and EAD metadata standards. The project has been funded by the “4Memory Incubator Funds” of the NFDI4Memory consortium and is being developed and maintained by the Montanhistorisches Dokumentationszentrum (montan.dok) of the Deutsches Bergbau-Museum Bochum.

### Target Group
This tutorial is aimed at new users of the NFDInspector. The aim is to make it easier to get started using the tool based on a use case presented here. A basic understanding of Python programming is required to use the tool independently. Due to the open source architecture, the tool can be integrated into other programs, tools and applications.

### Requirements
It is assumed that Python version 3.10 is installed on the machine. Further the lxml library is required to use the NFDInspector package. Besides, the installation of packages that are required for further use is recommended:
+ __[lxml](https://lxml.de/)__
+ __[pandas](https://pandas.pydata.org/)__
+ __[json](https://docs.python.org/3/library/json.html)__


### Learning goals
* [Installation](#installation)
* [Read metadata records from different sources](#read-metadata-records-from-different-sources)
* [Customize inspection configuration](#configuration)
* [Carry out inspection](#inspection)
* [Process or output the results](#file-output)

### Installation
The __[NFDInspector](https://github.com/montan-code/nfdinspector)__ package includes modules for the inspection of LIDO-xml and EAD-xml formats as standard. To install NFDInspector using pip on macOS or Linux, run:

In [None]:
python3 -m pip install nfdinspector

To install with pip under Windows, run:

In [None]:
py -m pip install nfdinspector

### Import and initialize:
You can import the __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__, a class from the __[``nfdinspector.lido_inspector``](https://montan-code.github.io/nfdinspector/nfdinspector.html#module-nfdinspector.lido_inspector)__ module.
While initiliazing the LIDOInspector, you can specify the language for the error messages. Though currently only ``'en'`` and ``'de'`` are available. In our case we will stick to english output.

In [2]:
from nfdinspector.lido_inspector import LIDOInspector

lido_inspector = LIDOInspector(error_lang='en')

### Read metadata records from different sources

The easiest way to ingest metadata is from a standalone ``.xml`` file. XML files can also contain more than one LIDO-object. The NFDInspector can destinguish between LIDO-objects in one file and store them separately in a list, which is the property __[``lido_objects``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.lido_objects)__ of the __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__ class as.

In [3]:
file_path = '../nfdinspector_tutorials/LIDO_xml/23310.xml'
lido_inspector.read_lido_file(file_path)

### Read metadata from multiple sources

You can read several files by specifying the folder path containing the ``.xml`` files.

In [3]:
lido_inspector.read_lido_files('../nfdinspector_tutorials/LIDO_xml')

Alternatively, you can parse LIDO-XML directly from a string and forward it to the Inspector by using the function __[``.read_lido()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido)__.
This is useful if you are implementing the NFDIsnpector functionality into a larger workflow.

In [None]:
lido_inspector.read_lido(lido_xml_string)

### Configuration
NFDInspector offers the ability to customise the inspection to your specific needs. If no special configuration is specified, the built-in configuration will be used. 
Customisation is usually done via a configuration file. Configurations can be exported and imported in JSON format.
The default configuration is variable and can be changed with package updates. To view the current default configuration, it can be retrieved from __[``LIDOInspector()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector)__ by calling the __[``.configuration``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.configuration)__ property. The dictionary lists the default configuration

In [18]:
lido_config = lido_inspector.configuration

The following script will store the configuration in JSON format in the ``config_path.json`` location.

In [22]:
import json

config_path = 'lido_config.json'
with open(config_path, 'w') as outfile:
    json.dump(lido_config, outfile, indent=4)

The __[``.config_file()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.config_file)__ function can be used to import configuration files from JSON files with compatible syntax.

In [20]:
lido_inspector.config_file(config_path)

We will take a closer look at the configuration options later on. For now, let's take a look at the basic functions.

### Inspection
First we read the xml LIDO files by passing the path to the __[``.read_lido_files()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.read_lido_files)__ function. The data is then inspected based on the settings in the [configuration](#configuration). This is done by calling the __[``.inspect()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.lido_inspector.LIDOInspector.inspect)__ function.<br>


In [4]:
lido_inspector.inspect()

The results of the inspections are stored as list of dictionaries under __[``LIDOInspector.inspections``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.inspections)__ and can be accessed from there.

In [None]:
print(lido_inspector.inspections)

### File output

Using __[``.to_json()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_json)__ and __[``.to_csv()``](https://montan-code.github.io/nfdinspector/nfdinspector.html#nfdinspector.metadata_inspector.MetadataInspector.to_csv)__ it is possible to output the inspection results in the appropriate formats. The indentation level of the JSON file can be specified with the ``indent`` parameter, the file path for the output file can be specified respectively in the ``to_json("filepath", indent=4)`` function. A ``delimiter`` for the csv format output can likewise be specified in the function as a string ``.to_csv("filepath", delimiter=';')``

In [15]:
lido_inspector.to_json('../nfdinspector_tutorials/results/results.json', indent=4)
lido_inspector.to_csv('../nfdinspector_tutorials/results/results.csv', delimiter=';')

For further informations please refer to the __[NFDInspector Documentation](https://montan-code.github.io/nfdinspector/)__