# RUDI Node tools: *rudinode-read* library
This library offers tools to take advantage of the [external API](https://app.swaggerhub.com/apis/OlivierMartineau/RUDI-PRODUCER) of a RUDI Producer node (also referred as RUDI node).

## Initialization (optional)
You man need to install the dev requirements to be able to run this Python notebook from the source.
If not, skip the box bellow.

In [None]:
%%sh
VENV_DIR=.venv
source "$VENV_DIR/bin/activate" || python3 -m venv "$VENV_DIR" && source "$VENV_DIR/bin/activate"
alias python="$VENV_DIR/bin/python3"
alias pip="$VENV_DIR/bin/pip3"
pip install -r requirements-dev.txt

You may also need to include the sources path to the PATH environment variable for the subsequent code to run correctly.
If not, skip th ebox bellow.

In [None]:
import os
import sys

print(os.getcwd())
sys.path.append('./src')

## Initializing the `RudiNodeReader` object
File [rudi_node_reader.py](https://github.com/OlivierMartineau/rudi-node-read/blob/release/src/rudi_node_read/rudi_node_reader.py) contains a class `RudiNodeReader` that makes it easier 
to access the data and metadata on a RUDI Producer node.
This class fetches all the metadata once and let you access the resulting metadata.

The object only takes the RUDI node URL as a parameter.
You can optionaly give some identifier that will be used for every request made to the node.

RUDI node external API does not need any identification. But you can give an indication in the request header.

In [None]:
from rudi_node_read.rudi_node_reader import RudiNodeReader

rudi_node_url = 'https://bacasable.fenix.rudi-univ-rennes1.fr'
rudi_node_info = RudiNodeReader(server_url=rudi_node_url, headers_user_agent='RudiNodeRead-Example01')

## Access to metadata information
The `RudiNodeReader` object lets you access and take advantage of the metadata stored on the node:
- access to the full list of metadata

In [None]:
info_tag = 'RudiNode info'

print(info_tag, 'metadata nb:', rudi_node_info.metadata_count)
print(info_tag, 'metadata list nb:', len(rudi_node_info.metadata_list))
print(info_tag, 'metadata 1:', rudi_node_info.metadata_list[0])

- access to the producers and contacts information

In [None]:
print(info_tag, 'list of producers:', rudi_node_info.organization_list)
print(info_tag, 'producer names:', rudi_node_info.organization_names)

print(info_tag, 'list of contacts:', rudi_node_info.contact_list)
print(info_tag, 'contact names:', rudi_node_info.contact_names)

- access to the classification tags

In [None]:
print(info_tag, 'themes:', rudi_node_info.themes)
print(info_tag, 'keywords:', rudi_node_info.keywords)

## Filtering the metadata
`RudiNodeReader` object offers some tools to filter the metadata with a partial JSON.
> You will need to understand how a RUDI metadata is structured to create adequate filters. See RUDI node [external API](https://app.swaggerhub.com/apis/OlivierMartineau/RUDI-PRODUCER) documentation for this.

> All the elements given in the filter are matched in the metadata that are kept in the end result of the filtering operation.

In [None]:
filter_tag = 'Filtering metadata'
example_filter = {'producer': {'organization_id': '52e7fa02-1b5c-42f8-9738-816ad933bb17', 'organization_name': 'Univ. Rennes / IRISA - Logistica'}}
print(filter_tag, 'with JSON:', rudi_node_info.filter_metadata(example_filter))

Some shortcuts have been implemented to make it easier to filter the metadata:

In [None]:

meta_producer = 'Univ. Rennes / IRISA - Logistica'
print(filter_tag, f"with producer name '{meta_producer}':", rudi_node_info.get_metadata_with_producer(meta_producer))

meta_contact = 'Univ. Rennes / IRISA / Logistica'
print(filter_tag, f"with contact name '{meta_contact}':", rudi_node_info.get_metadata_with_contact(meta_contact))

meta_theme = 'education'
print(filter_tag, f"with theme '{meta_theme}':", rudi_node_info.get_metadata_with_theme(meta_theme))

meta_keywords = ['test']
print(filter_tag, f"with keywords '{meta_keywords}':", rudi_node_info.get_metadata_with_keywords(meta_keywords))

print(filter_tag, "with available media:", len(rudi_node_info.metadata_with_available_media))


Additional code is provided to find a metadata:

In [None]:
find_tag = 'Finding a metadata'

meta_id = '3bddc7c0-a3eb-4d51-b9c9-b5ea5199d15d'
print(find_tag, f"with metadata uuid '{meta_id}':", rudi_node_info.find_metadata_with_uuid(meta_id))

meta_title = 'parcours pédestre sur la ville de rennes'
print(find_tag, f"with metadata title '{meta_title}':", rudi_node_info.find_metadata_with_title(meta_title))

file_name = 'toucan.jpg'
print(find_tag, f"with file name '{file_name}':", rudi_node_info.find_metadata_with_media_name(file_name))

file_uuid = '782bab2d-7ee8-4633-9c0a-173649b4d879'
print(find_tag, f"with file uuid '{file_uuid}':", rudi_node_info.find_metadata_with_media_uuid(file_uuid))


## Downloading a file
`RudiNodeReader` object also provides a method to download the data stored on the node:

In [None]:
dwnld_tag = 'Downloading'
dwnld_dir = './dwnld'

print(dwnld_tag, f"media with uuid '{file_uuid}':", rudi_node_info.download_file_with_uuid(file_uuid, dwnld_dir))
print(dwnld_tag, f"media with name '{file_name}':", rudi_node_info.download_file_with_name(file_name, dwnld_dir))
print(dwnld_tag, f"media for metadata '{meta_id}':", rudi_node_info.download_files_for_metadata(meta_id, dwnld_dir))


## Dumping the metadata into a file
`RudiNodeReader` object also provides a method to dump the metadata into a local file:

In [None]:
rudi_node_info.save_metadata_to_file(dwnld_dir, 'rudi_node_metadata.json')


## Analysing the (meta)data on the RUDI node
Example: extracting the proportions of file types on a RUDI node

In [None]:
list_formats = {"total":0}
for metadata in rudi_node_info.metadata_list:
  for media_file in metadata["available_formats"]:
    if media_file["media_type"] == "FILE":
      file_type = media_file["file_type"]
      if list_formats.get(file_type) is None:
        list_formats[file_type] = 1
      else:
        list_formats[file_type] += 1
      list_formats["total"] += 1
total_nb = list_formats["total"]
for mime_type, nb in list_formats.items():
  if mime_type == 'total':
    print(f"{total_nb} files found on {rudi_node_url}")
  else:
    print(f"- {mime_type}: {round(nb/total_nb*100,2)}%")

## Display a summary of the metadata on the RUDI node 

Example : Display all the catalogue

In [None]:
print(rudi_node_info.catalogue_summary)

Example : Display only some metadatas 

In [None]:
example_uuid = rudi_node_info.metadata_list[0:2]
print(rudi_node_info.create_textual_description_metadata(example_uuid))