# RUDI Node tools: *rudinode-read* library

This library offers tools to take advantage of the [external API](https://app.swaggerhub.com/apis/OlivierMartineau/RUDI-PRODUCER) of a RUDI Producer node (also referred as RUDI node).

File [rudi_node_reader.py](lib_rudi_meta/rudi_node_reader.py) contains a class `RudiNodeReader` that makes it easier 
to access a RUDI Producer node data.
This class fetches all the metadata once and let you access the resulting metadata.

## Initializing the `RudiNodeReader` object
This object only takes the RUDI node URL as a parameter.
You can optionaly give some identifier that will be used for every request made to the node.
> RUDI node external API does not need any identification. But you can give an indication in the request header.

In [None]:
import os
import sys

print(os.getcwd())
sys.path.append('./src')



In [None]:
from rudi_node_read.rudi_node_reader import RudiNodeReader
from rudi_node_read.utils.log import log_d

rudi_node_url = 'https://bacasable.fenix.rudi-univ-rennes1.fr'
rudi_node_info = RudiNodeReader(server_url=rudi_node_url, headers_user_agent='RudiNodeGet-Example01')

## Access to metadata information
The `RudiNodeReader` object lets you access and take advantage of the metadata stored on the node:
- access to the full list of metadata

In [None]:
info_tag = 'RudiNode info'

log_d(info_tag, 'metadata nb', rudi_node_info.metadata_count)
log_d(info_tag, 'metadata list nb', len(rudi_node_info.metadata_list))
log_d(info_tag, 'metadata 1', rudi_node_info.metadata_list[0])

- access to the producers and contacts information

In [None]:
log_d(info_tag, 'list of producers', rudi_node_info.organization_list)
log_d(info_tag, 'producer names', rudi_node_info.organization_names)

log_d(info_tag, 'list of contacts', rudi_node_info.contact_list)
log_d(info_tag, 'contact names', rudi_node_info.contact_names)

- access to the classification tags

In [None]:
log_d(info_tag, 'themes', rudi_node_info.themes)
log_d(info_tag, 'keywords', rudi_node_info.keywords)

## Filtering the metadata
`RudiNodeReader` object offers some tools to filter the metadata with a partial JSON.
> You will need to understand how a RUDI metadata is structured to create adequate filters. See RUDI node [external API](https://app.swaggerhub.com/apis/OlivierMartineau/RUDI-PRODUCER) documentation for this.

> All the elements given in the filter are matched in the metadata that are kept in the end result of the filtering operation.

In [None]:
filter_tag = 'Filtering metadata'
example_filter = {'producer': {'organization_id': '1d6bc543-07ed-46f6-a813-958edb73d5f0', 'organization_name': 'SIB (Test)'}}
log_d(filter_tag, 'with JSON', rudi_node_info.filter_metadata(example_filter))

Some shortcuts have been implemented to make it easier to filter the metadata:

In [None]:

meta_producer = 'Univ. Rennes'
log_d(filter_tag, f"with producer name '{meta_producer}'", rudi_node_info.get_metadata_with_producer(meta_producer))

meta_contact = 'Bacasable'
log_d(filter_tag, f"with contact name '{meta_contact}'", rudi_node_info.get_metadata_with_contact(meta_contact))

meta_theme = 'citizenship'
log_d(filter_tag, f"with theme '{meta_theme}'", rudi_node_info.get_metadata_with_theme(meta_theme))

meta_keywords = ['répartition', 'Commune']
log_d(filter_tag, f"with keywords '{meta_keywords}'", rudi_node_info.get_metadata_with_keywords(meta_keywords))

log_d(filter_tag, "with available media", len(rudi_node_info.metadata_with_available_media))


Additional code is provided to find a metadata:

In [None]:
find_tag = 'Finding a metadata'

meta_id = 'f48b4bcd-bba3-47ba-86e6-c0754b748728'
log_d(find_tag, f"with metadata uuid '{meta_id}'", rudi_node_info.find_metadata_with_uuid(meta_id))

meta_title = 'parcours pédestre sur la ville de rennes'
log_d(find_tag, f"with metadata title '{meta_title}'", rudi_node_info.find_metadata_with_title(meta_title))

file_name = 'toucan.jpg'
log_d(find_tag, f"with file name '{file_name}'", rudi_node_info.find_metadata_with_media_name(file_name))

file_uuid = '782bab2d-7ee8-4633-9c0a-173649b4d879'
log_d(find_tag, f"with file uuid '{file_uuid}'", rudi_node_info.find_metadata_with_media_uuid(file_uuid))


## Downloading a file
`RudiNodeReader` object also provides a method to download the data stored on the node:

In [None]:
dwnld_tag = 'Downloading'
dwnld_dir = './dwnld'

log_d(dwnld_tag, f"media with uuid '{file_uuid}'", rudi_node_info.download_file_with_uuid(file_uuid, dwnld_dir))
log_d(dwnld_tag, f"media with name '{file_name}'", rudi_node_info.download_file_with_name(file_name, dwnld_dir))
log_d(dwnld_tag, f"media for metadata '{meta_id}'", rudi_node_info.download_files_for_metadata(meta_id, dwnld_dir))


## Dumping the metadata into a file
`RudiNodeReader` object also provides a method to dump the metadata into a local file:

In [None]:
rudi_node_info.save_metadata_to_file(dwnld_dir, 'rudi_node_metadata.json')


## Analysing the (meta)data on the RUDI node
Example: extracting the proportions of file types on a RUDI node

In [None]:
list_formats = {"total":0}
for metadata in rudi_node_info.metadata_list:
  for media_file in metadata["available_formats"]:
    if media_file["media_type"] == "FILE":
      file_type = media_file["file_type"]
      if list_formats.get(file_type) is None:
        list_formats[file_type] = 1
      else:
        list_formats[file_type] += 1
      list_formats["total"] += 1
total_nb = list_formats["total"]
for mime_type, nb in list_formats.items():
  if mime_type == 'total':
    print(f"{total_nb} files found on {rudi_node_url}")
  else:
    print(f"- {mime_type}: {round(nb/total_nb*100,2)}%")