# RUDI Node tools: *rudinode-read* library
This library offers tools to take advantage of the [external API](https://app.swaggerhub.com/apis/OlivierMartineau/RUDI-PRODUCER) of a RUDI Producer node (also referred as RUDI node).

## Initialization (optional)
You man need to install the dev requirements to be able to run this Python notebook from the source.
If not, skip the box bellow.

In [5]:
%%sh
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt




[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


You may also need to include the sources path to the PATH environment variable for the subsequent code to run correctly.
If not, skip th ebox bellow.

In [6]:
import os
import sys

print(os.getcwd())
sys.path.append('./src')

/Users/omartine/Wk/Dev/_Projets/Rudi/rudinode-libs/rudinode-read


## Initializing the `RudiNodeReader` object
File [rudi_node_reader.py](https://github.com/OlivierMartineau/rudi-node-read/blob/release/src/rudi_node_read/rudi_node_reader.py) contains a class `RudiNodeReader` that makes it easier 
to access the data and metadata on a RUDI Producer node.
This class fetches all the metadata once and let you access the resulting metadata.

The object only takes the RUDI node URL as a parameter.
You can optionaly give some identifier that will be used for every request made to the node.

RUDI node external API does not need any identification. But you can give an indication in the request header.

In [7]:
from rudi_node_read.rudi_node_reader import RudiNodeReader

rudi_node_url = 'http://127.0.0.1:3030'
rudi_node_info = RudiNodeReader(server_url=rudi_node_url, headers_user_agent='RudiNodeGet-OM')

D 2025-01-30 11:09:15 [RudiNodeReader.__init__] connecting
D 2025-01-30 11:09:15 [Connector] base_url: http://127.0.0.1:3030
D 2025-01-30 11:09:15 [RudiNodeConnector.request] to: http://127.0.0.1:3030/api/admin/hash
D 2025-01-30 11:09:15 [('RudiNodeConnector',)] redirection to /catalog/admin/hash
D 2025-01-30 11:09:15 [('RudiNodeConnector',)] base_url:: http://127.0.0.1:3030
D 2025-01-30 11:09:15 [('RudiNodeConnector',)] replaced:: /catalog/
D 2025-01-30 11:09:15 [('RudiNodeConnector',)] base_url:: http://127.0.0.1:3030
D 2025-01-30 11:09:15 [RudiNodeConnector.request] to: http://127.0.0.1:3030/catalog/admin/hash
D 2025-01-30 11:09:15 [('RudiNodeConnector',)] Node '127.0.0.1:3030': redirection OK
D 2025-01-30 11:09:15 [('RudiNodeConnector',)] Node '127.0.0.1:3030': connection OK
D 2025-01-30 11:09:15 [RudiNodeReader.__init__] {"scheme": "http", "host": "127.0.0.1:3030", "path": "", "base_url": "http://127.0.0.1:3030", "_headers": {"User-Agent": "RudiNodeGet-OM", "Content-type": "text/p

## Access to metadata information
The `RudiNodeReader` object lets you access and take advantage of the metadata stored on the node:
- access to the full list of metadata

In [8]:
info_tag = 'RudiNode info'

print(info_tag, 'metadata nb:', rudi_node_info.metadata_count)
print(info_tag, 'metadata list nb:', len(rudi_node_info.metadata_list))
print(info_tag, 'metadata 1:', rudi_node_info.metadata_list[0])

D 2025-01-30 11:09:15 [RudiNodeConnector.request] to: http://127.0.0.1:3030/catalog/v1/resources?limit=1
RudiNode info metadata nb: 2
D 2025-01-30 11:09:15 [RudiNodeConnector.request] to: http://127.0.0.1:3030/catalog/v1/resources?limit=1
D 2025-01-30 11:09:15 [RudiNodeConnector.request] to: http://127.0.0.1:3030/catalog/v1/resources?sort_by=-updatedAt&limit=2&offset=0
D 2025-01-30 11:09:15 [get_metadata_list] total: 2
D 2025-01-30 11:09:15 [get_metadata_list] len: 2
RudiNode info metadata list nb: 2
RudiNode info metadata 1: {'global_id': 'e717bc56-9027-4828-b3a9-ad81b5200409', 'local_id': 'id-prod-manager-soft-checks', 'resource_title': 'test CA 2009 - Ville de Rennes - Budget Principal', 'synopsis': [{'lang': 'fr', 'text': "Les données du Compte administratif de la Ville de Rennes sont des données de consommation effective qui sont en général publiées en juin de l'année suivante."}], 'summary': [{'lang': 'fr', 'text': "Les données du Compte administratif de la Ville de Rennes sont d

- access to the producers and contacts information

In [9]:
print(info_tag, 'list of producers:', rudi_node_info.organization_list)
print(info_tag, 'producer names:', rudi_node_info.organization_names)

print(info_tag, 'list of contacts:', rudi_node_info.contact_list)
print(info_tag, 'contact names:', rudi_node_info.contact_names)

RudiNode info list of producers: [{'organization_id': 'b7687eb8-3c1e-4b56-bf6c-42ef3adc9aeb', 'organization_name': 'UR1'}]
RudiNode info producer names: ['UR1']
RudiNode info list of contacts: [{'contact_id': 'b06c3183-458b-4c63-9842-da8c7dbf14b7', 'contact_name': 'Toto', 'email': 'toto@irisa.fr'}]
RudiNode info contact names: ['Toto']


- access to the classification tags

In [10]:
print(info_tag, 'themes:', rudi_node_info.themes)
print(info_tag, 'keywords:', rudi_node_info.keywords)

RudiNode info themes: ['children', 'economy']
RudiNode info keywords: ['CA', 'Compte administratif', 'budget', 'test']


## Filtering the metadata
`RudiNodeReader` object offers some tools to filter the metadata with a partial JSON.
> You will need to understand how a RUDI metadata is structured to create adequate filters. See RUDI node [external API](https://app.swaggerhub.com/apis/OlivierMartineau/RUDI-PRODUCER) documentation for this.

> All the elements given in the filter are matched in the metadata that are kept in the end result of the filtering operation.

In [11]:
filter_tag = 'Filtering metadata'
example_filter = {'producer': {'organization_id': '1d6bc543-07ed-46f6-a813-958edb73d5f0', 'organization_name': 'SIB (Test)'}}
print(filter_tag, 'with JSON:', rudi_node_info.filter_metadata(example_filter))

Filtering metadata with JSON: []


Some shortcuts have been implemented to make it easier to filter the metadata:

In [12]:

meta_producer = 'Univ. Rennes'
print(filter_tag, f"with producer name '{meta_producer}':", rudi_node_info.get_metadata_with_producer(meta_producer))

meta_contact = 'Bacasable'
print(filter_tag, f"with contact name '{meta_contact}':", rudi_node_info.get_metadata_with_contact(meta_contact))

meta_theme = 'citizenship'
print(filter_tag, f"with theme '{meta_theme}':", rudi_node_info.get_metadata_with_theme(meta_theme))

meta_keywords = ['répartition', 'Commune']
print(filter_tag, f"with keywords '{meta_keywords}':", rudi_node_info.get_metadata_with_keywords(meta_keywords))

print(filter_tag, "with available media:", len(rudi_node_info.metadata_with_available_media))


Filtering metadata with producer name 'Univ. Rennes': []
Filtering metadata with contact name 'Bacasable': []
Filtering metadata with theme 'citizenship': []
Filtering metadata with keywords '['répartition', 'Commune']': []
Filtering metadata with available media: 1


Additional code is provided to find a metadata:

In [13]:
find_tag = 'Finding a metadata'

meta_id = 'f48b4bcd-bba3-47ba-86e6-c0754b748728'
print(find_tag, f"with metadata uuid '{meta_id}':", rudi_node_info.find_metadata_with_uuid(meta_id))

meta_title = 'parcours pédestre sur la ville de rennes'
print(find_tag, f"with metadata title '{meta_title}':", rudi_node_info.find_metadata_with_title(meta_title))

file_name = 'toucan.jpg'
print(find_tag, f"with file name '{file_name}':", rudi_node_info.find_metadata_with_media_name(file_name))

file_uuid = '782bab2d-7ee8-4633-9c0a-173649b4d879'
print(find_tag, f"with file uuid '{file_uuid}':", rudi_node_info.find_metadata_with_media_uuid(file_uuid))


Finding a metadata with metadata uuid 'f48b4bcd-bba3-47ba-86e6-c0754b748728': None
Finding a metadata with metadata title 'parcours pédestre sur la ville de rennes': None
Finding a metadata with file name 'toucan.jpg': None
Finding a metadata with file uuid '782bab2d-7ee8-4633-9c0a-173649b4d879': None


## Downloading a file
`RudiNodeReader` object also provides a method to download the data stored on the node:

In [14]:
dwnld_tag = 'Downloading'
dwnld_dir = './dwnld'

print(dwnld_tag, f"media with uuid '{file_uuid}':", rudi_node_info.download_file_with_uuid(file_uuid, dwnld_dir))
print(dwnld_tag, f"media with name '{file_name}':", rudi_node_info.download_file_with_name(file_name, dwnld_dir))
print(dwnld_tag, f"media for metadata '{meta_id}':", rudi_node_info.download_files_for_metadata(meta_id, dwnld_dir))


Downloading media with uuid '782bab2d-7ee8-4633-9c0a-173649b4d879': None
Downloading media with name 'toucan.jpg': None
Downloading media for metadata 'f48b4bcd-bba3-47ba-86e6-c0754b748728': None


## Dumping the metadata into a file
`RudiNodeReader` object also provides a method to dump the metadata into a local file:

In [15]:
rudi_node_info.save_metadata_to_file(dwnld_dir, 'rudi_node_metadata.json')


## Analysing the (meta)data on the RUDI node
Example: extracting the proportions of file types on a RUDI node

In [16]:
list_formats = {"total":0}
for metadata in rudi_node_info.metadata_list:
  for media_file in metadata["available_formats"]:
    if media_file["media_type"] == "FILE":
      file_type = media_file["file_type"]
      if list_formats.get(file_type) is None:
        list_formats[file_type] = 1
      else:
        list_formats[file_type] += 1
      list_formats["total"] += 1
total_nb = list_formats["total"]
for mime_type, nb in list_formats.items():
  if mime_type == 'total':
    print(f"{total_nb} files found on {rudi_node_url}")
  else:
    print(f"- {mime_type}: {round(nb/total_nb*100,2)}%")

2 files found on http://127.0.0.1:3030
- application/json: 50.0%
- video/mp4: 50.0%
