A Python client for downloading data from https://nerdpool-api.acdh-dev.oeaw.ac.at
pip install nerdpool_client
from nerdpool_client import NerdPoolClient
client = NerdPoolClient()
print(client.data_sets)
# ['RTA', 'RITA', 'MRP', 'Chronik Aldersbach', 'DIPKO']
- go to nerdpool-api and create/filter you'r prefered data sample; e.g. all samples from MRP:
from nerdpool_client import NerdPoolClient
url = "https://nerdpool-api.acdh-dev.oeaw.ac.at/api/ner-sample/?format=json&ner_ent_type__contains=&ner_source__title=MRP"
client = NerdPoolClient()
client.dump_to_jsonl(url)
# 'out.jsonl'
- With
file_name_prefix
you can add a custom prefix to the default file namestrain.jsonl
andeval.jsonl
- The param
split
defines that eachsplit
sample should be saved intoeval.jsonl
and not intotrain.jsonl
from nerdpool_client import NerdPoolClient
url = "https://nerdpool-api.acdh-dev.oeaw.ac.at/api/ner-sample/?format=json&ner_ent_type__contains=&ner_source__title=MRP"
client = NerdPoolClient()
client.dump_to_train_eval(url, file_name_prefix="mrp__", split=10)
# ['mrp__train.jsonl', 'mrp__eval.jsonl]