In [1]:
from pprint import pprint
from bigbio.dataloader import BigBioConfigHelpers
from bigbio.utils.constants import Tasks

# BigBioConfigHelpers

Start by creating an instance of BigBioConfigHelpers. This will help locate and filter datasets available in the BigBIO package. 

In [2]:
conhelps = BigBioConfigHelpers()
print("found {} dataset configs from {} datasets".format(
    len(conhelps),
    len(conhelps.available_dataset_names)
))

found 453 dataset configs from 127 datasets


Each dataset has at least one source config and at least one bigbio config. Source configs attempt to preserve the original structure of the dataset while bigbio configs are normalized into one of several bigbio [task schemas](https://github.com/bigscience-workshop/biomedical/blob/master/task_schemas.md). Some datasets have several source configs and/or bigbio configs (e.g. multi-lingual datasets or datasets supporting multiple cross-validation folds). This is why the number of configs is greater than twice the number of datasets.

### Examine One Helper

conhelps is list-like and elements can be accesed with integer indices. Lets examine one.

In [3]:
pprint(conhelps[0])

BigBioConfigHelper(script='/home/galtay/repos/biomedical/bigbio/biodatasets/an_em/an_em.py', dataset_name='an_em', tasks=[<Tasks.NAMED_ENTITY_RECOGNITION: 'NER'>, <Tasks.COREFERENCE_RESOLUTION: 'COREF'>, <Tasks.RELATION_EXTRACTION: 'RE'>], languages=[<Lang.EN: 'English'>], config=BigBioConfig(name='an_em_source', version=1.0.4, data_dir=None, data_files=None, description='AnEM source schema', schema='source', subset_id='an_em'), is_local=False, is_pubmed=True, is_bigbio_schema=False, bigbio_schema_caps=None, is_large=False, is_resource=False, is_default=True, is_broken=False, bigbio_version='1.0.0', source_version='1.0.4', citation="@inproceedings{ohta-etal-2012-open,\n  author    = {Ohta, Tomoko and Pyysalo, Sampo and Tsujii, Jun{'}ichi and Ananiadou, Sophia},\n  title     = {Open-domain Anatomical Entity Mention Detection},\n  journal   = {},\n  volume    = {W12-43},\n  year      = {2012},\n  url       = {https://aclanthology.org/W12-4304},\n  doi       = {},\n  biburl    = {},\n  bi

### Show All Available Datasets

In [4]:
print(conhelps.available_dataset_names)

['an_em', 'anat_em', 'ask_a_patient', 'bc5cdr', 'bc7_litcovid', 'bio_sim_verb', 'bio_simlex', 'bioasq_2021_mesinesp', 'bioasq_task_b', 'bioasq_task_c_2017', 'bioinfer', 'biology_how_why_corpus', 'biomrc', 'bionlp_shared_task_2009', 'bionlp_st_2011_epi', 'bionlp_st_2011_ge', 'bionlp_st_2011_id', 'bionlp_st_2011_rel', 'bionlp_st_2013_cg', 'bionlp_st_2013_ge', 'bionlp_st_2013_gro', 'bionlp_st_2013_pc', 'bionlp_st_2019_bb', 'biored', 'biorelex', 'bioscope', 'biosses', 'cadec', 'cantemist', 'cas', 'cellfinder', 'chebi_nactem', 'chemdner', 'chemprot', 'chia', 'citation_gia_test_collection', 'codiesp', 'cord_ner', 'ctebmsp', 'ddi_corpus', 'diann_iber_eval', 'distemist', 'ebm_pico', 'ehr_rel', 'essai', 'euadr', 'evidence_inference', 'gad', 'genetag', 'genia_ptm_event_corpus', 'genia_relation_corpus', 'genia_term_corpus', 'geokhoj_v1', 'gnormplus', 'hallmarks_of_cancer', 'hprd50', 'iepa', 'jnlpba', 'linnaeus', 'lll', 'mantra_gsc', 'mayosrs', 'med_qa', 'medal', 'meddialog', 'meddocan', 'medhop',

### Show Helpers for Specific Dataset

We can also get the helpers for a specific dataset using the dataset name. 

In [5]:
bc5cdr_helpers = conhelps.for_dataset("bc5cdr")
print(len(bc5cdr_helpers))
pprint(bc5cdr_helpers[0].config)
pprint(bc5cdr_helpers[1].config)

2
BigBioConfig(name='bc5cdr_source', version=1.5.16, data_dir=None, data_files=None, description='BC5CDR source schema', schema='source', subset_id='bc5cdr')
BigBioConfig(name='bc5cdr_bigbio_kb', version=1.0.0, data_dir=None, data_files=None, description='BC5CDR simplified BigBio schema', schema='bigbio_kb', subset_id='bc5cdr')


# Loading Datasets

Each config helper provides a wrapper to the [load_dataset](https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/loading_methods#datasets.load_dataset) function from Huggingface's [datasets](https://huggingface.co/docs/datasets/) package. This wrapper will automatically populate the first two arguments of load_dataset,

* path: path to the dataloader script
* name: name of the dataset configuration

If you have a specific dataset and config in mind, you can,

* fetch the helper from conhelps with the for_config_name method
* load the dataset using the load_dataset wrapper

In [6]:
bc5cdr_source = conhelps.for_config_name("bc5cdr_source").load_dataset()
bc5cdr_bigbio = conhelps.for_config_name("bc5cdr_bigbio_kb").load_dataset()

Reusing dataset bc5cdr_dataset (/home/galtay/.cache/huggingface/datasets/bc5cdr_dataset/bc5cdr_source/1.5.16/f01f16ea9b65ead985bedadf7335195c32297c8f1b09417fc607b102a6757d6f)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bc5cdr_dataset (/home/galtay/.cache/huggingface/datasets/bc5cdr_dataset/bc5cdr_bigbio_kb/1.0.0/f01f16ea9b65ead985bedadf7335195c32297c8f1b09417fc607b102a6757d6f)


  0%|          | 0/3 [00:00<?, ?it/s]

This wrapper function will pass through any other kwargs you may need to use. For example data_dir for datasets that are not public,

In [7]:
# note this will not work unless you have the n2c2 dataset locally

#n2c2_2011_source = (
#    conhelps.
#    for_config_name("n2c2_2011_source").
#    load_dataset(data_dir="/path/to/n2c2_2011/data")
#)

# Filter and Load Multiple Datasets

You can use any attribute of a BigBioConfigHelper to filter the collection. Here are some examples,

### BigBIO schema datasets that are public and not "large"

In [8]:
bb_public_helpers = conhelps.filtered(
    lambda x:
        x.is_bigbio_schema
        and not x.is_local
        and not x.is_large
)

### Source schema for n2c2 datasets

In [9]:
n2c2_source_helpers = conhelps.filtered(
    lambda x:
        x.dataset_name.startswith("n2c2")
        and not x.is_bigbio_schema
)

### BigBIO schema datasets that are public and support textual entailment

In [10]:
entailment_helpers = conhelps.filtered(
    lambda x:
        x.is_bigbio_schema
        and not x.is_local
        and Tasks.TEXTUAL_ENTAILMENT in x.tasks
)

### Loading filtered datasets

Note that the `filtered` method returns another instance of `BigBioConfigHelpers`. This means you can iterate over any of the helpers defined above and load all of the datasets. 

In [11]:
print(len(bb_public_helpers))

182


In [12]:
# NOTE the first time you run this cell, the public datasets will be downloaded and cached.
# Depending on your internet connection speed, this can take minutes or hours. 

bb_public_datasets = {
    helper.config.name: helper.load_dataset()
    for helper in bb_public_helpers
    if helper.dataset_name != 'bioasq_2021_mesinesp' # intermintant failure of download from dataset host
}

Reusing dataset an_em_dataset (/home/galtay/.cache/huggingface/datasets/an_em_dataset/an_em_bigbio_kb/1.0.0/7a60ad16ca5e51e4ccc8d8de0169364df987f7c94f307050e0ddab5473d4f13b)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset anat_em_dataset (/home/galtay/.cache/huggingface/datasets/anat_em_dataset/anat_em_bigbio_kb/1.0.0/5fdfe355922d9812744a70efd7fe46a322f9dba70f3ed29464e0c1009e4d04a1)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset ask_a_patient (/home/galtay/.cache/huggingface/datasets/ask_a_patient/ask_a_patient_bigbio_kb/1.0.0/b3a42779cd6f10ed9a7d30be59cf32b0046c9d6e259e93c72fefbfd4f9716ef4)


  0%|          | 0/30 [00:00<?, ?it/s]

Reusing dataset bc5cdr_dataset (/home/galtay/.cache/huggingface/datasets/bc5cdr_dataset/bc5cdr_bigbio_kb/1.0.0/f01f16ea9b65ead985bedadf7335195c32297c8f1b09417fc607b102a6757d6f)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bc7_lit_covid_dataset (/home/galtay/.cache/huggingface/datasets/bc7_lit_covid_dataset/bc7_litcovid_bigbio_text/1.0.0/96c21c98083595b15fd7ba063c4400026ccd75027316a6099df49598a2c5c557)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bio_sim_verb (/home/galtay/.cache/huggingface/datasets/bio_sim_verb/bio_sim_verb_bigbio_pairs/1.0.0/602af796168079d1c6ebcbc5b8997f6ded65244319deed4432bd6347529e1f3a)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset bio_simlex_dataset (/home/galtay/.cache/huggingface/datasets/bio_simlex_dataset/bio_simlex_bigbio_pairs/1.0.0/5568efc37fbd642518f04c5d871e14fe7781fa3edea8fb8da96f75ed1c36d85c)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset bioinfer_dataset (/home/galtay/.cache/huggingface/datasets/bioinfer_dataset/bioinfer_bigbio_kb/1.0.0/333a78767f28b6fdb78f7a4b87035e5f15641caeba7da6a2602ddb0a00d098eb)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset biology_how_why_corpus_dataset (/home/galtay/.cache/huggingface/datasets/biology_how_why_corpus_dataset/biology_how_why_corpus_bigbio_qa/1.0.0/69a76824cbc92fc33ed0b2946cb68c69b78575d17f71031bb2eb42ead8269e8d)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset biomrc_dataset (/home/galtay/.cache/huggingface/datasets/biomrc_dataset/biomrc_small_A_bigbio_qa/1.0.0/50019e3cc2ca2b443c65db67862fc9ce568f86da69f534487269cd62032c8334)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset biomrc_dataset (/home/galtay/.cache/huggingface/datasets/biomrc_dataset/biomrc_tiny_A_bigbio_qa/1.0.0/50019e3cc2ca2b443c65db67862fc9ce568f86da69f534487269cd62032c8334)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset biomrc_dataset (/home/galtay/.cache/huggingface/datasets/biomrc_dataset/biomrc_small_B_bigbio_qa/1.0.0/50019e3cc2ca2b443c65db67862fc9ce568f86da69f534487269cd62032c8334)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset biomrc_dataset (/home/galtay/.cache/huggingface/datasets/biomrc_dataset/biomrc_tiny_B_bigbio_qa/1.0.0/50019e3cc2ca2b443c65db67862fc9ce568f86da69f534487269cd62032c8334)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset bio_nlp_shared_task2009 (/home/galtay/.cache/huggingface/datasets/bio_nlp_shared_task2009/bionlp_shared_task_2009_bigbio_kb/1.0.0/49d4711f376f923398b34edeae28c47bbe018dc93d7b2162107944a7b5a87fb6)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2011_epi (/home/galtay/.cache/huggingface/datasets/bionlp_st_2011_epi/bionlp_st_2011_epi_bigbio_kb/1.0.0/068468f74ae36d0767a9390eb74dad0da9406be80da64064507ca802bef279ab)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2011_ge (/home/galtay/.cache/huggingface/datasets/bionlp_st_2011_ge/bionlp_st_2011_ge_bigbio_kb/1.0.0/7b87d387e9f5526bf15744e17aed32a21d6f2817ca344d010e7f87e6b76a9101)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2011_id (/home/galtay/.cache/huggingface/datasets/bionlp_st_2011_id/bionlp_st_2011_id_bigbio_kb/1.0.0/134e4afb43363c8957ed14fe90a575ddb56f6dff0d2e52179a2a1227fa8f90f8)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2011_rel (/home/galtay/.cache/huggingface/datasets/bionlp_st_2011_rel/bionlp_st_2011_rel_bigbio_kb/1.0.0/d33ada70ddfc893084cf58c86b6bdd899d73dd0ad62b7520b6d09dd9a900b939)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2013_cg (/home/galtay/.cache/huggingface/datasets/bionlp_st_2013_cg/bionlp_st_2013_cg_bigbio_kb/1.0.0/27b65d5a82c6d5098b62be71d839a748db029255a4260a9de19d9d151c4c2544)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2013_ge (/home/galtay/.cache/huggingface/datasets/bionlp_st_2013_ge/bionlp_st_2013_ge_bigbio_kb/1.0.0/84681c21e48d265b447737ba168f34a5fe7ca227a8fe5de250aec0e24236316d)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2013_gro (/home/galtay/.cache/huggingface/datasets/bionlp_st_2013_gro/bionlp_st_2013_gro_bigbio_kb/1.0.0/adb60378934cfc5610bc57f7922f83f10de72b454ea0a21ebee65af51d069b30)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2013_pc (/home/galtay/.cache/huggingface/datasets/bionlp_st_2013_pc/bionlp_st_2013_pc_bigbio_kb/1.0.0/c8ed69d8cad83c76f51171c918327b1d92a01f305ece1bee9fe8b7b1255cf803)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bionlp_st_2019_bb (/home/galtay/.cache/huggingface/datasets/bionlp_st_2019_bb/bionlp_st_2019_bb_bigbio_kb/1.0.0/c8e8e9978d6d4c2aa476bfc58833d9133195f3e02f6716461c805d2223c7257c)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset biored_dataset (/home/galtay/.cache/huggingface/datasets/biored_dataset/biored_bigbio_kb/1.0.0/a591f9d3071b2ad87c7aa4ec6a12e03e98d44eedf450da79061997d9f85c29c1)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset bio_rel_ex_dataset (/home/galtay/.cache/huggingface/datasets/bio_rel_ex_dataset/biorelex_bigbio_kb/1.0.0/eb69ec962b1355047ec4ec079b38f159b99ce497294d337f408c507a10be9320)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset bioscope_dataset (/home/galtay/.cache/huggingface/datasets/bioscope_dataset/bioscope_bigbio_kb/1.0.0/fe0fbe3d3661dc97a5599e10bbff64f83a80cd04380d90d16bc5fda9e5f74aef)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset bioscope_dataset (/home/galtay/.cache/huggingface/datasets/bioscope_dataset/bioscope_abstracts_bigbio_kb/1.0.0/fe0fbe3d3661dc97a5599e10bbff64f83a80cd04380d90d16bc5fda9e5f74aef)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset bioscope_dataset (/home/galtay/.cache/huggingface/datasets/bioscope_dataset/bioscope_papers_bigbio_kb/1.0.0/fe0fbe3d3661dc97a5599e10bbff64f83a80cd04380d90d16bc5fda9e5f74aef)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset bioscope_dataset (/home/galtay/.cache/huggingface/datasets/bioscope_dataset/bioscope_medical_texts_bigbio_kb/1.0.0/fe0fbe3d3661dc97a5599e10bbff64f83a80cd04380d90d16bc5fda9e5f74aef)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset biosses_dataset (/home/galtay/.cache/huggingface/datasets/biosses_dataset/biosses_bigbio_pairs/1.0.0/89e91fa4056118a4c1683b47698dfef1746396bb0f0dafb15f85453fdefbcc98)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset cadec_dataset (/home/galtay/.cache/huggingface/datasets/cadec_dataset/cadec_bigbio_kb/1.0.0/e4a2b9f54ca80c14071a43f8cc4479e5b738fbe9c4f553bbc999dfcc8dcb0dd9)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset cantemist_dataset (/home/galtay/.cache/huggingface/datasets/cantemist_dataset/cantemist_bigbio_kb/1.0.0/7b8b18017e8a11c7113a7252b9c5bb6cf6cc669ae9c3509bc24fd4d02d469362)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset cantemist_dataset (/home/galtay/.cache/huggingface/datasets/cantemist_dataset/cantemist_bigbio_text/1.0.0/7b8b18017e8a11c7113a7252b9c5bb6cf6cc669ae9c3509bc24fd4d02d469362)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset cell_finder_dataset (/home/galtay/.cache/huggingface/datasets/cell_finder_dataset/cellfinder_bigbio_kb/1.0.0/6e13003de485ac8fca3ff82f8dab2b2d01615ab47d3cb1d15ac3da1ceea6c0c4)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset cell_finder_dataset (/home/galtay/.cache/huggingface/datasets/cell_finder_dataset/cellfinder_splits_bigbio_kb/1.0.0/6e13003de485ac8fca3ff82f8dab2b2d01615ab47d3cb1d15ac3da1ceea6c0c4)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset chebi_nactem_datasset (/home/galtay/.cache/huggingface/datasets/chebi_nactem_datasset/chebi_nactem_abstr_ann1_bigbio_kb/1.0.0/12a07a3b91cb99c2e43666adb248d3acb9330d9ce3e907ade974b6e5b8ffe7df)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset chebi_nactem_datasset (/home/galtay/.cache/huggingface/datasets/chebi_nactem_datasset/chebi_nactem_abstr_ann2_bigbio_kb/1.0.0/12a07a3b91cb99c2e43666adb248d3acb9330d9ce3e907ade974b6e5b8ffe7df)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset chebi_nactem_datasset (/home/galtay/.cache/huggingface/datasets/chebi_nactem_datasset/chebi_nactem_fullpaper_bigbio_kb/1.0.0/12a07a3b91cb99c2e43666adb248d3acb9330d9ce3e907ade974b6e5b8ffe7df)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset chemdner_dataset (/home/galtay/.cache/huggingface/datasets/chemdner_dataset/chemdner_bigbio_kb/1.0.0/f55818d2067367e05317b993e50675a4a9dedd1c8d8f5d49e33177f569afebd1)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset chemdner_dataset (/home/galtay/.cache/huggingface/datasets/chemdner_dataset/chemdner_bigbio_text/1.0.0/f55818d2067367e05317b993e50675a4a9dedd1c8d8f5d49e33177f569afebd1)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset chemprot_dataset (/home/galtay/.cache/huggingface/datasets/chemprot_dataset/chemprot_bigbio_kb/1.0.0/d621093ca60927acf7784c181b4a4f1067f655c83ac18af82cd1723d99c4df2a)


  0%|          | 0/4 [00:00<?, ?it/s]

Reusing dataset chia_dataset (/home/galtay/.cache/huggingface/datasets/chia_dataset/chia_bigbio_kb/1.0.0/cad9b1810e0d3b47812b1b6ce3a087254e0728f061b04f51e8ab7b6b1ffce813)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset citation_gia_test_collection (/home/galtay/.cache/huggingface/datasets/citation_gia_test_collection/citation_gia_test_collection_bigbio_kb/1.0.0/d873bd3cd5030334b3bfe603fe9333f592b486277bceb13c5385a8b318e3e06c)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset codiesp_dataset (/home/galtay/.cache/huggingface/datasets/codiesp_dataset/codiesp_D_bigbio_text/1.0.0/1c3821f6ed08e10dc0260d174d27b7fcd54192c17e855cfdc3dedfa16aa8007e)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset codiesp_dataset (/home/galtay/.cache/huggingface/datasets/codiesp_dataset/codiesp_P_bigbio_text/1.0.0/1c3821f6ed08e10dc0260d174d27b7fcd54192c17e855cfdc3dedfa16aa8007e)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset codiesp_dataset (/home/galtay/.cache/huggingface/datasets/codiesp_dataset/codiesp_X_bigbio_kb/1.0.0/1c3821f6ed08e10dc0260d174d27b7fcd54192c17e855cfdc3dedfa16aa8007e)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset codiesp_dataset (/home/galtay/.cache/huggingface/datasets/codiesp_dataset/codiesp_extra_mesh_bigbio_text/1.0.0/1c3821f6ed08e10dc0260d174d27b7fcd54192c17e855cfdc3dedfa16aa8007e)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset codiesp_dataset (/home/galtay/.cache/huggingface/datasets/codiesp_dataset/codiesp_extra_cie_bigbio_text/1.0.0/1c3821f6ed08e10dc0260d174d27b7fcd54192c17e855cfdc3dedfa16aa8007e)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset cord_ner_dataset (/home/galtay/.cache/huggingface/datasets/cord_ner_dataset/cord_ner_bigbio_kb/1.0.0/f98a368b63f7f0c6c3ec0fab310162c1a022b7771967e8238d39fb87706f9d60)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset ctebm_sp_dataset (/home/galtay/.cache/huggingface/datasets/ctebm_sp_dataset/ctebmsp_abstracts_bigbio_kb/1.0.0/82370182768ade343e04528716eca142a847754d645c683a727f9445aac75bf4)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset ctebm_sp_dataset (/home/galtay/.cache/huggingface/datasets/ctebm_sp_dataset/ctebmsp_eudract_bigbio_kb/1.0.0/82370182768ade343e04528716eca142a847754d645c683a727f9445aac75bf4)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset ddi_corpus_dataset (/home/galtay/.cache/huggingface/datasets/ddi_corpus_dataset/ddi_corpus_bigbio_kb/1.0.0/ce2fe8ef8befc09a8e90da92d4abf2c8607ea036ebc06d7f1954a1b34106d151)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset diann_iber_eval_dataset (/home/galtay/.cache/huggingface/datasets/diann_iber_eval_dataset/diann_iber_eval_bigbio_kb/1.0.0/070bbaba168cb76f41952674aa9e94557c22cbbe7e514d7d692fbff2de297e04)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset diann_iber_eval_dataset (/home/galtay/.cache/huggingface/datasets/diann_iber_eval_dataset/diann_iber_eval_en_bigbio_kb/1.0.0/070bbaba168cb76f41952674aa9e94557c22cbbe7e514d7d692fbff2de297e04)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset diann_iber_eval_dataset (/home/galtay/.cache/huggingface/datasets/diann_iber_eval_dataset/diann_iber_eval_es_bigbio_kb/1.0.0/070bbaba168cb76f41952674aa9e94557c22cbbe7e514d7d692fbff2de297e04)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset diann_iber_eval_dataset (/home/galtay/.cache/huggingface/datasets/diann_iber_eval_dataset/diann_iber_eval_bigbio_t2t/1.0.0/070bbaba168cb76f41952674aa9e94557c22cbbe7e514d7d692fbff2de297e04)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset distemist_dataset (/home/galtay/.cache/huggingface/datasets/distemist_dataset/distemist_bigbio_kb/1.0.0/55f95e5183ef1f29f4078bc60cddab039991dc38e8af6dd66c0f4a76a68aba97)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset ebm_pico (/home/galtay/.cache/huggingface/datasets/ebm_pico/ebm_pico_bigbio_kb/1.0.0/224085a540def3600016d03f4f475c35e77bd5987be262503052e7728771ee9a)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset ehr_rel_dataset (/home/galtay/.cache/huggingface/datasets/ehr_rel_dataset/ehr_rel_bigbio_pairs/1.0.0/3786552d5033645f58d695a6238e3fbaf2fe235b5de663b5ab1fb3a89a82f409)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset euadr (/home/galtay/.cache/huggingface/datasets/euadr/euadr_bigbio_kb/1.0.0/a03e12e240d0884b555f721c5f88e554c06dfd5b74d12c569fa7c96586c721c6)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset evidence_inference_dataset (/home/galtay/.cache/huggingface/datasets/evidence_inference_dataset/evidence-inference_bigbio_te/1.0.0/a22bca7bd8fcdd9236ef714f266bf2316e1c0fad8f4648eb52305adc3726cf95)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold0_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold1_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold2_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold3_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold4_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold5_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold6_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold7_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold8_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_fold9_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gad (/home/galtay/.cache/huggingface/datasets/gad/gad_blurb_bigbio_text/1.0.0/60f2fe4eb74428a8ae13f757ea2931f42afa7ecf18f2393109e2e052efa5500c)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset genetag_dataset (/home/galtay/.cache/huggingface/datasets/genetag_dataset/genetaggold_bigbio_kb/1.0.0/67951603bb8cc6dc366b0ca502894ee41d288a56d4f6de70471ee81b6de32039)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset genetag_dataset (/home/galtay/.cache/huggingface/datasets/genetag_dataset/genetagcorrect_bigbio_kb/1.0.0/67951603bb8cc6dc366b0ca502894ee41d288a56d4f6de70471ee81b6de32039)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset genia_ptm_event_corpus_dataset (/home/galtay/.cache/huggingface/datasets/genia_ptm_event_corpus_dataset/genia_ptm_event_corpus_bigbio_kb/1.0.0/5c8be67cd6b83c548675129d163a42646420385a576969c86c1cab230dec989c)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset genia_relation_corpus_dataset (/home/galtay/.cache/huggingface/datasets/genia_relation_corpus_dataset/genia_relation_corpus_bigbio_kb/1.0.0/7fc1da5c37c5596b284b011d077fb7dcf7c99cd2f5b34c3907cfde694b39e08b)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset genia_term_corpus_dataset (/home/galtay/.cache/huggingface/datasets/genia_term_corpus_dataset/genia_term_corpus_bigbio_kb/1.0.0/00c9b0d6cfd7903ecd38e6612748dbc4c6249d0595b977f555a220cc5031f2e7)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset geokhojv1_dataset (/home/galtay/.cache/huggingface/datasets/geokhojv1_dataset/geokhoj_v1_bigbio_text/1.0.0/231efbd35f9f5564157e0197a4599e28a0563c82334100e4968b89fafa959e11)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset gnormplus_dataset (/home/galtay/.cache/huggingface/datasets/gnormplus_dataset/gnormplus_bigbio_kb/1.0.0/1bb16f1b4abf9394b9180cac70edc575cc5e7d32c697f8b9f69ba2f643d2fc95)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset hallmarks_of_cancer_dataset (/home/galtay/.cache/huggingface/datasets/hallmarks_of_cancer_dataset/hallmarks_of_cancer_bigbio_text/1.0.0/8b34c44f46003945f2ac4b0745ba705da96ae9bf9f5f626d59bce0fa92f52ba2)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset hprd50_dataset (/home/galtay/.cache/huggingface/datasets/hprd50_dataset/hprd50_bigbio_kb/1.0.0/8c03f0f6fc736b3842ff473a2d0cc4ef85dd178e76884c78721b29e7e2db6300)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset iepa_dataset (/home/galtay/.cache/huggingface/datasets/iepa_dataset/iepa_bigbio_kb/1.0.0/200b819f9b791ae233cc5ae2a58d039be47ac8755845c481ddf67d5283b20ebd)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset jnlpba_dataset (/home/galtay/.cache/huggingface/datasets/jnlpba_dataset/jnlpba_bigbio_kb/1.0.0/53912b11c844e899a8940bb77ebd81d3a58748632a0b785f24d4aa5b829fa0a0)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset linnaeus_dataset (/home/galtay/.cache/huggingface/datasets/linnaeus_dataset/linnaeus_bigbio_kb/1.0.0/f91b6061c1ec68a3b18eb1dd8af37953f56b217ed4c654072aaccdc630aeb0c5)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset linnaeus_dataset (/home/galtay/.cache/huggingface/datasets/linnaeus_dataset/linnaeus_filtered_bigbio_kb/1.0.0/f91b6061c1ec68a3b18eb1dd8af37953f56b217ed4c654072aaccdc630aeb0c5)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset lll_dataset (/home/galtay/.cache/huggingface/datasets/lll_dataset/lll_bigbio_kb/1.0.0/a4d0660d0383a5441090dc5d57ee4a3c7cacde80e14b4dff5f3b9d00ce4760eb)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_es_emea_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_es_medline_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_fr_emea_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_fr_medline_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_fr_patents_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_de_emea_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_de_medline_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_de_patents_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_nl_emea_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_nl_medline_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_en_emea_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_en_medline_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mantra_gsc_dataset (/home/galtay/.cache/huggingface/datasets/mantra_gsc_dataset/mantra_gsc_en_patents_bigbio_kb/1.0.0/7e442f3531d6990d7ebcb6382ff75257856f0bcc3e8941f3dc3b474d6ea7ce79)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mayosrs_dataset (/home/galtay/.cache/huggingface/datasets/mayosrs_dataset/mayosrs_bigbio_pairs/1.0.0/2ae3973a6d9a4288005affdfff508a8e01fd90b40cb858a8389871ed2f0375b1)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset med_qa_dataset (/home/galtay/.cache/huggingface/datasets/med_qa_dataset/med_qa_en_bigbio_qa/1.0.0/8d47fd5e34d1f7b1b04bfe89c9d7e10ca47020b1b847f362e685da00438b4b49)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_qa_dataset (/home/galtay/.cache/huggingface/datasets/med_qa_dataset/med_qa_zh_bigbio_qa/1.0.0/8d47fd5e34d1f7b1b04bfe89c9d7e10ca47020b1b847f362e685da00438b4b49)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_qa_dataset (/home/galtay/.cache/huggingface/datasets/med_qa_dataset/med_qa_tw_bigbio_qa/1.0.0/8d47fd5e34d1f7b1b04bfe89c9d7e10ca47020b1b847f362e685da00438b4b49)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_qa_dataset (/home/galtay/.cache/huggingface/datasets/med_qa_dataset/med_qa_tw_en_bigbio_qa/1.0.0/8d47fd5e34d1f7b1b04bfe89c9d7e10ca47020b1b847f362e685da00438b4b49)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_qa_dataset (/home/galtay/.cache/huggingface/datasets/med_qa_dataset/med_qa_tw_zh_bigbio_qa/1.0.0/8d47fd5e34d1f7b1b04bfe89c9d7e10ca47020b1b847f362e685da00438b4b49)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_dialog (/home/galtay/.cache/huggingface/datasets/med_dialog/meddialog_en_bigbio_text/1.0.0/baeea92e10915a8c6393ed8d5d355209142807a2ff1e9fa92fd5aeee65f6c502)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset meddocan_dataset (/home/galtay/.cache/huggingface/datasets/meddocan_dataset/meddocan_bigbio_kb/1.0.0/964604ff5d3ab34c05b0b205c0890f58c759c23b0dccf89db6e32c9aaa57e8f6)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_hop_dataset (/home/galtay/.cache/huggingface/datasets/med_hop_dataset/medhop_bigbio_qa/1.0.0/802061253658b81d588ed3c5c5b3b78d3a5f48e2b3a9298ba7cfacb021db47a5)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset mediqa_qa_dataset (/home/galtay/.cache/huggingface/datasets/mediqa_qa_dataset/mediqa_qa_bigbio_qa/1.0.0/2aca6bfd63148f3b3e18e50db7b23e918326310e52ae2e7a9dce256ca5bb225d)


  0%|          | 0/4 [00:00<?, ?it/s]

Reusing dataset mediqa_rqe_dataset (/home/galtay/.cache/huggingface/datasets/mediqa_rqe_dataset/mediqa_rqe_bigbio_te/1.0.0/26f3791910196b2018ed51926e87310584efd2d6bc1d1b6c4ae80297d4f0c30f)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_mentions_dataset (/home/galtay/.cache/huggingface/datasets/med_mentions_dataset/medmentions_full_bigbio_kb/1.0.0/b5c8691186d4701f9b18eddbe36d178ccf7e55761dcc6140c57f4410754511ac)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset med_mentions_dataset (/home/galtay/.cache/huggingface/datasets/med_mentions_dataset/medmentions_st21pv_bigbio_kb/1.0.0/b5c8691186d4701f9b18eddbe36d178ccf7e55761dcc6140c57f4410754511ac)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset me_q_sum_dataset (/home/galtay/.cache/huggingface/datasets/me_q_sum_dataset/meqsum_bigbio_t2t/1.0.0/0cd33941c5241e70b16d6db94299ba2572b21847cace2b768aeb551a3886c869)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset minimayosrs_dataset (/home/galtay/.cache/huggingface/datasets/minimayosrs_dataset/minimayosrs_bigbio_pairs/1.0.0/564b19cfba7fb458733e551a96c2bb02f90d57f44228a3a9001d87ad9eaa31cb)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset mi_rna_dataset (/home/galtay/.cache/huggingface/datasets/mi_rna_dataset/mirna_bigbio_kb/1.0.0/1235ed5dcf1fad04baee19c6254866b2daaf8358c11d6477834e5e12c98b4657)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset mlee (/home/galtay/.cache/huggingface/datasets/mlee/mlee_bigbio_kb/1.0.0/d42fcd6d0f5b31a49bc3c1cdb9302a7fcadcda06523c560367d5360189d0be48)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset mqp_dataset (/home/galtay/.cache/huggingface/datasets/mqp_dataset/mqp_bigbio_pairs/1.0.0/17893bd8246c1f4d7108e693a1b9f37ce5d64f78b952b356003d4c58558be93c)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset much_more_dataset (/home/galtay/.cache/huggingface/datasets/much_more_dataset/muchmore_bigbio_kb/1.0.0/a11b927d0f9b3e5a0730f1a49adbdba5b244c77d9d190e5b1ce0b60c32e0ea6f)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset much_more_dataset (/home/galtay/.cache/huggingface/datasets/much_more_dataset/muchmore_en_bigbio_kb/1.0.0/a11b927d0f9b3e5a0730f1a49adbdba5b244c77d9d190e5b1ce0b60c32e0ea6f)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset much_more_dataset (/home/galtay/.cache/huggingface/datasets/much_more_dataset/muchmore_de_bigbio_kb/1.0.0/a11b927d0f9b3e5a0730f1a49adbdba5b244c77d9d190e5b1ce0b60c32e0ea6f)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset much_more_dataset (/home/galtay/.cache/huggingface/datasets/much_more_dataset/muchmore_bigbio_t2t/1.0.0/a11b927d0f9b3e5a0730f1a49adbdba5b244c77d9d190e5b1ce0b60c32e0ea6f)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset multi_x_science (/home/galtay/.cache/huggingface/datasets/multi_x_science/multi_xscience_bigbio_t2t/1.0.0/5372d2f04ab0c807f7cc339c10b1a9941d5243df8ae09f015222c2f4a5efcde8)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset mutation_finder_dataset (/home/galtay/.cache/huggingface/datasets/mutation_finder_dataset/mutation_finder_bigbio_kb/1.0.0/a11a87d71b8a82109163d6abc4fbc7b46a2c89663b621eec89571e19823d9451)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset ncbi_disease_dataset (/home/galtay/.cache/huggingface/datasets/ncbi_disease_dataset/ncbi_disease_bigbio_kb/1.0.0/10a393201e55b403e5d107701b719368f54f1bf3d3438a1233f99be0badeb034)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset nlm_gene_dataset (/home/galtay/.cache/huggingface/datasets/nlm_gene_dataset/nlm_gene_bigbio_kb/1.0.0/4291402c7589961b34294745522bffe7e289d8f6c11d99a66d93362c0d30187d)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset nlm_chem_dataset (/home/galtay/.cache/huggingface/datasets/nlm_chem_dataset/nlmchem_bigbio_kb/1.0.0/d91131823c66b7dd1162027991ea47c342e478209b37cf261c5f122d30409594)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset nlm_chem_dataset (/home/galtay/.cache/huggingface/datasets/nlm_chem_dataset/nlmchem_bigbio_text/1.0.0/d91131823c66b7dd1162027991ea47c342e478209b37cf261c5f122d30409594)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset osiris (/home/galtay/.cache/huggingface/datasets/osiris/osiris_bigbio_kb/1.0.0/5aae14241c67fbe761843bedc3ab1123302773163574bf23bdd1c81fbb145485)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset paramed_dataset (/home/galtay/.cache/huggingface/datasets/paramed_dataset/paramed_bigbio_t2t/1.0.0/c7c545f9e448eedad7e528cc495ae31a47228ccfd052c74207040943b58cfab7)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pdr_dataset (/home/galtay/.cache/huggingface/datasets/pdr_dataset/pdr_bigbio_kb/1.0.0/9ec253d2bc6fcb4e915732a73be847d608383dca336ad90464ffbeab22f4f935)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset pharmaconer_dataset (/home/galtay/.cache/huggingface/datasets/pharmaconer_dataset/pharmaconer_bigbio_kb/1.0.0/0bfb0ab001bb8e93f01292bccfc411ea2ddfc50622975f82bf33353959f6c1e5)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pharmaconer_dataset (/home/galtay/.cache/huggingface/datasets/pharmaconer_dataset/pharmaconer_bigbio_text/1.0.0/0bfb0ab001bb8e93f01292bccfc411ea2ddfc50622975f82bf33353959f6c1e5)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pho_ner_dataset (/home/galtay/.cache/huggingface/datasets/pho_ner_dataset/pho_ner_bigbio_kb/1.0.0/b63c03a21ab8fd3aea54418ad680f3e1479d90e3f74faee93603c75aeb440f09)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pho_ner_dataset (/home/galtay/.cache/huggingface/datasets/pho_ner_dataset/pho_ner_syllable_bigbio_kb/1.0.0/b63c03a21ab8fd3aea54418ad680f3e1479d90e3f74faee93603c75aeb440f09)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pico_extraction_dataset (/home/galtay/.cache/huggingface/datasets/pico_extraction_dataset/pico_extraction_bigbio_kb/1.0.0/e44a7c77a2a3cd9b6f4a2ed8e0af76d74ba9259106a26e2ae98b15cbfd1d16b6)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset pmc_patients_dataset (/home/galtay/.cache/huggingface/datasets/pmc_patients_dataset/pmc_patients_bigbio_pairs/1.0.0/8b53a07c8b6b88d49fcf0909dfe46a61e9b778ba681151eb9b4ed53f97f6fe90)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset progene_dataset (/home/galtay/.cache/huggingface/datasets/progene_dataset/progene_bigbio_kb/1.0.0/f47dd117a24026c11f24285a8dc6921f68746d3379affca5f0022512c1cad0e0)


  0%|          | 0/30 [00:00<?, ?it/s]

Reusing dataset pubhealth_dataset (/home/galtay/.cache/huggingface/datasets/pubhealth_dataset/pubhealth_bigbio_pairs/1.0.0/4abda944fddf7b2b825a65adcc815fd98dc3008623301268fa69795f6b2a2b9b)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_artificial_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_unlabeled_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold0_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold1_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold2_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold3_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold4_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold5_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold6_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold7_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold8_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubmed_qa_dataset (/home/galtay/.cache/huggingface/datasets/pubmed_qa_dataset/pubmed_qa_labeled_fold9_bigbio_qa/1.0.0/43353ba5c6e691785b41bd24638e416d6f120111a0f2fd33d250ffe337c415d0)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset pubtator_central_dataset (/home/galtay/.cache/huggingface/datasets/pubtator_central_dataset/pubtator_central_sample_bigbio_kb/1.0.0/1942b2ed6ca53071c1dbcb604ed9b0a1e26c2a0b84839c6d7bbed1b82776b2c0)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset quaero (/home/galtay/.cache/huggingface/datasets/quaero/quaero_emea_bigbio_kb/1.0.0/774678ab312901ed68bd16ef32004f241b028aeab7713109b788cc031774fd9c)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset quaero (/home/galtay/.cache/huggingface/datasets/quaero/quaero_medline_bigbio_kb/1.0.0/774678ab312901ed68bd16ef32004f241b028aeab7713109b788cc031774fd9c)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset scai_chemical_dataset (/home/galtay/.cache/huggingface/datasets/scai_chemical_dataset/scai_chemical_bigbio_kb/1.0.0/ad206c78b1592b11002e702f356e76b053510aa5a7c99884636443bcc500fc00)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset scai_disease_dataset (/home/galtay/.cache/huggingface/datasets/scai_disease_dataset/scai_disease_bigbio_kb/1.0.0/2f31b34d3ebd26b72d0f8cc07c83d4730ed6a24dc117589975e22fb742180d1c)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset scicite_dataset (/home/galtay/.cache/huggingface/datasets/scicite_dataset/scicite_bigbio_text/1.0.0/68442c96e497d1b72c0437590d489ddd30adc5e325feee9057fc43f2d65a20d7)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset scielo_dataset (/home/galtay/.cache/huggingface/datasets/scielo_dataset/scielo_en_es_bigbio_t2t/1.0.0/ee3c81633c3b223c9bab99bb498dafa8d4f00aeb78313a123bcdf597ce4269a8)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset scielo_dataset (/home/galtay/.cache/huggingface/datasets/scielo_dataset/scielo_en_pt_bigbio_t2t/1.0.0/ee3c81633c3b223c9bab99bb498dafa8d4f00aeb78313a123bcdf597ce4269a8)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset sci_fact (/home/galtay/.cache/huggingface/datasets/sci_fact/scifact_rationale_bigbio_te/1.0.0/32a72fe1020e258ce659a565479106d3fe85e80bce650f0b292bd32f6d692e8c)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset sci_fact (/home/galtay/.cache/huggingface/datasets/sci_fact/scifact_labelprediction_bigbio_te/1.0.0/32a72fe1020e258ce659a565479106d3fe85e80bce650f0b292bd32f6d692e8c)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset sci_q (/home/galtay/.cache/huggingface/datasets/sci_q/sciq_bigbio_qa/1.0.0/9842ca282ee6b4beb0130a8bf5b2fd45d42e0d691834ab9009a1f7c7e92a624c)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset sci_tail_dataset (/home/galtay/.cache/huggingface/datasets/sci_tail_dataset/scitail_bigbio_te/1.0.0/ecf42397d8ebbc639750e1c62dbc6a945dc3111418e93b26174831ed786b2a42)


  0%|          | 0/3 [00:00<?, ?it/s]

Reusing dataset seth_corpus_dataset (/home/galtay/.cache/huggingface/datasets/seth_corpus_dataset/seth_corpus_bigbio_kb/1.0.0/8408379b021d9337b98455928ed2fc0be00ff3a1a3127a788623674d002f1415)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset spl_adr200_db_dataset (/home/galtay/.cache/huggingface/datasets/spl_adr200_db_dataset/spl_adr_200db_train_bigbio_kb/1.0.0/4164d73dc5914f20d6b43071459bd5749c121f911a676af4313a8f50586019c1)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset spl_adr200_db_dataset (/home/galtay/.cache/huggingface/datasets/spl_adr200_db_dataset/spl_adr_200db_unannotated_bigbio_kb/1.0.0/4164d73dc5914f20d6b43071459bd5749c121f911a676af4313a8f50586019c1)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset swedish_medical_ner_dataset (/home/galtay/.cache/huggingface/datasets/swedish_medical_ner_dataset/swedish_medical_ner_wiki_bigbio_kb/1.0.0/168be35daa935974181f57f545421e14dd43de7d371e354f0a2bc27c1a498106)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset swedish_medical_ner_dataset (/home/galtay/.cache/huggingface/datasets/swedish_medical_ner_dataset/swedish_medical_ner_lt_bigbio_kb/1.0.0/168be35daa935974181f57f545421e14dd43de7d371e354f0a2bc27c1a498106)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset swedish_medical_ner_dataset (/home/galtay/.cache/huggingface/datasets/swedish_medical_ner_dataset/swedish_medical_ner_1177_bigbio_kb/1.0.0/168be35daa935974181f57f545421e14dd43de7d371e354f0a2bc27c1a498106)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset thomas2011_dataset (/home/galtay/.cache/huggingface/datasets/thomas2011_dataset/thomas2011_bigbio_kb/1.0.0/5e9c1ec6acf01c45154fa2b498e340f5742516b0265308ce89955fc05f331beb)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset tmvar_v1_dataset (/home/galtay/.cache/huggingface/datasets/tmvar_v1_dataset/tmvar_v1_bigbio_kb/1.0.0/860549c471eb256552ae9ded879cbf1079cde78a799a13b8af59e0d0d0cfff90)


  0%|          | 0/2 [00:00<?, ?it/s]

Reusing dataset tmvar_v2_dataset (/home/galtay/.cache/huggingface/datasets/tmvar_v2_dataset/tmvar_v2_bigbio_kb/1.0.0/9dc11c7a56cbef69cd37f9444821ee1e7a63fb041ae696d54872fa4e5f19e215)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset tmvar_v3_dataset (/home/galtay/.cache/huggingface/datasets/tmvar_v3_dataset/tmvar_v3_bigbio_kb/1.0.0/0a53a482bab48f4c7f34298ee9fb39ccecf048bee3b0116b76df630b0f075532)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset tw_adrl (/home/galtay/.cache/huggingface/datasets/tw_adrl/twadrl_bigbio_kb/1.0.0/50f1c12d9fcc52da1820d0083e6c1f32a47b8639b44f8b37abcde00740b16364)


  0%|          | 0/30 [00:00<?, ?it/s]

Reusing dataset umnsrs_dataset (/home/galtay/.cache/huggingface/datasets/umnsrs_dataset/umnsrs_similarity_mod_bigbio_pairs/1.0.0/8c7002a472ad988e442babd6c70bd30d98eede363282d56922ace6db655cbf3a)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset umnsrs_dataset (/home/galtay/.cache/huggingface/datasets/umnsrs_dataset/umnsrs_similarity_bigbio_pairs/1.0.0/8c7002a472ad988e442babd6c70bd30d98eede363282d56922ace6db655cbf3a)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset umnsrs_dataset (/home/galtay/.cache/huggingface/datasets/umnsrs_dataset/umnsrs_relatedness_mod_bigbio_pairs/1.0.0/8c7002a472ad988e442babd6c70bd30d98eede363282d56922ace6db655cbf3a)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset umnsrs_dataset (/home/galtay/.cache/huggingface/datasets/umnsrs_dataset/umnsrs_relatedness_bigbio_pairs/1.0.0/8c7002a472ad988e442babd6c70bd30d98eede363282d56922ace6db655cbf3a)


  0%|          | 0/1 [00:00<?, ?it/s]

Reusing dataset verspoor2013_dataset (/home/galtay/.cache/huggingface/datasets/verspoor2013_dataset/verspoor_2013_bigbio_kb/1.0.0/7494b1d9332951566d7e01281b0e7ced1634721918510ce9fb40795490119ee0)


  0%|          | 0/1 [00:00<?, ?it/s]

# Dataset Metadata

Each BigBioConfigHelper provides a get_metadata method that will calculate schema specific metadata for configs implementing a BigBIO schema. For example,

In [13]:
conhelps.for_config_name('bc5cdr_bigbio_kb').get_metadata()

Reusing dataset bc5cdr_dataset (/home/galtay/.cache/huggingface/datasets/bc5cdr_dataset/bc5cdr_bigbio_kb/1.0.0/f01f16ea9b65ead985bedadf7335195c32297c8f1b09417fc607b102a6757d6f)


  0%|          | 0/3 [00:00<?, ?it/s]

{'train': BigBioKbMetadata(samples_count=500, passages_count=1000, passages_char_count=652177, passages_type_counter={'title': 500, 'abstract': 500}, entities_count=9570, entities_normalized_count=9599, entities_type_counter={'Chemical': 5207, 'Disease': 4363}, entities_db_name_counter={'MESH': 9599}, entities_unique_db_ids_count=1328, events_count=0, events_type_counter={}, events_arguments_count=0, events_arguments_role_counter={}, coreferences_count=0, relations_count=15072, relations_type_counter={'CID': 15072}, relations_db_name_counter={}, relations_unique_db_ids_count=0),
 'test': BigBioKbMetadata(samples_count=500, passages_count=1000, passages_char_count=676751, passages_type_counter={'title': 500, 'abstract': 500}, entities_count=9928, entities_normalized_count=9919, entities_type_counter={'Chemical': 5394, 'Disease': 4534}, entities_db_name_counter={'MESH': 9919}, entities_unique_db_ids_count=1315, events_count=0, events_type_counter={}, events_arguments_count=0, events_argu