## Hugging Face Hub API Examples
For more information, see the [Hugging Face Hub API documentation](https://huggingface.co/docs/hub/api).

## Basic Information of Dataset and Their MetaData

In [3]:
import huggingface_hub

In [6]:
datasets = huggingface_hub.list_datasets()
len(datasets)

39360

In [8]:
dataset_example = datasets[0]
dataset_example

DatasetInfo: { 
  {'_id': '621ffdd236468d709f181d58',
   'author': None,
   'cardData': None,
   'citation': '@inproceedings{veyseh-et-al-2020-what,\n'
               '   title={{What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and '
               'Disambiguation}},\n'
               '   author={Amir Pouran Ben Veyseh and Franck Dernoncourt and Quan Hung Tran and Thien Huu Nguyen},\n'
               '   year={2020},\n'
               '   booktitle={Proceedings of COLING},\n'
               '   link={https://arxiv.org/pdf/2010.14678v1.pdf}\n'
               '}',
   'description': 'Acronym identification training and development sets for the acronym identification task at '
                  'SDU@AAAI-21.',
   'disabled': False,
   'downloads': 7576,
   'gated': False,
   'id': 'acronym_identification',
   'lastModified': '2023-01-25T14:18:28.000Z',
   'likes': 14,
   'paperswithcode_id': 'acronym-identification',
   'private': False,
   'sha': 'c3c245a1

In [9]:
# get the metadata of a dataset (e.g., downloads)
dataset_example.downloads

7576

## Get Access to the Dataset Repo and Downloads It

In [10]:
import git
username = '' # specify your huggingface username
password = '' # specify your huggingface password

In [11]:
dataset_name = dataset_example.id
dataset_name

'acronym_identification'

In [19]:
name = dataset_name.replace('/', "'")
file_path = f'../dataset_repo/{name}' # specify your storage path

In [17]:
git_clone = git.Repo.clone_from(url=f'https://{username}:{password}@huggingface.co/datasets/{dataset_name}', to_path=file_path)

## Get the Dataset Card (README.md)

In [20]:
from git.repo import Repo
import os
repo = Repo(file_path)
if 'README.md' not in repo.git.ls_files():
    print('No README.md file found in the dataset repository.')
else:
    readme_path = os.path.join(file_path, 'README.md')
    content = open(readme_path, 'r', encoding='utf-8').read()
    print('Dataset Card:')
    print(content)

Dataset Card:
---
annotations_creators:
- expert-generated
language_creators:
- found
language:
- en
license:
- mit
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- token-classification
task_ids: []
paperswithcode_id: acronym-identification
pretty_name: Acronym Identification Dataset
tags:
- acronym-identification
dataset_info:
  features:
  - name: id
    dtype: string
  - name: tokens
    sequence: string
  - name: labels
    sequence:
      class_label:
        names:
          '0': B-long
          '1': B-short
          '2': I-long
          '3': I-short
          '4': O
  splits:
  - name: train
    num_bytes: 7792803
    num_examples: 14006
  - name: validation
    num_bytes: 952705
    num_examples: 1717
  - name: test
    num_bytes: 987728
    num_examples: 1750
  download_size: 8556464
  dataset_size: 9733236
train-eval-index:
- config: default
  task: token-classification
  task_id: entity_extraction
  splits:
    ev