`hffs`

hffs builds on huggingface_hub and fsspec to provide a convenient Python filesystem interface to 🤗 Hub.

Examples

Locate and read a file from a 🤗 Hub repo:

>>> import hffs
>>> fs = hffs.HfFileSystem("my-username/my-dataset-repo", repo_type="dataset")
>>> fs.ls("")
['.gitattributes', 'my-file.txt']
>>> with fs.open("my-file.txt", "r") as f:
...     f.read()
'Hello, world'

Write a file to the repo:

>>> with fs.open("my-file-new.txt", "w") as f:
...     f.write("Hello, world1")
...     f.write("Hello, world2")
>>> fs.exists("my-file-new.txt")
True
>>> fs.du("my-file-new.txt")
26

Instantiation via fsspec:

>>> import fsspec

>>> # Instantiate a `hffs.HfFileSystem` object
>>> fs = fsspec.filesystem("hf://my-username/my-model-repo", repo_type="model")
>>> fs.ls("")
['.gitattributes', 'config.json', 'pytorch_model.bin']

>>> # Instantiate a `hffs.HfFileSystem` object and write a file to it
>>> with fsspec.open("hf://my-username/my-dataset-repo:/my-file-new.txt", repo_type="dataset"):
...     f.write("Hello, world1")
...     f.write("Hello, world2")

Note: To be recognized as a hffs URL, the URL path passed to fsspec.open must adhere to the following scheme:
hf://<repo_id>[@<revision>]:/<path/in/repo>

Installation

pip install hffs

Integrations

pandas/dask

>>> import pandas as pd

>>> # Read a remote CSV file into a dataframe
>>> df = pd.read_csv("hf://my-username/my-dataset-repo:/train.csv", storage_options={"repo_type": "dataset"})

>>> # Write a dataframe to a remote CSV file
>>> df.to_csv("hf://my-username/my-dataset-repo:/test.csv", storage_options={"repo_type": "dataset"})

datasets

>>> import datasets

>>> # Export a (large) dataset to a repo
>>> cache_dir = "hf://my-username/my-dataset-repo"
>>> builder = datasets.load_dataset_builder("path/to/local/loading_script/loading_script.py", cache_dir=cache_dir, storage_options={"repo_type": "dataset"})
>>> builder.download_and_prepare(file_format="parquet")

>>> # Stream the dataset from the repo
>>> dset = datasets.load_dataset("my-username/my-dataset-repo", split="train")
>>> # Process the examples
>>> for ex in dset:
...    ...

zarr

>>> import numpy as np
>>> import zarr

>>> embeddings = np.random.randn(50000, 1000).astype("float32")

>>> # Write an array to a repo acting as a remote zarr store
>>> with zarr.open_group("hf://my-username/my-model-repo:/array-store", mode="w", storage_options={"repo_type": "model"}) as root:
...    foo = root.create_group("embeddings")
...    foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4')
...    foobar[:] = embeddings

>>> # Read from a remote zarr store
>>> with zarr.open_group("hf://my-username/my-model-repo:/array-store", mode="r", storage_options={"repo_type": "model"}) as root:
...    first_row = root["embeddings/experiment_0"][0]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
src/hffs		src/hffs
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`hffs`

Examples

Installation

Integrations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hffs

Examples

Installation

Integrations

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`hffs`

Packages