v0.1.0 - Initial Release
dataset-core v0.1.0
A generic, thread-safe dataset container with lazy loading and caching for Rust.
Note: This is an initial release. The API is not yet stable and may change in future versions.
Highlights
-
Zero-dependency core —
Dataset<T>pairs a storage directory with lazily-initialized data of any type. The first call toload()runs your closure and caches the result viaOnceLock; every subsequent call returns&Twith zero overhead, even across threads. -
Feature-gated modules — opt in to only what you need:
Feature What it adds Extra deps (none) Dataset<T>none utilsdownload_to,unzip,create_temp_dir,file_sha256_matches,acquire_dataset, and theerrormoduleureq, zip, tempfile, sha2 datasets6 built-in ML dataset loaders (implies utils)ndarray, csv
Built-in Datasets
Six classic machine learning datasets, ready to use with a consistent API (new → features() / labels() / targets() / data()):
| Dataset | Samples | Features | Task |
|---|---|---|---|
| Iris | 150 | 4 | Classification |
| Boston Housing | 506 | 13 | Regression |
| Diabetes (Pima) | 768 | 8 | Classification |
| Titanic | 891 | 11 (mixed) | Classification |
| Wine Quality (Red) | 1,599 | 11 | Regression |
| Wine Quality (White) | 4,898 | 11 | Regression |
All datasets are automatically downloaded, cached locally, and validated with SHA-256 checksums.
Utility Functions (utils feature)
download_to— download a remote file into a directoryunzip— extract a ZIP archivecreate_temp_dir— create a self-cleaning temporary directoryfile_sha256_matches— verify a file's SHA-256 hashacquire_dataset— cache-aware dataset acquisition workflow (temp dir → prepare → optional hash check → move to final location)
Requirements
- Rust edition 2024, MSRV 1.88.0
- License: MIT
Quick Start
use dataset_core::Dataset;
let ds = Dataset::<String>::new("./cache");
let data = ds.load(|dir| Ok(std::fs::read_to_string(format!("{dir}/my_file.txt"))?))?;
println!("{data}");With built-in datasets:
use dataset_core::datasets::Iris;
let iris = Iris::new("./data");
let (features, labels) = iris.data()?;
println!("shape: {:?}, first label: {}", features.shape(), labels[0]);