Skip to content

v0.1.0 - Initial Release

Choose a tag to compare

@SomeB1oody SomeB1oody released this 11 Apr 22:44
· 29 commits to master since this release

dataset-core v0.1.0

A generic, thread-safe dataset container with lazy loading and caching for Rust.

Note: This is an initial release. The API is not yet stable and may change in future versions.

Highlights

  • Zero-dependency coreDataset<T> pairs a storage directory with lazily-initialized data of any type. The first call to load() runs your closure and caches the result via OnceLock; every subsequent call returns &T with zero overhead, even across threads.

  • Feature-gated modules — opt in to only what you need:

    Feature What it adds Extra deps
    (none) Dataset<T> none
    utils download_to, unzip, create_temp_dir, file_sha256_matches, acquire_dataset, and the error module ureq, zip, tempfile, sha2
    datasets 6 built-in ML dataset loaders (implies utils) ndarray, csv

Built-in Datasets

Six classic machine learning datasets, ready to use with a consistent API (newfeatures() / labels() / targets() / data()):

Dataset Samples Features Task
Iris 150 4 Classification
Boston Housing 506 13 Regression
Diabetes (Pima) 768 8 Classification
Titanic 891 11 (mixed) Classification
Wine Quality (Red) 1,599 11 Regression
Wine Quality (White) 4,898 11 Regression

All datasets are automatically downloaded, cached locally, and validated with SHA-256 checksums.

Utility Functions (utils feature)

  • download_to — download a remote file into a directory
  • unzip — extract a ZIP archive
  • create_temp_dir — create a self-cleaning temporary directory
  • file_sha256_matches — verify a file's SHA-256 hash
  • acquire_dataset — cache-aware dataset acquisition workflow (temp dir → prepare → optional hash check → move to final location)

Requirements

  • Rust edition 2024, MSRV 1.88.0
  • License: MIT

Quick Start

use dataset_core::Dataset;

let ds = Dataset::<String>::new("./cache");
let data = ds.load(|dir| Ok(std::fs::read_to_string(format!("{dir}/my_file.txt"))?))?;
println!("{data}");

With built-in datasets:

use dataset_core::datasets::Iris;

let iris = Iris::new("./data");
let (features, labels) = iris.data()?;
println!("shape: {:?}, first label: {}", features.shape(), labels[0]);