Skip to content

Qumulo/qfetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qumulo Fetch

Standalone CLI tool that warms Qumulo cluster caches by recursively walking a directory tree and calling the fetch-data REST API for each file. No file data is transferred to the client; the cluster reads data directly into server-side caches.

Requirements

  • Python 3.10+. No dependencies beyond the standard library.
  • Qumulo Core 7.8.4+ for the fetch-data API.

Install

python3 -m pip install --user --break-system-packages git+https://github.com/Qumulo/qfetch.git

--break-system-packages is safe here. qfetch has no dependencies, so there is nothing to conflict with.

Usage

qfetch --host <host> --token <token> --path <dir> [options]

Or, from within the project directory:

python3 -m qfetch --host <host> --token <token> --path <dir>

Getting a token

Create a long-lived access token using the Qumulo CLI:

qq auth_create_access_token --self -f token.json

Then use --token-file token.json.

Options

Flag Default Description
--host (required) Hostname, IP, comma-separated list, or IP range
--token API bearer token
--token-file Path to token JSON file (as created by qq auth_create_access_token -f)
--path (required) Directory path to fetch
--port 8000 REST API port
--walkers 4 Parallel directory walker threads
--workers 8 Parallel file fetch threads
--insecure off Disable SSL certificate verification
--max-bytes unlimited Max bytes to fetch per file (B, KB, MB, GB, TB suffixes)
--no-progress off Disable progress output on stderr

One of --token or --token-file is required.

Multi-node

Distribute connections across cluster nodes for higher throughput. Each thread gets a sticky connection to one node, assigned round-robin.

# Comma-separated
qfetch --host 10.0.0.1,10.0.0.2,10.0.0.3 ...

# IP range (expands last octet)
qfetch --host 10.0.0.1-10.0.0.4 ...

# Mixed
qfetch --host node1,10.0.0.1-10.0.0.3 ...

Example

$ qfetch \
    --host 10.100.0.33-10.100.0.36 \
    --token-file token.json \
    --insecure \
    --walkers 32 \
    --workers 64 \
    --path /data

Discovered: 3897 files | Fetched: 3897/3897 files | 3.7 GB fetched (1.5 GB/s)
{
  "files_found": 3897,
  "files_fetched": 3897,
  "bytes_fetched": 4013286225,
  "elapsed_seconds": 2.49,
  "bytes_per_second": 1614567465
}

Progress is printed to stderr; the JSON summary goes to stdout.

Architecture

resolve_path(--path)
    |
    v
dir_queue (seeded with root)
    |
    v
N walker threads ──> list directory entries (paginated) ───┐
    |                                                      |
    ├── subdirectories -> back into dir_queue              |
    └── files -> file_queue                                |
                    |                                      |
                    v                                      |
          M fetcher threads ──> POST /fetch-data (loop)    |
                    |                                      |
                    v                                      |
               Progress counters <─────────────────────────┘
  • Walkers expand the tree in parallel (BFS-like). A WalkCoordinator tracks in-flight directories and signals completion when all are enumerated.
  • Fetchers drain the file queue and call the fetch-data API in a loop until each file is fully cached.
  • Connections are persistent per thread (http.client.HTTPSConnection with thread-local storage) to avoid TLS handshake overhead.

Tests

From the project directory:

python3 -m unittest

About

Standalone CLI tool that warms Qumulo cluster caches

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages