FileStag is a high-performance, unified file access library for Python. It provides a consistent API for reading, writing, and managing files across different storage backends—including the local filesystem, ZIP archives, HTTP/HTTPS URLs, and Azure Blob Storage.
FileStag is designed to be lightweight, efficient, and developer-friendly.
- Unified API: Use the same
load,save, andcopymethods regardless of the underlying storage. - Protocol Support:
file://(Local filesystem)zip://(Direct access to files inside ZIP archives)http://&https://(Web fetching with built-in caching)azure://(Azure Blob Storage support)
- Smart Caching: Built-in web caching to speed up repeated remote file access.
- Async Support: Full
async/awaitsupport for high-concurrency applications. - Developer Friendly: Type-hinted, intuitive methods for text, JSON, and binary data.
Install FileStag via pip or poetry:
# Standard installation
pip install filestag
# With Azure Blob Storage support
pip install "filestag[azure]"Using Poetry:
poetry add filestag
# or
poetry add filestag -E azureFileStag makes file I/O simple and consistent.
from filestag import FileStag
# Save text to a local file
FileStag.save_text("hello.txt", "Hello World!")
# Load it back
content = FileStag.load_text("hello.txt")
print(content) # "Hello World!"
# Check if it exists
if FileStag.exists("hello.txt"):
print("File exists!")Read and write JSON with zero boilerplate.
data = {"name": "FileStag", "version": "0.1.0"}
# Save dictionary as JSON
FileStag.save_json("config.json", data, indent=4)
# Load JSON back into a dict
config = FileStag.load_json("config.json")
print(config["name"])Access files directly inside ZIP archives without manual extraction.
# Read a file directly from a ZIP archive
content = FileStag.load_text("zip://my_archive.zip/data/readme.txt")
# Check existence inside ZIP
if FileStag.exists("zip://my_archive.zip/images/logo.png"):
print("Logo found in archive")Download files from the web easily. Enable caching to prevent redundant network requests during development or analysis.
# Fetch a file from the web
data = FileStag.load("https://example.com/data.csv")
# Fetch with caching (cached for 1 hour)
# If the file was downloaded recently, it loads from disk instead.
data = FileStag.load(
"https://example.com/large_dataset.csv",
max_cache_age=3600
)Building a modern async application? FileStag has you covered.
import asyncio
from filestag import FileStag
async def main():
# Async load
text = await FileStag.load_text_async("data.txt")
# Async save
await FileStag.save_text_async("output.txt", "Processed data")
# Async web fetch
data = await FileStag.load_async("https://api.example.com/data.json")
asyncio.run(main())(Requires filestag[azure] install)
Access Azure Blob Storage using connection strings or SAS URLs directly in the path.
from filestag import FileSource, FileSink
# Using a connection string with container and path
conn_string = (
"azure://DefaultEndpointsProtocol=https;"
"AccountName=myaccount;AccountKey=mykey...;"
"EndpointSuffix=core.windows.net/mycontainer/data"
)
# Iterate files from Azure
source = FileSource.from_source(conn_string)
for file in source:
print(f"{file.filename}: {len(file.data)} bytes")
# Upload files to Azure
sink = FileSink.with_target(conn_string)
sink.store("output.json", b'{"result": "success"}')
# Using a SAS URL (read-only access via shared link)
sas_url = "https://myaccount.blob.core.windows.net/container?sp=r&sig=..."
source = FileSource.from_source(sas_url, search_path="data/")
print(f"Found {len(source.file_list)} files")When working with Azure containers containing thousands of files, listing operations can be slow. FileStag can cache the file list locally for instant access on subsequent runs:
# Cache the file list locally - huge speedup for large containers!
source = FileSource.from_source(
conn_string,
file_list_name="my_azure_cache.json",
)
# First run: fetches file list from Azure (may take seconds for 10k+ files)
# Subsequent runs: loads instantly from local cache file
print(f"Found {len(source.file_list)} files")
# Invalidate cache by bumping the version number:
source = FileSource.from_source(
conn_string,
file_list_name=("my_azure_cache.json", 2), # Change 2 -> 3 to invalidate
)
# Or force refresh programmatically (re-fetches and updates cache):
source.refresh()
# Automatic cache validation - checks if Azure has newer files:
source = FileSource.from_source(
conn_string,
file_list_name="my_azure_cache.json",
validate_cache=True, # Auto-refreshes if files changed on Azure
)Keep credentials out of your code using {{env.VAR}} syntax:
# Credentials are substituted from environment variables
conn = "azure://...AccountName={{env.AZURE_ACCOUNT}};AccountKey={{env.AZURE_KEY}}..."
source = FileSource.from_source(conn + "/mycontainer")Contributions are welcome!
- Clone the repository
- Install dependencies:
poetry install - Run tests:
poetry run pytest - Check types:
poetry run mypy filestag
This project is licensed under the MIT License - see the LICENSE file for details.