SUSFlow

Modern Python library for downloading, parsing and engineering DATASUS public health datasets. SUSFlow provides:

resilient FTP access to DATASUS
a local cache that mirrors the FTP tree
transparent decompression of proprietary .dbc files to tabular data
helpers to load datasets as pandas DataFrame ready for analysis

This repository focuses on practical reproducibility and safe access to legacy public data systems.

Portuguese (Brazil) documentation and module index: Português do Brasil

Installation

Install in editable mode during development:

git clone https://github.com/OncoAtlas/susflow.git
cd susflow
python -m venv .venv
. ./.venv/bin/activate
pip install -U pip
pip install -e .

Install from PyPI (recommended for most users):

pip install susflow

To install a specific released version:

pip install susflow==0.1.1

Core runtime dependencies are declared in pyproject.toml. Typical extras for performance:

pyarrow or fastparquet (Parquet cache)
pandas (DataFrame API)

Basic usage

Each DATASUS system is available under susflow.systems. APIs are lightweight: list_files, download and read helpers manage discovery, download and conversion.

Example: SINASC (Live Births)

from susflow.systems import sinasc

# list files for a state
sinasc.list_files(uf="SP")

# download and return a pandas.DataFrame
df = sinasc.read(uf="SP", year=2020)

Example: PNI (Vaccinations)

from susflow.systems import pni
df = pni.read(uf="RJ", year=2015)

Caching behavior

By default downloads are stored under ~/.susflow/cache/ mirroring FTP paths. If a requested file is present locally the library skips the download and reads directly from cache. To force re-download set force=True on download/reader helpers.

Performance guidance

Downcast numeric types and convert repeated strings to category to reduce memory.
Convert commonly used datasets to Parquet once and reuse local Parquet caches.
For very large datasets prefer processing in chunks or using DuckDB/Polars to avoid excessive RAM.

Developer tools and linters

We recommend the following dev tools for contributors:

. ./.venv/bin/activate
pip install -U ruff black isort pytest pytest-mock coverage
ruff .
black --check .
isort --check-only .
pytest -q

Testing strategy

Unit tests should mock FTP and file IO; see tests/unit/ for examples.
Integration tests that access live FTP data should be opt-in and run manually (network-dependent).

Utilities

tools/mapear_ftp.py helps locate and audit DATASUS FTP directory structures when paths change. It can save structured maps to tools/mapas/ for offline analysis.

Contributing

See CONTRIBUTING.md for guidelines: coding style, tests, and PR workflow. See docs/contributing/coverage.md for coverage instructions.

License

This project is released under the MIT License — see LICENSE.

Contact

Open issues and pull requests are welcome. For larger changes please open an issue to discuss scope before implementing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SUSFlow

Contents

Installation

Basic usage

Caching behavior

Performance guidance

Developer tools and linters

Testing strategy

Utilities

Contributing

License

Contact

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
docs		docs
susflow		susflow
tests		tests
tools		tools
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

SUSFlow

Contents

Installation

Basic usage

Caching behavior

Performance guidance

Developer tools and linters

Testing strategy

Utilities

Contributing

License

Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages