-
Notifications
You must be signed in to change notification settings - Fork 1
Getting Started
github-actions[bot] edited this page May 12, 2026
·
1 revision
This guide walks through a complete PyJeb setup using a data ingestion job configuration as a running example.
You manage a set of data ingestion jobs. Each job reads files from a source path and writes them to a target path, with operational parameters (format, retries, threshold, etc.). You want to:
- Validate that required fields are present
- Apply sensible defaults for optional fields
- Inject the current date and environment name dynamically
pip install pyjebCreate job.yaml with one or more job entries:
# job.yaml
ingest_customers:
source:
path: "/landing/$var.env/customers/$sys.timestamp('YYYY-MM-DD')"
format: "csv"
pattern: "*.csv"
target:
path: "/bronze/$var.env/customers"
mode: "append"
enabled: true
max_retries: 3
threshold: 0.95
tags:
- "crm"
- "daily"
ingest_orders:
source:
path: "/landing/$var.env/orders/$sys.timestamp('YYYY-MM-DD')"
format: "json"
target:
path: "/bronze/$var.env/orders"
enabled: true
$var.envand$sys.timestamp(...)are PyJeb variables resolved at runtime. See Variables.
The control list describes the expected structure of each job entry:
controls = [
{ "name": "source", "type": "dict" },
{ "name": "source.path" },
{ "name": "source.format", "validset": ["csv", "json", "parquet"] },
{ "name": "source.pattern", "default": "*" },
{ "name": "target", "type": "dict" },
{ "name": "target.path" },
{ "name": "target.mode", "default": "append", "validset": ["append", "overwrite"] },
{ "name": "enabled", "type": "boolean", "default": True },
{ "name": "max_retries", "type": "integer", "default": 3 },
{ "name": "threshold", "type": "decimal", "default": 0.95 },
{ "name": "tags", "type": "list", "default": [] },
]See Controls — Overview for all available control fields.
import yaml
from pyjeb import control_and_setup
with open("job.yaml") as f:
config = yaml.safe_load(f)
variables = { "env": "prod" }
for job_name, job_config in config.items():
job = control_and_setup(job_config, controls, variables=variables, to_object=True)
print(f"[{job_name}]")
print(f" source.path = {job.source.path}")
print(f" source.format = {job.source.format}")
print(f" source.pattern = {job.source.pattern}")
print(f" target.path = {job.target.path}")
print(f" target.mode = {job.target.mode}")
print(f" enabled = {job.enabled}")
print(f" max_retries = {job.max_retries}")
print(f" threshold = {job.threshold}")
print(f" tags = {job.tags}")Output (run on 2026-05-11 with env = "prod"):
[ingest_customers]
source.path = /landing/prod/customers/2026-05-11
source.format = csv
source.pattern = *.csv
target.path = /bronze/prod/customers
target.mode = append
enabled = True
max_retries = 3
threshold = 0.95
tags = ['crm', 'daily']
[ingest_orders]
source.path = /landing/prod/orders/2026-05-11
source.format = json
source.pattern = *
target.path = /bronze/prod/orders
target.mode = append
enabled = True
max_retries = 3
threshold = 0.95
tags = []
Notice that for ingest_orders:
-
source.patterndefaulted to"*"— not provided in the file -
target.modedefaulted to"append"— not provided in the file -
enabled,max_retries,threshold,tagsall received their default values -
$var.envwas replaced by"prod" -
$sys.timestamp('YYYY-MM-DD')was replaced by the current date
- Customize your control definitions: Controls — Overview
- Use variables and functions: Variables — Overview
- Handle validation errors: Exceptions