Data Farm is a schema-aware test data generation tool designed for engineers, QA professionals, and developers who need structured, reproducible data for relational databases.
It inspects existing schemas, suggests generation strategies, and emits SQL scripts using deterministic random generation.
Data Farm is in early development.
Version v0.1.0 is a proof-of-concept release that validates the core workflow:
- Project initialization
- Configuration-driven schema inspection
- Strategy suggestion
- Deterministic data generation
- SQL emission via CLI
At this stage:
- Generated values are intentionally simple (e.g., placeholder text patterns and small randomized sets).
- Type coverage is limited.
- Realistic domain-aware data generation is still evolving.
- Breaking changes should be expected as the architecture matures.
This release exists to:
- Establish a stable CLI and packaging foundation.
- Validate the inspect → plan → generate → emit pipeline.
- Enable incremental improvement of planners and generators.
- Provide a public artifact for feedback and iteration.
Data Farm is not yet intended for production data generation workflows. Future releases will expand planner sophistication, improve realism, and strengthen architectural boundaries.
- Inspect relational database schemas
- Suggest generation strategies based on column types and patterns
- Deterministic random data generation (seed-based)
- SQL
INSERTscript emission - CLI-first design
- Extensible architecture (planners, suggestors, emitters)
- Structured logging with UTC timestamps
- Python 3.11+
Install from PyPI (Coming Soon!):
pip install datafarmAfter installation, the CLI command dfarm will be available.
Create a new Data Farm project:
dfarm project init my_projectThis creates a project directory containing configuration files.
Edit the generated configuration file (TOML format) to define:
- Database connection settings
- Target schema
- Generation options
- Output settings
Run schema inspection:
dfarm inspect --config path/to/config.tomlData Farm will:
- Connect to the configured data source
- Inspect tables and columns
- Suggest generation strategies
- Emit SQL scripts according to configuration
dfarm project init demo_project
# edit demo_project/config.toml
dfarm inspect --config demo_project/config.tomlOutput SQL scripts can then be executed against your target database.
Data Farm uses structured logging with:
- UTC timestamps
- Verbosity levels (
-v) - Optional file logging (
--log-file)
Example:
dfarm inspect --config config.toml -v --log-file logs/If a directory is provided to --log-file, a timestamped log file will
be created automatically.
dfarm --help
dfarm project --help
dfarm inspect --helpData Farm follows a layered design that separates:
- CLI interface
- Application orchestration
- Domain logic (suggestors, planners)
- Infrastructure (inspectors, emitters)
Future releases will further expand this architecture toward a full DDD/Clean Architecture structure.
Clone the repository:
git clone https://github.com/PhoenixAnvil/data-farm-python.git
cd data-farm-pythonCreate a virtual environment:
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # macOS/LinuxInstall development dependencies:
pip install -r requirements-dev.txtRun tests:
pytestBuild locally:
python -m build
twine check dist/*Please see CONTRIBUTING.md for contribution guidelines.
All contributors are expected to follow the CODE_OF_CONDUCT.md.
This project is licensed under the MIT License.
See the LICENSE file for details.
- v0.1.x --- Stabilization and packaging
- v0.2.0 --- DDD/Clean Architecture refactor
- v0.3.x --- Expanded planners and generation strategies