Data Farm (Python Edition)

Data Farm is a schema-aware test data generation tool designed for engineers, QA professionals, and developers who need structured, reproducible data for relational databases.

It inspects existing schemas, suggests generation strategies, and emits SQL scripts using deterministic random generation.

Project Status

Data Farm is in early development.

Version v0.1.0 is a proof-of-concept release that validates the core workflow:

Project initialization
Configuration-driven schema inspection
Strategy suggestion
Deterministic data generation
SQL emission via CLI

At this stage:

Generated values are intentionally simple (e.g., placeholder text patterns and small randomized sets).
Type coverage is limited.
Realistic domain-aware data generation is still evolving.
Breaking changes should be expected as the architecture matures.

This release exists to:

Establish a stable CLI and packaging foundation.
Validate the inspect → plan → generate → emit pipeline.
Enable incremental improvement of planners and generators.
Provide a public artifact for feedback and iteration.

Data Farm is not yet intended for production data generation workflows. Future releases will expand planner sophistication, improve realism, and strengthen architectural boundaries.

Features

Inspect relational database schemas
Suggest generation strategies based on column types and patterns
Deterministic random data generation (seed-based)
SQL INSERT script emission
CLI-first design
Extensible architecture (planners, suggestors, emitters)
Structured logging with UTC timestamps

Requirements

Python 3.11+

Installation

Install from PyPI (Coming Soon!):

pip install datafarm

After installation, the CLI command dfarm will be available.

Quick Start

1️⃣ Initialize a Project

Create a new Data Farm project:

dfarm project init my_project

This creates a project directory containing configuration files.

2️⃣ Configure the Project

Edit the generated configuration file (TOML format) to define:

Database connection settings
Target schema
Generation options
Output settings

3️⃣ Inspect a Schema

Run schema inspection:

dfarm inspect --config path/to/config.toml

Data Farm will:

Connect to the configured data source
Inspect tables and columns
Suggest generation strategies
Emit SQL scripts according to configuration

Example Workflow

dfarm project init demo_project
# edit demo_project/config.toml

dfarm inspect --config demo_project/config.toml

Output SQL scripts can then be executed against your target database.

Logging

Data Farm uses structured logging with:

UTC timestamps
Verbosity levels (-v)
Optional file logging (--log-file)

Example:

dfarm inspect --config config.toml -v --log-file logs/

If a directory is provided to --log-file, a timestamped log file will be created automatically.

CLI Overview

dfarm --help
dfarm project --help
dfarm inspect --help

Architecture

Data Farm follows a layered design that separates:

CLI interface
Application orchestration
Domain logic (suggestors, planners)
Infrastructure (inspectors, emitters)

Future releases will further expand this architecture toward a full DDD/Clean Architecture structure.

Development

Clone the repository:

git clone https://github.com/PhoenixAnvil/data-farm-python.git
cd data-farm-python

Create a virtual environment:

python -m venv .venv
.venv\Scripts\activate  # Windows
source .venv/bin/activate  # macOS/Linux

Install development dependencies:

pip install -r requirements-dev.txt

Run tests:

pytest

Build locally:

python -m build
twine check dist/*

Contributing

Please see CONTRIBUTING.md for contribution guidelines.

All contributors are expected to follow the CODE_OF_CONDUCT.md.

License

This project is licensed under the MIT License.

See the LICENSE file for details.

Roadmap

v0.1.x --- Stabilization and packaging
v0.2.0 --- DDD/Clean Architecture refactor
v0.3.x --- Expanded planners and generation strategies

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.idea		.idea
docs		docs
examples		examples
scripts		scripts
source		source
src/data_farm		src/data_farm
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
make.bat		make.bat
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Farm (Python Edition)

Project Status

Features

Requirements

Installation

Quick Start

1️⃣ Initialize a Project

2️⃣ Configure the Project

3️⃣ Inspect a Schema

Example Workflow

Logging

CLI Overview

Architecture

Development

Contributing

License

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Farm (Python Edition)

Project Status

Features

Requirements

Installation

Quick Start

1️⃣ Initialize a Project

2️⃣ Configure the Project

3️⃣ Inspect a Schema

Example Workflow

Logging

CLI Overview

Architecture

Development

Contributing

License

Roadmap

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages