Skip to content

A practical pytest tutorial using real-world campaign data with quality issues (nulls, invalid dates, financial inconsistencies).

Notifications You must be signed in to change notification settings

aboyalejandro/pytest-data-engineers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytest for Data Engineers

A practical pytest tutorial using real-world campaign data with quality issues (nulls, invalid dates, financial inconsistencies).

Includes Claude Code pre-commit hook integration to showcase automated testing in AI-assisted development workflows.

📁 Files

  • campaigns.json - Messy marketing campaign dataset (100+ campaigns with data quality issues)
  • conftest.py - Pytest fixtures for clean/messy campaign data
  • main.py - 5 pytest examples demonstrating key concepts
  • .claude/settings.json - Claude Code hook configuration
  • .claude/pre-git-hook.sh - Pre-commit test runner for Claude Code

📊 Dataset Issues

The campaigns.json from Synthetic Data Gen contains realistic data quality problems:

  • Missing required fields (name, channel, status)
  • End dates before start dates
  • ROI values without revenue/spend data
  • Budget exceeded by actual spend
  • Negative ROI values

🚀 Quick Start

# Start virtual env
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run all tests
pytest main.py -v

# Run specific test
pytest main.py::test_valid_campaign_passes -v

Output

========================================= test session starts ==========================================
platform darwin -- Python 3.13.2, pytest-8.4.2, pluggy-1.6.0 -- /Users/your-user/Desktop/pytest-data-engineers/.venv/bin/python3.13
cachedir: .pytest_cache
rootdir: /Users/your-user/path/pytest-data-engineers
collected 5 items                                                                                      

main.py::test_convert_campaigns_to_df PASSED                                                     [ 20%]
main.py::test_valid_campaign_passes PASSED                                                       [ 40%]
main.py::test_invalid_dates_fail PASSED                                                          [ 60%]
main.py::test_mock_api_call PASSED                                                               [ 80%]
main.py::test_mock_s3_load PASSED                                                                [100%]

========================================== 5 passed in 0.69s ===========================================

🧪 What We're Testing

Example 1: DataFrame Conversion

Convert campaign dictionaries to pandas DataFrames for analysis

Example 2: Data Validation

  • Required fields (name, channel, status)
  • Date logic (start_date <= end_date)
  • Financial consistency (ROI requires revenue & spend)

Example 3: Mocking API Calls

Mock external API calls using @patch decorator

Example 4: Mocking AWS S3

Mock boto3 S3 client to test data loading without AWS

💡 Key Concepts

  • Fixtures: Reusable test data in conftest.py
  • Mocking: Test external services (APIs, S3) without real calls
  • Assertions: Validate data quality rules
  • Real-world data: Handle nulls, invalid dates, bad financials

🤖 Claude Code Integration

Includes a pre-commit hook (.claude/settings.json) that automatically runs pytest before git commits. When using Claude Code to commit, tests run first and block the commit if any fail - showcasing automated testing in AI-assisted workflows.

About

A practical pytest tutorial using real-world campaign data with quality issues (nulls, invalid dates, financial inconsistencies).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published