A comprehensive object-oriented Python project template designed for ETL (Extract, Transform, Load) processes, API integrations, software tools or automation workflows. This template provides a structured foundation for building scalable data processing applications with proper logging, error handling, and modular design patterns.
This template is designed for general software or backend projects that involve:
- Database Operations: Reading from and writing to various databases
- API Integrations: Consuming and processing data from REST APIs
- File Processing: Handling CSV, Excel, and text files
- Data Transformation: ETL pipelines with pandas and other data tools
- GUI Tools: Tkinter-based interfaces for manual operations
- Automation: Scheduled processes and batch operations
π Project Root
βββ π main/ # Entry points and batch files
βββ π scripts/ # Core classes (Base & Child)
βββ π utils/ # Utility classes (DB, API, File operations)
βββ π tools/ # GUI tools and manual operation interfaces
βββ π tests/ # Unit tests and test utilities
βββ π config/ # Configuration files and mappings
βββ π docs/ # Documentation, schemas, and references
βββ π logs/ # Application logs (organized by module)
- Python 3.8 or higher (Note: Python 3.13+ may have compatibility issues with some data libraries)
- Virtual environment (recommended)
- Windows (recommended), otherwise switch out bat files with suitable alternative
-
Clone or download this template
-
Set up virtual environment:
python -m venv venv venv\Scripts\activate # Windows # source venv/bin/activate # Linux/Mac
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
cp .env.template .env # Edit .env with specific configuration
-
Run the example:
python -m main/main.py # or use the batch file main/main.bat
A .env
file can be created in the project root to store secrets, tokens logins etc, such as the example below:
# Database Connections
SQL_DATABASE_CONN=mssql+pyodbc://server/database?driver=ODBC+Driver+17+for+SQL+Server
# API Configuration
API_BASE_URL=https://api.example.com
API_KEY=api_key
API_TIMEOUT=30
# File Paths
INPUT_PATH=./data/input
OUTPUT_PATH=./data/output
ARCHIVE_PATH=./data/archive
When adding new dependencies to the project, update the requirements.txt file to maintain consistency across development environments.
Replace requirements.txt with installed dependencies:
pip freeze > requirements.txt
pandas
- Data manipulation and analysisrequests
- HTTP library for API callssqlalchemy
- Database toolkit and ORMpython-dotenv
- Environment variable managementpytest
- Testing framework
The main directory contains the primary execution scripts and batch files for running the scripts.
- Initializes Child class instances with specific log files
- Executes main workflows with proper resource disposal
- Demonstrates multiple instances running with separate logging
- Handles command-line arguments
- Manages process logging and error handling
- Activates virtual environment
- Updates code from version control
- Installs/updates dependencies
- Executes main process
- Activates virtual environment
- Updates code from version control
- Installs/updates dependencies
- Launches GUI or CLI tools for manual operations
This template implements a sophisticated instance-specific logging system that enables multiple script instances to run simultaneously with isolated log files while maintaining clear hierarchical logger naming conventions.
All logging uses this format:
%(asctime)s - %(name)s - %(levelname)s - %(message)s
Example log output:
2024-01-15 10:30:45,123 - scripts.base.instance_1 - INFO - Initialized scripts.base class
2024-01-15 10:30:45,124 - utils.api.instance_1 - INFO - Initialized API utility
2024-01-15 10:30:45,125 - utils.file.instance_1 - INFO - Initialized File utility
2024-01-15 10:30:45,125 - utils.db.instance_1 - INFO - Initialized DB utility
2024-01-15 10:30:45,126 - scripts.child.instance_1 - INFO - Initialized scripts.child class
2024-01-15 10:30:45,127 - scripts.child.instance_1 - INFO - Starting ETL workflow
2024-01-15 10:30:46,456 - utils.db.instance_1 - INFO - Read 1,250 rows from database.sales.transactions
2024-01-15 10:30:47,789 - scripts.child.instance_1 - INFO - Transformed data: 1,250 β 890 rows
2024-01-15 10:30:48,012 - utils.db.instance_1 - INFO - Upsert completed: 45 inserted, 123 updated
2024-01-15 10:30:45,128 - scripts.child.instance_1 - INFO - ETL workflow completed
2024-01-15 10:30:45,129 - scripts.base.instance_1 - INFO - Disposing of base class
Instance-specific logging system that ensures multiple objects can run simultaneously with separate log files while maintaining clear logger hierarchies.
- Unique instances: Each object gets instance-specific loggers (e.g.,
scripts.child.instance_1
,scripts.child.instance_2
) - Separate files: Multiple instances can run simultaneously with their own log files without cross-contamination
- Clear hierarchy: Logger names preserve module structure while adding instance identification
- Always provide file paths: All classes require explicit log file paths
- Module list maintenance: When adding/removing utility classes or renaming script files, update the
base_modules
list inscripts/base.py
to ensure proper logger cleanup - File management: Logs overwrite by default; use timestamps in filenames for archival
- Include metrics: Always log row counts, processing times, and aggregate information
- Proper disposal: Always call
dispose()
to clean up logging handlers - Hierarchical naming: Use module.class.instance naming (e.g.,
utils.db.instance_{n}
,scripts.child.instance_{n}
)
Core classes that implement the main processes and business logic, including the abstract base class and concrete implementations.
- Instance-Specific Logging: Configures standardized logging for each instance
- Utility Initialization: Initializes utility objects with instance-specific loggers
- Utility Wrappers: Provides wrapper methods for accessing utility objects.
- Abstract Methods: Enforces common patterns children must follow (Ex: extract, transform, load, main)
- Common Functionality: Provides common methods used across all children (e.g.,
dispose()
method for cleanup) - Required Parameters: Optionally include parameters for configurations common among all children (e.g.,
file_path: str
- to specify log file location)
- Abstract Method Implementation: Implements required abstract methods from base class (main workflow and any defined process steps)
- Business Logic Methods: Contains processing logic and workflows specific to the use case (e.g., financial calculations, inventory management, customer segmentation)
- Workflow Execution: Coordinates the complete process pipeline (e.g., extract-transform-load for ETL projects)
- Instance Configuration: Accepts child-specific parameters and passes common configurations to base class (e.g.,
file_path: str
for log file location)
- Method Organization: Separate methods into logical groups using comments (e.g.,
# MARK: Wrappers
,# MARK: Business Logic
) - High-Level Structure: Organize code into well-defined, high-level functions that follow the workflow pattern (e.g., extract, transform, load methods for ETL projects)
- Clear Purpose: Each method should have a clear purpose with concise names
- Input validation: Check for empty DataFrames, null values, and invalid parameters
- Error handling: Wrap all methods in try-except blocks with logging and optionally re-raising
- Comments: Include concise comments in appropriate locations
- Code Grouping: Use strategic whitespace to group related code blocks, avoiding excessive newlines while maintaining readability
- Documentation: Complete docstrings following the specified format, only include headers if they are present in the method (e.g., Don't include Parameters if there are no parameters)
- Type hints: Include method signatures with proper type annotations
"""
Short method description.
Parameters
----------
parameter name : DataType
Short description
Returns
-------
DataType
Short description
Raises
------
ErrorType
Short description
"""
def process_data(self, data: pd.DataFrame, threshold: float = 0.5) -> pd.DataFrame:
"""
Process and filter data based on threshold criteria.
Parameters
----------
data : pd.DataFrame
Input dataframe to process
threshold : float, default 0.5
Minimum threshold for filtering
Returns
-------
pd.DataFrame
Processed dataframe with applied filters
Raises
------
ValueError
If threshold is not between 0 and 1
"""
try:
# Validate inputs
if data.empty:
self.logger.warning("Received empty dataframe")
return pd.DataFrame()
if not 0 <= threshold <= 1:
raise ValueError(f"Threshold must be between 0 and 1, got {threshold}")
# Process data
filtered_data = data[data['score'] >= threshold]
self.logger.info(f"Filtered data: {len(data)} β {len(filtered_data)} rows")
return filtered_data
except Exception as e:
self.logger.error(f"Error processing data: {e}")
raise
GUI applications and interactive tools for manual operations, data management, and process monitoring. Tools integrate with the instance-specific logging system and provide sensible defaults for log file locations.
- Automated Script Integration: Optional integration to Child class functionality
- Resource Management: Proper disposal of resources
Example Tool Usage:
from tools.tool import Tool
# Using default log location (tool/tool.log)
tool = Tool()
tool.run()
tool.dispose()
Reusable utility classes for common operations like database access, API interactions, and file processing. See below for common examples of utility classes.
- Connection Management: Multiple database connections with pooling
- Pandas Integration: Direct DataFrame read/write operations
- Bulk Operations: Efficient insert/upsert capabilities
- Error Handling: Robust exception handling with detailed logging
- Request Handling: GET, POST, PUT, DELETE operations
- Authentication: Bearer token and API key management
- Retry Logic: Automatic retry with exponential backoff
- Rate Limiting: Request throttling and queue management
- Format Support: CSV, Excel, TXT file handling
- Data Validation: Schema validation and data quality checks
- File Management: Archive, backup, and cleanup operations
- Encoding Handling: UTF-8, ASCII, and other encoding support
Comprehensive testing framework with pytest fixtures, test utilities, and examples for unit and integration testing.
π tests/
βββ test_base.py # Base class tests
βββ test_child.py # Child class tests
βββ test_db_utils.py # Database utility tests
βββ conftest.py # Pytest fixtures
βββ data/ # Test data files
pytest # Run all tests
pytest --cov=scripts # Run with coverage
pytest tests/test_db_utils.py -v # Run specific file
The config directory contains project-specific configuration files and data mappings.
Use this directory for:
- Data mappings - Column mappings, data type definitions, validation rules
- Business rules - Thresholds, categories, lookup tables
- Environment configs - Different settings for dev/staging/production
- API configurations - Endpoint mappings, request templates
Template for data transformation and validation configurations:
{
"column_mappings": {
"source_column_name": "target_column_name",
"old_field": "new_field"
},
"data_types": {
"id": "int64",
"amount": "float64",
"date": "datetime64[ns]"
},
"validation_rules": {
"required_columns": ["id", "amount"],
"numeric_columns": ["amount", "quantity"],
"max_null_percentage": 0.05
}
}
The docs directory contains project documentation, references, and technical specifications.
Use this directory for:
- Database schemas - SQL scripts for table creation, indexes, stored procedures
- API documentation - Endpoint specifications, authentication details, examples
- Data flow diagrams - Visual representations of processes and system architecture
- Business requirements - Functional specifications and business rules documentation
- External references - Links to third-party APIs, vendor documentation
schema.sql
- Database schema definitions and setup scripts for the project's tablesERD.png
- Entity Relationship Diagram showing table structures and relationshipsapi_reference.md
- Documentation for external APIs used in the projectdata_dictionary.xlsx
- Field definitions, data types, and business meanings
The template includes Windows batch files for automated deployment:
main/main.bat
- Production automation execution with environment setupmain/tool.bat
- GUI tool launcher with environment setup
TBD - This section will be expanded when CI/CD pipelines are implemented. Examples include:
- GitHub Actions or Azure DevOps workflows
- Automated testing and deployment
- Environment-specific configurations
- Containerization with Docker
Links to external documentation and learning resources for commonly used modules.
- SQLAlchemy Documentation
- Pandas Documentation
- Requests Documentation
- Tkinter Tutorial
- Pytest Documentation
Template Version: 1.0