feat: Core components - exceptions, file handling, and HTTP client #1

jdrhyne · 2025-06-17T17:07:57Z

Overview

This PR implements the core components of the Nutrient DWS Python client library as specified in the Software Design Specification v1.1.

🎯 Major Discovery: Implicit Document Conversion

During implementation and testing with the live API, we discovered that the Nutrient API automatically converts Office documents (DOCX, XLSX, PPTX) to PDF when processing them. This means:

No explicit conversion needed - Just pass Office documents to any method
All methods accept Office documents - rotate_pages(), ocr_pdf(), etc. work seamlessly with DOCX files
Simplified workflows - Users can merge PDFs and Office documents in a single operation

This significantly enhances the library's capabilities beyond the original specification.

Features Implemented

🏗️ Project Structure

Modern Python packaging with src layout
Comprehensive development tooling (pytest, mypy, ruff)
Pre-commit hooks for code quality
GitHub Actions CI/CD pipeline

🔧 Core Components

Custom Exception Hierarchy
- NutrientError (base exception)
- AuthenticationError (401/403 errors)
- APIError (general API errors with status codes)
- ValidationError (request validation failures)
- TimeoutError (request timeouts)
- FileProcessingError (file operation failures)
HTTP Client Layer
- Connection pooling for performance
- Automatic retry logic with exponential backoff
- Bearer token authentication
- Request/response logging capabilities
File Handling Utilities
- Support for multiple input types (paths, Path objects, bytes, file-like objects)
- Automatic streaming for large files (>10MB)
- Memory-efficient processing

📚 API Implementation

Supported Operations (via API testing):
- convert_to_pdf - Leverages implicit conversion
- flatten_annotations - Flatten PDF annotations and forms
- rotate_pages - Rotate specific or all pages
- ocr_pdf - Make PDFs searchable (supports English and German)
- watermark_pdf - Add text/image watermarks
- apply_redactions - Apply existing redaction annotations
- merge_pdfs - Merge multiple files (PDFs and Office docs)

Builder API: Fluent interface for complex workflows

# Works with Office documents too\!
client.build(input_file="report.docx") \
    .add_step("watermark-pdf", {"text": "DRAFT", "width": 200, "height": 100}) \
    .add_step("flatten-annotations") \
    .execute(output_path="processed.pdf")

🧪 Testing

Unit Tests: 82 tests with 92.46% coverage
Live API Testing: Validated all operations against production API
Type Safety: Full mypy type checking
Code Quality: Ruff linting and formatting

📖 Documentation

Comprehensive README with examples
SUPPORTED_OPERATIONS.md documenting all available methods
Inline code documentation
Type hints throughout

🚀 CI/CD

GitHub Actions workflow for Python 3.8-3.12
Automated testing, linting, and type checking
PyPI release automation
Dependabot configuration

Test Results

# Unit Tests
✅ 82 tests passing
✅ 92.46% code coverage

# Live API Tests
✅ All supported operations validated
✅ Implicit conversion confirmed
✅ Error handling verified

# Type Checking
✅ mypy: Success: no issues found

# Linting
✅ ruff: All checks passed

Usage Examples

from nutrient import NutrientClient

# Initialize client
client = NutrientClient(api_key="your-api-key")

# Convert Office document to PDF (implicit conversion)
client.convert_to_pdf("document.docx", output_path="converted.pdf")

# Process Office document directly
client.ocr_pdf("scanned_document.docx", output_path="searchable.pdf")

# Merge PDFs and Office documents together
client.merge_pdfs([
    "report.pdf",
    "data.xlsx",
    "presentation.pptx"
], output_path="combined.pdf")

# Builder API with Office document
client.build(input_file="contract.docx") \
    .add_step("watermark-pdf", {
        "text": "CONFIDENTIAL",
        "width": 300,
        "height": 150,
        "opacity": 0.5
    }) \
    .add_step("flatten-annotations") \
    .execute(output_path="final_contract.pdf")

Checklist

Next Steps

After this PR is merged, the following tasks remain:

Set up documentation site (Sphinx/MkDocs)
Prepare for PyPI publication
Add more integration test scenarios
Performance benchmarking

This implementation follows the Software Design Document specifications and includes significant enhancements discovered during API testing.

- Implement comprehensive exception classes for error handling - Add rich error context with status codes and request IDs - Create dedicated exceptions for auth, validation, timeout, and file errors - Add unit tests for all exception classes with 100% coverage

- Implement HTTPClient with automatic retries for transient errors - Add connection pooling for performance optimization - Handle all API error responses with appropriate exceptions - Support multipart/form-data for file uploads and JSON actions - Add comprehensive unit tests with mocked responses - Include context manager support for proper resource cleanup

- Complete NutrientClient with authentication and configuration - Add Direct API methods generated from common operations - Support both parameter and environment variable API key - Implement _process_file method for handling API requests - Add comprehensive unit tests for client functionality - Include context manager support for proper cleanup

- Complete Builder API with fluent interface for chaining operations - Support multiple document processing steps in a single API call - Map tool names to Build API action types - Add output options configuration (metadata, optimization) - Include comprehensive unit tests for all builder functionality - Support both in-memory and file output options

- Fix file handler to properly extract basenames from Path objects - Update session close tests to match requests library behavior - Add TYPE_CHECKING imports to resolve circular dependencies - Improve type annotations throughout codebase - Fix linting issues identified by ruff - All 82 tests now passing with 92.46% coverage

jdrhyne · 2025-06-17T17:31:43Z

Test Results Update ✅

All tests are now passing! Here's the latest status:

Test Suite

82 tests passing
92.46% code coverage
All unit tests for core components working correctly

Type Checking

All mypy type checking errors resolved
Added proper TYPE_CHECKING imports to handle circular dependencies
Improved type annotations throughout the codebase

Code Quality

Linting with ruff completed successfully
All style issues automatically fixed
Code follows Python best practices

Changes Made

Fixed file handler to properly extract basenames from Path objects
Updated session close tests to match requests library behavior
Resolved circular import issues with TYPE_CHECKING
Improved type safety across all modules

The core components implementation is now complete and ready for review!

- Add CI workflow for testing across Python 3.8-3.12 - Include linting, type checking, and test coverage - Add release workflow for PyPI publishing - Configure Dependabot for dependency updates - Set up caching for faster builds

- Add detailed installation and quick start guide - Include examples for both Direct API and Builder API - Document all available tools and their usage - Add error handling examples - Include development setup instructions - Add contribution guidelines

- Add integration tests for Direct API operations - Add integration tests for Builder API workflows - Test various file input methods (path, bytes, file object) - Test authentication and error handling - Add pytest markers and configuration for test separation - Integration tests require NUTRIENT_API_KEY environment variable

- Change base URL to https://api.pspdfkit.com - Use Bearer token authentication instead of X-Api-Key - Send instructions as JSON string in form data - Update Direct API to use Build API internally - Add Path object support to file handlers - Fix tests passing: 14/22 tests now working

- Remove unsupported methods (convert-to-pdf, export-to-images, etc) - Fix watermark to require width/height parameters - Add OCR language code mapping (en -> english) - Update merge_pdfs to work with Build API - Add comprehensive documentation of supported operations - Update README to reflect only supported features Based on API testing: - Only 6 operations are currently supported - All operations go through the Build API - Watermark requires width/height parameters - OCR supports english/eng/deu languages

- Discovered that the Nutrient API automatically converts Office documents (DOCX, XLSX, PPTX) to PDF - Added convert_to_pdf method that leverages implicit conversion - Updated all Direct API method documentation to reflect Office document support - Updated SUPPORTED_OPERATIONS.md with comprehensive documentation of the discovery - All methods now accept both PDFs and Office documents seamlessly - Updated examples to show mixing PDFs and Office documents in operations like merge This is a significant improvement to the library's capabilities, as users can now: - Convert Office documents to PDF without explicit conversion steps - Use any processing operation (rotate, OCR, watermark, etc.) directly on Office files - Mix PDFs and Office documents in merge operations

jdrhyne added 5 commits June 17, 2025 12:58

jdrhyne added 6 commits June 17, 2025 13:34

ci: add GitHub Actions workflows for CI/CD

07b711c

- Add CI workflow for testing across Python 3.8-3.12 - Include linting, type checking, and test coverage - Add release workflow for PyPI publishing - Configure Dependabot for dependency updates - Set up caching for faster builds

jdrhyne merged commit 7016643 into main Jun 17, 2025
1 of 6 checks passed

jdrhyne deleted the feature/core-components branch June 17, 2025 18:29

jdrhyne mentioned this pull request Jun 21, 2025

Enhancement Roadmap: Comprehensive Feature Plan #9

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Core components - exceptions, file handling, and HTTP client #1

feat: Core components - exceptions, file handling, and HTTP client #1

Uh oh!

jdrhyne commented Jun 17, 2025 •

edited

Loading

Uh oh!

jdrhyne commented Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Core components - exceptions, file handling, and HTTP client #1

feat: Core components - exceptions, file handling, and HTTP client #1

Uh oh!

Conversation

jdrhyne commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

🎯 Major Discovery: Implicit Document Conversion

Features Implemented

🏗️ Project Structure

🔧 Core Components

📚 API Implementation

🧪 Testing

📖 Documentation

🚀 CI/CD

Test Results

Usage Examples

Checklist

Next Steps

Uh oh!

jdrhyne commented Jun 17, 2025

Test Results Update ✅

Test Suite

Type Checking

Code Quality

Changes Made

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jdrhyne commented Jun 17, 2025 •

edited

Loading