Skip to content

Conversation

jaseemjaskp
Copy link
Contributor

Summary

This PR adds support for two new API patterns to the apihub-python-client:

  1. Doc-splitter APIs - Job ID-based workflow for document splitting
  2. Generic Unstract APIs - Execution ID-based workflow for dynamic endpoints

New Features

🔧 DocSplitterClient

  • File upload with multipart form-data support
  • Job status polling with configurable intervals
  • Binary file download (zip files)
  • Methods: upload(), get_job_status(), download_result(), wait_for_completion()
  • Uses job_id for tracking operations

🚀 GenericUnstractClient

  • Dynamic endpoint support (invoice, contract, receipt, etc.)
  • Execution ID-based tracking
  • Multipart form-data uploads with 'files' field
  • Methods: process(), get_result(), wait_for_completion(), check_status()
  • Uses execution_id for tracking operations

Implementation Details

  • Consistent API Design: Both clients follow existing patterns for consistency
  • Comprehensive Testing: 55 new tests added (94 total tests passing)
  • Type Safety: Full type hints with proper error handling
  • Documentation: Updated README with usage examples and complete API reference
  • Error Handling: All clients share the same ApiHubClientException

API Examples

DocSplitterClient Usage

from apihub_client import DocSplitterClient

doc_client = DocSplitterClient(
    api_key="your-api-key",
    base_url="http://localhost:8005"
)

result = doc_client.upload(
    file_path="document.pdf",
    wait_for_completion=True
)

doc_client.download_result(
    job_id=result["job_id"],
    output_path="result.zip"
)

GenericUnstractClient Usage

from apihub_client import GenericUnstractClient

client = GenericUnstractClient(
    api_key="your-api-key",
    base_url="http://localhost:8005"
)

result = client.process(
    endpoint="invoice",
    file_path="invoice.pdf",
    wait_for_completion=True
)

Files Changed

  • New Files:

    • src/apihub_client/doc_splitter.py - DocSplitterClient implementation
    • src/apihub_client/generic_client.py - GenericUnstractClient implementation
    • test/test_doc_splitter.py - DocSplitterClient tests (21 tests)
    • test/test_generic_client.py - GenericUnstractClient tests (34 tests)
  • Modified Files:

    • src/apihub_client/__init__.py - Export new clients
    • README.md - Add usage examples and API documentation

Testing

  • ✅ All 94 tests passing
  • ✅ Comprehensive coverage of success/failure paths
  • ✅ Performance benchmarks included
  • ✅ Real-world usage scenarios tested
  • ✅ Code formatting and linting checks pass
  • ✅ Type checking passes

Backwards Compatibility

This PR is fully backwards compatible. Existing ApiHubClient functionality remains unchanged, and new clients are additive.

Summary

The client now supports all three API patterns:

  • ApiHubClient: Original extract APIs with file_hash tracking
  • DocSplitterClient: Doc-splitter APIs with job_id tracking
  • GenericUnstractClient: Generic Unstract APIs with execution_id tracking

All functionality is production-ready with comprehensive testing and documentation.

Add support for two new API patterns:
1. Doc-splitter APIs (job_id-based workflow)
2. Generic Unstract APIs (execution_id-based workflow)

## New Features

### DocSplitterClient
- File upload with form-data support
- Job status polling with configurable intervals
- Binary file download (zip files)
- Methods: upload(), get_job_status(), download_result(), wait_for_completion()

### GenericUnstractClient
- Dynamic endpoint support (invoice, contract, receipt, etc.)
- Execution ID-based tracking
- Multipart form-data uploads with 'files' field
- Methods: process(), get_result(), wait_for_completion(), check_status()

## Implementation Details
- Both clients follow existing patterns for consistency
- Comprehensive test coverage (55 new tests)
- Full type safety with proper error handling
- Updated README with usage examples and API documentation
- All clients share the same ApiHubClientException

## Testing
- 94/94 tests passing
- Comprehensive coverage of success/failure paths
- Performance benchmarks included
- Real-world usage scenarios tested
- Remove test/ directory from tox lint and format commands
- Focus tox linting only on src/ directory
- Prevents import sorting conflicts between test files and tox
- Resolves GitHub Actions CI failures
…ling

- Extract status from nested 'data' structure in wait_for_completion
- Support both uppercase and lowercase status values
- Add comprehensive test for nested response format
- Fixes infinite polling issue with real doc-splitter API
- Add comprehensive test_imports.py for package-level imports and metadata testing
- Enhance test_client.py with additional test cases for wait_for_complete methods
- Fix timeout exception test with proper time.time() mocking
- Add tests for client initialization edge cases
- Achieve 100% line coverage (221/221 lines covered)
- All 97 tests now pass successfully

Coverage improvements:
- __init__.py: 0% → 100% (package imports and metadata)
- client.py: ~98.6% → 100% (timeout and edge cases)
- Overall: 46% → 100% (exceeds 85% requirement)
- Update GitHub Action workflow to run all tests in test/ directory
- Update tox configuration to run all test files instead of hardcoded subset
- Fixes coverage failure in CI by including all test files for complete coverage

This ensures that the CI environment runs the same comprehensive test suite
that achieves 100% coverage locally, including:
- test/test_client.py
- test/test_integration.py
- test/test_doc_splitter.py
- test/test_generic_client.py
- test/test_imports.py
- test/test_performance.py
- Remove test/test_performance.py as it's not required for core test coverage
- Maintains 100% coverage with 97 tests instead of 108
- Reduces CI complexity and focuses on functional test coverage
- Performance testing can be added separately if needed in the future
Copy link
Contributor

🧪 Test Report

Test Results

Test Environment

  • Python Version: 3.12
  • OS: Ubuntu Latest
  • Tox Environment: py312

Status

✅ All tests passed successfully!

@jaseemjaskp jaseemjaskp requested a review from nagesh-zip August 24, 2025 06:43
@jaseemjaskp jaseemjaskp merged commit e4d4c6a into main Aug 24, 2025
3 checks passed
@jaseemjaskp jaseemjaskp deleted the feature/add-doc-splitter-and-generic-clients branch August 24, 2025 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant