Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 74 additions & 24 deletions specifications/SPEC-APPLICATION-SERVICE.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
itemId: SPEC-APPLICATION-SERVICE
itemId: SPEC-APPLICATION-SERVICE
itemTitle: Application Module Specification
itemType: Software Item Spec
itemFulfills: SWR-APPLICATION-1-1, SWR-APPLICATION-1-2
Module: Application
Layer: Domain Service
Version: 0.2.106
itemType: Software Item Spec
itemFulfills: SWR-APPLICATION-1-1, SWR-APPLICATION-1-2, SHR-APPLICATION-3, SWR-APPLICATION-2-12
Module: Application
Layer: Domain Service
Version: 0.2.106
Date: 2025-09-09
---

Expand Down Expand Up @@ -88,25 +88,73 @@ application/

### 3.1 Inputs

| Input Type | Source | Code Location |
| -------------------------- | ------------- | ------------------------------------------------------------- |
| **Supported WSI Files** | CLI/GUI | `WSI_SUPPORTED_FILE_EXTENSIONS` in `constants.py` |
| **Application Version ID** | API | `Service.application_run_submit(application_version_id: str)` |
| **Input Items** | API | `Service.application_run_submit(items: list[InputItem])` |
| **Run ID** | API | `Service.application_run_download(run_id: str)` |
| **Upload Chunks** | Configuration | Constants in `_service.py` |
| Input Type | Source | Data Type/Format | Validation Rules | Business Rules |
| -------------------------- | ------------- | ---------------- | ------------------------------------------------------ | ----------------------------------------------- |
| **Supported WSI Files** | CLI/GUI | Path object | Must exist, extension in WSI_SUPPORTED_FILE_EXTENSIONS | File must be readable, format must be supported |
| **Application Version ID** | API | String | Must be valid UUID format | Must correspond to existing application version |
| **Input Items** | API | List[InputItem] | Each item must have valid metadata | Items must match application input schema |
| **Run ID** | API | String | Must be valid UUID format | Must correspond to existing application run |
| **Upload Chunks** | Configuration | Integer | Must be positive value | Configurable based on platform limits |

### 3.2 Outputs

| Output Type | Destination | Code Location |
| ---------------------- | ---------------- | ------------------------------------------------ |
| **Application Runs** | Platform API | `Service.application_run_submit()` return value |
| **Downloaded Results** | Local filesystem | `Service.application_run_download()` side effect |
| **QuPath Projects** | Local filesystem | QuPath integration when `has_qupath_extra=True` |
| **Progress Updates** | Callback/GUI | `DownloadProgress` model with computed fields |
| **Metadata Reports** | CLI/GUI | CLI commands and service methods |
| Output Type | Destination | Data Type/Format | Success Criteria | Error Conditions |
| ---------------------- | ---------------- | --------------------- | -------------------------------------------------- | --------------------------------------- |
| **Application Runs** | Platform API | ApplicationRun object | Run successfully submitted with valid ID | Platform API failure, validation errors |
| **Downloaded Results** | Local filesystem | Directory structure | All artifacts downloaded to organized directories | Network failure, permission errors |
| **QuPath Projects** | Local filesystem | .qpproj file | Valid QuPath project with input/result integration | QuPath dependency missing, file errors |
| **Progress Updates** | Callback/GUI | DownloadProgress | Real-time progress tracking with normalized values | Callback execution errors |
| **Metadata Reports** | CLI/GUI | Formatted text/JSON | Human-readable metadata display | Processing errors, missing files |

### 3.3 Data Schemas

**InputItem Schema:**

```yaml
InputItem:
type: object
properties:
path:
type: string
description: File system path to WSI file
metadata:
type: object
description: Extracted WSI metadata including dimensions and format
bucket_key:
type: string
description: Cloud storage key after upload
required: [path, metadata]
```

**DownloadProgress Schema:**

```yaml
DownloadProgress:
type: object
properties:
state:
type: string
enum: [INITIALIZING, CHECKING, DOWNLOADING, QUPATH_ADD_RESULTS, COMPLETED]
total_artifact_count:
type: integer
description: Total number of artifacts to download
total_artifact_index:
type: integer
description: Current artifact being processed
item_progress_normalized:
type: number
minimum: 0
maximum: 1
description: Progress for current item (0-1)
artifact_progress_normalized:
type: number
minimum: 0
maximum: 1
description: Overall progress across all artifacts (0-1)
required: [state, total_artifact_count, total_artifact_index]
```

### 3.3 Data Flow
### 3.4 Data Flow

```mermaid
graph TD
Expand Down Expand Up @@ -371,20 +419,22 @@ Configuration is managed through environment variables with the prefix `AIGNOSTI

## 9. Implementation Details

### 9.1 Key Algorithms
### 9.1 Key Algorithms and Business Logic

- **Metadata Generation Pipeline**: Multi-stage pipeline for WSI file discovery, metadata extraction, and validation
- **Progress Tracking Algorithm**: Normalized progress calculation with multi-level aggregation across files and operations
- **Chunked Upload Algorithm**: Memory-efficient streaming upload with integrity verification and resume capability

### 9.2 State Management
### 9.2 State Management and Data Flow

- **Configuration State**: Environment-aware settings management with Pydantic validation and secure credential handling
- **Runtime State**: Progress tracking state persistence for resumable operations and error recovery
- **Cache Management**: Platform client caching with lazy initialization and automatic session management

### 9.3 Concurrency and Threading
### 9.3 Performance and Scalability Considerations

- **Async Operations**: Asynchronous file upload/download operations with configurable concurrency limits
- **Thread Safety**: Thread-safe progress tracking and state management with queue-based communication
- **Resource Management**: Proper cleanup of network connections and file handles with context managers
- **Memory Efficiency**: Handle multi-gigabyte files through streaming and chunked operations
- **Scalability Patterns**: Integration with cloud storage services for horizontal scaling
121 changes: 77 additions & 44 deletions specifications/SPEC-BUCKET-SERVICE.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
itemId: SPEC-BUCKET-SERVICE
itemId: SPEC-BUCKET-SERVICE
itemTitle: Bucket Module Specification
itemType: Software Item Spec
itemFulfills: SWR-BUCKET-1-1, SWR-BUCKET-1-2, SWR-BUCKET-1-3
Module: Bucket
Layer: Domain Service
Version: 0.2.105
itemType: Software Item Spec
itemFulfills: SWR-BUCKET-1-1, SWR-BUCKET-1-2, SWR-BUCKET-1-3
Module: Bucket
Layer: Domain Service
Version: 0.2.105
Date: 2025-09-09
---

Expand Down Expand Up @@ -90,25 +90,71 @@ def read_in_chunks():

### 3.1 Inputs

| Input Type | Source | Format/Type | Validation Rules | Code Location |
| ------------------ | ------------- | --------------- | --------------------------------------------------------- | ------------------------------------------------------ |
| Bucket Name | CLI/GUI/API | String | Must match GCS bucket naming conventions | `_service.py::Service.get_bucket_name()` method |
| Object Key/Pattern | CLI/GUI/API | String/Regex | Valid path characters, regex patterns for bulk operations | `_service.py` upload/download methods, `_cli.py` args |
| Local File Path | CLI/GUI/API | Path | Must exist for upload, valid directory for download | `_cli.py` typer Path validation, `_service.py` checks |
| Credentials | Environment | HMAC Key Pair | Required AIGNOSTICS_BUCKET_HMAC_* variables | `_settings.py::Settings` environment variable binding |
| Protocol | Configuration | String | Must be "gs" or "s3" | `_settings.py::BucketProtocol` enum validation |
| Input Type | Source | Data Type/Format | Validation Rules | Business Rules |
| ------------------ | ------------- | ---------------- | --------------------------------------------------------- | ------------------------------------------------------- |
| Bucket Name | CLI/GUI/API | String | Must match GCS bucket naming conventions | Must correspond to accessible cloud storage bucket |
| Object Key/Pattern | CLI/GUI/API | String/Regex | Valid path characters, regex patterns for bulk operations | Keys must follow cloud storage path conventions |
| Local File Path | CLI/GUI/API | Path | Must exist for upload, valid directory for download | File must be readable, directories must be writable |
| Credentials | Environment | HMAC Key Pair | Required AIGNOSTICS_BUCKET_HMAC_* variables | Keys must have appropriate bucket permissions |
| Protocol | Configuration | String | Must be "gs" or "s3" | Protocol must match configured cloud storage provider |

### 3.2 Outputs

| Output Type | Destination | Format/Type | Success Criteria | Code Location |
| ---------------- | --------------- | ---------------- | --------------------------------------------- | ------------------------------------------------------- |
| Uploaded Files | Cloud Storage | Binary/Metadata | Successful S3 PUT with ETag confirmation | `_service.py::Service.upload()` method return |
| Downloaded Files | Local Filesystem| Binary | Complete download with ETag validation | `_service.py::Service.download()` method with progress |
| Signed URLs | Client/Platform | HTTPS URL | Valid URL with correct expiration time | `_service.py::Service.create_signed_*_url()` methods |
| Progress Updates | CLI/GUI | Progress Models | Real-time byte-level progress information | `_service.py::DownloadProgress/UploadProgress` models |
| Operation Status | Logs/Console | Structured Logs | Success/failure with detailed error messages | `_cli.py` console output, `_service.py` logger calls |
| Output Type | Destination | Data Type/Format | Success Criteria | Error Conditions |
| ---------------- | ---------------- | ---------------- | --------------------------------------------- | ------------------------------------------- |
| Uploaded Files | Cloud Storage | Binary/Metadata | Successful S3 PUT with ETag confirmation | Network failure, permission errors |
| Downloaded Files | Local Filesystem | Binary | Complete download with ETag validation | Disk space issues, permission errors |
| Signed URLs | Client/Platform | HTTPS URL | Valid URL with correct expiration time | Credential errors, invalid object keys |
| Progress Updates | CLI/GUI | Progress Models | Real-time byte-level progress information | Callback execution errors |
| Operation Status | Logs/Console | Structured Logs | Success/failure with detailed error messages | Logging system failures |

### 3.3 Data Schemas

**DownloadProgress Schema:**

```yaml
DownloadProgress:
type: object
properties:
total_bytes:
type: integer
description: Total bytes to download
downloaded_bytes:
type: integer
description: Bytes downloaded so far
current_file:
type: string
description: Current file being downloaded
progress_percentage:
type: number
minimum: 0
maximum: 100
description: Download progress as percentage
required: [total_bytes, downloaded_bytes]
```

**UploadProgress Schema:**

```yaml
UploadProgress:
type: object
properties:
total_bytes:
type: integer
description: Total bytes to upload
uploaded_bytes:
type: integer
description: Bytes uploaded so far
current_file:
type: string
description: Current file being uploaded
upload_speed:
type: number
description: Upload speed in bytes per second
required: [total_bytes, uploaded_bytes]
```

### 3.3 Data Flow
### 3.4 Data Flow

```mermaid
graph LR
Expand Down Expand Up @@ -305,39 +351,26 @@ uvx aignostics bucket [subcommand] [options]

---

## 9. Testing and Quality Assurance

### 9.1 Testing Strategy

- **Unit Tests**: Mock S3 client for isolated service testing, validate all public methods
- **Integration Tests**: Real cloud storage operations in test environment
- **Performance Tests**: Large file upload/download benchmarks, concurrent operation testing
- **Security Tests**: Credential handling validation, input sanitization verification

### 9.2 Quality Metrics

- **Code Coverage**: Minimum 80% test coverage for service layer
- **Performance Benchmarks**: <30s for 1GB file operations, <5s for signed URL generation
- **Reliability Targets**: 99.9% operation success rate, <1% data corruption tolerance

---

## 10. Implementation Details
## 9. Implementation Details

### 10.1 Key Algorithms
### 9.1 Key Algorithms and Business Logic

- **Chunked Transfer**: Adaptive chunk sizing based on operation type (1MB upload, 10MB download, 100MB ETag)
- **ETag Caching**: MD5-based content comparison to avoid redundant downloads
- **Progress Calculation**: Byte-level progress tracking with transfer speed estimation
- **Pattern Matching**: Regex-based object filtering for bulk operations and content discovery

### 10.2 State Management
### 9.2 State Management and Data Flow

- **Configuration State**: Settings cached from environment variables with lazy loading
- **Runtime State**: Progress models maintain operation state with real-time updates
- **Cache Management**: ETag-based file validation cache for efficient re-download detection
- **Session Management**: S3 client connection pooling and automatic retry mechanisms

### 10.3 Concurrency and Threading
### 9.3 Performance and Scalability Considerations

- **Async Operations**: Generator-based progress callbacks for non-blocking UI updates
- **Thread Safety**: Immutable progress models, thread-safe logging configuration
- **Memory Efficiency**: Streaming operations for large files with configurable chunk sizes
- **Network Optimization**: Connection pooling, retry mechanisms, and bandwidth throttling
- **Concurrent Operations**: Thread-safe progress tracking and parallel transfer support
- **Resource Management**: Proper cleanup of S3 client connections and file handles
- **Scalability Patterns**: Support for high-throughput operations with memory constraints
3 changes: 1 addition & 2 deletions specifications/SPEC-BUILD-CHAIN-CICD-SERVICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
itemId: SPEC-BUILD-CHAIN-CICD-SERVICE
itemTitle: Build Chain and CI/CD Module Specification
itemType: Software Item Spec
itemFulfills: SWR-BUILD-CHAIN-1 _(Comprehensive Build Chain and CI/CD Pipeline)_
Module: Build Chain and CI/CD
itemFulfills: TBD _(No infrastructure requirements currently defined)_Module: Build Chain and CI/CD
Layer: Infrastructure Service
Version: 0.2.140
Date: 2025-09-11
Expand Down
23 changes: 9 additions & 14 deletions specifications/SPEC-DATASET-SERVICE.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,14 @@
---
itemId: SPEC-DATASET-SERVICE
itemTitle: Dataset Module Specification
itemType: Software Item Spec
itemFulfills: FE-6386
Module: Dataset
Layer: Domain Service
Version: 0.2.105
itemType: Software Item Spec
itemFulfills: SHR-DATASET-1, SWR-DATASET-1-1, SWR-DATASET-1-3
Module: Dataset
Layer: Domain Service
Version: 0.2.105
Date: 2025-09-11
---

## Documentation Guidelin| Parameter | Type | Default | Description | Required |

| -------------------- | ---- | ---------------------------------------------------------------- | --------------------------------- | -------- |
| `target_layout` | str | `%collection_id/%PatientID/%StudyInstanceUID/%Modality_%SeriesInstanceUID/` | Directory layout template | No |
| `portal_url` | str | `https://portal.imaging.datacommons.cancer.gov/explore/` | IDC portal base URL | No |
| `example_dataset_id` | str | `1.3.6.1.4.1.5962.99.1.1069745200.1645485340.1637452317744.2.0` | Example dataset for testing | No |
| `path_length_max` | int | 260 | Maximum path length (Windows) | No |# Code in Specifications - Best Practices

## 1. Description

### 1.1 Purpose
Expand Down Expand Up @@ -380,4 +372,7 @@ _Note: For exact version requirements, refer to `pyproject.toml` and dependency
**API Documentation**: Auto-generated from docstrings in service classes

---
````

```

```
2 changes: 1 addition & 1 deletion specifications/SPEC-MODULE-SERVICE-TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Version: [VERSION] _(e.g., 0.2.105)_
Date: [DATE]
---

## Documentation Guidelines
## Documentation Guidelines [DO NOT ADD]

### Code in Specifications - Best Practices

Expand Down
Loading