feat(sdk/ts): Add multimodal helpers#321
Conversation
Performance
⚠ Regression detected:
|
santoshkumarradha
left a comment
There was a problem hiding this comment.
Code Review — PR #321: Multimodal Helpers
Overall Assessment: ✅ Good implementation
The PR adds 942 lines across two well-structured files that closely follow the Python reference implementation. The API surface matches the issue spec.
Strengths
- Clean API design —
Image.fromFile(),Audio.fromUrl(),File.fromBuffer()factory methods match the issue requirements exactly - Private constructors — Forces use of factory methods, preventing invalid state
- MIME type inference — Automatically determines content type from file extensions
- Comprehensive response handler —
MultimodalResponsewithhasAudio(),hasImage(),save()methods - No
anytypes — Proper TypeScript typing throughout - JSDoc comments on all public methods
Issues to Address
-
Image.fromUrlandAudio.fromUrlare async but don't await anything — ThefromUrlmethods returnPromise<Image>but just construct synchronously. Should either be sync or actually fetch/validate the URL. -
Missing export from index — Neither file is re-exported from the SDK's main entry point. Users can't
import { Image } from '@agentfield/sdk'without an index update. -
fetchdependency —Audio.fromUrluses globalfetch()but this isn't available in Node.js < 18. Should document the requirement or add a polyfill guard. -
No input validation —
fromFiledoesn't check if the file exists before reading. A clear error message like "File not found: /path" would be better than a raw ENOENT. -
MultimodalResponse.save()creates directories implicitly — Usesfs.mkdirwithrecursive: truewhich is fine, but should be documented.
Minor Suggestions
- Consider making
fromUrlsync since it doesn't do I/O (just stores the URL) - Add
.svgtoIMAGE_MIME_TYPES(image/svg+xml) - The
Textclass could extend or implement a commonContentPartinterface for type discrimination
Verdict
Good first implementation. Address items 1-2 (async consistency + index export) before merge. Items 3-5 are nice-to-haves for a follow-up.
santoshkumarradha
left a comment
There was a problem hiding this comment.
PR #321 Review: MultimodalResponse.ts Implementation
Summary
PR #321 adds a new TypeScript SDK file (sdk/typescript/src/ai/MultimodalResponse.ts, 537 lines) implementing multimodal response handling. Analysis based on PR diff and comparison against Issue #91 requirements reveals 2 missing functions and field name inconsistencies that need clarification before merge.
Overall Assessment: ~80% aligned with requirements. MEDIUM severity blockers identified.
Missing Functionality (Critical)
| Function | Required | Status | Action Needed |
|---|---|---|---|
| toMultimodalResponse | Yes | NOT FOUND | Clarify if alias for createMultimodalResponse |
| parseMultimodalResponse | Yes | NOT FOUND | Clarify if alias for createMultimodalResponse |
Recommendation: Confirm with author if these are aliases or need separate implementation.
Field Name Inconsistencies (Medium)
| Interface | Python SDK (Expected) | PR #321 (Actual) | Impact |
|---|---|---|---|
| ImageOutput.base64 | base64 | b64Json | Naming mismatch |
| ImageOutput.mimeType | mimeType | (missing) | Field missing |
| AudioOutput.base64 | base64 | data | Naming mismatch |
| AudioOutput.mimeType | mimeType | format | Naming mismatch |
Recommendation: Decide whether to align with Python SDK conventions (base64, mimeType) or keep TypeScript-native naming (b64Json, format).
Convention Violations
Note: Direct file analysis blocked - files not in this repo. Based on PR diff review.
- Underscore prefix on private fields - _text, _audio, _images, _files, _rawResponse, _costUsd, _usage violates TS SDK convention
- Missing rate limiting - No withRateLimitRetry pattern in MultimodalResponse.ts
- Duplicated base64 decoding - getImageBytes, getAudioBytes, getFileBytes repeat similar logic
- Inconsistent naming - hasImage vs other patterns
- Missing usage/cost extraction - createMultimodalResponse factory doesn't extract usage/cost
- No error context - Generic error messages lack debugging context
Recommendations
- Add or clarify toMultimodalResponse and parseMultimodalResponse functions
- Standardize field names to match Python SDK or document why TS diverges
- Consider refactoring base64 decoding into a shared utility to reduce duplication
- Add rate limiting using existing withRateLimitRetry pattern from codebase
- Add error context to improve debuggability
Data Sources
- PR #321 diff (task t-83e0)
- Issue #91 requirements analysis (task t-04ff)
- Gap analysis (task t-d6d0)
- TS SDK conventions reference (task t-6102)
Status: Ready for author response on missing functions and naming conventions.
|
Ignore tests for https://github.com/Agent-Field/SWE-AF updates |
Implements #91 — Image, Audio, File input helpers and MultimodalResponse output handler. Closes #91