Skip to content

feat(sdk/ts): Add multimodal helpers#321

Closed
santoshkumarradha wants to merge 1 commit intomainfrom
feat/ts-multimodal-helpers
Closed

feat(sdk/ts): Add multimodal helpers#321
santoshkumarradha wants to merge 1 commit intomainfrom
feat/ts-multimodal-helpers

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

Implements #91 — Image, Audio, File input helpers and MultimodalResponse output handler. Closes #91

@santoshkumarradha santoshkumarradha requested review from a team and AbirAbbas as code owners April 1, 2026 14:09
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Performance

SDK Memory Δ Latency Δ Tests Status
TS 465 B +33% 1.36 µs -32%

Regression detected:

  • TypeScript memory: 350 B → 465 B (+33%)

Copy link
Copy Markdown
Member Author

@santoshkumarradha santoshkumarradha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #321: Multimodal Helpers

Overall Assessment: ✅ Good implementation

The PR adds 942 lines across two well-structured files that closely follow the Python reference implementation. The API surface matches the issue spec.

Strengths

  1. Clean API designImage.fromFile(), Audio.fromUrl(), File.fromBuffer() factory methods match the issue requirements exactly
  2. Private constructors — Forces use of factory methods, preventing invalid state
  3. MIME type inference — Automatically determines content type from file extensions
  4. Comprehensive response handlerMultimodalResponse with hasAudio(), hasImage(), save() methods
  5. No any types — Proper TypeScript typing throughout
  6. JSDoc comments on all public methods

Issues to Address

  1. Image.fromUrl and Audio.fromUrl are async but don't await anything — The fromUrl methods return Promise<Image> but just construct synchronously. Should either be sync or actually fetch/validate the URL.

  2. Missing export from index — Neither file is re-exported from the SDK's main entry point. Users can't import { Image } from '@agentfield/sdk' without an index update.

  3. fetch dependencyAudio.fromUrl uses global fetch() but this isn't available in Node.js < 18. Should document the requirement or add a polyfill guard.

  4. No input validationfromFile doesn't check if the file exists before reading. A clear error message like "File not found: /path" would be better than a raw ENOENT.

  5. MultimodalResponse.save() creates directories implicitly — Uses fs.mkdir with recursive: true which is fine, but should be documented.

Minor Suggestions

  • Consider making fromUrl sync since it doesn't do I/O (just stores the URL)
  • Add .svg to IMAGE_MIME_TYPES (image/svg+xml)
  • The Text class could extend or implement a common ContentPart interface for type discrimination

Verdict

Good first implementation. Address items 1-2 (async consistency + index export) before merge. Items 3-5 are nice-to-haves for a follow-up.

Copy link
Copy Markdown
Member Author

@santoshkumarradha santoshkumarradha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #321 Review: MultimodalResponse.ts Implementation

Summary

PR #321 adds a new TypeScript SDK file (sdk/typescript/src/ai/MultimodalResponse.ts, 537 lines) implementing multimodal response handling. Analysis based on PR diff and comparison against Issue #91 requirements reveals 2 missing functions and field name inconsistencies that need clarification before merge.

Overall Assessment: ~80% aligned with requirements. MEDIUM severity blockers identified.


Missing Functionality (Critical)

Function Required Status Action Needed
toMultimodalResponse Yes NOT FOUND Clarify if alias for createMultimodalResponse
parseMultimodalResponse Yes NOT FOUND Clarify if alias for createMultimodalResponse

Recommendation: Confirm with author if these are aliases or need separate implementation.


Field Name Inconsistencies (Medium)

Interface Python SDK (Expected) PR #321 (Actual) Impact
ImageOutput.base64 base64 b64Json Naming mismatch
ImageOutput.mimeType mimeType (missing) Field missing
AudioOutput.base64 base64 data Naming mismatch
AudioOutput.mimeType mimeType format Naming mismatch

Recommendation: Decide whether to align with Python SDK conventions (base64, mimeType) or keep TypeScript-native naming (b64Json, format).


Convention Violations

Note: Direct file analysis blocked - files not in this repo. Based on PR diff review.

  1. Underscore prefix on private fields - _text, _audio, _images, _files, _rawResponse, _costUsd, _usage violates TS SDK convention
  2. Missing rate limiting - No withRateLimitRetry pattern in MultimodalResponse.ts
  3. Duplicated base64 decoding - getImageBytes, getAudioBytes, getFileBytes repeat similar logic
  4. Inconsistent naming - hasImage vs other patterns
  5. Missing usage/cost extraction - createMultimodalResponse factory doesn't extract usage/cost
  6. No error context - Generic error messages lack debugging context

Recommendations

  1. Add or clarify toMultimodalResponse and parseMultimodalResponse functions
  2. Standardize field names to match Python SDK or document why TS diverges
  3. Consider refactoring base64 decoding into a shared utility to reduce duplication
  4. Add rate limiting using existing withRateLimitRetry pattern from codebase
  5. Add error context to improve debuggability

Data Sources

  • PR #321 diff (task t-83e0)
  • Issue #91 requirements analysis (task t-04ff)
  • Gap analysis (task t-d6d0)
  • TS SDK conventions reference (task t-6102)

Status: Ready for author response on missing functions and naming conventions.

@santoshkumarradha
Copy link
Copy Markdown
Member Author

Ignore tests for https://github.com/Agent-Field/SWE-AF updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TypeScript SDK] Add helpers for image, audio, and file inputs/outputs

1 participant