Skip to content

fix: replace insecure pickle deserialization with JSON serialization#4747

Closed
devin-ai-integration[bot] wants to merge 2 commits into
mainfrom
devin/1772809995-fix-insecure-pickle-deserialization
Closed

fix: replace insecure pickle deserialization with JSON serialization#4747
devin-ai-integration[bot] wants to merge 2 commits into
mainfrom
devin/1772809995-fix-insecure-pickle-deserialization

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Mar 6, 2026

Summary

Fixes #4746 — replaces all pickle serialization/deserialization with JSON-based alternatives to prevent arbitrary code execution via insecure deserialization (CWE-502).

Changes:

  • file_handler.py: PickleHandler now uses json.dump/json.load instead of pickle. File extension changed from .pkl to .json. Includes a one-time migration path that reads legacy .pkl files and converts them to .json.
  • upload_cache.py: Replaced PickleSerializer with a custom _CachedUploadSerializer (extends JsonSerializer) that handles CachedUpload dataclass round-tripping via a __cached_upload__ marker field.
  • file_store.py: Swapped PickleSerializerJsonSerializer for the in-memory file store cache.
  • agent_card.py: Swapped PickleSerializerJsonSerializer for the @cached decorator on agent card fetching.
  • Updated all related tests and added new security-focused tests.

Review & Testing Checklist for Human

  • Integer key coercion in training data: JSON converts integer dict keys to strings. The tests were updated to use str(iteration), but verify that all production consumers of CrewTrainingHandler.load() (in crew.py, crew_agent_executor.py, agent/core.py) correctly handle string keys instead of integer keys. This is a behavioral change that could break training workflows.
  • file_store.pyJsonSerializer with complex objects: The file store caches FileInput objects. Verify JsonSerializer can handle these; no tests cover this path. If FileInput isn't JSON-serializable, this will break file store set/get operations.
  • agent_card.pyJsonSerializer with AgentCard: The @cached decorator now uses JsonSerializer for AgentCard objects. Verify AgentCard (from a2a.types) is JSON-serializable. If not, the A2A cached agent card fetch will fail.
  • Migration path still uses pickle.load: _migrate_legacy_pkl() calls pickle.load() on existing .pkl files as a one-time migration. An attacker who can write a .pkl to the working directory before the first load could still achieve code execution through this path. Evaluate whether this migration is acceptable or if legacy files should just be discarded.
  • Constants file not updated: TRAINING_DATA_FILE and TRAINED_AGENTS_DATA_FILE in constants.py still say .pkl. The code works (it strips .pkl and appends .json), but the constants should probably be renamed for clarity.

Test Plan

  1. Training handler: Create a crew, run crew.train() multiple times, verify training data persists correctly and integer iteration keys work.
  2. File uploads: Upload a file via crewai-files, verify upload cache correctly stores and retrieves CachedUpload entries.
  3. Agent card fetching: Test A2A agent card caching to confirm AgentCard JSON serialization works.
  4. Legacy migration: Place an old .pkl file (from a prior crewai version) in the working directory, run the code, verify it auto-migrates to .json.

Notes


Note

Medium Risk
Medium risk because it changes on-disk training/data persistence from .pkl to .json (including coercing iteration keys to strings) and adds a one-time legacy migration path that still invokes pickle.load() on existing files.

Overview
Replaces pickle-based persistence/caching with JSON to mitigate insecure deserialization (CWE-502). PickleHandler now reads/writes JSON, switches storage to .json, and adds automatic migration from legacy .pkl files (then deletes the old file).

Upload caching in crewai-files switches from PickleSerializer to a custom JSON serializer (_CachedUploadSerializer) that round-trips CachedUpload safely, with new tests covering JSON output, round-trips, and corrupted data handling.

Training data writes are updated to use string iteration keys for JSON compatibility (in CrewTrainingHandler and both agent executors), and tests are updated accordingly; in-memory caches that require complex object serialization keep PickleSerializer with explicit safety comments.

Written by Cursor Bugbot for commit c1ae3da. This will update automatically on new commits. Configure here.

Fixes #4746 - Security: Insecure Pickle Deserialization enables Arbitrary Code Execution

- Replace pickle.load/dump with json.load/dump in PickleHandler (file_handler.py)
- Add backward compatibility to auto-migrate legacy .pkl files to .json
- Replace PickleSerializer with JSON-based _CachedUploadSerializer in upload_cache.py
- Replace PickleSerializer with JsonSerializer in file_store.py and agent_card.py
- Update and add comprehensive security tests for all changes

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Prompt hidden (unlisted session)

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Comment thread lib/crewai/src/crewai/utilities/file_handler.py
Comment thread lib/crewai/src/crewai/a2a/utils/agent_card.py Outdated
Comment thread lib/crewai/src/crewai/utilities/file_store.py Outdated
…he serializers

- Convert train_iteration to string keys in crew_agent_executor.py,
  experimental/agent_executor.py, and training_handler.py for JSON compatibility
- Revert file_store.py and agent_card.py back to PickleSerializer since they
  use in-memory caches only (no untrusted deserialization risk)
- Add explanatory comments for why PickleSerializer is safe for in-memory caches

Co-Authored-By: João <joao@crewai.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

with open(self.file_path, "wb") as f:
pickle.dump(obj=data, file=f)
with open(self.file_path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, default=str)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent data corruption via default=str in JSON serialization

Medium Severity

The save method uses json.dump(data, f, indent=2, default=str) which silently converts any non-JSON-serializable value to its str() representation. Unlike pickle which could round-trip arbitrary Python objects, this silently produces lossy output that cannot be deserialized back to the original types. For example, a datetime becomes the string "2024-01-01 12:00:00" and loads back as a plain string, not a datetime. This masks serialization bugs instead of failing loudly, and could silently corrupt data for any current or future caller of PickleHandler.save().

Fix in Cursor Fix in Web

else None
),
)
return data
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhandled KeyError/ValueError in cached upload deserialization

Medium Severity

The loads method of _CachedUploadSerializer gracefully returns None for None input, decode errors, and invalid JSON. However, when a dict has the __cached_upload__ marker, it accesses data["file_id"], data["content_type"], and data["uploaded_at"] without handling KeyError, and calls datetime.fromisoformat() without handling ValueError. If the cached data is incomplete or contains an invalid date string (e.g., from a corrupted Redis entry), this raises an unhandled exception instead of returning None like all other error paths.

Fix in Cursor Fix in Web

@greysonlalonde
Copy link
Copy Markdown
Contributor

Closing as stale — no activity in 30+ days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: Insecure Pickle Deserialization enables Arbitrary Code Execution in cache handling

1 participant