fix: replace insecure pickle deserialization with JSON serialization#4747
fix: replace insecure pickle deserialization with JSON serialization#4747devin-ai-integration[bot] wants to merge 2 commits into
Conversation
Fixes #4746 - Security: Insecure Pickle Deserialization enables Arbitrary Code Execution - Replace pickle.load/dump with json.load/dump in PickleHandler (file_handler.py) - Add backward compatibility to auto-migrate legacy .pkl files to .json - Replace PickleSerializer with JSON-based _CachedUploadSerializer in upload_cache.py - Replace PickleSerializer with JsonSerializer in file_store.py and agent_card.py - Update and add comprehensive security tests for all changes Co-Authored-By: João <joao@crewai.com>
|
Prompt hidden (unlisted session) |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…he serializers - Convert train_iteration to string keys in crew_agent_executor.py, experimental/agent_executor.py, and training_handler.py for JSON compatibility - Revert file_store.py and agent_card.py back to PickleSerializer since they use in-memory caches only (no untrusted deserialization risk) - Add explanatory comments for why PickleSerializer is safe for in-memory caches Co-Authored-By: João <joao@crewai.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| with open(self.file_path, "wb") as f: | ||
| pickle.dump(obj=data, file=f) | ||
| with open(self.file_path, "w", encoding="utf-8") as f: | ||
| json.dump(data, f, indent=2, default=str) |
There was a problem hiding this comment.
Silent data corruption via default=str in JSON serialization
Medium Severity
The save method uses json.dump(data, f, indent=2, default=str) which silently converts any non-JSON-serializable value to its str() representation. Unlike pickle which could round-trip arbitrary Python objects, this silently produces lossy output that cannot be deserialized back to the original types. For example, a datetime becomes the string "2024-01-01 12:00:00" and loads back as a plain string, not a datetime. This masks serialization bugs instead of failing loudly, and could silently corrupt data for any current or future caller of PickleHandler.save().
| else None | ||
| ), | ||
| ) | ||
| return data |
There was a problem hiding this comment.
Unhandled KeyError/ValueError in cached upload deserialization
Medium Severity
The loads method of _CachedUploadSerializer gracefully returns None for None input, decode errors, and invalid JSON. However, when a dict has the __cached_upload__ marker, it accesses data["file_id"], data["content_type"], and data["uploaded_at"] without handling KeyError, and calls datetime.fromisoformat() without handling ValueError. If the cached data is incomplete or contains an invalid date string (e.g., from a corrupted Redis entry), this raises an unhandled exception instead of returning None like all other error paths.
|
Closing as stale — no activity in 30+ days. |


Summary
Fixes #4746 — replaces all
pickleserialization/deserialization with JSON-based alternatives to prevent arbitrary code execution via insecure deserialization (CWE-502).Changes:
file_handler.py:PickleHandlernow usesjson.dump/json.loadinstead ofpickle. File extension changed from.pklto.json. Includes a one-time migration path that reads legacy.pklfiles and converts them to.json.upload_cache.py: ReplacedPickleSerializerwith a custom_CachedUploadSerializer(extendsJsonSerializer) that handlesCachedUploaddataclass round-tripping via a__cached_upload__marker field.file_store.py: SwappedPickleSerializer→JsonSerializerfor the in-memory file store cache.agent_card.py: SwappedPickleSerializer→JsonSerializerfor the@cacheddecorator on agent card fetching.Review & Testing Checklist for Human
str(iteration), but verify that all production consumers ofCrewTrainingHandler.load()(increw.py,crew_agent_executor.py,agent/core.py) correctly handle string keys instead of integer keys. This is a behavioral change that could break training workflows.file_store.py—JsonSerializerwith complex objects: The file store cachesFileInputobjects. VerifyJsonSerializercan handle these; no tests cover this path. IfFileInputisn't JSON-serializable, this will break file store set/get operations.agent_card.py—JsonSerializerwithAgentCard: The@cacheddecorator now usesJsonSerializerforAgentCardobjects. VerifyAgentCard(froma2a.types) is JSON-serializable. If not, the A2A cached agent card fetch will fail.pickle.load:_migrate_legacy_pkl()callspickle.load()on existing.pklfiles as a one-time migration. An attacker who can write a.pklto the working directory before the first load could still achieve code execution through this path. Evaluate whether this migration is acceptable or if legacy files should just be discarded.TRAINING_DATA_FILEandTRAINED_AGENTS_DATA_FILEinconstants.pystill say.pkl. The code works (it strips.pkland appends.json), but the constants should probably be renamed for clarity.Test Plan
crew.train()multiple times, verify training data persists correctly and integer iteration keys work.crewai-files, verify upload cache correctly stores and retrievesCachedUploadentries.AgentCardJSON serialization works..pklfile (from a prior crewai version) in the working directory, run the code, verify it auto-migrates to.json.Notes
Note
Medium Risk
Medium risk because it changes on-disk training/data persistence from
.pklto.json(including coercing iteration keys to strings) and adds a one-time legacy migration path that still invokespickle.load()on existing files.Overview
Replaces pickle-based persistence/caching with JSON to mitigate insecure deserialization (CWE-502).
PickleHandlernow reads/writes JSON, switches storage to.json, and adds automatic migration from legacy.pklfiles (then deletes the old file).Upload caching in
crewai-filesswitches fromPickleSerializerto a custom JSON serializer (_CachedUploadSerializer) that round-tripsCachedUploadsafely, with new tests covering JSON output, round-trips, and corrupted data handling.Training data writes are updated to use string iteration keys for JSON compatibility (in
CrewTrainingHandlerand both agent executors), and tests are updated accordingly; in-memory caches that require complex object serialization keepPickleSerializerwith explicit safety comments.Written by Cursor Bugbot for commit c1ae3da. This will update automatically on new commits. Configure here.