fix: chroma destination connector serialization#2425
Conversation
| def to_dict(self, **kwargs): | ||
| """ | ||
| The _collection variable in this dataclass breaks deepcopy due to: | ||
| TypeError: cannot pickle '_thread.lock' object |
There was a problem hiding this comment.
This comment looks copy pasted from the Postgres. Is it correct?
There was a problem hiding this comment.
It is false - fixing now
There was a problem hiding this comment.
Addressed with fix docstring, had missed those two lines
| """ | ||
| self_cp = copy.copy(self) | ||
| if hasattr(self_cp, "_collection"): | ||
| setattr(self_cp, "_collection", None) |
There was a problem hiding this comment.
Would it be safer to delattr? Though now that I think about it probably not.
There was a problem hiding this comment.
Since we have it in dataclass initialization as a None value, is setattr to None better?
unstructured/unstructured/ingest/connector/chroma.py
Lines 44 to 48 in 4a34765
jasonbot
left a comment
There was a problem hiding this comment.
Looks fine as long as you’ve validated it. Test would be nice.
potter-potter
left a comment
There was a problem hiding this comment.
Looks good. Basically the same code we've had to add to most of the dest connectors.
This fixes the serialization of the ChromaDB destination connector. Presence of the _collection object breaks serialization due to TypeError: cannot pickle 'module' object. This removes that object before serialization.