RecursiveDocumentSplitter updates Document's meta field after initializing it

**Describe the bug**
In https://github.com/deepset-ai/haystack/blob/a28b2851d9251ad2275d344ba46d1bb8fb35932e/haystack/components/preprocessors/recursive_splitter.py#L426

Documents with the same content (and same initial meta data) will be assigned the same id in the RecursiveDocumentSplitter. As a result, the run method of the RecursiveDocumentSplitter might return documents with the same id. That looks like a bug to me too.

What could be a fix is to first create the new meta data, as in the line `new_doc.meta["split_id"] = split_nr` and only afterward create a new document. In addition we should add the id of the parent document. I have in mind something like:

```python
meta=deepcopy(doc.meta)
meta["parent_id"] = doc.id
meta["split_id"] = split_nr
meta["split_idx_start"] = current_position
meta["_split_overlap"] = [] if self.split_overlap > 0 else None
new_doc = Document(content=chunk, meta=meta)
```

**Error message**
None. Documents with the same id might be handled as duplicates later in a pipeline.

**Expected behavior**
Different chunks with same content and differing meta data should have different document ids.

**Additional context**
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

**To Reproduce**
Steps to reproduce the behavior

**FAQ Check**
- [ ] Have you had a look at [our new FAQ page](https://docs.haystack.deepset.ai/docs/faq)?

**System:**
 - OS:
 - GPU/CPU:
 - Haystack version (commit or version number):
 - DocumentStore:
 - Reader:
 - Retriever:


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecursiveDocumentSplitter updates Document's meta field after initializing it #9508

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RecursiveDocumentSplitter updates Document's meta field after initializing it #9508

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions