`Collection.validate(eager=True)` slower than `Collection.validate(eager=False).collect_all()`

Currently these two are **not equivalent** in terms of performance:

```python
MyCollection.validate(data, eager=True)                    # Scans source N times
MyCollection.validate(data, eager=False).collect_all()     # Scans source once
```

When members share a common source (e.g., same parquet file), `eager=True` collects each member independently, causing duplicate scans. With `eager=False` + `collect_all()`, Polars can use common subplan elimination.

Is this behavior on purpose ?


<details>
<summary>MRE</summary>

```python

import polars as pl
import dataframely as dy

scan_count = 0


def count_scans(s: pl.Series) -> pl.Series:
    global scan_count
    scan_count += 1
    return s


class A(dy.Schema):
    x = dy.Integer(primary_key=True)


class B(dy.Schema):
    x = dy.Integer(primary_key=True)
    y = dy.Integer()


class MyCollection(dy.Collection):
    a: dy.LazyFrame[A]
    b: dy.LazyFrame[B]


# Both members derive from same source with an "expensive" operation
source = pl.LazyFrame({"x": [1, 2, 3]}).with_columns(
    pl.col("x").map_batches(count_scans)
)
data = {
    "a": source,
    "b": source.with_columns(y=pl.col("x") * 2),
}

# Test 1: eager=True
scan_count = 0
MyCollection.validate(data, eager=True)
print(f"eager=True:                        {scan_count} scans")

# Test 2: eager=False + collect_all()
scan_count = 0
MyCollection.validate(data, eager=False).collect_all()
print(f"eager=False + collect_all():       {scan_count} scans")
# eager=True:                        3 scans
# eager=False + collect_all():       1 scans
```
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Collection.validate(eager=True)` slower than `Collection.validate(eager=False).collect_all()` #320

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Collection.validate(eager=True) slower than Collection.validate(eager=False).collect_all() #320

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Collection.validate(eager=True)` slower than `Collection.validate(eager=False).collect_all()` #320