Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/docs/core/cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ The following subcommands are available:
| `setup` | Check and apply setup changes for flows, including the internal and target storage (to export). |
| `show` | Show the spec for a specific flow. |
| `update` | Update the index defined by the flow. |
| `evaluate` | Evaluate the flow and dump flow outputs to files. Instead of updating the index, it dumps what should be indexed to files. Mainly used for evaluation purpose. |

Use `--help` to see the full list of subcommands, and `subcommand --help` to see the usage of a specific one.

Expand Down
22 changes: 21 additions & 1 deletion docs/docs/core/flow_methods.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ After a flow is defined as discussed in [Flow Definition](/docs/core/flow_def),

## update

The `update()` method will update will update the index defined by the flow.
The `update()` method will update the index defined by the flow.

Once the function returns, the indice is fresh up to the moment when the function is called.

Expand All @@ -23,5 +23,25 @@ Once the function returns, the indice is fresh up to the moment when the functio
flow.update()
```

</TabItem>
</Tabs>

## evaluate_and_dump

The `evaluate_and_dump()` method evaluates the flow and dump flow outputs to files.

It takes a `EvaluateAndDumpOptions` dataclass as input to configure, with the following fields:

* `output_dir` (type: `str`, required): The directory to dump the result to.
* `use_cache` (type: `bool`, default: `True`): Use already-cached intermediate data if available.
Note that we only reuse existing cached data without updating the cache even if it's turned on.

<Tabs>
<TabItem value="python" label="Python" default>

```python
flow.evaluate_and_dump(EvaluateAndDumpOptions(output_dir="./eval_output"))
```

</TabItem>
</Tabs>
2 changes: 1 addition & 1 deletion python/cocoindex/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Cocoindex is a framework for building and running indexing pipelines.
"""
from . import flow, functions, query, sources, storages, cli
from .flow import FlowBuilder, DataScope, DataSlice, Flow, flow_def
from .flow import FlowBuilder, DataScope, DataSlice, Flow, flow_def, EvaluateAndDumpOptions
from .llm import LlmSpec, LlmApiType
from .vector import VectorSimilarityMetric
from .lib import *
Expand Down
11 changes: 8 additions & 3 deletions python/cocoindex/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,18 @@ def update(flow_name: str | None):
@click.argument("flow_name", type=str, required=False)
@click.option(
"-o", "--output-dir", type=str, required=False,
help="The directory to dump the evaluation output to.")
help="The directory to dump the output to.")
@click.option(
"-c", "--use-cache", is_flag=True, show_default=True, default=True,
help="Use cached evaluation results if available.")
help="Use already-cached intermediate data if available. "
"Note that we only reuse existing cached data without updating the cache "
"even if it's turned on.")
def evaluate(flow_name: str | None, output_dir: str | None, use_cache: bool = True):
"""
Evaluate and dump the flow.
Evaluate the flow and dump flow outputs to files.

Instead of updating the index, it dumps what should be indexed to files.
Mainly used for evaluation purpose.
"""
fl = _flow_by_name(flow_name)
if output_dir is None:
Expand Down
2 changes: 1 addition & 1 deletion python/cocoindex/flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ def update(self):

def evaluate_and_dump(self, options: EvaluateAndDumpOptions):
"""
Evaluate and dump the flow.
Evaluate the flow and dump flow outputs to files.
"""
return self._lazy_engine_flow().evaluate_and_dump(_dump_engine_object(options))

Expand Down