forked from stanfordnlp/dspy
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add documentation for evaluate module
- Loading branch information
1 parent
c7a522a
commit 04ec956
Showing
1 changed file
with
75 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# Evaluate Module | ||
|
||
The `evaluate` module in the DSPy framework is designed to evaluate the performance of the models. It plays a crucial role in the framework by providing a way to measure the effectiveness of the models and their configurations. | ||
|
||
## Evaluate Class | ||
|
||
The `Evaluate` class is the main class in the `evaluate` module. It provides methods to execute the evaluation in single or multiple threads, update the progress of the evaluation, and call the `Evaluate` class instance as a function. | ||
|
||
### __init__(self, *, devset, metric=None, num_threads=1, display_progress=False, display_table=False, display=True, max_errors=5) | ||
|
||
This method initializes the `Evaluate` class with the following parameters: | ||
|
||
- `devset`: The development set for evaluation. | ||
- `metric`: The metric used for evaluation. Defaults to `None`. | ||
- `num_threads`: The number of threads used for evaluation. Defaults to `1`. | ||
- `display_progress`: A boolean indicating whether to display the progress of the evaluation. Defaults to `False`. | ||
- `display_table`: A boolean indicating whether to display the evaluation results in a table. Defaults to `False`. | ||
- `display`: A boolean indicating whether to display the evaluation results. Defaults to `True`. | ||
- `max_errors`: The maximum number of errors allowed during the evaluation. Defaults to `5`. | ||
|
||
### _execute_single_thread(self, wrapped_program, devset, display_progress) | ||
|
||
This method executes the evaluation in a single thread. It takes the following parameters: | ||
|
||
- `wrapped_program`: The program to be evaluated. | ||
- `devset`: The development set for evaluation. | ||
- `display_progress`: A boolean indicating whether to display the progress of the evaluation. | ||
|
||
### _execute_multi_thread(self, wrapped_program, devset, num_threads, display_progress) | ||
|
||
This method executes the evaluation in multiple threads. It takes the following parameters: | ||
|
||
- `wrapped_program`: The program to be evaluated. | ||
- `devset`: The development set for evaluation. | ||
- `num_threads`: The number of threads used for evaluation. | ||
- `display_progress`: A boolean indicating whether to display the progress of the evaluation. | ||
|
||
### _update_progress(self, pbar, ncorrect, ntotal) | ||
|
||
This method updates the progress bar during the evaluation. It takes the following parameters: | ||
|
||
- `pbar`: The progress bar. | ||
- `ncorrect`: The number of correct predictions. | ||
- `ntotal`: The total number of predictions. | ||
|
||
### __call__(self, program, metric=None, devset=None, num_threads=None, display_progress=None, display_table=None, display=None, return_all_scores=False) | ||
|
||
This method is used to call the `Evaluate` class instance as a function. It takes the following parameters: | ||
|
||
- `program`: The program to be evaluated. | ||
- `metric`: The metric used for evaluation. Defaults to `None`. | ||
- `devset`: The development set for evaluation. Defaults to `None`. | ||
- `num_threads`: The number of threads used for evaluation. Defaults to `None`. | ||
- `display_progress`: A boolean indicating whether to display the progress of the evaluation. Defaults to `None`. | ||
- `display_table`: A boolean indicating whether to display the evaluation results in a table. Defaults to `None`. | ||
- `display`: A boolean indicating whether to display the evaluation results. Defaults to `None`. | ||
- `return_all_scores`: A boolean indicating whether to return all scores. Defaults to `False`. | ||
|
||
## Usage | ||
|
||
Here is an example of how to use the `evaluate` module: | ||
|
||
```python | ||
from dspy.evaluate import Evaluate | ||
|
||
# Initialize the Evaluate class | ||
evaluator = Evaluate(devset=devset, metric=metric) | ||
|
||
# Call the Evaluate class instance as a function | ||
score = evaluator(program) | ||
``` | ||
|
||
## Conclusion | ||
|
||
The `evaluate` module is a powerful tool in the DSPy framework. It provides a flexible and efficient way to evaluate the performance of the models and their configurations. By using the `evaluate` module, you can ensure that your models are performing as expected and make necessary adjustments to improve their performance. |