Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of intersectional slicing to user guide #625

Merged
merged 1 commit into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions docs/source/evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,16 @@ The Evaluation API equips you with a rich toolbox to assess your models across k
dimensions. Dive into detailed performance metrics, unveil potential fairness
concerns, and gain granular insights through data slicing.

Key capabilities:
Key capabilities
****************

* Performance: Employ a robust selection of common metrics to evaluate your
* **Performance**: Employ a robust selection of common metrics to evaluate your
model's effectiveness and identify areas for improvement.
* Fairness: Uncover and analyze potential biases within your model to ensure
responsible and equitable outcomes.
* Data slicing: Isolate the model's behavior on specific subsets of your
* **Data slicing**: Isolate the model's behavior on specific subsets of your
data, revealing performance nuances across demographics, features, or other
important characteristics.
* **Fairness**: Uncover and analyze potential biases within your model to ensure
responsible and equitable outcomes.

.. image:: https://github.com/VectorInstitute/cyclops/assets/8986523/416170db-1265-42a3-a3c1-d34558b72b65

Expand Down
73 changes: 61 additions & 12 deletions docs/source/examples/metrics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Breast Cancer Classification and Evaluation\n",
"\n",
"The Breast Cancer dataset is a well-suited example for demonstrating Cyclops features due to its two distinct classes (binary classification) and complete absence of missing values. This clean and organized structure makes it an ideal starting point for exploring Cyclops Evaluator."
"The Breast Cancer dataset is a well-suited example for demonstrating CyclOps features due to its two distinct classes (binary classification) and complete absence of missing values. This clean and organized structure makes it an ideal starting point for exploring CyclOps Evaluator."
]
},
{
Expand All @@ -27,7 +27,8 @@
"from cyclops.evaluate.fairness import evaluate_fairness\n",
"from cyclops.evaluate.metrics import BinaryAccuracy, create_metric\n",
"from cyclops.evaluate.metrics.experimental import BinaryAUROC, BinaryAveragePrecision\n",
"from cyclops.evaluate.metrics.experimental.metric_dict import MetricDict"
"from cyclops.evaluate.metrics.experimental.metric_dict import MetricDict\n",
"from cyclops.report.plot.classification import ClassificationPlotter"
]
},
{
Expand Down Expand Up @@ -86,7 +87,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use Cyclops evaluation metrics to evaluate our model's performance. You can either use each metric individually by calling them, or define a ``MetricDict`` object.\n",
"Now we can use CyclOps evaluation metrics to evaluate our model's performance. You can either use each metric individually by calling them, or define a ``MetricDict`` object.\n",
"Here, we show both methods."
]
},
Expand Down Expand Up @@ -172,7 +173,7 @@
"spec_list = [\n",
" {\n",
" \"worst radius\": {\n",
" \"min_value\": 10.0,\n",
" \"min_value\": 14.0,\n",
" \"max_value\": 15.0,\n",
" \"min_inclusive\": True,\n",
" \"max_inclusive\": False,\n",
Expand All @@ -181,7 +182,15 @@
" {\n",
" \"worst radius\": {\n",
" \"min_value\": 15.0,\n",
" \"max_value\": 37.0,\n",
" \"max_value\": 17.0,\n",
" \"min_inclusive\": True,\n",
" \"max_inclusive\": False,\n",
" },\n",
" },\n",
" {\n",
" \"worst texture\": {\n",
" \"min_value\": 23.1,\n",
" \"max_value\": 28.7,\n",
" \"min_inclusive\": True,\n",
" \"max_inclusive\": False,\n",
" },\n",
Expand All @@ -190,13 +199,39 @@
"slice_spec = SliceSpec(spec_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Intersectional slicing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When subpopulation slices are specified using the ``SliceSpec``, sometimes we wish create combinations of intersectional slices. We can use the ``intersections`` argument to specify this."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"slice_spec = SliceSpec(spec_list, intersections=2)\n",
"slice_spec"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preparing Result\n",
"\n",
"Cyclops Evaluator takes data as a HuggingFace Dataset object, so we combine predictions and features in a dataframe, and create a `Dataset` object:"
"CyclOps Evaluator takes data as a HuggingFace Dataset object, so we combine predictions and features in a dataframe, and create a `Dataset` object:"
]
},
{
Expand All @@ -219,7 +254,6 @@
"source": [
"# Create Dataset object\n",
"breast_cancer_data = Dataset.from_pandas(df)\n",
"\n",
"breast_cancer_sliced_result = evaluator.evaluate(\n",
" dataset=breast_cancer_data,\n",
" metrics=metric_collection, # type: ignore[list-item]\n",
Expand All @@ -233,7 +267,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's the evaluation result for the data slices we defined:"
"We can visualize the ``BinaryF1Score`` and ``BinaryPrecision`` for the different slices"
]
},
{
Expand All @@ -242,7 +276,22 @@
"metadata": {},
"outputs": [],
"source": [
"breast_cancer_sliced_result"
"# Extracting the metric values for all the slices.\n",
"slice_metrics = {\n",
" slice_name: {\n",
" metric_name: metric_value\n",
" for metric_name, metric_value in slice_results.items()\n",
" if metric_name in [\"BinaryF1Score\", \"BinaryPrecision\"]\n",
" }\n",
" for slice_name, slice_results in breast_cancer_sliced_result[\n",
" \"model_for_preds_prob\"\n",
" ].items()\n",
"}\n",
"# Plotting the metric values for all the slices.\n",
"plotter = ClassificationPlotter(task_type=\"binary\", class_names=[\"0\", \"1\"])\n",
"plotter.set_template(\"plotly_white\")\n",
"slice_metrics_plot = plotter.metrics_comparison_bar(slice_metrics)\n",
"slice_metrics_plot.show()"
]
},
{
Expand Down Expand Up @@ -280,7 +329,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "cyclops",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -294,9 +343,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
4 changes: 2 additions & 2 deletions docs/source/tutorials/kaggle/heart_failure_prediction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1374,7 +1374,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1388,7 +1388,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Loading