Skip to content

Commit

Permalink
Fixing many issues in the docs (#2343)
Browse files Browse the repository at this point in the history
  • Loading branch information
ItayGabbay committed Feb 20, 2023
1 parent 018884d commit f148ae7
Show file tree
Hide file tree
Showing 12 changed files with 26 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ def add_condition_kurtosis_greater_than(self, threshold: float = -0.1):
Kurtosis is a measure of the shape of the distribution, helping us understand if the distribution
is significantly "wider" from a normal distribution. A lower value indicates a "wider" distribution.
Parameters
----------
threshold : float , default: -0.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ class WeakSegmentsPerformance(SingleDatasetCheck, WeakSegmentAbstract):
In order to achieve this, the check trains several simple tree based models which try to predict the error of the
user provided model on the dataset. The relevant segments are detected by analyzing the different
leafs of the trained trees.
Parameters
----------
columns : Union[Hashable, List[Hashable]] , default: None
Expand Down
4 changes: 2 additions & 2 deletions deepchecks/tabular/suites/default_suites.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def data_integrity(columns: Union[Hashable, List[Hashable]] = None,
* - :ref:`plot_tabular_identifier_label_correlation`
- :class:`~deepchecks.tabular.checks.data_integrity.IdentifierLabelCorrelation`
* - :ref:`plot_tabular_feature_feature_correlation`
- :class:`~deepchecks.tabular.checks.data_integrity.FeatureFeatureCorrelation`
- :class:`~deepchecks.tabular.checks.data_integrity.FeatureFeatureCorrelation`
Parameters
----------
Expand Down Expand Up @@ -270,7 +270,7 @@ def model_evaluation(alternative_scorers: Dict[str, Callable] = None,
* - :ref:`plot_tabular_model_inference_time`
- :class:`~deepchecks.tabular.checks.model_evaluation.ModelInferenceTime`
* - :ref:`plot_tabular_train_test_prediction_drift`
- :class:`~deepchecks.tabular.checks.model_evaluation.TrainTestPredictionDrift`
- :class:`~deepchecks.tabular.checks.model_evaluation.TrainTestPredictionDrift`
Parameters
----------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@

#%%
# Load Dataset
# ----------------
# ------------


train_ds = load_dataset(train=True, batch_size=64, object_type='VisionData')
Expand All @@ -83,7 +83,7 @@

#%%
# Running TrainTestPredictionDrift on classification
# ---------------------------------------------
# --------------------------------------------------

check = TrainTestPredictionDrift()
result = check.run(train_ds, test_ds)
Expand All @@ -99,6 +99,7 @@
#%%
# Understanding the results
# -------------------------
#
# We can see there is almost no drift between the train & test predictions. This means the
# split to train and test was good (as it is balanced and random). Let's check the
# performance of a simple model trained on MNIST.
Expand All @@ -109,7 +110,8 @@

#%%
# MNIST with prediction drift
# ======================
# ===========================
#
# Now, let's try to separate the MNIST dataset in a different manner that will result
# in a prediction drift, and see how it affects the performance. We are going to create a
# custom `collate_fn`` in the test dataset, that will select a few of the samples with class 0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,10 @@
# To display the results in an IDE like PyCharm, you can use the following code:

# result.show_in_window()

#%%
# The result will be displayed in a new window.

#%%
# Observe the check's output
# --------------------------
Expand Down
2 changes: 2 additions & 0 deletions docs/source/nitpick-exceptions
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ py:meth torch.nn.Module.train
py:func load_state_dict
py:attr state_dict
py:meth load_state_dict
py:meth ignite.metrics
py:attr ignite.metrics
4 changes: 2 additions & 2 deletions docs/source/user-guide/general/ci_cd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,9 @@ Deepchecks can be used in the CI/CD process at 2 main steps of model development
In this guide we will show end to end examples of validating both the data and the trained model. In most use cases those
processes will be separated into two separate pipelines, one for data validation and one for model validation.
We will use the default suites provided by deepchecks, but it's possible to create a
:doc:`custom suite</user-guide/general/customizations/plot_create_a_custom_suite>`
:doc:`custom suite</user-guide/general/customizations/examples/plot_create_a_custom_suite>`
containing hand chosen checks and
:doc:`conditions</user-guide/general/customizations/plot_configure_check_conditions>`
:doc:`conditions</user-guide/general/customizations/examples/plot_configure_check_conditions>`
in order to cater to the specific needs of the project.

Airflow Integration
Expand Down
4 changes: 2 additions & 2 deletions docs/source/user-guide/general/drift_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,14 @@ In general, it is recommended to use Cramer's V, unless your variable includes c
However, in cases of a variable with many categories with few samples, it is still recommended to use Cramer's V, as PSI will not be able to detect change in the smaller categories.

Detecting Drift in Unbalanced Classification Tasks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In classification problems, it is common to have unbalanced data, meaning that the number of samples in each class is
highly skewed. For example, in a dataset of credit card transactions, the number of fraudulent transactions is usually
much lower than the number of non-fraudulent transactions, and can be below 1% of the total number of samples.

In such cases, running the :doc:`TrainTestLabelDrift </checks_gallery/tabular/train_test_validation/plot_train_test_label_drift>`:
or :doc:`TrainTestPredictionDrift </checks_gallery/tabular/train_test_validation/plot_train_test_prediction_drift>` checks
or :doc:`TrainTestPredictionDrift </checks_gallery/tabular/model_evaluation/plot_train_test_prediction_drift>` checks
with the default parameters will likely lead to a false negative, as for example a change in the percent of fraudulent
transactions from 0.2% to 0.4% will not be detected, but may in fact be very significant for our business.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,8 @@ def deepchecks_collate_fn(batch) -> BatchOutputFormat:
test_data = VisionData(batch_loader=test_loader, task_type='classification', label_map=LABEL_MAP)
#%%
# Making sure our data is in the correct format:
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# The VisionData object automatically validates your data format and will alert you if there is a problem.
# However, you can also manually view your images and labels to make sure they are in the correct format by using
# the ``head`` function to conveniently visualize your data:
Expand All @@ -207,7 +208,8 @@ def deepchecks_collate_fn(batch) -> BatchOutputFormat:
# And observe the output:
#
# Running Deepchecks' suite on our data and model!
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# Now that we have defined the task class, we can validate the train and test data with deepchecks' train test validation
# suite.
# This can be done with this simple few lines of code:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ def load_or_download(cls, root: Path, train: bool) -> 'CocoInstanceSegmentationD
# predictions (when available). In order to do that, those must be in a pre-defined format, according to the task type.
#
# In the following example we're using pytorch. To see how this can be done using tensorflow or a generic generator,
# please refer to :doc:`creating VisionData guide </user-guide/vision/VisionData#creating-a-visiondata-object>`.
# please refer to :doc:`creating VisionData guide </user-guide/vision/VisionData>`.
#
# For pytorch, we will use our DataLoader, but we'll create a new collate function for it, that transforms the batch to
# the correct format. Then, we'll create a :class:`deepchecks.vision.vision_data.vision_data.VisionData` object,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,8 @@ def deepchecks_collate_fn(batch) -> BatchOutputFormat:

#%%
# Making sure our data is in the correct format:
# ~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# The VisionData object automatically validates your data format and will alert you if there is a problem.
# However, you can also manually view your images and labels to make sure they are in the correct format by using
# the ``head`` function to conveniently visualize your data:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,8 @@ def deepchecks_collate_fn(batch) -> BatchOutputFormat:

#%%
# Making sure our data is in the correct format:
# ~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# The VisionData object automatically validates your data format and will alert you if there is a problem.
# However, you can also manually view your images and labels to make sure they are in the correct format by using
# the ``head`` function to conveniently visualize your data:
Expand Down

0 comments on commit f148ae7

Please sign in to comment.