-
Notifications
You must be signed in to change notification settings - Fork 246
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated readme, user guide and how-to-guides (#340)
* adding getting started and a few sections to the user guide * updating readme and docs Co-authored-by: Itay Gabbay <itay@deepchecks.com>
- Loading branch information
1 parent
df7ad4e
commit 652a1f4
Showing
14 changed files
with
6,600 additions
and
2,468 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
0.0.17 | ||
0.1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,65 @@ | ||
.. _getting_started: | ||
|
||
================ | ||
Getting Started | ||
=============== | ||
================ | ||
|
||
Welcome to Deepchecks! | ||
|
||
To get started and easily validate your data and models, make sure to | ||
install the deepchecks python package. | ||
|
||
Installation | ||
============== | ||
|
||
Local Installation | ||
--------------------- | ||
|
||
Deepchecks requires Python 3 and can be installed using pip or conda, depending on the package manager you're working with for most of your packages. | ||
|
||
Using Pip | ||
~~~~~~~~~~ | ||
As a best practice we recommend working on a `virtual environment <https://docs.python.org/3/library/venv.html>`__. | ||
|
||
The deepchecks package can be installed from `PyPi <https://pypi.org/project/deepchecks/>`__ using the following command: | ||
|
||
.. code-block:: bash | ||
pip install deepchecks --upgrade | ||
.. note:: | ||
Deepchecks is in alpha development, which means that new versions are released on a weekly basis and new features are frequently added. If you experience any unexpected behavior from deepchecks, the first step to troubleshoot is to upgrade to the latest version. | ||
|
||
|
||
Using Conda | ||
~~~~~~~~~~~~~ | ||
As a best practice we recommend `creating a conda environment <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands>`__. | ||
|
||
To install the deepchecks package from the conda package manager run | ||
|
||
.. code-block:: bash | ||
conda install -c deepchecks deepchecks | ||
or, if you already have deepchecks installed and want to update then run | ||
|
||
.. code-block:: bash | ||
conda update -c deepchecks deepchecks | ||
Installing On Google Colab or on a Kaggle Kernel | ||
------------------------------------------------- | ||
Simply run the following command in a notebook cell: | ||
|
||
.. code-block:: bash | ||
!pip install deepchecks | ||
Here we will show you how to get started with deepchecks! | ||
Start Working with the Package! | ||
================================= | ||
To get started with deepchecks with 5 lines of code, head to :doc:`examples/howto-guides/quickstart_in_5_minutes`. | ||
|
||
.. toctree:: | ||
:caption: Getting Started | ||
For additional usage examples and for understanding the best practices of how to use the package, stay tuned for our additional | ||
guides that will be added here with our official launch! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,40 +1,41 @@ | ||
.. mlchecks documentation master file, created by | ||
sphinx-quickstart on Sun Oct 24 10:19:19 2021. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
Welcome to Deepchecks' documentation! | ||
===================================== | ||
Welcome to Deepchecks! | ||
===================================== | ||
Deepchecks is the leading tool for validation you machine learning models | ||
and data and enables doing so with minimal effort. Deepchecks accompanies you through | ||
various validation needs such as verifying your data's integrity, inspecting its distributions, | ||
validating data aplits, evaluating your model and comparing between different models. | ||
|
||
Head over to the :doc:`getting-started/index` guide to get it up and running. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:hidden: | ||
|
||
getting-started/index | ||
.. note:: | ||
Welcome! In addition to perusing the documentation, please feel free to | ||
to ask questions on our `Slack channel <https://join.slack.com/t/deepcheckscommunity/shared_invite/zt-y28sjt1v-PBT50S3uoyWui_Deg5L_jg>`__, or to post a issue or start a discussion on `Github <https://github.com/deepchecks/deepchecks/>`__. | ||
|
||
|
||
Documentation Structure | ||
========================= | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:maxdepth: 2 | ||
|
||
examples/index | ||
getting-started/index | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:maxdepth: 2 | ||
|
||
user-guide/index | ||
user-guide/index | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 3 | ||
:maxdepth: 2 | ||
|
||
api/index | ||
examples/index | ||
|
||
|
||
Indices and tables | ||
~~~~~~~~~~~~~~~~~~~~ | ||
.. toctree:: | ||
:maxdepth: 3 | ||
|
||
* :ref:`modindex` | ||
* :ref:`search` | ||
api/index |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
.. _dataset_object: | ||
|
||
==================== | ||
The Dataset Object | ||
==================== | ||
The dataset is one of the basic blocks of deepchecks. It is a container for the data and its relevant metadata, like special column names (label, date, index, etc). | ||
Some of the checks allows to use a dataframe directly, but some others requires the metadata in order to run, so they are limited to working only with Datasets. | ||
|
||
Class Parameters | ||
=================== | ||
All of the parameters are optional. | ||
|
||
.. list-table:: | ||
:widths: 20 20 50 10 | ||
:header-rows: 1 | ||
|
||
* - Name | ||
- Type | ||
- Description | ||
- Default | ||
* - label | ||
- pandas.Series | ||
- Data of labels as separate series from the data | ||
- None | ||
* - features | ||
- List[Hashable] | ||
- Names of the features in the data | ||
- None | ||
* - cat_features | ||
- List[Hashable] | ||
- Names of the categorical features in the data. Must be subset of `features` | ||
- None | ||
* - label_name | ||
- Hashable | ||
- Name of label column in the data | ||
- None | ||
* - use_index | ||
- bool | ||
- If data is dataframe, whether to use the dataframe index as index column for index related checks | ||
- False | ||
* - index_name | ||
- Hashable | ||
- Name of index column in the data | ||
- None | ||
* - date_name | ||
- Hashable | ||
- Name of date column in the data | ||
- None | ||
* - date_unit_type | ||
- str | ||
- Unit to convert date column if it's numeric. using `pandas.Timestamp <https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html>`__ to convert | ||
- None | ||
* - max_categorical_ratio | ||
- float | ||
- Used to infer which columns are categorical (if `cat_features` isn't explicitly passed). | ||
Set maximum ratio of unique values in a column in order for it to be categorical. | ||
The higher the value, the chance of column inferred as categorical is higher | ||
- 0.01 | ||
* - max_categories | ||
- int | ||
- Used to infer which columns are categorical (if `cat_features` isn't explicitly passed). | ||
Set maximum number of unique values in a column in order for it to be categorical. | ||
The higher the value, the chance of column inferred as categorical is higher | ||
- 30 | ||
* - max_float_categories | ||
- int | ||
- Same as `max_categories` but for columns of type float | ||
- 5 | ||
* - convert_date | ||
- bool | ||
- Whether to convert date column if it's numeric to date | ||
- True | ||
|
||
Inferring Features And Categorical Features | ||
================================================ | ||
Dataset defines which columns of the data are features and of them which are categorical features. | ||
For features, if parameter `features` not passed explicitly, all will be considered features apart from the label, index and date. | ||
For categorical features, if parameter `cat_features` not passed explicitly, the following logic runs on every column to determine | ||
whether the column is categorical or not: | ||
|
||
* if columns is float type: | ||
* number of unique < `max_float_categories` | ||
* else: | ||
* number of unique < `max_categories` AND (number of unique / number of samples) < `max_categorical_ratio` | ||
|
||
Creating a Dataset | ||
====================== | ||
|
||
From a DataFrame | ||
~~~~~~~~~~~~~~~~ | ||
The default dataset constructor is expecting to get a dataframe. the rest of the properties | ||
are optional, but if your data have date/index/label you would want to define them. | ||
|
||
.. code-block:: python | ||
Dataset(my_dataframe, features=['feat1', 'feat2', 'feat3'], label='target', index='id', date='timestamp') | ||
From Numpy Arrays | ||
~~~~~~~~~~~~~~~~~~~ | ||
A Dataset can be created using a 2D numpy array for features and 1D numpy array for the labels. The features array is mandatory, and the labels array is optional. | ||
|
||
.. code-block:: python | ||
features = np.array([[0.25, 0.3, 0.3], [0.14, 0.75, 0.3], [0.23, 0.39, 0.1]]) | ||
labels = np.array([0.1, 0.1, 0.7]) | ||
dataset_with_labels = Dataset.from_numpy(features, labels) | ||
dataset_without_labels = Dataset.from_numpy(features) | ||
Also, it's possible to assign names to the features and label: | ||
|
||
.. code-block:: python | ||
Dataset.from_numpy( | ||
features, labels, | ||
feature_names=['feat1', 'feat2', 'feat3',], | ||
label_name='target' | ||
) | ||
All the rest of the Dataset's properties can be passed also as a regular keyword arguments: | ||
|
||
.. code-block:: python | ||
Dataset.from_numpy( | ||
features, labels, | ||
feature_names=['feat1', 'feat2', 'feat3',], | ||
label_name='target', | ||
max_float_categories=10 | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,14 @@ | ||
.. _user_guide: | ||
|
||
========== | ||
User Guide | ||
========== | ||
|
||
Here you can find the key concepts, structure, recommended flow, and dive in to many of the deepchecks functionalities. | ||
|
||
.. toctree:: | ||
:caption: User Guide | ||
:caption: User Guide | ||
|
||
key_concepts | ||
when_should_you_use | ||
dataset_object |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
.. _user_guide__key_concepts: | ||
|
||
============== | ||
Key Concepts | ||
============== | ||
|
||
Check | ||
======== | ||
|
||
Each check enables you to inspect a specific aspect of your data and models. | ||
They are the basic building block of the deepchecks package, covering all kinds of common issues, | ||
such as: PerformanceOverfit, DataSampleLeakage, SingleFeatureContribution, DataDuplicates, and :ref:`many more checks <examples/checks>`. | ||
Each check can have two types of results: | ||
|
||
#. A visual result meant for display (e.g. a figure or a table). | ||
#. A return value that can be used for validating the expected check results | ||
(validations are typically done by adding a "condition" to the check, as explained below) | ||
|
||
|
||
Condition | ||
========== | ||
A condition is a function that can be added to a Check, which returns | ||
a pass ✓, fail ✖ or warning ! result, intended for validating the Check's return value. | ||
An example for adding a condition would be: | ||
|
||
.. code-block:: python | ||
from deepchecks.checks import BoostingOverfit | ||
BoostingOverfit().add_condition_test_score_percent_decline_not_greater_than(threshold=0.05) | ||
Suite | ||
======== | ||
|
||
A suite is an ordered collection of checks, that can have conditions added to them. The Suite enables displaying a concluding report for all of the Checks that ran. `Here <https://github.com/deepchecks/deepchecks/blob/main/deepchecks/suites>`__ you can find the predefined existing suites and a code example demonstrating how to build your own custom suite. The existing suites include default conditions added for most of the checks. You can edit the preconfigured suites or build a suite of your own with a collection of checks and optional conditions. | ||
|
||
|
||
|
||
|
Oops, something went wrong.