Skip to content

Commit

Permalink
Updated readme, user guide and how-to-guides (#340)
Browse files Browse the repository at this point in the history
* adding getting started and a few sections to the user guide

* updating readme and docs

Co-authored-by: Itay Gabbay <itay@deepchecks.com>
  • Loading branch information
shir22 and ItayGabbay committed Jan 2, 2022
1 parent df7ad4e commit 652a1f4
Show file tree
Hide file tree
Showing 14 changed files with 6,600 additions and 2,468 deletions.
313 changes: 190 additions & 123 deletions README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.0.17
0.1.0
8 changes: 4 additions & 4 deletions deepchecks/suites/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
~ ----------------------------------------------------------------------------
~
-->
# Check Suites
# Suites

## Using Existing CheckSuites
## Using an Existing Suite

### [List of Prebuilt Suites](./default_suites.py)

Expand All @@ -24,12 +24,12 @@


### Running a Suite
to run a suite, first import it
To run a prebuilt suite, first import it

```python
from deepchecks.suites import *
```
Then run it with the required input parameters (datasets and models)
Then run it with the required (suite-dependant) input parameters
```python
model_evaluation().run(model=my_classification_model, train_dataset=ds_train, test_dataset=ds_test)
```
Expand Down
Binary file added docs/images/deepchecks-banner.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/train-test-drift-output.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
66 changes: 62 additions & 4 deletions docs/source/getting-started/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,65 @@
.. _getting_started:

================
Getting Started
===============
================

Welcome to Deepchecks!

To get started and easily validate your data and models, make sure to
install the deepchecks python package.

Installation
==============

Local Installation
---------------------

Deepchecks requires Python 3 and can be installed using pip or conda, depending on the package manager you're working with for most of your packages.

Using Pip
~~~~~~~~~~
As a best practice we recommend working on a `virtual environment <https://docs.python.org/3/library/venv.html>`__.

The deepchecks package can be installed from `PyPi <https://pypi.org/project/deepchecks/>`__ using the following command:

.. code-block:: bash
pip install deepchecks --upgrade
.. note::
Deepchecks is in alpha development, which means that new versions are released on a weekly basis and new features are frequently added. If you experience any unexpected behavior from deepchecks, the first step to troubleshoot is to upgrade to the latest version.


Using Conda
~~~~~~~~~~~~~
As a best practice we recommend `creating a conda environment <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands>`__.

To install the deepchecks package from the conda package manager run

.. code-block:: bash
conda install -c deepchecks deepchecks
or, if you already have deepchecks installed and want to update then run

.. code-block:: bash
conda update -c deepchecks deepchecks
Installing On Google Colab or on a Kaggle Kernel
-------------------------------------------------
Simply run the following command in a notebook cell:

.. code-block:: bash
!pip install deepchecks
Here we will show you how to get started with deepchecks!
Start Working with the Package!
=================================
To get started with deepchecks with 5 lines of code, head to :doc:`examples/howto-guides/quickstart_in_5_minutes`.

.. toctree::
:caption: Getting Started
For additional usage examples and for understanding the best practices of how to use the package, stay tuned for our additional
guides that will be added here with our official launch!
43 changes: 22 additions & 21 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,40 +1,41 @@
.. mlchecks documentation master file, created by
sphinx-quickstart on Sun Oct 24 10:19:19 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Deepchecks' documentation!
=====================================
Welcome to Deepchecks!
=====================================
Deepchecks is the leading tool for validation you machine learning models
and data and enables doing so with minimal effort. Deepchecks accompanies you through
various validation needs such as verifying your data's integrity, inspecting its distributions,
validating data aplits, evaluating your model and comparing between different models.

Head over to the :doc:`getting-started/index` guide to get it up and running.

.. toctree::
:maxdepth: 2
:hidden:

getting-started/index
.. note::
Welcome! In addition to perusing the documentation, please feel free to
to ask questions on our `Slack channel <https://join.slack.com/t/deepcheckscommunity/shared_invite/zt-y28sjt1v-PBT50S3uoyWui_Deg5L_jg>`__, or to post a issue or start a discussion on `Github <https://github.com/deepchecks/deepchecks/>`__.


Documentation Structure
=========================

.. toctree::
:maxdepth: 2
:maxdepth: 2

examples/index
getting-started/index


.. toctree::
:maxdepth: 2
:maxdepth: 2

user-guide/index
user-guide/index


.. toctree::
:maxdepth: 3
:maxdepth: 2

api/index
examples/index


Indices and tables
~~~~~~~~~~~~~~~~~~~~
.. toctree::
:maxdepth: 3

* :ref:`modindex`
* :ref:`search`
api/index
128 changes: 128 additions & 0 deletions docs/source/user-guide/dataset_object.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
.. _dataset_object:

====================
The Dataset Object
====================
The dataset is one of the basic blocks of deepchecks. It is a container for the data and its relevant metadata, like special column names (label, date, index, etc).
Some of the checks allows to use a dataframe directly, but some others requires the metadata in order to run, so they are limited to working only with Datasets.

Class Parameters
===================
All of the parameters are optional.

.. list-table::
:widths: 20 20 50 10
:header-rows: 1

* - Name
- Type
- Description
- Default
* - label
- pandas.Series
- Data of labels as separate series from the data
- None
* - features
- List[Hashable]
- Names of the features in the data
- None
* - cat_features
- List[Hashable]
- Names of the categorical features in the data. Must be subset of `features`
- None
* - label_name
- Hashable
- Name of label column in the data
- None
* - use_index
- bool
- If data is dataframe, whether to use the dataframe index as index column for index related checks
- False
* - index_name
- Hashable
- Name of index column in the data
- None
* - date_name
- Hashable
- Name of date column in the data
- None
* - date_unit_type
- str
- Unit to convert date column if it's numeric. using `pandas.Timestamp <https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html>`__ to convert
- None
* - max_categorical_ratio
- float
- Used to infer which columns are categorical (if `cat_features` isn't explicitly passed).
Set maximum ratio of unique values in a column in order for it to be categorical.
The higher the value, the chance of column inferred as categorical is higher
- 0.01
* - max_categories
- int
- Used to infer which columns are categorical (if `cat_features` isn't explicitly passed).
Set maximum number of unique values in a column in order for it to be categorical.
The higher the value, the chance of column inferred as categorical is higher
- 30
* - max_float_categories
- int
- Same as `max_categories` but for columns of type float
- 5
* - convert_date
- bool
- Whether to convert date column if it's numeric to date
- True

Inferring Features And Categorical Features
================================================
Dataset defines which columns of the data are features and of them which are categorical features.
For features, if parameter `features` not passed explicitly, all will be considered features apart from the label, index and date.
For categorical features, if parameter `cat_features` not passed explicitly, the following logic runs on every column to determine
whether the column is categorical or not:

* if columns is float type:
* number of unique < `max_float_categories`
* else:
* number of unique < `max_categories` AND (number of unique / number of samples) < `max_categorical_ratio`

Creating a Dataset
======================

From a DataFrame
~~~~~~~~~~~~~~~~
The default dataset constructor is expecting to get a dataframe. the rest of the properties
are optional, but if your data have date/index/label you would want to define them.

.. code-block:: python
Dataset(my_dataframe, features=['feat1', 'feat2', 'feat3'], label='target', index='id', date='timestamp')
From Numpy Arrays
~~~~~~~~~~~~~~~~~~~
A Dataset can be created using a 2D numpy array for features and 1D numpy array for the labels. The features array is mandatory, and the labels array is optional.

.. code-block:: python
features = np.array([[0.25, 0.3, 0.3], [0.14, 0.75, 0.3], [0.23, 0.39, 0.1]])
labels = np.array([0.1, 0.1, 0.7])
dataset_with_labels = Dataset.from_numpy(features, labels)
dataset_without_labels = Dataset.from_numpy(features)
Also, it's possible to assign names to the features and label:

.. code-block:: python
Dataset.from_numpy(
features, labels,
feature_names=['feat1', 'feat2', 'feat3',],
label_name='target'
)
All the rest of the Dataset's properties can be passed also as a regular keyword arguments:

.. code-block:: python
Dataset.from_numpy(
features, labels,
feature_names=['feat1', 'feat2', 'feat3',],
label_name='target',
max_float_categories=10
)
11 changes: 10 additions & 1 deletion docs/source/user-guide/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
.. _user_guide:

==========
User Guide
==========

Here you can find the key concepts, structure, recommended flow, and dive in to many of the deepchecks functionalities.

.. toctree::
:caption: User Guide
:caption: User Guide

key_concepts
when_should_you_use
dataset_object
39 changes: 39 additions & 0 deletions docs/source/user-guide/key_concepts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
.. _user_guide__key_concepts:

==============
Key Concepts
==============

Check
========

Each check enables you to inspect a specific aspect of your data and models.
They are the basic building block of the deepchecks package, covering all kinds of common issues,
such as: PerformanceOverfit, DataSampleLeakage, SingleFeatureContribution, DataDuplicates, and :ref:`many more checks <examples/checks>`.
Each check can have two types of results:

#. A visual result meant for display (e.g. a figure or a table).
#. A return value that can be used for validating the expected check results
(validations are typically done by adding a "condition" to the check, as explained below)


Condition
==========
A condition is a function that can be added to a Check, which returns
a pass ✓, fail ✖ or warning ! result, intended for validating the Check's return value.
An example for adding a condition would be:

.. code-block:: python
from deepchecks.checks import BoostingOverfit
BoostingOverfit().add_condition_test_score_percent_decline_not_greater_than(threshold=0.05)
Suite
========

A suite is an ordered collection of checks, that can have conditions added to them. The Suite enables displaying a concluding report for all of the Checks that ran. `Here <https://github.com/deepchecks/deepchecks/blob/main/deepchecks/suites>`__ you can find the predefined existing suites and a code example demonstrating how to build your own custom suite. The existing suites include default conditions added for most of the checks. You can edit the preconfigured suites or build a suite of your own with a collection of checks and optional conditions.




0 comments on commit 652a1f4

Please sign in to comment.