Updated readme, user guide and how-to-guides (#340)

* adding getting started and a few sections to the user guide * updating readme and docs Co-authored-by: Itay Gabbay <itay@deepchecks.com>
deepchecks · Jan 2, 2022 · 652a1f4 · 652a1f4
1 parent df7ad4e
commit 652a1f4
Show file tree

Hide file tree

Showing 14 changed files with 6,600 additions and 2,468 deletions.
diff --git a/README.md b/README.md
diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.0.17
+0.1.0
diff --git a/deepchecks/suites/README.md b/deepchecks/suites/README.md
@@ -10,9 +10,9 @@
   ~ ----------------------------------------------------------------------------
   ~
 -->
-# Check Suites
+# Suites
 
-## Using Existing CheckSuites
+## Using an Existing Suite
 
 ### [List of Prebuilt Suites](./default_suites.py)
 
@@ -24,12 +24,12 @@
 
 
 ### Running a Suite
-to run a suite, first import it
+To run a prebuilt suite, first import it
 
 ```python
 from deepchecks.suites import *
 ```
-Then run it with the required input parameters (datasets and models)
+Then run it with the required (suite-dependant) input parameters
 ```python
 model_evaluation().run(model=my_classification_model, train_dataset=ds_train, test_dataset=ds_test)
 ```

diff --git a/docs/images/deepchecks-banner.png b/docs/images/deepchecks-banner.png
diff --git a/docs/images/deepchecks-logo-with-white-wide-back.png b/docs/images/deepchecks-logo-with-white-wide-back.png
diff --git a/docs/images/train-test-drift-output.png b/docs/images/train-test-drift-output.png
diff --git a/docs/source/getting-started/index.rst b/docs/source/getting-started/index.rst
@@ -1,7 +1,65 @@
+.. _getting_started:
+
+================
 Getting Started
-===============
+================
+
+Welcome to Deepchecks!
+
+To get started and easily validate your data and models, make sure to
+install the deepchecks python package.
+
+Installation
+==============
+
+Local Installation
+---------------------
+
+Deepchecks requires Python 3 and can be installed using pip or conda, depending on the package manager you're working with for most of your packages.
+
+Using Pip
+~~~~~~~~~~
+As a best practice we recommend working on a `virtual environment <https://docs.python.org/3/library/venv.html>`__. 
+
+The deepchecks package can be installed from `PyPi <https://pypi.org/project/deepchecks/>`__ using the following command:
+
+.. code-block:: bash
+
+    pip install deepchecks --upgrade
+
+.. note::
+    Deepchecks is in alpha development, which means that new versions are released on a weekly basis and new features are frequently added. If you experience any unexpected behavior from deepchecks, the first step to troubleshoot is to upgrade to the latest version.
+
+
+Using Conda
+~~~~~~~~~~~~~
+As a best practice we recommend `creating a conda environment <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands>`__.
+
+To install the deepchecks package from the conda package manager run
+
+.. code-block:: bash
+
+    conda install -c deepchecks deepchecks
+
+or, if you already have deepchecks installed and want to update then run
+
+.. code-block:: bash
+
+    conda update -c deepchecks deepchecks
+
+Installing On Google Colab or on a Kaggle Kernel
+-------------------------------------------------
+Simply run the following command in a notebook cell:
+
+.. code-block:: bash
+
+    !pip install deepchecks
+
+
 
-Here we will show you how to get started with deepchecks!
+Start Working with the Package!
+=================================
+To get started with deepchecks with 5 lines of code, head to :doc:`examples/howto-guides/quickstart_in_5_minutes`.
 
-.. toctree::
-    :caption: Getting Started
+For additional usage examples and for understanding the best practices of how to use the package, stay tuned for our additional
+guides that will be added here with our official launch!
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,40 +1,41 @@
-.. mlchecks documentation master file, created by
-   sphinx-quickstart on Sun Oct 24 10:19:19 2021.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
-
-
-Welcome to Deepchecks' documentation!
 =====================================
+Welcome to Deepchecks!
+=====================================
+Deepchecks is the leading tool for validation you machine learning models
+and data and enables doing so with minimal effort. Deepchecks accompanies you through
+various validation needs such as verifying your data's integrity, inspecting its distributions,
+validating data aplits, evaluating your model and comparing between different models.
 
+Head over to the :doc:`getting-started/index` guide to get it up and running.
 
-.. toctree::
-   :maxdepth: 2
-   :hidden:
 
-   getting-started/index
+.. note::
+    Welcome! In addition to perusing the documentation, please feel free to
+    to ask questions on our `Slack channel <https://join.slack.com/t/deepcheckscommunity/shared_invite/zt-y28sjt1v-PBT50S3uoyWui_Deg5L_jg>`__, or to post a issue or start a discussion on `Github <https://github.com/deepchecks/deepchecks/>`__.
+
 
+Documentation Structure
+=========================
 
 .. toctree::
-   :maxdepth: 2
+    :maxdepth: 2
 
-   examples/index
+    getting-started/index
 
 
 .. toctree::
-   :maxdepth: 2
+    :maxdepth: 2
 
-   user-guide/index
+    user-guide/index
 
 
 .. toctree::
-   :maxdepth: 3
+    :maxdepth: 2
 
-   api/index
+    examples/index
 
 
-Indices and tables
-~~~~~~~~~~~~~~~~~~~~
+.. toctree::
+    :maxdepth: 3
 
-* :ref:`modindex`
-* :ref:`search`
+    api/index
diff --git a/docs/source/user-guide/dataset_object.rst b/docs/source/user-guide/dataset_object.rst
@@ -0,0 +1,128 @@
+.. _dataset_object:
+
+====================
+The Dataset Object
+====================
+The dataset is one of the basic blocks of deepchecks. It is a container for the data and its relevant metadata, like special column names (label, date, index, etc).
+Some of the checks allows to use a dataframe directly, but some others requires the metadata in order to run, so they are limited to working only with Datasets.
+
+Class Parameters
+===================
+All of the parameters are optional.
+
+.. list-table::
+    :widths: 20 20 50 10
+    :header-rows: 1
+
+    * - Name
+      - Type
+      - Description
+      - Default
+    * - label
+      - pandas.Series
+      - Data of labels as separate series from the data
+      - None
+    * - features
+      - List[Hashable]
+      - Names of the features in the data
+      - None
+    * - cat_features
+      - List[Hashable]
+      - Names of the categorical features in the data. Must be subset of `features`
+      - None
+    * - label_name
+      - Hashable
+      - Name of label column in the data
+      - None
+    * - use_index
+      - bool
+      - If data is dataframe, whether to use the dataframe index as index column for index related checks
+      - False
+    * - index_name
+      - Hashable
+      - Name of index column in the data
+      - None
+    * - date_name
+      - Hashable
+      - Name of date column in the data
+      - None
+    * - date_unit_type
+      - str
+      - Unit to convert date column if it's numeric. using `pandas.Timestamp <https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html>`__ to convert
+      - None
+    * - max_categorical_ratio
+      - float
+      - Used to infer which columns are categorical (if `cat_features` isn't explicitly passed).
+        Set maximum ratio of unique values in a column in order for it to be categorical.
+        The higher the value, the chance of column inferred as categorical is higher
+      - 0.01
+    * - max_categories
+      - int
+      - Used to infer which columns are categorical (if `cat_features` isn't explicitly passed).
+        Set maximum number of unique values in a column in order for it to be categorical.
+        The higher the value, the chance of column inferred as categorical is higher
+      - 30
+    * - max_float_categories
+      - int
+      - Same as `max_categories` but for columns of type float
+      - 5
+    * - convert_date
+      - bool
+      - Whether to convert date column if it's numeric to date
+      - True
+
+Inferring Features And Categorical Features
+================================================
+Dataset defines which columns of the data are features and of them which are categorical features.
+For features, if parameter `features` not passed explicitly, all will be considered features apart from the label, index and date.
+For categorical features, if parameter `cat_features` not passed explicitly, the following logic runs on every column to determine
+whether the column is categorical or not:
+
+* if columns is float type:
+    * number of unique < `max_float_categories`
+* else:
+    *  number of unique < `max_categories` AND (number of unique / number of samples) < `max_categorical_ratio`
+
+Creating a Dataset
+======================
+
+From a DataFrame
+~~~~~~~~~~~~~~~~
+The default dataset constructor is expecting to get a dataframe. the rest of the properties
+are optional, but if your data have date/index/label you would want to define them.
+
+.. code-block:: python
+
+    Dataset(my_dataframe, features=['feat1', 'feat2', 'feat3'], label='target', index='id', date='timestamp')
+
+From Numpy Arrays
+~~~~~~~~~~~~~~~~~~~
+A Dataset can be created using a 2D numpy array for features and 1D numpy array for the labels. The features array is mandatory, and the labels array is optional.
+
+.. code-block:: python
+
+    features = np.array([[0.25, 0.3, 0.3], [0.14, 0.75, 0.3], [0.23, 0.39, 0.1]])
+    labels = np.array([0.1, 0.1, 0.7])
+    dataset_with_labels = Dataset.from_numpy(features, labels)
+    dataset_without_labels = Dataset.from_numpy(features)
+
+Also, it's possible to assign names to the features and label:
+
+.. code-block:: python
+
+    Dataset.from_numpy(
+        features, labels,
+        feature_names=['feat1', 'feat2', 'feat3',],
+        label_name='target'
+    )
+
+All the rest of the Dataset's properties can be passed also as a regular keyword arguments:
+
+.. code-block:: python
+
+    Dataset.from_numpy(
+        features, labels,
+        feature_names=['feat1', 'feat2', 'feat3',],
+        label_name='target',
+        max_float_categories=10
+    )
diff --git a/docs/source/user-guide/index.rst b/docs/source/user-guide/index.rst
@@ -1,5 +1,14 @@
+.. _user_guide:
+
+==========
 User Guide
 ==========
 
+Here you can find the key concepts, structure, recommended flow, and dive in to many of the deepchecks functionalities.
+
 .. toctree::
-    :caption: User Guide
+    :caption: User Guide
+
+    key_concepts
+    when_should_you_use
+    dataset_object
diff --git a/docs/source/user-guide/key_concepts.rst b/docs/source/user-guide/key_concepts.rst
@@ -0,0 +1,39 @@
+.. _user_guide__key_concepts:
+
+==============
+Key Concepts
+==============
+
+Check
+========
+
+Each check enables you to inspect a specific aspect of your data and models.
+They are the basic building block of the deepchecks package, covering all kinds of common issues,
+such as: PerformanceOverfit, DataSampleLeakage, SingleFeatureContribution, DataDuplicates, and :ref:`many more checks <examples/checks>`.
+Each check can have two types of results:
+
+#. A visual result meant for display (e.g. a figure or a table).
+#. A return value that can be used for validating the expected check results
+   (validations are typically done by adding a "condition" to the check, as explained below)
+
+
+Condition
+==========
+A condition is a function that can be added to a Check, which returns
+a pass ✓, fail ✖ or warning ! result, intended for validating the Check's return value.
+An example for adding a condition would be:
+
+.. code-block:: python
+
+    from deepchecks.checks import BoostingOverfit
+    BoostingOverfit().add_condition_test_score_percent_decline_not_greater_than(threshold=0.05)
+    
+
+Suite
+========
+
+A suite is an ordered collection of checks, that can have conditions added to them. The Suite enables displaying a concluding report for all of the Checks that ran. `Here <https://github.com/deepchecks/deepchecks/blob/main/deepchecks/suites>`__ you can find the predefined existing suites and a code example demonstrating how to build your own custom suite. The existing suites include default conditions added for most of the checks. You can edit the preconfigured suites or build a suite of your own with a collection of checks and optional conditions.
+
+
+
+