Update README_CONSOLE.md

Add readme for console repo and package. Remove cli-related information from README.md.
Desbordante · Jun 15, 2024 · 5dda96d · 5dda96d
1 parent 0a44627
commit 5dda96d
Show file tree

Hide file tree

Showing 2 changed files with 107 additions and 51 deletions.
diff --git a/README.md b/README.md
@@ -46,46 +46,7 @@ A brief introduction to the tool and its use cases can be found [here](https://m
 
 ## Console
 
-Usage examples:
-1) Discover all exact functional dependencies in a table stored in a comma-separated file with a header row. In this example the default FD discovery algorithm (HyFD) is used.
-
-```sh
-python3 cli.py --task=fd --table=../examples/datasets/university_fd.csv , True
-```
-
-```text
-[Course Classroom] -> Professor
-[Classroom Semester] -> Professor
-[Classroom Semester] -> Course
-[Professor] -> Course
-[Professor Semester] -> Classroom
-[Course Semester] -> Classroom
-[Course Semester] -> Professor
-```
-
-2) Discover all approximate functional dependencies with error less than or equal to 0.1 in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default AFD discovery algorithm (Pyro) is used.
-
-```sh
-python3 cli.py --task=afd --table=../examples/datasets/inventory_afd.csv , True --error=0.1
-```
-
-```text
-[Id] -> ProductName
-[Id] -> Price
-[ProductName] -> Price
-```
-
-3) Check whether metric functional dependency “Title -> Duration” with radius 5 (using the Euclidean metric) holds in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default MFD validation algorithm (BRUTE) is used.
-
-```sh
-python3 cli.py --task=mfd_verification --table=../examples/datasets/theatres_mfd.csv , True --lhs_indices=0 --rhs_indices=2 --metric=euclidean --parameter=5
-```
-
-```text
-True
-```
-
-For more information consult documentation and help files.
+For information about the console interface check the [repository](https://github.com/Desbordante/desbordante-cli).
 
 ## Python bindings
 
@@ -250,17 +211,6 @@ $ pip install desbordante
 
 However, as Desbordante core uses C++, additional requirements on the machine are imposed. Therefore this installation option may not work for everyone. Currently, only manylinux2014 (Ubuntu 20.04+, or any other linux distribution with gcc 10+) is supported. If the above does not work for you consider building from sources.
 
-## CLI installation
-
-**NOTE**: Only Python 3.11+ is supported for CLI
-
-Сlone the repository, change the current directory to the project directory and run the following commands:
-
-```sh
-pip install -r cli/requirements.txt
-python3 cli/cli.py --help
-```
-
 ## Build instructions
 
 ### Ubuntu

diff --git a/README_CONSOLE.md b/README_CONSOLE.md
@@ -0,0 +1,106 @@
+<p>
+   <img src="https://github.com/Mstrutov/Desbordante/assets/88928096/d687809b-5a3b-420e-a192-a1a2b6697b2a"/>
+</p>
+
+---
+
+# Desbordante: high-performance data profiler (console interface)
+
+## What is it?
+
+[**Desbordante**](https://github.com/Desbordante/desbordante-core) is a high-performance data profiler oriented towards exploratory data analysis. This is the repository for the Desbordante console interface, which is published as a separate [package](https://pypi.org/project/desbordante-cli/). This package depends on the [desbordante package](https://pypi.org/project/desbordante/), which contains the C++ code for pattern discovery and validation. As the result, depending on the algorithm and dataset, the runtimes may be cut by 2-10 times compared to the alternative tools.
+
+## Table of Contents
+
+- [Main Features](#main-features)
+- [Usage Examples](#usage-examples)
+- [Installation](#installation)
+- [Contacts and Q&A](#contacts-and-qa)
+
+# Main Features
+
+[**Desbordante**](https://github.com/Desbordante/desbordante-core) is a high-performance data profiler that is capable of discovering and validating many different patterns in data using various algorithms. 
+
+The **Discovery** task is designed to identify all instances of a specified pattern *type* of a given dataset.
+
+The **Validation** task is different: it is designed to check whether a specified pattern *instance* is present in a given dataset. This task not only returns True or False, but it also explains why the instance does not hold (e.g. it can list table rows with conflicting values).
+
+The currently supported data patterns are:
+* Functional dependency variants:
+    - Exact functional dependencies (discovery and validation)
+    - Approximate functional dependencies, with g<sub>1</sub> metric (discovery and validation)
+    - Probabilistic functional dependencies, with PerTuple and PerValue metrics (discovery)
+* Graph functional dependencies (validation)
+* Conditional functional dependencies (discovery)
+* Inclusion dependencies (discovery)
+* Order dependencies:
+   - set-based axiomatization (discovery)
+   - list-based axiomatization (discovery)
+* Metric functional dependencies (validation)
+* Fuzzy algebraic constraints (discovery)
+* Unique column combinations:
+   - Exact unique column combination (discovery and validation)
+   - Approximate unique column combination, with g<sub>1</sub> metric (discovery and validation)
+* Association rules (discovery)
+
+For more information about the supported patterns check the main [repo](https://github.com/Desbordante/desbordante-core).
+
+## Usage examples
+
+Usage examples:
+1) Discover all exact functional dependencies in a table stored in a comma-separated file with a header row. In this example the default FD discovery algorithm (HyFD) is used.
+
+```sh
+python3 cli.py --task=fd --table=../examples/datasets/university_fd.csv , True
+```
+
+```text
+[Course Classroom] -> Professor
+[Classroom Semester] -> Professor
+[Classroom Semester] -> Course
+[Professor] -> Course
+[Professor Semester] -> Classroom
+[Course Semester] -> Classroom
+[Course Semester] -> Professor
+```
+
+2) Discover all approximate functional dependencies with error less than or equal to 0.1 in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default AFD discovery algorithm (Pyro) is used.
+
+```sh
+python3 cli.py --task=afd --table=../examples/datasets/inventory_afd.csv , True --error=0.1
+```
+
+```text
+[Id] -> ProductName
+[Id] -> Price
+[ProductName] -> Price
+```
+
+3) Check whether metric functional dependency “Title -> Duration” with radius 5 (using the Euclidean metric) holds in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default MFD validation algorithm (BRUTE) is used.
+
+```sh
+python3 cli.py --task=mfd_verification --table=../examples/datasets/theatres_mfd.csv , True --lhs_indices=0 --rhs_indices=2 --metric=euclidean --parameter=5
+```
+
+```text
+True
+```
+
+For more information check the --help option:
+
+```sh
+desbordante --help
+```
+
+## Installation
+
+The source code is currently hosted on GitHub at https://github.com/Desbordante/desbordante-console. In order for this to run, first you have to have install the latest version of the main Desbordante [package](https://pypi.org/project/desbordante/).
+
+**NOTE**: Only Python 3.11+ is supported for CLI
+
+Run the following commands:
+
+```sh
+pip install -r cli/requirements.txt
+python3 cli/cli.py --help
+```