# What is an Approximate Functional Dependency?

In Desbordante we consider an approximate functional dependency ($AFD$)
any kind of functional dependency ($FD$) that employs an error metric and is not named (e.g. *soft functional dependencies*).

This metric is used to calculate the extent of violation for a given exact $FD$ and lies within `[0, 1]` range (the lower, the less violations
are found in data).

For the discovery task a user can specify the threshold and Desbordante
will find all $AFDs$, which have their error equal or less than the threshold, according to the selected metric.

# What we have to offer

Currently, Desbordante supports:
1. Five metrics: `g1`, `pdep`, `tau`, `mu+`, `rho`.
2. Two algorithms for discovery of $AFDs$: `Tane` and `Pyro`, with `Pyro` being the fastest.

  *Unfortunately, Pyro can handle only the g1 metric, for the rest use Tane.*

For more information consider:
1. *Measuring Approximate Functional Dependencies: A Comparative Study by M. Parciak et al.*
2. *Efficient Discovery of Approximate Dependencies by S. Kruse and F. Naumann.*
3. *TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies by Y. Huhtala et al.*

# Example

Now, we are going to demonstrate how to discover $AFDs$.

First, install dependencies, import the modules and load the dataset.

In [None]:
!pip install desbordante==2.3.2
!pip install pandas

Collecting desbordante==2.3.2
  Downloading desbordante-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Downloading desbordante-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m31.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: desbordante
Successfully installed desbordante-2.3.2


In [None]:
import desbordante as db
import pandas as pd

In [None]:
!wget -q https://raw.githubusercontent.com/Desbordante/desbordante-core/main/examples/datasets/inventory_afd.csv

Display the dataset using `pandas`.

In [None]:
df = pd.read_csv("inventory_afd.csv")
df

Unnamed: 0,Id,ProductName,Price
0,1,Laptop,3000
1,2,Laptop,3000
2,3,Laptop,300
3,4,Laptop,3000
4,5,Smartwatch,600
5,6,Headphones,500
6,7,Tablet,300
7,8,Tablet,500
8,9,Smartphone,1000
9,10,Headphones,500


---
AFDs mined by Pyro

In [None]:
pyro_alg = db.afd.algorithms.Default()
pyro_alg.load_data(table=df)
pyro_alg.execute(error=0.3)

for fd in pyro_alg.get_fds():
    print(fd)

[ProductName] -> Id
[Id] -> ProductName
[Price] -> Id
[Price] -> ProductName
[Id] -> Price
[ProductName] -> Price


---
AFDs mined by Tane

In [None]:
ERROR_MEASURES = ['g1','pdep','tau','mu_plus', 'rho']

tane_alg = db.afd.algorithms.Tane()
tane_alg.load_data(table=df)

for MEASURE in ERROR_MEASURES:
    tane_alg.execute(error=0.3, afd_error_measure=MEASURE)
    result = tane_alg.get_fds()
    print(MEASURE+':')
    for fd in result:
        print(fd)
    print()

g1:
[Id] -> ProductName
[Id] -> Price
[ProductName] -> Id
[Price] -> Id
[ProductName] -> Price
[Price] -> ProductName

pdep:
[Id] -> ProductName
[Id] -> Price
[ProductName] -> Price

tau:
[Id] -> ProductName
[Id] -> Price

mu_plus:
[Id] -> ProductName
[Id] -> Price

rho:
[Id] -> ProductName
[Id] -> Price
[ProductName] -> Price

