# Analysis of the Database With Cause-effect Pairs

This notebook was created in section 4.2.2 because it requires a monotonic dataset.

In [1]:
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

In [2]:
import pandas as pd
from tqdm import tqdm

In [3]:
def load_dataset(number: int) -> pd.DataFrame:
    assert number > 0, f"The dataset number must be greater than zero, got '{number}'."
    url = "https://webdav.tuebingen.mpg.de/cause-effect/pair{:04d}.txt".format(number)
    df = pd.read_csv(url, sep=" ", header=None)
    df.columns = ["x", "y"]  # all datasets have two variables
    return df

In [4]:
def is_monotonic(data: pd.DataFrame) -> bool:
    data = data.sort_values(by='x')
    last_y = None
    for i, row in data.iterrows():
        current_y = row['y']
        if last_y is not None and last_y > current_y:
            return False
        last_y = current_y
    return True

In [5]:
def swap_xy(df: pd.DataFrame) -> pd.DataFrame:
    data['tmp'] = data['x']
    data['x'] = data['y']
    data['y'] = data['tmp']
    del data['tmp']
    return data

In [6]:
monotonic, monotonic_xtoy = set(), set()
errors = set()
for i in tqdm(range(1, 109)):
    try:
        data = load_dataset(i)
        if is_monotonic(data):
            monotonic.add(i)
        data = swap_xy(data)
        if is_monotonic(data):
            monotonic_xtoy.add(i)
    except:
        errors.add(i)

100%|██████████| 108/108 [00:22<00:00,  4.82it/s]


## Results: Monotonic

Datasets where $y$ is monotonic with respect to $x$:

In [7]:
print(monotonic)

set()


Datasets where $x$ is monotonic with respect to $y$:

In [8]:
print(monotonic_xtoy)

set()


Datasets which could not be analyzed:

In [9]:
print(errors)

{42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 69, 70, 71, 73, 74, 75, 76, 77, 81, 82, 83, 84, 85, 87, 89, 90, 91, 92, 105}


There are no datasets that are monotonic at all.