# Introduction to Tetrad

Tetrad is a Java library of various interfaces for exploring causal explanations of data. With the py-tetrad python interface and JPype, we can access all of Tetrad's functionality, from creating graphs of relationships within data to simulating data from constructed models.

## Prerequisites
- Java JDK 21+ installed on your system
- Python 3.12+
- py-tetrad for basic Tetrad functionality
- JPype for basic and extended Tetrad functionality
- PyTorch for the CausalPerceptronNetwork, used to simulate data from existing graphs

In [2]:
!sudo pip install -U --break-system-packages JPype1 git+https://github.com/cmu-phil/py-tetrad torch

Collecting git+https://github.com/cmu-phil/py-tetrad
  Cloning https://github.com/cmu-phil/py-tetrad to /tmp/pip-req-build-8cyqkfh7
  Running command git clone --filter=blob:none --quiet https://github.com/cmu-phil/py-tetrad /tmp/pip-req-build-8cyqkfh7
  Resolved https://github.com/cmu-phil/py-tetrad to commit 275850d12bb29e392580ee96bff633928a8296bb
  Preparing metadata (setup.py) ... [?25ldone
[0m

## Imports

In [3]:
from importlib import resources as impresources
from pytetrad import resources
import pytetrad.tools.TetradSearch as ts
import edu.cmu.tetrad.data as td
import pandas as pd

## Knowledge

In [2]:
# Reading sample data taken from py-tetrad
resource_file = impresources.files(resources) / 'aw-fb-pruned18.data.mixed.numeric.txt'
knowledge_df = pd.read_csv(resource_file, sep='\t')

knowledge_df.head(5)

Unnamed: 0,age,gender,height,weight,steps,heart_rate,calories,distance,entropy_heart,entropy_steps,resting_heart,corr_heart_steps,norm_heart,sd_norm_heart,steps_times_distance,device,activity
0,20.0,1.0,168.0,65.4,10.7714,78.5313,0.3445,0.0083,6.2216,6.1163,59.0,1.0,19.5313,1.0,0.0897,0.0,0.0
1,20.0,1.0,168.0,65.4,11.4753,78.4534,3.2876,0.0089,6.2216,6.1163,59.0,1.0,19.4534,1.0,0.1021,0.0,0.0
2,20.0,1.0,168.0,65.4,12.1792,78.5408,9.484,0.0095,6.2216,6.1163,59.0,1.0,19.5408,1.0,0.1153,0.0,0.0
3,20.0,1.0,168.0,65.4,12.8831,78.6283,10.1546,0.01,6.2216,6.1163,59.0,1.0,19.6283,1.0,0.1293,0.0,0.0
4,20.0,1.0,168.0,65.4,13.587,78.7157,10.8251,0.0106,6.2216,6.1163,59.0,0.9828,19.7157,0.2416,0.1441,0.0,0.0


We know that factors like age, gender, height, weight, resting heart rate, and whether the user has a device or recorded activity are all determined prior to the activity test recorded in this data. As such, we can include these factors in Tier 0 and the rest in later tiers. Additionally, we can indicate to the model that these factors should not affect those within the same tier.

In [3]:
# Declaring Knowledge tiers
tier0 = ['age', 'gender', 'height', 'weight', 'resting_heart', 'device', 'activity']
tier1 = ['steps', 'heart_rate', 'calories', 'distance']

# Adding tiers to knowledge
knowledge = td.Knowledge()
for col in tier0:
    knowledge.addToTier(0, col)
for col in tier1:
    knowledge.addToTier(1, col)

# Forbid edges within tier0
knowledge.setTierForbiddenWithin(0, True)

As examples, we can specify required and forbidden edges.

In [4]:
# Required edge
knowledge.setRequired('activity', 'steps')

# Forbidden edge
knowledge.setForbidden('age', 'device')

## TetradSearch
To demonstrate use of TetradSearch, first we will run the FGES algorithm on a continuous dataset. The FGES algorithm produces a Completed Partially Directed Acyclic Graph (CPDAG).

In [5]:
# Reading sample data from py-tetrad
resource_file = impresources.files(resources) / 'airfoil-self-noise.continuous.txt'
continuous_df = pd.read_csv(resource_file, sep='\t')
continuous_df = continuous_df.astype({col: "float64" for col in continuous_df.columns})

# Instantiating TetradSearch
search = ts.TetradSearch(continuous_df)

# Adding knowledge through TetradSearch
search.set_forbidden('Frequency', 'Attack')
search.set_required('Frequency', 'Pressure')

# Specify SEM BIC scoring
search.use_sem_bic(penalty_discount=2)

# Run the search -- These are default FGES search parameters
search.run_fges(symmetric_first_step=False, max_degree=-1, parallelized=False, faithfulness_assumed=False)
print(search.get_string())

Dec 01, 2025 3:35:06 AM java.util.prefs.FileSystemPreferences$6 run


Error computing BIC: Graph must not be null.
Graph Nodes:
Frequency;Attack;Chord;Velocity;Displacement;Pressure

Graph Edges:
1. Attack --> Frequency
2. Attack --> Pressure
3. Chord --> Attack
4. Chord --> Frequency
5. Chord --> Pressure
6. Displacement --> Attack
7. Displacement --- Chord
8. Displacement --> Pressure
9. Frequency --> Pressure
10. Velocity --> Attack
11. Velocity --> Frequency
12. Velocity --> Pressure

Graph Attributes:
Score: -46106.834126

Graph Node Attributes:
Score: [Frequency: -28323.59197296753;Attack: -7882.65059720316;Chord: 2856.9619390766175;Velocity: -12518.376781557072;Displacement: 8815.0969551796;Pressure: -9054.27366892865]



The error displayed in the above output, "Error computing BIC," is an issue with the py-tetrad library that appears on any FGES-based search. The score attributes indicate the strength of the graph and its constituent nodes, with higher values indicating higher confidence in the determined graph.

To analyze temporal relationships, we can use the time lag parameter to include previous data's effect on current data. Below, we see how a time lag of 1 affects the causal graph by including variables from one record prior. Further increasing the time lag parameter significantly increases the complexity of the model.

In [6]:
# Set time lag parameter
search.set_time_lag(1)

# Run the search again
search.run_fges()
print(search.get_string())

Error computing BIC: Graph must not be null.
Graph Nodes:
Frequency;Attack;Chord;Velocity;Displacement;Pressure;Frequency:1;Attack:1;Chord:1;Velocity:1;Displacement:1;Pressure:1

Graph Edges:
1. Attack --> Displacement
2. Attack --> Frequency
3. Attack --> Pressure
4. Chord --> Pressure
5. Frequency --> Pressure
6. Velocity --> Frequency
7. Velocity --> Pressure
8. Attack:1 --> Attack
9. Attack:1 --> Displacement
10. Attack:1 --> Frequency:1
11. Attack:1 --> Pressure:1
12. Chord:1 --> Chord
13. Chord:1 --> Pressure
14. Chord:1 --> Attack:1
15. Chord:1 --- Displacement:1
16. Chord:1 --> Frequency:1
17. Chord:1 --> Pressure:1
18. Displacement:1 --> Displacement
19. Displacement:1 --> Attack:1
20. Displacement:1 --> Pressure:1
21. Frequency:1 --> Frequency
22. Frequency:1 --> Pressure
23. Frequency:1 --> Pressure:1
24. Pressure:1 --> Pressure
25. Velocity:1 --> Pressure
26. Velocity:1 --> Velocity
27. Velocity:1 --> Attack:1
28. Velocity:1 --> Frequency:1
29. Velocity:1 --> Pressure:1

Gr

TetradSearch similarly includes access to algorithms for discrete or mixed datasets. Below is an example of using TetradSearch to run the RFCI algorithm. RFCI is designed for discrete datasets and returns a Partial Ancestral Graph (PAG).

In [7]:
# Reading sample data from py-tetrad
resource_file = impresources.files(resources) / 'bridges.data.version211_rev.txt'
discrete_df = pd.read_csv(resource_file, sep='\t')

# Instantiating TetradSearch
search = ts.TetradSearch(discrete_df)

# Select tests and scores
search.use_bdeu(sample_prior=10, structure_prior=0)
search.use_chi_square(alpha=0.05)

# Run the search
search.run_rfci(depth=-1, stable_fas=True, max_disc_path_length=-1, complete_rule_set_used=True)
print(search.get_string())

Graph Nodes:
RIVER;ERECTED;PURPOSE;LENGTH;LANES;CLEAR_G;T_OR_D;MATERIAL;SPAN;REL_L;TYPE

Graph Edges:
1. CLEAR_G o-> TYPE
2. LANES o-> T_OR_D
3. LENGTH o-> SPAN
4. MATERIAL o-> TYPE
5. REL_L o-> SPAN
6. T_OR_D <-> TYPE




## Running Tetrad Directly with JPype
Some of Tetrad's functionality is not included directly in py-tetrad and must instead be called through JPype. One search algorithm that TetradSearch does not include is the IMaGES algorithm, which is a modified version of the FGES algorithm that can input multiple similarly structured datasets to produce a single CPDAG. The below example uses the IMaGES-BOSS algorithm, which utilizes BOSS instead of FGES and is specifically suited for continuous variables.

In [6]:
import pytetrad.tools.translate as ptt
import edu.cmu.tetrad.util as util
import edu.cmu.tetrad.algcomparison.algorithm.multi as multi
import java.util as jutil

# Reading sample data from py-tetrad
resource_file = impresources.files(resources) / 'airfoil-self-noise.continuous.txt'
continuous_df = pd.read_csv(resource_file, sep='\t')
continuous_df = continuous_df.astype({col: "float64" for col in continuous_df.columns})

# Instantiate the IMaGES-BOSS algorithm 
alg = multi.ImagesBoss()

# Set parameters. Images uses SEM BIC for scoring by default
params = util.Parameters()
params.set(util.Params.PENALTY_DISCOUNT, 2)
params.set(util.Params.RANDOM_SELECTION_SIZE, 1)

# Convert dataframe to Tetrad data object
tetrad_data = ptt.pandas_data_to_tetrad(continuous_df)

# Add data to an ArrayList 
data_list = jutil.ArrayList()
# In this example we will simply reuse the same dataset but different, similarly-structured data also works
data_list.add(tetrad_data)
data_list.add(tetrad_data)
data_list.add(tetrad_data)

# Run and print resultss
cpdag = alg.search(data_list, params)
ptt.print_java(cpdag)

Graph Nodes:
Frequency;Attack;Chord;Velocity;Displacement;Pressure

Graph Edges:
1. Attack --> Displacement
2. Attack --> Pressure
3. Chord --> Attack
4. Chord --> Displacement
5. Chord --> Pressure
6. Displacement --> Pressure
7. Frequency --> Attack
8. Frequency --> Pressure
9. Velocity --> Attack
10. Velocity --- Frequency
11. Velocity --> Pressure




## Simulating Data with Tetrad

### Simulating Continuous Data

In [9]:
import pytetrad.tools.simulate as sim

# Simulate data with its true graph
d_continuous, g_continuous = sim.simulateContinuous(num_meas = 20, num_lat = 0, avg_deg = 4, samp_size = 200, 
                                                    coef_low = 0, coef_high = 1, var_low = 1, var_high = 3, 
                                                    rand_cols=False)

# View true graph
ptt.print_java(g_continuous)

# View data with pandas
df = ptt.tetrad_data_to_pandas(d_continuous)
df.head(5)

Graph Nodes:
X1;X2;X3;X4;X5;X6;X7;X8;X9;X10;X11;X12;X13;X14;X15;X16;X17;X18;X19;X20

Graph Edges:
1. X1 --> X2
2. X1 --> X4
3. X1 --> X7
4. X1 --> X8
5. X1 --> X12
6. X1 --> X17
7. X1 --> X20
8. X2 --> X3
9. X2 --> X5
10. X3 --> X13
11. X3 --> X16
12. X4 --> X8
13. X4 --> X9
14. X4 --> X12
15. X4 --> X16
16. X4 --> X17
17. X6 --> X10
18. X6 --> X12
19. X6 --> X14
20. X7 --> X11
21. X7 --> X15
22. X8 --> X9
23. X8 --> X17
24. X8 --> X18
25. X8 --> X19
26. X9 --> X13
27. X9 --> X19
28. X10 --> X13
29. X10 --> X15
30. X12 --> X15
31. X12 --> X19
32. X13 --> X15
33. X13 --> X18
34. X14 --> X15
35. X14 --> X16
36. X14 --> X17
37. X15 --> X20
38. X16 --> X19
39. X17 --> X18
40. X19 --> X20




Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,X20
0,3.917677,-1.014223,-0.28911,-3.742876,-1.267677,2.328146,2.375434,5.473889,-4.066737,-2.819086,0.092815,1.0819,-0.707093,0.142024,2.290921,-6.251781,8.210197,4.895512,2.329169,-3.797691
1,2.578913,-0.778541,-1.994886,0.28195,2.478758,-2.056178,1.615939,2.011372,-2.85092,5.722634,0.478972,-1.704014,-1.673368,-0.742578,-7.571754,1.951863,2.349743,3.538466,3.048942,4.537513
2,-1.758419,-0.909899,-1.836314,1.129903,2.921092,-0.094451,-4.061908,-1.816634,0.823412,-0.97527,-0.383905,-1.28897,1.133605,-0.136437,5.223333,2.055732,-2.092541,0.545167,1.075073,-5.286149
3,-0.944805,1.89238,1.256632,1.787365,-1.603913,0.198759,-0.468707,-0.56103,2.727938,0.799146,-1.057838,-1.467815,1.162031,1.384879,1.219631,0.630107,0.757051,0.362992,1.391626,-1.890697
4,0.650975,-0.891033,-0.157883,-2.714893,0.616899,-0.329968,-2.28829,2.980452,-0.946839,1.567209,0.362529,-0.9114,0.761004,-2.598468,4.286951,-3.912184,2.757018,2.906181,1.83523,-5.638241


### Simulating Discrete Data 

In [10]:
# Simulate data with its true graph
d_discrete, g_discrete = sim.simulateDiscrete(num_meas = 20, num_lat = 0, avg_deg = 4, min_cat=3, max_cat=3, 
                                                  samp_size=1000)

# View true graph
ptt.print_java(g_discrete)

# View data with pandas
df = ptt.tetrad_data_to_pandas(d_discrete)
df.head(5)

Graph Nodes:
X1;X2;X3;X4;X5;X6;X7;X8;X9;X10;X11;X12;X13;X14;X15;X16;X17;X18;X19;X20

Graph Edges:
1. X1 --> X2
2. X1 --> X3
3. X1 --> X6
4. X1 --> X14
5. X1 --> X15
6. X1 --> X18
7. X2 --> X3
8. X2 --> X5
9. X2 --> X7
10. X2 --> X17
11. X2 --> X18
12. X2 --> X20
13. X3 --> X4
14. X3 --> X6
15. X3 --> X10
16. X3 --> X12
17. X4 --> X8
18. X4 --> X9
19. X4 --> X13
20. X4 --> X16
21. X5 --> X6
22. X5 --> X15
23. X5 --> X20
24. X6 --> X11
25. X6 --> X12
26. X6 --> X17
27. X6 --> X19
28. X7 --> X8
29. X7 --> X16
30. X7 --> X19
31. X8 --> X15
32. X9 --> X10
33. X9 --> X14
34. X10 --> X12
35. X10 --> X13
36. X10 --> X14
37. X12 --> X19
38. X17 --> X19
39. X17 --> X20
40. X19 --> X20




Unnamed: 0,X14,X12,X2,X11,X18,X1,X5,X6,X16,X13,X15,X10,X7,X9,X19,X3,X17,X20,X4,X8
0,(0),(1),(0),(0),(0),(1),(0),(1),(0),(2),(2),(2),(0),(1),(2),(0),(2),(1),(2),(1)
1,(2),(0),(2),(1),(2),(0),(0),(0),(1),(0),(1),(2),(2),(1),(0),(0),(0),(0),(2),(2)
2,(1),(2),(0),(2),(0),(1),(0),(1),(2),(2),(2),(0),(0),(2),(0),(1),(0),(1),(0),(2)
3,(0),(1),(0),(2),(0),(1),(0),(1),(1),(2),(2),(2),(0),(1),(2),(1),(2),(1),(2),(1)
4,(1),(0),(1),(2),(0),(2),(2),(1),(0),(0),(0),(1),(2),(0),(1),(2),(2),(0),(1),(2)


### Simulating Mixed Data with Lee & Hastie
The [Lee & Hastie method](https://proceedings.mlr.press/v31/lee13a.html) for simulating mixed datasets treats mixtures of continuous and discrete data as log-linear.

In [11]:
# Simulate data with its true graph
d_mixed, g_mixed = sim.simulateLeeHastie(num_meas = 20, num_lat = 0, avg_deg = 4, min_cat=3, max_cat=3, 
                                         perc_disc=50, samp_size=1000)

# View true graph
ptt.print_java(g_mixed)

# View data with pandas
df = ptt.tetrad_data_to_pandas(d_mixed)
df.head(5)

Graph Nodes:
X1;X2;X3;X4;X5;X6;X7;X8;X9;X10;X11;X12;X13;X14;X15;X16;X17;X18;X19;X20

Graph Edges:
1. X1 --> X3
2. X1 --> X5
3. X1 --> X6
4. X2 --> X6
5. X2 --> X8
6. X2 --> X11
7. X2 --> X12
8. X2 --> X15
9. X2 --> X18
10. X3 --> X5
11. X3 --> X11
12. X3 --> X19
13. X4 --> X6
14. X4 --> X9
15. X4 --> X17
16. X5 --> X7
17. X5 --> X11
18. X5 --> X14
19. X5 --> X18
20. X5 --> X19
21. X6 --> X12
22. X6 --> X15
23. X6 --> X18
24. X7 --> X18
25. X7 --> X19
26. X8 --> X14
27. X9 --> X14
28. X9 --> X17
29. X9 --> X18
30. X9 --> X19
31. X10 --> X11
32. X10 --> X14
33. X11 --> X18
34. X12 --> X15
35. X12 --> X16
36. X12 --> X17
37. X13 --> X15
38. X15 --> X17
39. X15 --> X19
40. X18 --> X19




Unnamed: 0,X20,X12,X17,X8,X7,X1,X14,X9,X19,X11,X6,X13,X10,X16,X4,X15,X18,X3,X2,X5
0,-0.078325,0.134885,0,2,1,-0.773934,0,0.723786,2,0,-0.228449,2,1.202918,2,0.338711,1,-1.394207,-2.007607,-0.469089,1
1,-0.638732,0.298619,1,2,1,-0.258239,0,0.378877,2,1,2.629355,1,0.728368,2,0.334428,0,2.401044,-2.03979,-1.190205,1
2,-1.954954,0.643924,2,0,1,1.792331,2,-0.477489,2,1,-0.829142,2,1.662202,1,0.609947,1,2.838105,2.623676,0.721279,0
3,0.770293,0.921582,0,2,0,-0.912488,0,0.721542,1,0,-1.589278,0,-0.475801,2,0.492657,1,1.427559,0.649983,0.261497,0
4,1.055458,-2.741489,0,2,2,-2.037988,1,-1.46381,2,0,1.162191,1,-1.311442,1,1.700365,0,-2.666066,-0.301788,-1.481669,1


### Simulating Data from an Existing Graph
Py-tetrad provides the CausalPerceptronNetwork class based on PyTorch for simulating data based on an input graph. This can be useful for assessing the robustness of a generated graph.

In [12]:
import pytetrad.tools.cpn as cpn
import torch.nn as nn

# Setup a CausalPerceptronNetwork to simulate data from the CPDAG from above
noise_distributions = {}
for node in cpdag.getNodes():
    noise_distributions[node] = cpn.NoiseDistribution(distribution_type="normal", mean=0, std=1)

cpn = cpn.CausalPerceptronNetwork(
    graph=cpdag,
    num_samples=10000,
    noise_distributions=noise_distributions,  # Function to generate noise
    hidden_dimensions=[50, 50, 50, 50, 50],
    input_scale=1,
    activation_module=nn.LeakyReLU(),
    nonlinearity='leaky_relu',
    discrete_prob=0,  # No discrete variables
    seed=40
)

# Simulate the data from the CPN that was created from the PAG
simulated_df = cpn.generate_data()

simulated_df.head(5)

Unnamed: 0,Frequency,Attack,Chord,Velocity,Displacement,Pressure
0,0.134287,2.102219,1.113105,-0.574891,0.119222,-0.546117
1,0.064648,1.72888,1.890632,-1.656146,0.766208,-0.692765
2,0.078281,0.823285,1.111294,-1.202268,1.664933,1.220293
3,0.096521,0.803707,0.75175,-0.620423,2.05905,-0.404596
4,0.089519,1.287818,-0.844076,-0.862712,0.675245,-0.200326
