#### About
> Conditional Independence

- It is a concept in probability theory that describes the relationship between two random variables, given the value of a third random variable. 
- It is a form of independence that occurs when the occurence of one event has no effect on the probability of another event, given the occ. of third event.

- Two R.Vs X and Y are said to be conditionally independent given a third random variable Z, iff
P(X,Y|Z) = P(X|Z) * P(Y|Z)

- It is useful concept in many areas, including machine learning where it is often used in bayesian networks to simplify the represetntation and computation of complex prob. models. It can also be used to make inferences about causal relationships between variables.

- A causal relationship between variable referes to a relationship in which one variable, known as the cause or independent variable, directly affects other variable, known as the effect or dependent variable. Like amount of fertilizer and plant length growth.

> Example

Suppose a dataset of medical records exist that includes information about patient's age, smoking status and whether they have lung cancer or not. We are interested in understanding the relationship between age and lung cancer, and we want to know whether smoking affects this relationship.

- We use conditional independence to determine whether age and lung cancer are indep, given smoking status. If they are cond. independent, then smoking status doesn't affect the relationship between age and lung canecer. If they are not, then smoking does affect this relationship.

> For same, We calculate the following probab
- P(Age,lung_cancer| smoking_status)
- P(Age| smoking_status)
- P(lung_cancer| smoking_status)

If it satisfies the equation above then, we can say that age and lung cancer are cond. independent given smoking status


In [7]:
import numpy as np
import pandas as pd

# create a dataset
data = pd.DataFrame({
    'age': [1, 1, 2, 2, 3, 3, 4, 4],
    'smoking': [0, 1, 0, 1, 0, 1, 0, 1],
    'lung_cancer': [0, 0, 0, 0, 1, 1, 1, 1]
})
# calculate conditional probabilities
p_age_lung_smoking = pd.crosstab(index=data['age'], columns=[data['lung_cancer'], data['smoking']], normalize='index')
p_age_smoking = pd.crosstab(index=data['age'], columns=data['smoking'], normalize='index')
p_lung_smoking = pd.crosstab(index=data['lung_cancer'], columns=data['smoking'], normalize='index')




In [13]:
print(p_age_lung_smoking.shape)
p_age_lung_smoking

(4, 4)


lung_cancer,0,0,1,1
smoking,0,1,0,1
age,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
1,0.5,0.5,0.0,0.0
2,0.5,0.5,0.0,0.0
3,0.0,0.0,0.5,0.5
4,0.0,0.0,0.5,0.5


In [14]:
print(p_age_smoking.shape)

p_age_smoking

(4, 2)


smoking,0,1
age,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.5,0.5
2,0.5,0.5
3,0.5,0.5
4,0.5,0.5


In [15]:
print(p_lung_smoking.shape)

p_lung_smoking

(2, 2)


smoking,0,1
lung_cancer,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.5,0.5
1,0.5,0.5


In [16]:
# reshape p_age_smoking and p_lung_smoking
p_age_smoking_reshaped = p_age_smoking.values.reshape(-1, 1, 1)
p_lung_smoking_reshaped = p_lung_smoking.values.reshape(1, -1, 1)

# test for conditional independence
is_cond_ind = np.isclose(p_age_lung_smoking, p_age_smoking_reshaped * p_lung_smoking_reshaped, rtol=1e-5, atol=1e-8).all()

print(f"Age and lung cancer are {'conditionally independent' if is_cond_ind else 'not conditionally independent'} given smoking status.")


Age and lung cancer are not conditionally independent given smoking status.
