#### About
> Independence

- Independence is a statistical concept that describes the relationship between two random variables. Two random variables X and Y are said to be independent if the occurence of one event does not affect the prob of the occurence of the other.

- P(X=x,Y=y) = P(X=x) * P(Y=y)

- Use cases
1. Coin toss - The toss of a coin : Prob of getting heads on the first toss doesn't depend on the outcome of the second toss and vice versa.
2. Medical tests - Results of multiple tests are often used to make a diagnosis. If the tests are independent, The prob of a disease can be calculated as prod of prob of each test being positive.
3. Weather - In forecasting, The temp and precipitation are two indep. variables. The occurence of rain doesn't depend on temp and vice versa.

Example 

Survey to check if their is a relationship between gender and those who like dogs.

- Using chi-square test for independence to determine if there is a significant association between gender and liking dogs.

In [7]:
import pandas as pd
from scipy.stats import chi2_contingency

# data collected in survey
data = {'gender': ['Male', 'Male', 'Male', 'Female', 'Female', 'Female'], 
        'likes_dogs': ['Yes', 'No', 'No', 'Yes', 'Yes', 'No']}
df = pd.DataFrame(data)

# create a contingency table
contingency_table = pd.crosstab(df['gender'], df['likes_dogs'])

print(contingency_table)

likes_dogs  No  Yes
gender             
Female       1    2
Male         2    1


- Computation of chi-square statistic and p-value : If the p-value is less than a chosen significance level (e.g., 0.05), we reject the null hypothesis of independence and conclude that the variables are dependent. If the p-value is greater than the chosen significance level, we fail to reject the null hypothesis and conclude that the variables are independent.

In [8]:

# perform the chi-square test for independence
chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print('Chi-square statistic:', chi2)
print('p-value:', p_value)
print('Degrees of freedom:', dof)
print('Expected values:', expected)


Chi-square statistic: 0.0
p-value: 1.0
Degrees of freedom: 1
Expected values: [[1.5 1.5]
 [1.5 1.5]]


The p-value is 1.0, which is greater than the significance level of 0.05. This means that we fail to reject the null hypothesis of independence and conclude that there is not enough evidence to suggest that there is a significant association between gender and liking dogs.
