# Test for independence

Two discrete random variables ***X*** and ***Y*** are independent if the joint probability mass function satisfies

***P(X=x and Y=y)=P(X=x)·P(Y=y)***

for all x and y.

This lab will check each cell to see if there are any independant between cells

The grid index is setup as

| 0   | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  | 14  | 15  |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 16  | 17  | 18  | 19  | 20  | 21  | 22  | 23  | 24  | 25  | 26  | 27  | 28  | 29  | 30  | 31  |
| 32  | 33  | 34  | 35  | 36  | 37  | 38  | 30  | 40  | 41  | 42  | 43  | 44  | 45  | 46  | 47  |
| 48  | 49  | 50  | 51  | 52  | 53  | 54  | 55  | 56  | 57  | 58  | 59  | 60  | 61  | 62  | 63  |
| 64  | 65  | 66  | 67  | 68  | 69  | 70  | 71  | 72  | 73  | 74  | 75  | 76  | 77  | 78  | 79  |
| 80  | 81  | 82  | 83  | 84  | 85  | 86  | 87  | 88  | 89  | 90  | 91  | 92  | 93  | 94  | 95  |
| 96  | 97  | 98  | 99  | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 |
| 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 |
| 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 |
| 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 |
| 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 |


The x,y coordinates are setup as

| 0  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|----|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|
| 1  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 2  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 3  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 4  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 5  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 6  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 7  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 8  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 9  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
| 10 |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |


## References
* https://en.wikipedia.org/wiki/Joint_probability_distribution
* http://stattrek.com/online-calculator/probability-calculator.aspx
* https://www.boundless.com/statistics/textbooks/boundless-statistics-textbook/probability-8/what-are-the-chances-33/unions-and-intersections-168-4442/


In [5]:
import pymongo
import operator
import numpy as np
from pymongo import MongoClient
import pandas as pd

In [6]:
# Load dataset
client = MongoClient('localhost', 27017)
db = client.nesoi
terrain = db.terrain
terrain.count() # Should be 22639

22639

In [7]:
CType = createCType()
print CType

[u'bridge', u'bridgeGreen', u'bushGreen', u'bushRed', u'bushWhite', u'cave', u'dirt', u'dirtGrey', u'gateGreenCyclopsTop', u'gateGreenLeft', u'gateGreenRight', u'gateGreenTop', u'gateGreenTopLeft', u'gateGreenTopRight', u'gateRedLeft', u'gateRedRight', u'gateRedTop', u'gateRedTopLeft', u'gateRedTopRight', u'gateWhiteCyclopsTop', u'gateWhiteLeft', u'gateWhiteRight', u'gateWhiteTop', u'gateWhiteTopLeft', u'gateWhiteTopRight', u'guardGreen', u'guardRed', u'guardWhite', u'headstone', u'ladderGreen', u'ladderRed', u'ladderWhite', u'mountainGreen', u'mountainGreenLowerLeft', u'mountainGreenLowerRight', u'mountainGreenTop', u'mountainGreenTopLeft', u'mountainGreenTopRight', u'mountainRed', u'mountainRedLowerLeft', u'mountainRedLowerRight', u'mountainRedTop', u'mountainRedTopLeft', u'mountainRedTopRight', u'mountainWhite', u'mountainWhiteLowerLeft', u'mountainWhiteLowerRight', u'mountainWhiteTop', u'mountainWhiteTopLeft', u'mountainWhiteTopRight', u'rockGreen', u'rockRed', u'sand', u'sandRedLe

In [9]:
getDistribution(0)

[(u'mountainRed', 50.78125),
 (u'bushGreen', 14.0625),
 (u'mountainGreen', 10.9375),
 (u'bushRed', 7.8125),
 (u'mountainWhite', 6.25),
 (u'waterRed', 5.46875),
 (u'waterGreen', 4.6875)]

In [10]:
getDistribution(1)

[(u'mountainRed', 51.93798449612403),
 (u'bushGreen', 12.4031007751938),
 (u'mountainGreen', 10.852713178294573),
 (u'bushRed', 7.751937984496124),
 (u'mountainWhite', 6.2015503875969),
 (u'waterGreen', 4.651162790697675),
 (u'waterRed', 3.10077519379845),
 (u'dirt', 2.3255813953488373),
 (u'sand', 0.7751937984496124)]

In [11]:
df = getPairs(0,1)

In [None]:
def print_full(x):
    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.reset_option('display.max_rows')
    
print_full(df)

In [None]:
# P(X=x and Y=y)=P(X=x)·P(Y=y)
marginal0 = getMarginal(df, "Cell0") 
marginal1 =  getMarginal(df, "Cell1")
print marginal0
print marginal1


In [None]:
cp = cartisianProduct(marginal0, marginal1)

In [None]:
# Check if the probabilities are the same
df_pivot_prob - cp 

In [None]:
# We can find P(A ∪ B)

# P(cell0=mountainRed) ∪ P(cell1=mountainRed)
# Add up all the times cell0=mountainRed or cell1=mountainRed 64 + 3 + 1 ... don't count the 64 twice
c0mr_union_c1mr = (df_pivot.loc["mountainRed"].sum() + df_pivot["mountainRed"].sum() -  df_pivot.loc["mountainRed"]["mountainRed"]) / df_pivot.sum().sum()
print c0mr_union_c1mr

In [None]:
#p(cell0=mountainRed)
c0_mr =  marginal0.loc["mountainRed"].Prob
c1_mr =  marginal1.loc["mountainRed"].Prob
print c0_mr
print c1_mr
print c0_mr + c1_mr

In [None]:
# If depended: then this should be true // I don't know if this formula is correct
# P(A ∩ B) = P(A ∪ B) - (P(A) + P(B))
print c0mr_intersection_c1mr
print c0mr_union_c1mr - (c0_mr + c1_mr)

In [None]:
# If independent then this should be true
# P(A ∩ B) = P(A) * P(B)
print c0mr_intersection_c1mr
print c0_mr * c1_mr

In [None]:
# Are they dependent
# # P(A ∩ B) = P(A) * P(B|A)
c1mr_given_c0mr = (df_pivot.loc["mountainRed"] /  df_pivot.loc["mountainRed"].sum())["mountainRed"]
print c1mr_given_c0mr

print c0mr_intersection_c1mr 
print c0_mr * c1mr_given_c0mr
print c0mr_intersection_c1mr == c0_mr * c1mr_given_c0mr