In this lesson, we'll examine how to use <b>contingency tables</b> and concepts from probability to explore the governors' data in new ways. First, some vocabulary:

<ul>
    <li><b>Outcome table:</b> lists the various possible outcomes of a set of similar or related events.</li>
    <li><b>Binomial:</b> meaning "two names," binomial data has just two possible outcomes.</li>
    <li><b>Binomial distribution:</b> a model of the possibilities across multiple trials. </li>
    <li><b>Trial:</b> a group of events that test the probability of the occurrence of certain events</li>
    <li><b>Contingency table:</b> a tool for asking 'what if' questions about a more complex, linked set of outcomes. They must have at least four cells. Each cell lies at the intersection of exactly one row and one column and represents the number of times (or the percentage) that the events described in the corresponding row and column occur.</li>
    </ul>


In [1]:
#As always, we must begin by first reading the dataset into R.
GovData.SimplifiedFates <- read.csv("C:/Users/ASG/Dropbox/Scholarship/AhmedBey/Data/GovernorsData-Git/GovData/GovData-SimplifiedFates.csv")

#If you'd like to preview the data, uncomment the following line of code.
head(GovData.SimplifiedFates, 5)

Unnamed: 0_level_0,Transliterated.Name,Start.Date,End.Date,Tenure..Years.,Fate,Ethnicity
Unnamed: 0_level_1,<chr>,<int>,<int>,<dbl>,<chr>,<chr>
1,RamdÄn-TshÅ«laq BÄy,1567,1574,7,Non-Violent Fate,Ottoman
2,JÊ¿far BÄy,1574,1588,14,Non-Violent Fate,Ottoman
3,Muhammad Ben Ferá¸¥Ät BÄy,1588,1608,20,Killed in battle,Algerian
4,á¸¤asan BÄy,1608,1622,14,Non-Violent Fate,Ottoman
5,MurÄd BÄy,1622,1647,25,Killed in battle,Ottoman


In [5]:
#Install (if necessary) and load packages we will need for this lesson
#Sinstall.packages("dplyr")
library(dplyr)

Installing package into 'C:/Users/ASG/Documents/R/win-library/4.0'
(as 'lib' is unspecified)

also installing the dependencies 'pkgconfig', 'purrr', 'generics', 'magrittr', 'R6', 'tibble', 'tidyselect'




package 'pkgconfig' successfully unpacked and MD5 sums checked
package 'purrr' successfully unpacked and MD5 sums checked
package 'generics' successfully unpacked and MD5 sums checked
package 'magrittr' successfully unpacked and MD5 sums checked
package 'R6' successfully unpacked and MD5 sums checked
package 'tibble' successfully unpacked and MD5 sums checked
package 'tidyselect' successfully unpacked and MD5 sums checked
package 'dplyr' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\ASG\AppData\Local\Temp\RtmpSMcxgv\downloaded_packages



Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




In [6]:
#save the data in a dataframe
fates <- as_tibble(data.frame(GovData.SimplifiedFates))

#preview data frame
head(fates, 5)

Transliterated.Name,Start.Date,End.Date,Tenure..Years.,Fate,Ethnicity
<chr>,<int>,<int>,<dbl>,<chr>,<chr>
RamdÄn-TshÅ«laq BÄy,1567,1574,7,Non-Violent Fate,Ottoman
JÊ¿far BÄy,1574,1588,14,Non-Violent Fate,Ottoman
Muhammad Ben Ferá¸¥Ät BÄy,1588,1608,20,Killed in battle,Algerian
á¸¤asan BÄy,1608,1622,14,Non-Violent Fate,Ottoman
MurÄd BÄy,1622,1647,25,Killed in battle,Ottoman


## Outcome Tables

Let's begin by looking at outcome tables for two of our nominal variables: fate and ethnicity.


In [7]:
#Outcome table for governors' ethnicity by raw count (the number of governors of each ethnicity)

table(fates$Ethnicity)


Algerian European Kulughlu  Ottoman  Unknown 
       3        3        6       31        3 

In [8]:
#Outcome table for governors' ethnicity by percentage
prop.table(table(fates$Ethnicity))


  Algerian   European   Kulughlu    Ottoman    Unknown 
0.06521739 0.06521739 0.13043478 0.67391304 0.06521739 

In [9]:
#Outcome table for governors' fates by raw count (the number of governors who met each fate)

table(fates$Fate)


       Killed in battle        Non-Violent Fate                 Unknown 
                      4                      18                       4 
Willful Violent Removal 
                     20 

In [10]:
#Outcome table for governors' fates by percentage
prop.table(table(fates$Fate))


       Killed in battle        Non-Violent Fate                 Unknown 
             0.08695652              0.39130435              0.08695652 
Willful Violent Removal 
             0.43478261 

## Contingency Tables

The outcome tables give us a quick summary of single variables, but it is often far more interesting to see two variables in relation to one another. Contingency tables are perfectly suited to this task. The contingency table of governors' ethnicities and fates will tell us how many governors of each ethnicity met which fate. 

In [17]:
conTable <- table(fates$Ethnicity, fates$Fate)
conTable

          
           Killed in battle Non-Violent Fate Unknown Willful Violent Removal
  Algerian                1                0       2                       0
  European                0                1       0                       2
  Kulughlu                1                0       1                       4
  Ottoman                 2               15       0                      14
  Unknown                 0                2       1                       0

With the simple contingency table above, we can answer a number of questions:

<ul>
    <li>How many governors were killed in battle, enjoyed a non-violent fate, were removed violently or met an unknown fate?</li>
    <li>Was it more likely for governors to meet a violent or non-violent end?</li>
    <li>What was the likely ethnicity and fate of a governor of Constantine chosen at random?</li>
    <li>Did a governor's ethnicity make a difference in his likely fate?</li>
</ul>

This list of questions is not exhaustive, but we will begin to answer these and other questions as we add some useful information to this table and execute several transformations.

In [18]:
#To make answering some of the questions above easier, let's calculate the row and column totals, or marginal totals

rowSums(conTable)

In [19]:
colSums(conTable)

In [22]:
#Here's another way to calculate the marginal totals
margin.table(conTable) #Total number of governors
margin.table(conTable,1) #Marginal totals for rows
margin.table(conTable,2) #Marginal totals for columns


Algerian European Kulughlu  Ottoman  Unknown 
       3        3        6       31        3 


       Killed in battle        Non-Violent Fate                 Unknown 
                      4                      18                       4 
Willful Violent Removal 
                     20 

In [23]:
fateProbs <- conTable/margin.table(conTable) #Calculate the probabilities
fateProbs #view table of probabilities

          
           Killed in battle Non-Violent Fate    Unknown Willful Violent Removal
  Algerian       0.02173913       0.00000000 0.04347826              0.00000000
  European       0.00000000       0.02173913 0.00000000              0.04347826
  Kulughlu       0.02173913       0.00000000 0.02173913              0.08695652
  Ottoman        0.04347826       0.32608696 0.00000000              0.30434783
  Unknown        0.00000000       0.04347826 0.02173913              0.00000000

In [12]:
# Another way to create the above table of probabilities is to use prop.table

prop_conTable <- prop.table(table(fates$Ethnicity, fates$Fate))
prop_conTable

          
           Killed in battle Non-Violent Fate    Unknown Willful Violent Removal
  Algerian       0.02173913       0.00000000 0.04347826              0.00000000
  European       0.00000000       0.02173913 0.00000000              0.04347826
  Kulughlu       0.02173913       0.00000000 0.02173913              0.08695652
  Ottoman        0.04347826       0.32608696 0.00000000              0.30434783
  Unknown        0.00000000       0.04347826 0.02173913              0.00000000

In [13]:
# You can also convert the above decimal format to percentages by multiplying by 100

prop_conTable <- prop.table(table(fates$Ethnicity, fates$Fate))*100
prop_conTable

          
           Killed in battle Non-Violent Fate   Unknown Willful Violent Removal
  Algerian         2.173913         0.000000  4.347826                0.000000
  European         0.000000         2.173913  0.000000                4.347826
  Kulughlu         2.173913         0.000000  2.173913                8.695652
  Ottoman          4.347826        32.608696  0.000000               30.434783
  Unknown          0.000000         4.347826  2.173913                0.000000

The above table is a <b>joint probability table</b>, which we can identify by noticing that each row and each column does not sum to 100 percent. Joint probability is the probability that two events co-occur, or happen together. Visually, joint probabilities are depicted as the area of intersection between two circles in a Venn Diagram:

![alt text](https://www.statisticshowto.com/wp-content/uploads/2013/12/venn-diagram-intersection.jpg "A Venn diagram intersection shows the intersection of events A and B happening together")
Image from: Stephanie Glen, <a href = "https://www.statisticshowto.com/joint-probability-distribution/">"Joint Probability and Joint Distributions: Definition, Examples"</a>, <a href = "https://www.statisticshowto.com/">StatisticsHowTo.com: Elementary Statistics for the rest of us!</a> (Accessed 25 January 2021).

Therefore, according to the table above, the joint probability that governor was both Algerian and killed in battle, for example, is 2.17%.