IUSSI2022.Rmd


---
title: "Demography data for IUSSI"
author:
- name: DEIJ Committee, IUSSI
  affiliation: In case of questions, please contact [Biplabendu Das](https://biplabendu.github.io/homepage/)
date: Last updated on `r format(Sys.time(), "%d %B, %Y")`
output:
  html_document:
    toc: true
    toc_depth: 3
    toc_float: true
    keep_md: no
    code_folding: hide
    includes:
      in_header: analytics.html
editor_options: 
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = F)
knitr::opts_chunk$set(message = F)

## For more inspiration on customizing the html output, refer to the following:
# https://bookdown.org/yihui/rmarkdown/html-document.html#table-of-contents

```

***

> What is this document?   

The goal:

Perform a rapid exploratory analyses of the demography data obtained from the survey performed on the audience interested in or/and attending the upcoming IUSSI international conference in 2022, San Diego.


```{r, include=F, echo=F}
# Let's load the data and get a glimpse.

rm(list = ls())
library(tidyverse)
library(kableExtra)
library(plot.matrix)

# Load custom functions
source("./functions/theme_publication.R")

# Load data
path_to_repo <- "/Users/biplabendudas/Documents/GitHub/homepage/"
folder.name <- "2022_demography_data/"

raw.dat <- read.csv(paste0(path_to_repo,
                           "data/",
                           folder.name,"iussi_2022_data_30Jul22.csv"), 
                header = T, stringsAsFactors = F,
                na.strings = c("NA",""," "))

# Get a glimpse
raw.dat %>% as.tibble() %>% head()

# Let's rename the columns to simple, easy-to-read/access data names
colnames(raw.dat)

new.colnames <- c("time_of_entry",
                  "age",
                  "gender",
                  "add_gender",
                  "sex_orientation",
                  "add_sex_orientation",
                  "residence_location",
                  "research_location",
                  "ethnic_identity",
                  "add_ethnic_identity",
                  "religious_identity",
                  "add_religious_identity",
                  "disability_identity",
                  "add_disability_identity",
                  "written_english",
                  "spoken_english",
                  "children_responsibility",
                  "adult_responsibility",
                  "funds_for_care_inhibit",
                  "firstgen",
                  "current_position",
                  "which_attended_before",
                  "role_attended_before",
                  "current_institute_desc",
                  "barriers_what_degree",
                  "barriers_which",
                  "add_barriers_which")

## Rename column names
colnames(raw.dat) <- new.colnames

## Specify what to call the group in which we club everything else
## Some options: other, additional_category
other.var <- "additional_category"

```

***

## 01 Raw data

> Number of folks that took the survey: `r nrow(raw.dat)`

Let's take a look at few rows and columns of the raw data.

```{r raw_data, echo=F}
# get a glimpse of the column format features of the data
raw.dat[10:13,1:7] %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped"), full_width = F, position = "center", latex_options = "scaled_down")

raw.dat[10:13,8:13] %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped"), full_width = F, position = "center", latex_options = "scaled_down")

```

### What does each column contain?

**Column name: Description**

- time_of_entry: 
- age: 
- and so on..

***

## 02 Format data

> Get the data into a tidy format

- remove columns that are not useful for a first assessment
    - time_of_entry
    - add_xxx (all columns that contain additional response not listed)
- format each column by
    - separating multiple choices into separate rows
    - factorizing the variables
    - checking for erronneous choices/inputs


```{r format_data}

## Tidy and save data
dat <- 
  
  raw.dat %>% as_tibble() %>% 
  
  
  ## remove columns that you do not want
  select(-time_of_entry) %>% 
  select(-starts_with("add")) %>% 
  
  ## add a unique ID for each person
  rownames_to_column("ID") %>% 
  
  ## AGE
  mutate(age = str_split_fixed(age, " ", n=2)[,1]) %>%
  mutate(age = ifelse(age %in% c("","Prefer"), other.var, age)) %>%
  mutate(age = as.factor(age)) %>%
      # ## Plot: AGE
      # group_by(age) %>%
      # summarise(num = n()) %>% 
      # ##
      # ggplot(aes(x=age, y=num)) +
      #   geom_point() +
      #   coord_flip() +
      #   theme_Publication()

  
  ## GENDER
    # there are two entries that contains the following two responses for gender identity
    # c("Cisgender woman;Prefer not to respond", "Transgender man;Cisgender woman")),
    # five entries that stated "Prefer not to respond", and one which is NA.
    # Rename all to "Other"
  mutate(gender = ifelse(!gender %in% c("Cisgender man", "Cisgender woman", "Non-binary",
                       "Transgender man", "Transgender woman"), other.var, gender)) %>%
  mutate(gender = as.factor(gender)) %>%
      # ## Plot: GENDER
      # group_by(gender) %>%
      # summarise(num = n()) %>%
      # na.omit() %>%
      # ##
      # ggplot(aes(x=gender, y=num/sum(num)*100)) +
      #   geom_point() +
      #   coord_flip() +
      #   theme_Publication()

  
  ## sex_orientation
  mutate(sex_orientation = ifelse(sex_orientation %in% c("Additional identity not listed? (Please click next and share)","Prefer not to respond"), other.var, sex_orientation)) %>% 
  mutate(sex_orientation = as.factor(sex_orientation)) %>% 
  
  
  ## residence_location
  mutate(residence_location = ifelse(residence_location %in% c("Central America;Other",
                                     "Continental US;South Asia",
                                     "Europe;Middle East",
                                     "Europe;Southeast Asia",
                                     "Other"), other.var, residence_location)) %>%
  mutate(residence_location = as.factor(residence_location)) %>% 


  ## research location
  # filter(research_location %in% c("Central America;Continental US", "Europe;South America")) %>%
  separate_rows(research_location, sep = ";") %>% 
  mutate(research_location = as.factor(research_location)) %>% 
  
  
  ## ethnic_identity
  separate_rows(ethnic_identity, sep = ";") %>% 
  mutate(ethnic_identity = ifelse(ethnic_identity %in% c("A race or ethnicity not listed",
                                                         "Prefer not to respond"), 
                                  other.var, ethnic_identity)) %>% 
  mutate(ethnic_identity = as.factor(ethnic_identity)) %>% 
  
  
  ## religious_identity
  mutate(religious_identity = ifelse(religious_identity %in% c("An additional affiliation not listed",
                                                               "Prefer not to respond"), 
                                     other.var, religious_identity)) %>% 
  mutate(religious_identity = as.factor(religious_identity)) %>% 
  
  
  ## disability_identity
  separate_rows(disability_identity, sep = ";") %>% 
  mutate(disability_identity = as.factor(disability_identity)) %>% 
  

  ## written_english
  mutate(written_english = as.factor(written_english)) %>% 
  
  
  ## spoken_english
  mutate(spoken_english = as.factor(spoken_english)) %>% 
  
  
  ## children_responsibility
  mutate(children_responsibility = as.factor(children_responsibility)) %>% 
  
  
  ## adult_responsibility
  mutate(adult_responsibility = ifelse(adult_responsibility %in% c("Prefer not to respond", "Other"),
                                       other.var, adult_responsibility)) %>% 
  mutate(adult_responsibility = as.factor(adult_responsibility)) %>% 
  
  
  ## funds_for_care_inhibit
  mutate(funds_for_care_inhibit = as.factor(funds_for_care_inhibit)) %>% 
  
  
  ## firstgen
  mutate(firstgen = as.factor(firstgen)) %>%  
  
  
  ## current position
  separate_rows(current_position, sep = ";") %>% 
  mutate(current_position = as.factor(current_position)) %>% 
  
  
  ## which_attended_before
  separate_rows(which_attended_before, sep = ";") %>% 
  mutate(which_attended_before = as.factor(which_attended_before)) %>% 

  
  ## role_attended_before
  separate_rows(role_attended_before, sep = ";") %>% 
  mutate(role_attended_before = as.factor(role_attended_before)) %>% 

  
  ## current_institute_desc
  separate_rows(current_institute_desc, sep = ";") %>% 
  mutate(current_institute_desc = as.factor(current_institute_desc)) %>% 
  
  
  ## barriers_what_degree
  mutate(barriers_what_degree = as.factor(barriers_what_degree)) %>% 
  
  
  ## barriers_which
  separate_rows(barriers_which, sep = ";") %>% 
  mutate(barriers_which = as.factor(barriers_which))


# ## get a glimpse of the tidy data
# dat[10:13,1:7] %>% 
#   kable() %>% 
#   kable_styling(bootstrap_options = c("striped"), full_width = F, position = "center", latex_options = "scaled_down")
# 
# dat[10:13,8:13] %>% 
#   kable() %>% 
#   kable_styling(bootstrap_options = c("striped"), full_width = F, position = "center", latex_options = "scaled_down")

```


***

## 03 Visualize

```{r define_params}

## Use this code chunk to define paramaters, such as color palettes and such

## COLOR
# role attended before
roles.color <- c("#005C53", "#F2E744", "#D9814E", "#BF6854", "#261B1A", "grey60")

```


### Age & gender

```{r age_gender}

## AGE + GENDER
dat %>% 
  
  # remove the rows with no age/gender info
  filter(age != other.var) %>% 
  filter(!is.na(gender)) %>%
  # filter(gender != "Other") %>% 
  
  # tally the number of people in a given age group for a given gender
  distinct(ID, age, gender) %>%
  group_by(age, gender) %>% 
  summarize(num=n()) %>% 
    
  ## Plot: AGE + GENDER
  ggplot(aes(x=age, y=num, fill=gender, group=gender)) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    geom_bar(position="stack", stat="identity") +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_Publication() +
    guides(fill = guide_legend(nrow = 3))

```

***

### Residence location & gender

```{r residence_location_gender}
# Residence location & gender
dat %>%

  # filter(!residence_location %in% c("Other")) %>% 
  filter(!is.na(residence_location)) %>%
  filter(!is.na(gender)) %>%
  # filter(gender != "Other") %>% 
  
  distinct(ID, residence_location, gender) %>% 
  group_by(residence_location, gender) %>%
  summarize(num = n()) %>% 
  
  ggplot(aes(x=residence_location, y=num, fill=gender, group=gender)) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    geom_bar(position="stack", stat="identity") +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_Publication() +
    guides(fill = guide_legend(nrow = 3))

```

***

### Research location & gender

```{r research_location_gender}

## Plot: research_location + gender
dat %>% 
  
  filter(!is.na(research_location)) %>%
  filter(!is.na(gender)) %>%
  # filter(gender != "Other") %>%
  
  distinct(ID, research_location, gender) %>% 
  group_by(research_location, gender) %>%
  summarize(num = n()) %>% 
  
  ggplot(aes(x=research_location, y=num, fill=gender, group=gender)) +
    geom_bar(position="stack", stat="identity", size=2) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_Publication() +
    guides(fill = guide_legend(nrow = 3))
```

> A bunch of folks perform research in Central America, but there seems to be no representation from folks living in Central America. There is, of course, the possibility that folks residing in Central America did not take the survey.

***

### Role of attendee & gender

```{r role_attended_before_gender}

dat %>% 
  
  # remove the rows with no info
  filter(!is.na(role_attended_before)) %>%
  filter(!is.na(gender)) %>%
  # filter(gender != "Other") %>% 
  
  distinct(ID, role_attended_before, gender) %>% 
  group_by(role_attended_before, gender) %>%
  summarize(num = n()) %>% 
    
  ## Plot: research_location + gender
  ggplot(aes(x=role_attended_before, y=num, fill=gender, group=gender)) +
    geom_bar(position="stack", stat="identity", size=2) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_Publication() +
    guides(fill = guide_legend(nrow = 3))

```

***

### Current position & gender

```{r current_position_gender}

dat %>% 
  
  filter(!is.na(current_position)) %>%
  filter(!is.na(gender)) %>%
  # filter(gender != "Other") %>% 

  distinct(ID, current_position, gender) %>% 
  group_by(current_position, gender) %>%
  summarize(num = n()) %>% 
  
  ggplot(aes(x=current_position, y=log2(num+1), fill=gender, group=gender)) +
    geom_bar(position="stack", stat="identity", size=2) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_Publication() +
    guides(fill = guide_legend(nrow = 3))

```

***
***

### Role of attendee & residence location

```{r role_attended_before_residence_location}

dat %>% 
  
  # remove the rows with no age/gender info
  filter(!is.na(role_attended_before)) %>%
  filter(!is.na(residence_location)) %>%
  # filter(gender != "Other") %>% 
  
  distinct(ID, role_attended_before, residence_location) %>% 
  group_by(role_attended_before, residence_location) %>%
  summarize(num = n()) %>%
    
  ggplot(aes(x=residence_location, y=num, fill=role_attended_before, group=role_attended_before)) +
    geom_bar(position="stack", stat="identity", size=2) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    # scale_fill_Publication() +
    scale_fill_manual(values = roles.color) +
    guides(fill = guide_legend(nrow = 3))

```

***

### Role of attendee & current position

```{r role_attended_before_current_position}

dat %>% 
  
  # remove the rows with no age/gender info
  filter(!is.na(role_attended_before)) %>%
  filter(!is.na(current_position)) %>%
  # filter(gender != "Other") %>% 
  
  distinct(ID, role_attended_before, current_position) %>% 
  group_by(role_attended_before, current_position) %>%
  summarize(num = n()) %>% 
    
  ## Plot: research_location + gender
  ggplot(aes(x=current_position, y=num, fill=role_attended_before, group=role_attended_before)) +
    geom_bar(position="stack", stat="identity", size=2) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_manual(values = roles.color) +
    guides(fill = guide_legend(nrow = 3))
# to do:
# normalize the number of roles undertaken by a person in past IUSSI
# number of past roles / number of past IUSSI attended

```

The above plot might be misleading since the higher number of roles assumed by (activity of) late, tenured Profs might be due to the higher number of previous conferences attened by them. So, it might make sense to have a standardized metric that gives us a better understanding of the activity of each group in IUSSI.

> Activity = number of roles assumed (including attendee) / number of IUSSIs attended


```{r role_attended_before_current_position_v2}

id <- character()
num_attended <- numeric()  
for (i in 1:length(levels(as.factor(dat$ID)))) {
    
    id[i] <- levels(as.factor(dat$ID))[i]
    
    num_attended[i] <-
      dat %>% 
      filter(ID == id[i]) %>%
      # select(which_attended_before)
      mutate(which_attended_before=as.character(which_attended_before)) %>% 
      pull(which_attended_before) %>% 
      unique() %>% 
      as.factor() %>% 
      levels() %>% length() %>% 
      as.numeric()
}

dummy.dat <- data.frame(ID=id, num_attended=num_attended)

## Add this information to the primary dataset

dat %>% 
  
  # add the num_attended column to dat
  left_join(dummy.dat, by="ID") %>% 
    # check to see if it worked
    # select(ID, which_attended_before, num_attended) %>% head(20)
  
  # remove the rows with no info
  filter(!is.na(role_attended_before)) %>%
  filter(!is.na(current_position)) %>%
  # filter(gender != "Other") %>% 
  
  select(ID, current_position, role_attended_before, num_attended) %>% 
    # head(20)
  
  distinct(ID, role_attended_before, current_position, num_attended) %>% 
  
  group_by(ID, current_position, num_attended) %>%
  summarize(num_roles = n()) %>% 
  arrange(as.numeric(ID)) %>% 
  
  # calculate activity
  mutate(activity = num_roles/num_attended) %>% 
    
  ## Plot: research_location + gender
  ggplot(aes(x=current_position, y=activity)) +
    # geom_bar(position="stack", stat="identity", size=2) +
    geom_boxplot(alpha=0.7, outlier.colour = "darkred", fill=viridis::inferno(10)[4]) +
    labs(y="# roles/attendance") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_Publication() +
    # scale_fill_manual(values = roles.color) +
    guides(fill = guide_legend(nrow = 8)) +
    theme(legend.position = "none")
```

***

### Role of attendee & ethnic identity

```{r role_attended_before_ethnic_identity}

dat %>% 
  
  # remove the rows with no age/gender info
  filter(!is.na(role_attended_before)) %>%
  filter(!is.na(ethnic_identity)) %>%
  # filter(gender != "Other") %>% 
  
  distinct(ID, role_attended_before, ethnic_identity) %>% 
  group_by(role_attended_before, ethnic_identity) %>%
  summarize(num = n()) %>% 
    
  ## Plot: research_location + gender
  ggplot(aes(x=ethnic_identity, y=num, fill=role_attended_before, group=role_attended_before)) +
    geom_bar(position="stack", stat="identity", size=2) +
    labs(y="number of people") +
    # geom_hline(yintercept = 5, col="grey60", alpha=0.7, size=1) +
    coord_flip() +
    theme_Publication() +
    scale_colour_Publication() +
    scale_fill_manual(values = roles.color) +
    guides(fill = guide_legend(nrow = 3))

```

___
___

## 05 Over/Under-represented folks

> Check which groups are *over/under-represented* at IUSSI, and for which activities.


```{r over_v1}
#  load the GeneOverlap package to perform pairwise Fisher's exact test
library(GeneOverlap)

# which variable do you need to use for the x-axis
names.x <- c("gender", 
             # "age", 
             "residence_location", 
             # "research_location",
             # "current_position", 
             "ethnic_identity")

writeLines("Variables of interest:")
writeLines(names.x)

# initialize a list to hold all the possible groups that we want to plot
# on the x-axis
list.x <- list()
for (j in 1:length(names.x)) {
  
  # what are the different groups in the filtering column
  groups.x <- dat %>% 
  pull(names.x[j]) %>% 
  as.factor() %>% 
  levels()
  # what is the name of the filtering column
  var <- names.x[[j]]
  
  
  list.sub <- list()
  for (i in 1:length(groups.x)) {
    list.sub[[i]] <-
      dat %>% filter(.data[[var]] %in% groups.x[[i]] ) %>% pull(ID) %>% unique()
    names(list.sub)[i] <- groups.x[[i]] %>% as.character()
  }
  
  # Save the results for the filtering column and name the different lists
  list.x[[j]] <- list.sub
  
}

## DO THE SAME FOR THE Y-AXIS

names.y <- c("role_attended_before")
list.y <- list()

groups.y <- dat %>% 
  pull(names.y) %>% 
  as.factor() %>% 
  levels()
var <- names.y

list.sub <- list()
  for (i in 1:length(groups.y)) {
    list.sub[[i]] <-
      dat %>% filter(.data[[var]] %in% groups.y[[i]] ) %>% pull(ID) %>% unique()
    names(list.sub)[i] <- groups.y[[i]] %>% as.character()
  }
  
# Save the results for the filtering column and name the different lists
list.y <- list.sub


## Run the pairwise Fisher's tests ##

## What is the total sample size
nFolks <- dat %>% pull(ID) %>% unique() %>% length()

## Remove the groups that contain less than 5 individuals
list.y <- list.y[sapply(list.y, length)>=5]

for (i in 1:length(list.x)) {

  # remove for x
  list.x[[i]] <- list.x[[i]][sapply(list.x[[i]], length)>=5]
    
  if (i==4) {
    names(list.x[[i]]) <- names(list.x[[i]]) %>% str_sub(end=9)
    gom <- newGOM(list.y, list.x[[i]],
              nFolks)

  } 
  ## Let's perform the pairwise Fisher's exact test
gom <- newGOM(list.x[[i]], list.y,
              nFolks)
png(paste0(path_to_repo,
           "images/", folder.name,
           names.x[i],"_v_",names.y,"_gom.png"), 
    width = 30, height = 20, units = "cm", res = 300)
drawHeatmap(gom,
            # adj.p=T,
            cutoff=0.05,
            what="odds.ratio",
            # what="Jaccard",
            log.scale = T,
            grid.col="Oranges",
            note.col="black")
trash <- dev.off()

}

```


```{r plot_gom_1, echo = FALSE, fig.align='center', fig.cap='gender', out.width="95%"}
# > Plots: Gender
# sapply(list.x[[1]], length) %>% as.data.frame() %>% select(how_many=".")
# knitr::include_graphics(paste0(path_to_repo,
#            "images/", folder.name,
#            names.x[1],"_v_",names.y,"_gom.png"))

```


```{r plot_gom_2, echo = FALSE, fig.align='center', fig.cap='residence_location', out.width="95%"}
# > Plots: Residence location
# sapply(list.x[[2]], length) %>% as.data.frame() %>% select(how_many=".")
#   knitr::include_graphics(paste0(path_to_repo,
#            "images/", folder.name,
#            names.x[2],"_v_",names.y,"_gom.png"))

```


```{r plot_gom_3, echo = FALSE, fig.align='center', fig.cap='research_location', out.width="95%"}
# > Plots: Research location
# sapply(list.x[[3]], length) %>% as.data.frame() %>% select(how_many=".")
#   knitr::include_graphics(paste0(path_to_repo,
#            "images/", folder.name,
#            names.x[3],"_v_",names.y,"_gom.png"))

```


```{r plot_gom_4, echo = FALSE, fig.align='center', fig.cap='current_position', out.width="95%"}
# > Plots: Current position
# sapply(list.x[[4]], length) %>% as.data.frame() %>% select(how_many=".")
#   knitr::include_graphics(paste0(path_to_repo,
#            "images/", folder.name,
#            names.x[4],"_v_",names.y,"_gom.png"))

```


```{r plot_gom_5, echo = FALSE, fig.align='center', fig.cap='ethnic_identity', out.width="95%"}
# > Plots: Ethnic identity
# sapply(list.x[[5]], length) %>% as.data.frame() %>% select(how_many=".")
#   knitr::include_graphics(paste0(path_to_repo,
#            "images/", folder.name,
#            names.x[5],"_v_",names.y,"_gom.png"))

```

___


The goal is to perform enrichment analyses (using hypergeometric analyses) to identify under (or over) represented groups (based on gender, ethnicity, etc) in different activities (roles) at IUSSI.

> We found several overrepresented groups for specific roles; we did not find any significantly underrepresented terms. The overrepresented groups in different IUSSI roles are shown in the table/figure below.

### Results in a table

```{r under_v1}

source(paste0(path_to_repo,"functions/underrepresentation_analyses.R"))

## Format the data
bg.dat <- 
  dat %>%
  select(ID, role_attended_before) %>%
  group_by(ID) %>%
  summarize(role_attended_before = str_c(role_attended_before, collapse = "; ")) %>%
  ungroup()
  
results.list <- list()
k <- 1
# get a list of all gender IDs
for (i in 1:length(list.x)) {
  
  names.vars <- names(list.x[[i]])
  
  for (j in 1:length(names.vars)) {
  results.list[[k]] <- 
      list.x[[i]][[j]] %>% 
        check_under(.,
                    bg.file = bg.dat,
                    what = paste0(names.x[[i]],": ",names.vars[[j]]),
                    atleast = 3)
  k <- k+1
  }
}

bind_rows(results.list, .id = "column_label") %>%
    dplyr::select(-column_label) %>% 
    as_tibble() %>% 
    filter(adj_pVal < 0.05) %>%
    mutate(adj_pVal = round(adj_pVal,3)) %>%
    filter(over_under=="over") %>% 
  
    filter(what_desc != other.var) %>% 
    filter(annot_term != "no_annot") %>% 
    
    select(annot_term:adj_pVal) %>% 
    arrange(annot_term, adj_pVal) %>% 
    
    DT::datatable()

```

---

> And, here are the results as a heatmap.

### Results as a figure

```{r under_v2}

results.dat <- 
  bind_rows(results.list, .id = "column_label") %>%
    dplyr::select(-column_label) %>% 
    as_tibble() %>% 
    filter(adj_pVal < 0.05) %>% 
    mutate(adj_pVal = round(adj_pVal,3)) %>% 
    filter(over_under=="over") %>% 
  
    filter(what_desc != other.var) %>% 
    filter(annot_term != "no_annot") %>% 
    
    select(annot_term:adj_pVal) %>% 
    arrange(annot_term, adj_pVal)
  

mat <- 
  results.dat %>% 
  select(-over_under) %>% 
  pivot_wider(names_from = "what_desc",
              values_from = "adj_pVal",
              values_fn = list(adj_pVal = min)) %>% 
  as.data.frame() %>% 
  column_to_rownames("annot_term") %>% 
  as.matrix()


# Plot the matrix
### note: 
### make sure the library "plot.matrix" is loaded
### see here for more params: https://cran.r-project.org/web/packages/plot.matrix/vignettes/plot.matrix.html
png(paste0(path_to_repo,
           "images/", folder.name,
           "enrichement_results.png"), 
    width = 30, height = 20, units = "cm", res = 300)
par(mar=c(10.1, 16.1, 4.1, 6.1))
  mat %>% 
    t() %>% 
    plot(., 
         # border=NA,
         col=viridis::viridis(3),
         spacing.key=1.5,
         fmt.key="%.3f",
         xlab = "",
         ylab = "",
         axis.col=list(side=1, las=2), axis.row = list(side=2, las=1))
trash <- dev.off()
```


```{r plot_enrichment, echo = FALSE, fig.align='center', fig.cap='Results of the enrichment test; colors indicate p-values returned by the hypergeometric test. Lower p-values suggest possible over-representation.', out.width="95%"}
  knitr::include_graphics(paste0(path_to_repo,
                                 "images/", folder.name,
                                 "enrichement_results.png"))

```

> So, what does the data tell us?

Write down the possible interpretations here. For example, the above figure tells us that

- Folks of *ethnic_identity: white* are overrepresented in most roles at IUSSI,

- Folks that relate to *gender: Cisgender woman* and *residence_location: Middle East* participate in IUSSI as attendees more than expected at random,

- and so on.

___

## What next?

*My thoughts*: Once we, the committee, have all the summary statistics (and figures), and interpretations from the enrichment analyses, we should be ready to put together a brief document (pdf) for circulation. We might be able to highlight a few areas that we can improve on in the near future, and a few areas that we need to work on in the long-term. 


___

## Supplementary

#### 1. Reproducibility

- All code used for this document is available on my [Github](https://github.com/biplabendu/homepage/blob/master/IUSSI2022.Rmd).

- The function to perform enrichment analyses can be found [here](https://github.com/biplabendu/homepage/blob/master/functions/underrepresentation_analyses.R)


#### 2. IUSSI 2014 & 2018

A summary of the demography data for folks that attended IUSSI 2014 and 2018 can be found [here](https://biplabendu.github.io/homepage/IUSSI.html).