# Analysis 06: Organize Strain Outliers

In [None]:

# Script to identify outlier strains based on their response to toxicants
# This script reads in strain and drug means data and identifies strains that are
# outliers (either highly susceptible or minimally susceptible) based on 2SD thresholds

library(tidyverse)


── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

`summarise()` has grouped output by 'strain'. You can override using the
`.groups` argument.

## Strains that were outliers to multiple drugs

In [None]:

# print the unique nuber of strains that were outliers
n_outlier_strains <- strain_means_anno %>%
  dplyr::filter(HighResponder == 1 | LowResponder == 1) %>%
  dplyr::pull(strain) %>%
  unique() %>%
  length()

multi_outlier_strains <- outlier_summary_df %>%
  dplyr::filter(Count > 1) %>%
  dplyr::pull(strain) %>%
  unique() %>%
  length()

# small .csv file summarizing the number of strains that are outliers to multiple drugs
multi_outlier_strains_table <- tibble::tibble(
  n_outlier_strains = n_outlier_strains,
  multi_outlier_strains = multi_outlier_strains
)

# save the table
data.table::fwrite(
  multi_outlier_strains_table,
  "data/processed/phenotypes/number_outlier_strains_table.csv"
)

cat(sprintf("Number of strains that are outliers: %d\n", n_outlier_strains))


Number of strains that are outliers: 108

Number of strains that are outliers to multiple drugs: 46