# What causes antibiotic resistance?

**Project 2.**\
Lab journal by Anna Ogurtsova

---

### Before starting
Some tools are needed for analysis. You should download and install following tools (it is recommended to use venv (conda, mamba etc.) before installing):

* EntrezDirect ```conda install -c bioconda entrez-direct``` (version 16.2 was used)
* BWA ```sudo apt-get install bwa``` (Version: 0.7.17-r1188 was used)
* SAMTOOLS ```sudo apt-get install samtools``` (samtools 1.13 was used)
* Desktop version of IGV - https://igv.org/doc/desktop/
* BCFtools ```sudo apt install bcftools``` (bcftools 1.13 was used)
* SRA Toolkit [Download link](https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit) (version 3.0.7 was used)
* Snakemake ```sudo apt -y install snakemake``` (Version: 6.15.1 was used)    
* Snakefile
* R [Install instruction](https://linux.how2shout.com/how-to-install-r-base-ubuntu-22-04-lts-jammy/) (version 4.3.1 was used)

### 1. Run Snakefile

In [None]:
!snakemake --cores 12 --config sample=SRR1705851 id=KF848938.1 freq=0.001 -p KF848938.1.SRR1705851.varscan.0.001.csv
!snakemake --cores 12 --config sample=SRR1705858 id=KF848938.1 freq=0.001 -p KF848938.1.SRR1705858.varscan.0.001.csv
!snakemake --cores 12 --config sample=SRR1705859 id=KF848938.1 freq=0.001 -p KF848938.1.SRR1705859.varscan.0.001.csv
!snakemake --cores 12 --config sample=SRR1705860 id=KF848938.1 freq=0.001 -p KF848938.1.SRR1705860.varscan.0.001.csv

### 2. Calculate the average frequencies and standard deviation of detected variants

In [None]:
%%R
library(dplyr)
control_1 <- read.csv('KF848938.1.SRR1705858.varscan.0.001.csv')
control_2 <- read.csv('KF848938.1.SRR1705859.varscan.0.001.csv')
control_3 <- read.csv('KF848938.1.SRR1705860.varscan.0.001.csv')
roommate <- read.csv('KF848938.1.SRR1705851.varscan.0.001.csv')

names(control_1) <- c("sample", 'position', 'ref_base', 'alt_base', 'frequency')
names(control_2) <- c("sample", 'position', 'ref_base', 'alt_base', 'frequency')
names(control_3) <- c("sample", 'position', 'ref_base', 'alt_base', 'frequency')
names(roommate) <- c("sample", 'position', 'ref_base', 'alt_base', 'frequency')

mean1 <- mean(control_1$frequency)
mean2 <- mean(control_2$frequency)
mean3 <- mean(control_3$frequency)


sd1 <- sd(control_1$frequency)
sd2 <- sd(control_2$frequency)
sd3 <- sd(control_3$frequency)

mean_roommate <- mean(roommate$frequency)

### 3. Search for rare mutations that aren't sequencing errors

In [None]:
%%R
mean(roommate$frequency)

total_mean <- (mean1+mean2+mean3)/3
mean_sd <- mean(c(sd1,sd2,sd3))

lower_border <- total_mean-3*mean_sd
upper_border <- total_mean+3*mean_sd

chosen_mutations <- roommate %>% filter (frequency > upper_border)
print(chosen_mutations)

*output*

```
             sample position ref_base alt_base frequency
KF848938.1    5         307       C        T     0.94
KF848938.1   21        1458       T        C     0.84
```