# **CHAPTER 2. Differential analysis**

**Install conda env and activate it**

```
conda env create -f diff_an.yaml
```

```
conda activate diff_an
```

## **Part 0. Copy kreports from the server**

```
scp -r username@host.com:"/path/to/kreports/folder" data/kraken2_bracken/
```

## **Part 1. Data parsing**

Rename files. Delete `_kraken_report` from file names.

In [None]:
# Usage
# {path_to_script} {path_to_folder}
%run scripts/rename_files.py data/kraken2_bracken

In [None]:
! KrakenParser --complete -i data/kraken2_bracken/

In [2]:
import pandas as pd

In [3]:
# Загрузка CSV-файла
df = pd.read_csv("data/counts/csv/counts_species.csv")

# Переименование колонки
df.rename(columns={"Blattambidensovirus incertum1": "Parus major densovirus"}, inplace=True)

# Сохранение обратно в CSV (если нужно)
df.to_csv("data/counts/csv/counts_species_new.csv", index=False)

# Выводим первые строки для проверки
df.head()

Unnamed: 0,Sample_id,Candidatus Chazhemtobacterium aquaticus,Candidatus Fervidibacter sacchari,Candidatus Bipolaricaulis anaerobius,Candidatus Absconditicoccus praedator,Candidatus Nanosynbacter lyticus,Candidatus Nanosynbacter sp. HMT-352,Dissulfurimicrobium hydrothermale,Microvenator marinus,Persicimonas caeni,...,Rosenblumvirus CSA13,Andhravirus andhra,Eganvirus EtG,Eganvirus ev186,Valbvirus ValB1MD2,Circoviridae 2 LDMD-2013,Gemykroznavirus hydro1,Gemykibivirus rhina2,Blattambidensovirus incertum3,Parus major densovirus
0,D1,0,0,0,0,0,0,0,0,0,...,16,353,0,0,3086,0,0,0,31,19826745
1,D2,0,0,0,0,0,17,0,0,0,...,14,1531,0,17,6351,0,0,0,0,14822235
2,D3,0,0,0,0,0,0,0,0,0,...,0,161,0,0,620,0,0,0,0,7311619
3,D4,0,0,0,0,0,0,0,0,0,...,0,51,0,0,1855,0,0,0,0,9510020
4,D5,0,0,0,10,0,96,0,0,0,...,44,1214,18,0,3386,0,0,0,12,12848618


### **Part 1.4. Create metadata**

In [4]:
import csv

In [2]:
# Define the data
data = [
    {'sample_id': 'D1', 'Group': 'Vespertilio murinus'},
    {'sample_id': 'D2', 'Group': 'Vespertilio murinus'},
    {'sample_id': 'D3', 'Group': 'Vespertilio murinus'},
    {'sample_id': 'D4', 'Group': 'Vespertilio murinus'},
    {'sample_id': 'D5', 'Group': 'Vespertilio murinus'},
    {'sample_id': 'P1', 'Group': 'Nyctalus noctula'},
    {'sample_id': 'P2', 'Group': 'Nyctalus noctula'},
    {'sample_id': 'P3', 'Group': 'Nyctalus noctula'},
    {'sample_id': 'P4', 'Group': 'Nyctalus noctula'},
    {'sample_id': 'P5', 'Group': 'Nyctalus noctula'}
]

# Define the CSV file name
filename = 'metadata.csv'

# Write the data to the CSV file
with open(filename, mode='w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=['sample_id', 'Group'])
    writer.writeheader()
    writer.writerows(data)

print(f'{filename} has been created successfully.')

metadata.csv has been created successfully.


## **Part 2. Comparative statistics**

### **Part 2.1. Differential Microbial Abundance**

`MaAsLin2` is the next generation of `MaAsLin` (Microbiome Multivariable Association with Linear Models).

`MaAsLin2` is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta-omics features. `MaAsLin2` relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, along with a variety of filtering, normalization, and transform methods.

In [8]:
! mkdir MaAsLin2_results

#### **Part 2.1.1. _`Species`_ level**

In [None]:
%%bash
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts} {path_to_output}
Rscript scripts/MaAsLin2.R metadata.csv data/counts/csv/counts_species_new.csv MaAsLin2_results/species

In [53]:
MaAsLin2_results_species = pd.read_csv('MaAsLin2_results/species/significant_results.tsv', sep='\t')
MaAsLin2_results_species

Unnamed: 0,feature,metadata,value,coef,stderr,N,N.not.0,pval,qval


#### **Part 2.1.2. _`Genus`_ level**

In [None]:
%%bash
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts} {path_to_output}
Rscript scripts/MaAsLin2.R metadata.csv data/counts/csv/counts_genus.csv MaAsLin2_results/genus

In [55]:
MaAsLin2_results_species = pd.read_csv('MaAsLin2_results/genus/significant_results.tsv', sep='\t')
MaAsLin2_results_species

Unnamed: 0,feature,metadata,value,coef,stderr,N,N.not.0,pval,qval


#### **Part 2.1.3. _`Family`_ level**

In [None]:
%%bash
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts} {path_to_output}
Rscript scripts/MaAsLin2.R metadata.csv data/counts/csv/counts_family.csv MaAsLin2_results/family

In [57]:
MaAsLin2_results_family = pd.read_csv('MaAsLin2_results/family/significant_results.tsv', sep='\t')
MaAsLin2_results_family

Unnamed: 0,feature,metadata,value,coef,stderr,N,N.not.0,pval,qval


#### **Part 2.1.4. _`Order`_ level**

In [None]:
%%bash
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts} {path_to_output}
Rscript scripts/MaAsLin2.R metadata.csv data/counts/csv/counts_order.csv MaAsLin2_results/order

In [59]:
MaAsLin2_results_order = pd.read_csv('MaAsLin2_results/order/significant_results.tsv', sep='\t')
MaAsLin2_results_order

Unnamed: 0,feature,metadata,value,coef,stderr,N,N.not.0,pval,qval


#### **Part 2.1.5. _`Class`_ level**

In [None]:
%%bash
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts} {path_to_output}
Rscript scripts/MaAsLin2.R metadata.csv data/counts/csv/counts_class.csv MaAsLin2_results/class

In [61]:
MaAsLin2_results_class = pd.read_csv('MaAsLin2_results/class/significant_results.tsv', sep='\t')
MaAsLin2_results_class

Unnamed: 0,feature,metadata,value,coef,stderr,N,N.not.0,pval,qval


#### **Part 2.1.6. _`Phylum`_ level**

In [None]:
%%bash
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts} {path_to_output}
Rscript scripts/MaAsLin2.R metadata.csv data/counts/csv/counts_phylum.csv MaAsLin2_results/phylum

In [63]:
MaAsLin2_results_phylum = pd.read_csv('MaAsLin2_results/phylum/significant_results.tsv', sep='\t')
MaAsLin2_results_phylum

Unnamed: 0,feature,metadata,value,coef,stderr,N,N.not.0,pval,qval


As it can be seen on each taxonomic level there is no significant results in differential microbial abundance. Anyway let's visualize these results to take a closer look!

### **Part 2.2. Visualization.**

Please open `RStudio` and go through  `Volcano_plots_journal.R` script.<br>
There are a lot of manual adjustments to the plots to make it executable.

### **Part 2.3. Alpha- and Beta diversities**

Alpha diversity calculations

In [5]:
%%bash
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts} {path_to_output}
Rscript scripts/Alpha_div_calculations.R data/counts/csv/counts_species_new.csv data/alpha_div_cult.csv

Загрузка требуемого пакета: pacman


### **Part 2.4. Relative Abundance, Alpha- and Beta diversities visualization**

Please open `RStudio` and go through  `RA_n_Diversity_journal.R` script.<br>
There are a lot of manual adjustments to the plots to make it executable.