## Overview

Nitrogen (N) fixation is a vital microbial process in soils. N-fixing microbes fox atmospheric N2 to a plant-usable form (NH3) via nitrogenase genes (*nif*H, *nif*D, *nif*K).

I analyzed the metagenomes from 2 (of 48 total) soil samples that were used a greenhouse study prior to BioBead treatment.

-   F1B - bulk field soil
-   F2R - rhizosphere (root space) soil The goal is to identify the microbial communities in these 2 samples, analyze shared and unique taxa, and screen each for the 3 *nif* genes. Ultimately, I wanted to establish a pipeline for analyzing the rest of the samples for microbial ID and functional insights.

## BioBead project

## Challenges in soil biodiversity and inoculation

## Methods

-   Confirm data integrity with checksums
-   Quality control: Fast QC and MultiQC
-   Preprocessing: trimming via Trimmomatic, merging with PEAR
-   Assembly via Megahit
-   Phylogenetic tree construction via Megan
-   Taxonomic identification via taxize
-   Prodigal for gene prediction (protein coding genes)
-   HMMER for nif gene screening

## Trimming

Trimming removes adapters and low-quality bases.


```{bash}
java -jar /home/shared/16TB_HDD_01/fish546/renee/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 4
  F1B-KM40_R1_001.fastq.gz F1B-KM40_R2_001.fastq.gz \
  F1B-KM40_trimmed_R1_paired.fastq.gz F1B-KM40_trimmed_R1_unpaired.fastq.gz \
  F1B-KM40_trimmed_R2_paired.fastq.gz F1B-KM40_trimmed_R2_unpaired.fastq.gz \
  ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 \
  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:50  #keeps bps equal or above 50 
```


This produces **unpaired** and **paired** file outputs. Paired reads are those which both forward and reverse survived trimming. These are used for downstream analysis like merging and assembly. Unpaired reads indicate where only one of the pair survived (the other was discarded due to low quality or short length).

## Merging

These are R1/R2 (forward and reverse reads) and will have to be merged. This is the last component of pre-processing as we work towards metagenome assembly.

``` bash
/home/shared/fastp-v0.24.0/fastp \
  -i F1B-KM40_trimmed_R1_paired.fastq.gz \
  -I F1B-KM40_trimmed_R2_paired.fastq.gz \
  --merge \
  --merged_out F1B-KM40_merged.fastq.gz \ 
  
/home/shared/fastp-v0.24.0/fastp \
  -i F2R-KM41_trimmed_R1_paired.fastq.gz \
  -I F2R-KM41_trimmed_R2_paired.fastq.gz \
  --merge \
  --merged_out F2R-KM41_merged.fastq.gz 
```

## Assembly

To assemble the metagenome files, MEGAHIT was used.

``` {.bash code-line-numbers="2-4"}
./megahit 
  -r ../F1B-KM40_merged.fastq.gz  #specifying input file
  -o megahit_F1B_KM40_out   #output directory
  --min-contig-len 500  #over 500 bps
  -t 8  #8 threads
```

Like other steps, this is done with both files.

``` {.bash code-line-numbers="2-4"}
./megahit \
  -r ../F2R-KM41_merged.fastq.gz \
  -o megahit_F2R_KM41_out \
  --min-contig-len 500 \
  -t 8
```

## Phylogenetic tree construction via MEGAN

Phylogenetic trees are constructed using MEGAN. This is the result for F2R, the rhizosphere soil. The code and results for F1B will be included in the next update.

<embed src="https://gannet.fish.washington.edu/seashell/snaps/sr-blastx-meganized.pdf" width="100%" height="500px" />

## Rhizosphere soil top hits

| Organism                      | Classification |
|-------------------------------|----------------|
| Acidobacteria bacterium       | Bacteria       |
| Alphaproteobacteria bacterium | Bacteria       |
| Betaproteobacteria bacterium  | Bacteria       |
| Verrucomicrobia bacterium     | Bacteria       |
| Actinobacteria bacterium      | Bacteria       |

: Table 1: Rhizosphere soil top hits

## Bulk soil top hits

## Shared vs unique taxa

## nif gene analysis

## Prodigal to parse protein-coding genes

## HMMER for nif gene screening

## nif genes in metagenome samples

## Conclusions

## Next steps: scaling up