# Analysis 01: NemaScan Pipeline Documentation

## Overview

This document describes how the GWAS mapping results in `data/processed/20231116_Analysis_NemaScan/` were generated using the NemaScan pipeline. **These results are provided pre-computed and will be automatically decompressed during the analysis workflow. You do not need to re-run this pipeline unless you want to modify mapping parameters or use different input data.**

## Pipeline Information

-   **Pipeline**: [andersenlab/nemascan](https://github.com/andersenlab/nemascan)
-   **Commit hash**: `b58711369124885fb90ce9c53b720313fa68f79b`

## Input Data

The pipeline requires trait data generated by `analysis/trait_generate_mapping_inputs.qmd`, which processes raw phenotype data into the format required by NemaScan.

## Command Used

The pipeline was executed on an HPC cluster with the following command:

``` bash
nextflow run andersenlab/nemascan \
  -profile mappings \
  -r b58711369124885fb90ce9c53b720313fa68f79b \
  --vcf 20220216 \
  --traitfile /projects/b1059/projects/Tim/2021_GWA_manuscript/data/raw/traitfiles/toxicant.traits.tsv \
  --sthresh EIGEN \
  --group_qtl 200 \
  --ci_size 50 \
  --finemap TRUE \
  --mediation TRUE \
  --out /projects/b1059/projects/Tim/2021_GWA_manuscript/data/processed/20231116_Analysis_NemaScan
```

## Re-running the Pipeline

If you need to re-run the mapping analysis:

### 1. Clone the NemaScan repository at the specific commit

``` bash
git clone https://github.com/andersenlab/nemascan.git
cd nemascan
git checkout b58711369124885fb90ce9c53b720313fa68f79b
```

### 2. Prepare input data

-   Run `analysis/trait_generate_mapping_inputs.qmd` to generate trait files
-   Ensure you have access to the WI VCF version 20220216, which can be downloaded from [CaeNDR](https://caendr.org/data/data-release)

### 3. Run the pipeline

``` bash
nextflow run main.nf \
  -profile mappings \
  --vcf 20220216 \
  --traitfile <path_to_trait_file> \
  --sthresh EIGEN \
  --group_qtl 200 \
  --ci_size 50 \
  --finemap TRUE \
  --mediation TRUE \
  --out <output_directory>
```

### 4. Replace the archived results

-   Compress the output directory using tar with xz compression (more efficient than gzip):

    ``` bash
    tar -cJf 20231116_Analysis_NemaScan.tar.xz 20231116_Analysis_NemaScan/
    ```

-   Replace `data/raw/nemascan_output/20231116_Analysis_NemaScan.tar.xz` with your new archive