# Fine-mapping with SuSiE RSS model

This notebook take a list of LD reference files and a list of sumstat files from various association studies ...

## Input


I. **GWAS Summary Statistics Files**
- **Input**: Vector of files for one or more GWAS studies.
- **Format**: 
  - Tab-delimited files.
  - First 4 columns: `chrom`, `pos`, `A1`, `A2`
  - Additional columns can be loaded using column mapping file see below  
- **Column Mapping files (optional)**:
  - Optional YAML file for custom column mapping.
  - Required columns: `chrom`, `pos`, `A1`, `A2`, either `z` or (`beta` and `se`).
  - Optional columns: `n`, `var_y`.

II. **GWAS Summary Statistics Meta-File**: this is optional and helpful when there are lots of GWAS data to process via the same command
- **Columns**: `study_id`, chromosome number, path to summary statistics file, optional path to column mapping file.
- **Note**: Chromosome number `0` indicates a genome-wide file.

eg: `gwas_meta.tsv`

```
study_id    chrom    file_path            column_mapping_file
study1      1        gwas1.tsv.gz         column_mapping.yml
study1      2        gwas2.tsv.gz         column_mapping.yml
study2      0        gwas3.tsv.gz         column_mapping.yml
```

If both summary stats file (I) and meta data file (II) are specified we will take the union of the two.

eg. `column_mapping.yml` left: standard name. Right: original column name.

```
chrom:chromosome
pos:base_pair_location
A1:effect_allele
A2:other_allele
beta:beta
se:standard_error
pvalue:p_value
maf:maf
n_cases:n_cases
n_controls:n_controls
```


III. **LD Reference Metadata File**
- **Format**: Single TSV file.
- **Contents**:
  - Columns: `chr`, `start`, `end`, path to the LD matrix, genomic build.
  - LD matrix path format: comma-separated, first entry is the LD matrix, second is the bim file.
- **Documentation**: Refer to our LD reference preparation document for detailed information (Tosin pending update).

IV. For analyzing specific genomic regions, you can specify them using the `--region-names` option in the 'chr:start-end' format, where multiple regions are accepted. Alternatively, you may provide a file containing a list of regions through the `--region-list` option, also adhering to the 'chr:start-end' format. When both `--region-names` and `--region-list` are provided, union of these options will be used to analyze. In cases where neither option is specified, the analysis defaults to encompass all regions specified in the LD reference metadata.

## Output

1. A RDS file containing SuSiE output object.
2. Summary statistics, QC-ed and QC summary.

## MWE

In [None]:
sos run xqtl-pipeline/pipeline/SuSiE_RSS.ipynb SuSiE_RSS \
    --ld-meta-data ADSP_R4_EUR.LD.list \
    --gwas-meta-data AD_sumstat_list.txt \
    --impute \
    --container oras://ghcr.io/cumc/pecotmr_apptainer:latest

In [None]:
[global]
parameter: cwd = path("output/")
parameter: gwas_meta_data = path()
parameter: ld_meta_data = path()
parameter: gwas_name = []
parameter: gwas_data = []
parameter: column_mapping = []
parameter: region_list = path()
parameter: region_name = []
parameter: container = ''
import re
parameter: entrypoint= ('micromamba run -a "" -n' + ' ' + re.sub(r'(_apptainer:latest|_docker:latest|\.sif)$', '', container.split('/')[-1])) if container else ""
parameter: job_size = 10
parameter: walltime = "5h"
parameter: mem = "16G"
parameter: numThreads = 1
parameter: impute = True # Whether to impute the sumstat for all the snp in LD but not in sumstat.
parameter: QC = True

def group_by_region(lst, partition):
    # from itertools import accumulate
    # partition = [len(x) for x in partition]
    # Compute the cumulative sums once
    # cumsum_vector = list(accumulate(partition))
    # Use slicing based on the cumulative sums
    # return [lst[(cumsum_vector[i-1] if i > 0 else 0):cumsum_vector[i]] for i in range(len(partition))]
    return partition
import os
if (not os.path.isfile(region_list)) and len(region_name) == 0:
    region_list = ld_meta_data

In [None]:
[get_analysis_regions: shared = "regional_data"]
import os
import pandas as pd
from collections import OrderedDict

def file_exists(file_path, relative_path=None):
    """Check if a file exists at the given path or relative to a specified path."""
    if os.path.exists(file_path) and os.path.isfile(file_path):
        return True
    elif relative_path:
        relative_file_path = os.path.join(relative_path, file_path)
        return os.path.exists(relative_file_path) and os.path.isfile(relative_file_path)
    return False

def check_required_columns(df, required_columns):
    """Check if the required columns are present in the dataframe."""
    missing_columns = [col for col in required_columns if col not in df.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {', '.join(missing_columns)}")

def parse_region(region):
    """Parse a region string in 'chr:start-end' format into a list [chr, start, end]."""
    chrom, rest = region.split(':')
    start, end = rest.split('-')
    return [int(chrom), int(start), int(end)]

def extract_regional_data(gwas_meta_data, gwas_name, gwas_data, column_mapping, region_name=None, region_list=None):
    """
    Extracts data from GWAS metadata files and additional GWAS data provided. 
    Optionally filters data based on specified regions.

    Args:
    - gwas_meta_data (str): File path to the GWAS metadata file.
    - gwas_name (list): Vector of GWAS study names.
    - gwas_data (list): Vector of GWAS data.
    - column_mapping (list, optional): Vector of column mapping files.
    - region_name (list, optional): List of region names in 'chr:start-end' format.
    - region_list (str, optional): File path to a file containing regions.

    Returns:
    - GWAS Dictionary: Maps study IDs to a list containing chromosome number, 
      GWAS file path, and optional column mapping file path.
    - Region Dictionary: Maps region names to lists [chr, start, end].

    Raises:
    - FileNotFoundError: If any specified file path does not exist.
    - ValueError: If required columns are missing in the input files or vector lengths mismatch.
    """
    # Check vector lengths
    if len(gwas_name) != len(gwas_data):
        raise ValueError("gwas_name and gwas_data must be of equal length")
    
    if len(column_mapping) > 0 and len(column_mapping) != len(gwas_name):
        raise ValueError("If column_mapping is provided, it must be of the same length as gwas_name and gwas_data")

    # Required columns for GWAS file type
    required_gwas_columns = ['study_id', 'chrom', 'file_path']

    # Base directory of the metadata files
    gwas_base_dir = os.path.dirname(gwas_meta_data)
    
    # Reading the GWAS metadata file
    gwas_df = pd.read_csv(gwas_meta_data, sep="\t")
    check_required_columns(gwas_df, required_gwas_columns)
    gwas_dict = OrderedDict()

    # Process additional GWAS data from vectors
    for name, data, mapping in zip(gwas_name, gwas_data, column_mapping or [None]*len(gwas_name)):
        gwas_dict[name] = {0: [data, mapping]}

    for _, row in gwas_df.iterrows():
        file_path = row['file_path']
        mapping_file = row.get('column_mapping_file')

        # Check if the file and optional mapping file exist
        if not file_exists(file_path, gwas_base_dir) or (mapping_file and not file_exists(mapping_file, gwas_base_dir)):
            raise FileNotFoundError(f"File {file_path} not found for {row['study_id']}")
        
        # Adjust paths if necessary
        file_path = file_path if file_exists(file_path) else os.path.join(gwas_base_dir, file_path)
        if mapping_file:
            mapping_file = mapping_file if file_exists(mapping_file) else os.path.join(gwas_base_dir, mapping_file)
        
        # Create or update the entry for the study_id
        if row['study_id'] not in gwas_dict:
            gwas_dict[row['study_id']] = {}

        # Expand chrom 0 to chrom 1-22 or use the specified chrom
        chrom_range = range(1, 23) if row['chrom'] == 0 else [row['chrom']]
        for chrom in chrom_range:
            if chrom in gwas_dict[row['study_id']]:
                existing_entry = gwas_dict[row['study_id']][chrom]
                raise ValueError(f"Duplicate chromosome specification for study_id {row['study_id']}, chrom {chrom}. "
                                 f"Conflicting entries: {existing_entry} and {[file_path, mapping_file]}")
            gwas_dict[row['study_id']][chrom] = [file_path, mapping_file]

    # Process region_list and region_name
    region_dict = dict()
    if region_list and os.path.isfile(region_list):
        with open(region_list, 'r') as file:
            for line in file:
                # Skip empty lines
                if not line.strip():
                    continue
                parts = line.strip().split()
                if len(parts) == 1:
                    region = parse_region(parts[0])
                elif len(parts) >= 3:
                    region = [int(parts[0]), int(parts[1]), int(parts[2])]
                else:
                    raise ValueError("Invalid region format in region_list")
                
                region_dict[f"{region[0]}:{region[1]}-{region[2]}"] = region
                
    if region_name:
        for region in region_name:
            parsed_region = parse_region(region)
            region_key = f"{parsed_region[0]}:{parsed_region[1]}-{parsed_region[2]}"
            if region_key not in region_dict:
                region_dict[region_key] = parsed_region

    return gwas_dict, region_dict

gwas_dict, region_dict = extract_regional_data(gwas_meta_data, gwas_name, gwas_data, column_mapping, region_name, region_list)
regional_data = dict([("GWAS", gwas_dict), ("regions", region_dict)])
print(regional_data)

In [None]:
[SuSiE_RSS_1]
parameter: L = 10
parameter: max_L = 100
# If available the column that indicates sample size within the sumstats
parameter: sample_size_col = []
# Sample size used to generate the sumstats
parameter: sample_size = 0
# filtering threshold for raiss imputation
parameter: rcond = 0.01
parameter: R2_threshold = 0.6
depends: sos_variable("regional_data")
regions = list(regional_data['regions'].keys())
studies = list(regional_data["GWAS"].keys())
input: for_each = ["regions", "studies"]
output: f'{cwd:a}/{step_name[:-2]}/{_studies}.{_regions.replace(":", "_")}.susie_rss.rds'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
R: expand = '${ }', stdout = f"{_output:n}.stdout", stderr = f"{_output:n}.stderr", container = container, entrypoint = entrypoint
    source("/home/hs3393/RSS_QC/pecotmr/R/raiss.R")
    library(dplyr)
    devtools::load_all("/home/hs3393/RSS_QC/previous_version/susieR/")
    library(data.table)
    sumstats=fread("${regional_data['GWAS'][_studies][regional_data['regions'][_regions][0]][0]}")
  
    # rename the columns by yml file -- make the column names consistent
    column_file_path = "${regional_data['GWAS'][_studies][regional_data['regions'][_regions][0]][1]}"
    column_data <- read.table(column_file_path, header = FALSE, sep = ":", stringsAsFactors = FALSE)
    colnames(column_data) = c("standard", "original")
    count = 1
    for (name in colnames(sumstats)){
        if(name %in% column_data$original){
            index = which(column_data$original == name)
            colnames(sumstats)[count] = column_data$standard[index]
        }
        count = count + 1
    }
  
    ## if the data don't have z scores, derive by beta/se, so that allele flip function can run
    if(length(sumstats$z) == 0){
          sumstats$z = sumstats$beta / sumstats$se
    }
  
    ## if the data don't have beta, derive it by making beta = z and se =1, so that allele flip function can run
    if(length(sumstats$beta) == 0){
          sumstats$beta = sumstats$z
          sumstats$se = 1
    }
    
    ## load region infomation
    region=data.frame(chrom = ${regional_data['regions'][_regions][0]},start = ${regional_data['regions'][_regions][1]},end = ${regional_data['regions'][_regions][2]})
    LD_meta_file=read.table("${ld_meta_data}", sep=" ", header = FALSE, col.names = c("chrom", "start", "end", "path"))
    ## Step 1: Load summary stats and LD data for a region, and match them, using the function in pecotmr::LD.R
    LD_data = load_LD_matrix(LD_meta_file, region, sumstats)
    ## Step 2: basic QC between LD and summary stats --- to correct allele flipping mainly in pecotmr
    allele_flip = allele_qc(sumstats, LD_data[[1]]$variants_df, match.min.prop=0.2, remove_dups=FALSE, flip=TRUE, remove=TRUE)
    allele_flip = allele_flip %>% mutate(variant_allele_flip = paste(chrom,pos,A1.sumstats,A2.sumstats,sep=":"))
    LD_extract = LD_data[[1]]$LD[allele_flip$variant_allele_flip,allele_flip$variant_allele_flip]
    ## Step 3: Perform SuSiE RSS with QC using Gao's prototype
    cols_sample_size=c(${','.join(['"%s"' % x for x in sample_size_col if x is not None])})
    sample_size = ${sample_size}
    L = ${L}
    sample_size_col = c(${','.join(['"%s"' % x for x in sample_size_col if x is not None])})
    ## get sample size: better specified. If not specified, calculate from median "sample_size_col". If columns to compute sample size not specified
    ## make sample size = 0, so that susie_rss will run without n (not meaning n will = 0)
    if(sample_size > 0){
      n = sample_size
    }else if(length(cols_sample_size) >= 1){
      n_col_sum <- allele_flip$${sample_size_col[0]} + allele_flip$${sample_size_col[1]}
      n = median(n_col_sum)
    }else{
      n = 0
    }
  
    # if include QC step, then correct_zR_discrepancy = TRUE
    if(${"TRUE" if QC else "FALSE"}){

      if( n > 0){
      susie_rss_result = susie_rss(bhat = allele_flip$beta, shat = allele_flip$se,
                              R = LD_extract, n = n, L = L,
                              correct_zR_discrepancy = TRUE, track_fit = FALSE)
      }else{
      # run without n
      susie_rss_result = susie_rss(bhat = allele_flip$beta, shat = allele_flip$se,
                              R = LD_extract, L = L,
                              correct_zR_discrepancy = TRUE, track_fit = FALSE)
      }

      if(${"TRUE" if impute else "FALSE"}){
        outlier = susie_rss_result$zR_outliers
        if(length(outlier) == 0){
            # no outliers, no need to imputation directly report fit result
            result = susie_rss_result
        }else{
            # with outliers, raiss imputation
            ref_panel = allele_flip %>% select("chrom", "pos", "variant_allele_flip", "A1.ref", "A2.ref")
            colnames(ref_panel) = c("chr", "pos", "variant_id", "A0", "A1") 
            known_zscore =  allele_flip %>% select("chrom", "pos", "variant_allele_flip", "A1.ref", "A2.ref", "z")
            colnames(known_zscore) = c("chr", "pos", "variant_id", "A0", "A1", "Z")
            known_zscores = known_zscore[-outlier, ] %>% arrange(pos)
            imputation_result = raiss(ref_panel, known_zscores, LD_extract, rcond = ${rcond}, R2_threshold = ${R2_threshold})
            filtered_out_variant = setdiff(allele_flip$variant_allele_flip, imputation_result$variant_id)
            filtered_out_id = which(allele_flip$variant_allele_flip %in% filtered_out_variant)
            if(length(filtered_out_id) != 0){
                LD_extract_filtered = as.matrix(LD_extract)[-filtered_out_id,-filtered_out_id]
            }else{
                LD_extract_filtered = as.matrix(LD_extract)

            }
            ## repeat step: get same sample size, if n = 0, run without n parameter
            if(n > 0){
            impute_rss_fit = susie_rss(z = imputation_result$Z, R = LD_extract_filtered, 
                               n = n,
                               L = L, correct_zR_discrepancy = FALSE,
                               track_fit = FALSE)
            }else{
            impute_rss_fit = susie_rss(z = imputation_result$Z, R = LD_extract_filtered, 
                               L = L, correct_zR_discrepancy = FALSE,
                               track_fit = FALSE)        
            }
            result = impute_rss_fit
            result$z = imputation_result$Z
        }



      }else{
        ## no imputation
             result = susie_rss_result
  
  
          }
      }else{
        ## no QC
        if( n > 0){
          result = susie_rss(bhat = allele_flip$beta, shat = allele_flip$se,
                                  R = LD_extract, n = n, L = L,
                                  correct_zR_discrepancy = FALSE, track_fit = FALSE)
          }else{
          # run without n
          result = susie_rss(bhat = allele_flip$beta, shat = allele_flip$se,
                                  R = LD_extract, L = L,
                                  correct_zR_discrepancy = FALSE, track_fit = FALSE)
          }
          
      }

    saveRDS(result, file = "${_output}")
    #write.table(allele_flip, "${_output:n}.sumstats_qced", sep = "\t", col.names=TRUE, row.names=FALSE, quote=FALSE)


    ## Output are 1) RDS file of fine-mapping results and 2) summary stats file for the region after allele flipping QC as well as the SuSiE RSS based QC
    ## For fine-mapping results we would like to report both the top variant model (LD  reference free) and the conventional fine-mapping results

    ## Ater that we repeat Step 1 and Step 3 with RSS QC (susie_rss as is)

In [None]:
[SuSiE_RSS_2]
output: pip_plot = f"{cwd}/{_input:bn}.png"
task: trunk_workers = 1, trunk_size = job_size, walltime = '12h', mem = '20G', cores = numThreads, tags = f'{step_name}_{_output:bn}'
R: container=container, expand = "${ }", stderr = f'{_output[0]:n}.stderr', stdout = f'{_output[0]:n}.stdout', entrypoint = entrypoint
    res = readRDS(${_input:r})
    png(${_output[0]:r}, width = 14, height=6, unit='in', res=300)
    par(mfrow=c(1,2))
    susieR::susie_plot(res, y= "PIP", pos=list(attr='pos',start=res$pos[1],end=res$pos[length(res$pos)]), add_legend=T, xlab="position")
    susieR::susie_plot(res, y= "z", pos=list(attr='pos',start=res$pos[1],end=res$pos[length(res$pos)]), add_legend=T, xlab="position", ylab="-log10(p)")
    dev.off()

In [None]:
[SuSiE_RSS_3]
input: group_by = 'all'
output: analysis_summary = f'{cwd}/{sumstats_path:bnn}.analysis_summary.md', variants_csv = f'{cwd}/{sumstats_path:bnn}.variants.csv'
R: container=container, expand = "${ }", entrypoint = entrypoint
    # Define the theme string
    theme <- '---
    theme: base-theme
    style: |
     p {
       font-size: 24px;
       height: 900px;
       margin-top:1cm;
      }
      img {
        height: 70%;
        display: block;
        margin-left: auto;
        margin-right: auto;
      }
      body {
       margin-top: auto;
       margin-bottom: auto;
       font-family: verdana;
      }
    ---    
    '
    text <- ""
    sep <- '\n\n---\n'

    inp <- strsplit("${_input:r}", " ")[[1]]
    inp <- sapply(inp, function(x) paste(head(strsplit(x, "\\.")[[1]], -1), collapse = "."))

    r <- unique(strsplit("${_input:bn}", " ")[[1]])

    num_csets <- numeric()
    region_info <- character()

    variant_info <- list()

    for (reg_i in seq_along(unique(inp))) {

      rid <- unlist(strsplit(r[reg_i], '\\.'))[1]

      text_temp <- ""
      text_temp <- paste0(text_temp, "#\n\n SuSiE RSS ", r[reg_i], " \n")
      text_temp <- paste0(text_temp, "![](", r[reg_i], ".png)", sep, " \n \n")

      rd <- readRDS(substr(each, 2, nchar(each)) + ".rds")

      # find the number of cs in the current region
      if (is.null(rd$sets$cs)) {
        num_csets <- c(num_csets, 0)
      } else {
        num_csets <- c(num_csets, length(rd$sets$cs))
      }
      cat(num_csets, "\n")

      # this will store the indices of all variants that cross the threshold
      ind_p <- which(rd$pip >= ${pip_cutoff})
      sumvars <- 0

      # if we have at least one cs in the current region
      if (num_csets[reg_i] > 0) {
        tbl_header <- "| chr number | pos at highest pip | ref | alt | region id | cs | highest pip |  \n| --- | --- | --- | --- | --- | --- | --- |  \n"

        table <- ""

        sumpips <- 0

        for (cset in names(rd$sets$cs)) {
          print(cset)

          # if we have many variants in the cs
          if (length(rd$sets$cs[[cset]]) > 1) {
            highestpip <- max(rd$pip[rd$sets$cs[[cset]]])
            poswhighestpip <- which.max(rd$pip[rd$sets$cs[[cset]]])

            # we make sure that ind_p only stores the variants that aren't in any cs
            ind_p <- setdiff(ind_p, rd$sets$cs[[cset]])

            # append variant info
            i <- poswhighestpip
            variant_info[[length(variant_info) + 1]] <- list(rd$chr[i], rd$pos[i], rd$ref[i], rd$alt[i], rid, cset, rd$pip[i])

            table <- paste0(table, "| ", rd$chr[i], " | ", rd$pos[i], " | ", rd$ref[i], " | ", rd$alt[i], " | ", rid, " | ", cset, " | ", sprintf("%.2f", rd$pip[i]), " |  \n")

            sumpips <- sumpips + sum(rd$pip[rd$sets$cs[[cset]]])
            sumvars <- sumvars + length(rd$sets$cs[[cset]])
          } else { # if we have only one variant in the cs
            i <- rd$sets$cs[[cset]]

            # we make sure that ind_p only stores the variants that aren't in any cs
            ind_p <- setdiff(ind_p, i)

            # append variant info
            variant_info[[length(variant_info) + 1]] <- list(rd$chr[i], rd$pos[i], rd$ref[i], rd$alt[i], rid, cset, rd$pip[i])

            table <- paste0(table, "| ", rd$chr[i], " | ", rd$pos[i], " | ", rd$ref[i], " | ", rd$alt[i], " | ", rid, " | ", cset, " | ", sprintf("%.2f", rd$pip[i]), " |  \n")

            sumpips <- sumpips + rd$pip[i]
            sumvars <- sumvars + 1
          }
        }

        text_temp <- paste0(text_temp, "- Total number of variants: ", length(rd$pip), "\n")
        text_temp <- paste0(text_temp, "- Expected number of causal variants: ", sprintf("%.2f", sumpips), "\n")
        text_temp <- paste0(text_temp, "- Number of variants with PIP > ", ${pip_cutoff}, " and not in any CS: ", length(ind_p), "\n\n")
        text_temp <- paste0(text_temp, tbl_header, table, sep)

        if (num_csets[reg_i] > 1) {
          text_temp <- paste0(text_temp, "#### CORR: Correlation between CS | OLAP: Overlap between CS\n")

          cs <- names(rd$sets$cs)

          corrheader <- "|  |"
          corrbreak <- "| --- |"

          for (i in cs) {
            corrheader <- paste0(corrheader, " CORR ", i, " |")
            corrbreak <- paste0(corrbreak, " --- |")
          }

          corrheader <- paste0(corrheader, "  |")
          corrbreak <- paste0(corrbreak, " --- |")

          for (i in cs) {
            corrheader <- paste0(corrheader, " OLAP ", i, " |")
            corrbreak <- paste0(corrbreak, " --- |")
          }

          corrheader <- paste0(corrheader, "\n")
          corrbreak <- paste0(corrbreak, "\n")

          body <- ""

          for (en in seq_along(cs)) {
            i <- cs[en]
            body <- paste0(body, "| ", i, " |")
            for (j in rd$cscorr[[en]]) {
              body <- paste0(body, " ", sprintf("%.2f", j), " |")
            }
            body <- paste0(body, "  |")
            for (j in names(rd$sets$cs)) {
              body <- paste0(body, " ", length(intersect(rd$sets$cs[[i]], rd$sets$cs[[j]])), " |")
            }
            body <- paste0(body, "\n")
          }

          text_temp <- paste0(text_temp, corrheader, corrbreak, body, sep)
        }

        region_info <- c(region_info, text_temp)
      }
    }

    f <- file(${_output["analysis_summary"]:r}, "w")
    writeLines(paste0(theme, text), f)
    close(f)

    for (i in ind_p) {
      # append variant info
      variant_info[[length(variant_info) + 1]] <- list(rd$chr[i], rd$pos[i], rd$ref[i], rd$alt[i], rid, "None", rd$pip[i])
    }

    df <- do.call(rbind, variant_info)
    colnames(df) <- c("chr", "pos", "ref", "alt", "rid", "cs", "pip")
    write.table(df, ${_output["variants_csv"]:r}, sep = "\t", row.names = TRUE, col.names = TRUE)

In [None]:
# Generate analysis report: HTML file, and optionally PPTX file
[SuSiE_RSS_4]
output: f"{_input['analysis_summary']:n}.html"
sh: container=container_marp, expand = "${ }", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', entrypoint = entrypoint
    node /opt/marp/.cli/marp-cli.js ${_input['analysis_summary']} -o ${_output:a} \
        --title '${region_file:bnn} fine mapping analysis' \
        --allow-local-files
    node /opt/marp/.cli/marp-cli.js ${_input['analysis_summary']} -o ${_output:an}.pptx \
        --title '${region_file:bnn} fine mapping analysis' \
        --allow-local-files