# Getting Started

In this tutorial, we will cover some of the basics of `funkea`, and run through a few simple examples of how one can get various enrichment results from GWAS sumstats. In these examples, we will use the `Fisher` method for computing the enrichments, as it is simple and quick.

In [None]:
from funkea.core import data
from funkea.implementations import Fisher
from pyspark.sql import SparkSession

Provide the filepath to your GWAS sumstats. Here we used `ieu-b7` from OpenGWAS, which is a Parkinson's study on a European population.

In [None]:
SUMSTATS_PATH = "data/sumstats.parquet"

In [None]:
spark = (
    SparkSession.builder
    .master("local[2]")
    .getOrCreate()
)
sumstats = spark.read.parquet(SUMSTATS_PATH)
sumstats.show()

Next, we instantiate the `AnnotationComponent` object. This is an abstraction layer on top of the genomic annotations data (which is provided in tabular format), such that annotations can be used interchangeably. Here, we used (a subset of) the KEGG dataset, where the annotations are the genes and the partitions are the KEGG pathways. The partition type is `HARD`, i.e. a gene is either in a pathway or not; there is no distribution. The `dataset` is provided as a filepath, but we could just as well passed in a Spark dataframe.

In [None]:
kegg = data.AnnotationComponent(
    columns=data.AnnotationColumns(
        annotation_id="gene_id", partition_id="pathway_name"
    ),
    partition_type=data.PartitionType.HARD,
    dataset="data/kegg.parquet",
)

Now, we instantiate the model, using the `default` configuration (more on this in the next section). We make sure to pass the annotation component to the default configuration, to overwrite the default annotation component (GTEx).

In [None]:
model = Fisher.default(annotation=kegg)
enrichments = model.transform(sumstats)

In [None]:
enrichments.show(truncate=False)

## Composability

We saw how we could easily run functional enrichment experiments on GWAS sumstats using default configurations. However, `funkea` also offers ways of exploring various parameter settings and pipeline compositions to create new enrichment workflows.

But before we do so, let us consider some concepts `funkea` employs to make this possible. Every workflow implementation follows the schematic show below:

![schematic](docs/source/_static/schematic.png)

i.e. each workflow consists of (1) a data pipeline; and (2) an enrichment method. The former filters down the sumstats (`variant_selection`), creates loci from the remaining variants (`locus_definition`) and then finally associates these loci with annotations (`annotation`). The latter then takes the loci (including their annotations) and computes the study-wide enrichments for each annotation partition, and its respective significance.

In the following example, we will run the same enrichment experiment as above, but with some small modifications (purely for demonstration purposes):

1. We reduce the $p$-value threshold and remove the dropping of ambiguous variants from the `variant_selector`.
2. We replace a simple locus-annotation overlap with an overlap of an extended locus, i.e. we expand by $10,000$ base pairs into either direction.

<div class="alert alert-block alert-warning">
<b>Note:</b> While "variant_selection" transforms are both idempotent and commutative, "locus_definition" transforms are not. That means, the order in which they appear matters and some will assume that others have come before (e.g. "Merge" depends on "Collect").
</div>

In [None]:
from funkea.components import locus_definition as ld
from funkea.components import variant_selection as vs
from funkea.implementations import fisher

In [None]:
model = Fisher(
    pipeline=fisher.Pipeline(
        ld.Compose(
            ld.Expand(extension=(10_000, 10_000)),
            ld.Overlap(),
            annotation=kegg
        ),
        variant_selector=vs.Compose(
            vs.AssociationThreshold(
                threshold=5e-10
            ),
            vs.DropHLA(),
            vs.DropIndel(),
        )
    ),
    method=fisher.Method()
)

In [None]:
model.transform(sumstats).show(truncate=False)