# Getting Started

In this tutorial, we will cover some of the basics of `funkea`, and run through a few simple examples of how one can get various enrichment results from GWAS sumstats. In these examples, we will use the `Fisher` method for computing the enrichments, as it is simple and quick.

In [1]:
from funkea.core import data
from funkea.implementations import Fisher
from pyspark.sql import SparkSession

In [2]:
SUMSTATS_PATH = "data/sumstats.parquet"

In [3]:
spark = (
    SparkSession.builder
    .master("local[2]")
    .getOrCreate()
)
sumstats = spark.read.parquet(SUMSTATS_PATH)
sumstats.show()

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


23/04/03 19:33:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable




+---+------+------------+------------+-------------+-------+------+-------------------+------+-------+
|chr|   pos|        rsid|other_allele|effect_allele|   beta|    se|                  p|   maf|     id|
+---+------+------------+------------+-------------+-------+------+-------------------+------+-------+
|  1| 54591| rs561234294|           A|            G| 0.6797|0.8869|0.44350038379874507|0.0062|ieu-b-7|
|  1| 54676|   rs2462492|           C|            T|-0.0996|0.0656|0.12870013427164811|0.3166|ieu-b-7|
|  1| 79188| rs534350410|           G|            T| 0.6704|0.8795|0.44589951502820296|0.0062|ieu-b-7|
|  1| 82994| rs574556077|           A|            G|-1.2614|1.8502|0.49539999743363095|0.0016|ieu-b-7|
|  1| 86028| rs114608975|           T|            C| 0.2158|0.1222| 0.0774497463907569|0.0655|ieu-b-7|
|  1| 90231| rs553304094|           C|            A|-1.7603|1.8868| 0.3508003354622716|0.0017|ieu-b-7|
|  1| 91536|rs1251109649|           G|            T| -0.104|0.0625|0.0962

                                                                                

In [4]:
kegg = data.AnnotationComponent(
    columns=data.AnnotationColumns(
        annotation_id="gene_id", partition_id="pathway_name"
    ),
    partition_type=data.PartitionType.HARD,
    dataset="data/kegg.parquet",
)

In [5]:
model = Fisher.default(annotation=kegg)
enrichments = model.transform(sumstats)

In [6]:
enrichments.show(truncate=False)

[Stage 20:>                                                         (0 + 1) / 1]

+-------+------------------------------------------+----------+--------------------+
|id     |pathway_name                              |enrichment|p_value             |
+-------+------------------------------------------+----------+--------------------+
|ieu-b-7|lysine degradation                        |2         |0.014877136266759378|
|ieu-b-7|phosphatidylinositol signaling system     |2         |0.024412004705053864|
|ieu-b-7|primary bile acid biosynthesis            |1         |0.04926694876315757 |
|ieu-b-7|synaptic vesicle cycle                    |1         |0.05526157259817756 |
|ieu-b-7|steroid biosynthesis                      |1         |0.05526157259817756 |
|ieu-b-7|transcriptional misregulation in cancer   |1         |0.05526157259817756 |
|ieu-b-7|glycosaminoglycan degradation             |1         |0.05824548059845314 |
|ieu-b-7|fatty acid elongation                     |1         |0.07888496219190465 |
|ieu-b-7|nicotinate and nicotinamide metabolism    |1         |0.

                                                                                

## Composability

In [7]:
from funkea.components import locus_definition as ld
from funkea.components import variant_selection as vs
from funkea.implementations import fisher

In [8]:
model = Fisher(
    pipeline=fisher.Pipeline(
        ld.Compose(
            ld.Expand(extension=(10_000, 10_000)),
            ld.Overlap(),
            annotation=kegg
        ),
        variant_selector=vs.Compose(
            vs.AssociationThreshold(
                threshold=5e-10
            ),
            vs.DropHLA(),
            vs.DropIndel(),
        )
    ),
    method=fisher.Method()
)

In [9]:
model.transform(sumstats).show(truncate=False)

                                                                                

+-------+------------------------------------------+----------+--------------------+
|id     |pathway_name                              |enrichment|p_value             |
+-------+------------------------------------------+----------+--------------------+
|ieu-b-7|lysine degradation                        |2         |0.020013031955997095|
|ieu-b-7|phosphatidylinositol signaling system     |2         |0.03263839498942856 |
|ieu-b-7|primary bile acid biosynthesis            |1         |0.05725346396108495 |
|ieu-b-7|synaptic vesicle cycle                    |1         |0.06418661410626866 |
|ieu-b-7|terpenoid backbone biosynthesis           |1         |0.0744953589290203  |
|ieu-b-7|fatty acid elongation                     |1         |0.09143667068429072 |
|ieu-b-7|nicotinate and nicotinamide metabolism    |1         |0.11137648478555355 |
|ieu-b-7|pancreatic secretion                      |1         |0.11465899908936757 |
|ieu-b-7|axon guidance                             |2         |0.