# OAK enrichment command

This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).

This notebook provides examples for the `enrichment` command which produces a summary of ontology classes that are enriched in the associations for an input set of entities.

See also the end of the [Command Line Tutorial](https://doi.org/10.5281/zenodo.7708963)

## Help Option

You can get help on any OAK command using `--help`

In [1]:
!runoak enrichment --help

Usage: runoak enrichment [OPTIONS] [TERMS]...

  Run class enrichment analysis.

Options:
  -o, --output FILENAME           Output file, e.g. obo file
  -p, --predicates TEXT           A comma-separated list of predicates
  --autolabel / --no-autolabel    If set, results will automatically have
                                  labels assigned  [default: autolabel]
  -O, --output-type TEXT          Desired output type
  -o, --output FILENAME           Output file, e.g. obo file
  --if-absent [absent-only|present-only]
                                  determines behavior when the value is not
                                  present or is empty.
  -S, --set-value TEXT            the value to set for all terms for the given
                                  property.
  --cutoff FLOAT                  The cutoff for the p-value  [default: 0.05]
  -U, --sample-file FILENAME      file containing input list of entity IDs
                                  (e.g. gene IDs)


## Download example file and setup

We will use the HPO Association file

In [2]:
!curl -L -s http://purl.obolibrary.org/obo/hp/hpoa/genes_to_phenotype.txt > input/hpoa_g2p.tsv

next we will set up an hpo alias

In [3]:
alias hp runoak -i sqlite:obo:hp

Test this out by querying for associations for a particular orpha disease.

We need to pass in the association file we downloaded, as well as specify the file type (with `-G`):

In [4]:
hp -G hpoa_g2p -g input/hpoa_g2p.tsv associations -Q subject NCBIGene:8192 -O csv | head

subject	subject_label	predicate	object	object_label	property_values
NCBIGene:8192	None	None	HP:0000013	Hypoplasia of the uterus	[]
NCBIGene:8192	None	None	HP:0000815	Hypergonadotropic hypogonadism	[]
NCBIGene:8192	None	None	HP:0000786	Primary amenorrhea	[]
NCBIGene:8192	None	None	HP:0000007	Autosomal recessive inheritance	[]
NCBIGene:8192	None	None	HP:0000252	Microcephaly	[]
NCBIGene:8192	None	None	HP:0001250	Seizure	[]
NCBIGene:8192	None	None	HP:0004322	Short stature	[]
NCBIGene:8192	None	None	HP:0008527	Congenital sensorineural hearing impairment	[]


In [5]:
hp -G hpoa_g2p -g input/hpoa_g2p.tsv associations -Q subject .idfile input/eds-genes-ncbigene.tsv -O csv

subject	subject_label	predicate	object	object_label	property_values
NCBIGene:165	None	None	HP:0004976	Knee dislocation	[]
NCBIGene:165	None	None	HP:0002933	Ventral hernia	[]
NCBIGene:165	None	None	HP:0000465	Webbed neck	[]
NCBIGene:165	None	None	HP:0000028	Cryptorchidism	[]
NCBIGene:165	None	None	HP:0001382	Joint hypermobility	[]
NCBIGene:165	None	None	HP:0001582	Redundant skin	[]
NCBIGene:165	None	None	HP:0002616	Aortic root aneurysm	[]
NCBIGene:165	None	None	HP:0000470	Short neck	[]
NCBIGene:165	None	None	HP:0004602	Cervical C2/C3 vertebral fusion	[]
NCBIGene:165	None	None	HP:0000218	High palate	[]
NCBIGene:165	None	None	HP:0001763	Pes planus	[]
NCBIGene:165	None	None	HP:0011463	Childhood onset	[]
NCBIGene:165	None	None	HP:0001822	Hallux valgus	[]
NCBIGene:165	None	None	HP:0001075	Atrophic scars	[]
NCBIGene:165	None	None	HP:0000939	Osteoporosis	[]
NCBIGene:165	None	None	HP:0002162	Low posterior hairline	[]
NCBIGene:165	None	None	HP:0002761	Generalized joint laxity	[]
NCBIGene:165	Non

NCBIGene:91252	None	None	HP:0009803	Short phalanx of finger	[]
NCBIGene:91252	None	None	HP:0001388	Joint laxity	[]
NCBIGene:91252	None	None	HP:0001371	Flexion contracture	[]
NCBIGene:91252	None	None	HP:0001508	Failure to thrive	[]
NCBIGene:91252	None	None	HP:0000520	Proptosis	[]
NCBIGene:91252	None	None	HP:0010489	Absent palmar crease	[]
NCBIGene:91252	None	None	HP:0000944	Abnormal metaphysis morphology	[]
NCBIGene:91252	None	None	HP:0002652	Skeletal dysplasia	[]
NCBIGene:1278	None	None	HP:0000974	Hyperextensible skin	[]
NCBIGene:1278	None	None	HP:0001763	Pes planus	[]
NCBIGene:1278	None	None	HP:0000023	Inguinal hernia	[]
NCBIGene:1278	None	None	HP:0000767	Pectus excavatum	[]
NCBIGene:1278	None	None	HP:0000007	Autosomal recessive inheritance	[]
NCBIGene:1278	None	None	HP:0001075	Atrophic scars	[]
NCBIGene:1278	None	None	HP:0001634	Mitral valve prolapse	[]
NCBIGene:1278	None	None	HP:0000978	Bruising susceptibility	[]
NCBIGene:1278	None	None	HP:0002816	Genu recurvatum	[]
NCBIGene:1278	No

NCBIGene:1281	None	None	HP:0001123	Visual field defect	[]
NCBIGene:1281	None	None	HP:0007029	Cerebral berry aneurysm	[]
NCBIGene:1281	None	None	HP:0002621	Atherosclerosis	[]
NCBIGene:1281	None	None	HP:0002363	Abnormal brainstem morphology	[]
NCBIGene:1281	None	None	HP:0001269	Hemiparesis	[]
NCBIGene:1281	None	None	HP:0012518	Abnormal circle of Willis morphology	[]
NCBIGene:1281	None	None	HP:0002138	Subarachnoid hemorrhage	[]
NCBIGene:1281	None	None	HP:0002616	Aortic root aneurysm	[]
NCBIGene:1281	None	None	HP:0200042	Skin ulcer	[]
NCBIGene:1281	None	None	HP:0001773	Short foot	[]
NCBIGene:1281	None	None	HP:0000347	Micrognathia	[]
NCBIGene:1281	None	None	HP:0001249	Intellectual disability	[]
NCBIGene:1281	None	None	HP:0002652	Skeletal dysplasia	[]
NCBIGene:1281	None	None	HP:0007400	Irregular hyperpigmentation	[]
NCBIGene:1281	None	None	HP:0002650	Scoliosis	[]
NCBIGene:1281	None	None	HP:0200055	Small hand	[]
NCBIGene:1281	None	None	HP:0002213	Fine hair	[]
NCBIGene:1281	None	None	HP:010057

NCBIGene:1290	None	None	HP:0010754	Abnormality of the temporomandibular joint	[]
NCBIGene:1303	None	None	HP:0002751	Kyphoscoliosis	[]
NCBIGene:1303	None	None	HP:0003593	Infantile onset	[]
NCBIGene:1303	None	None	HP:0003557	Increased variability in muscle fiber diameter	[]
NCBIGene:1303	None	None	HP:0001319	Neonatal hypotonia	[]
NCBIGene:1303	None	None	HP:0001270	Motor delay	[]
NCBIGene:1303	None	None	HP:0001371	Flexion contracture	[]
NCBIGene:1303	None	None	HP:0003741	Congenital muscular dystrophy	[]
NCBIGene:1303	None	None	HP:0001382	Joint hypermobility	[]
NCBIGene:1303	None	None	HP:0002877	Nocturnal hypoventilation	[]
NCBIGene:1303	None	None	HP:0000218	High palate	[]
NCBIGene:1303	None	None	HP:0010628	Facial palsy	[]
NCBIGene:1303	None	None	HP:0001284	Areflexia	[]
NCBIGene:1303	None	None	HP:0000007	Autosomal recessive inheritance	[]
NCBIGene:1303	None	None	HP:0003623	Neonatal onset	[]
NCBIGene:1303	None	None	HP:0000006	Autosomal dominant inheritance	[]
NCBIGene:1303	None	None	HP:0003

NCBIGene:11107	None	None	HP:0000647	Sclerocornea	[]
NCBIGene:11107	None	None	HP:0000545	Myopia	[]
NCBIGene:11107	None	None	HP:0001382	Joint hypermobility	[]
NCBIGene:11107	None	None	HP:0000007	Autosomal recessive inheritance	[]
NCBIGene:11107	None	None	HP:0002757	Recurrent fractures	[]
NCBIGene:11285	None	None	HP:0002652	Skeletal dysplasia	[]
NCBIGene:11285	None	None	HP:0000974	Hyperextensible skin	[]
NCBIGene:11285	None	None	HP:0000973	Cutis laxa	[]
NCBIGene:11285	None	None	HP:0003202	Skeletal muscle atrophy	[]
NCBIGene:11285	None	None	HP:0001371	Flexion contracture	[]
NCBIGene:11285	None	None	HP:0004322	Short stature	[]
NCBIGene:11285	None	None	HP:0000160	Narrow mouth	[]
NCBIGene:11285	None	None	HP:0045075	Sparse eyebrow	[]
NCBIGene:11285	None	None	HP:0000256	Macrocephaly	[]
NCBIGene:11285	None	None	HP:0007469	Palmoplantar cutis gyrata	[]
NCBIGene:11285	None	None	HP:0002751	Kyphoscoliosis	[]
NCBIGene:11285	None	None	HP:0006481	Abnormality of primary teeth	[]
NCBIGene:11285	None	None	

NCBIGene:5351	None	None	HP:0001537	Umbilical hernia	[]
NCBIGene:5351	None	None	HP:0005659	Thoracic kyphoscoliosis	[]
NCBIGene:5351	None	None	HP:0000545	Myopia	[]
NCBIGene:5351	None	None	HP:0008780	Congenital bilateral hip dislocation	[]
NCBIGene:5351	None	None	HP:0025019	Arterial rupture	[]
NCBIGene:5351	None	None	HP:0005692	Joint hyperflexibility	[]
NCBIGene:5351	None	None	HP:0001058	Poor wound healing	[]
NCBIGene:5351	None	None	HP:0001519	Disproportionate tall stature	[]
NCBIGene:5351	None	None	HP:0002761	Generalized joint laxity	[]
NCBIGene:5351	None	None	HP:0002194	Delayed gross motor development	[]
NCBIGene:5351	None	None	HP:0000023	Inguinal hernia	[]
NCBIGene:5351	None	None	HP:0031629	Impaired tandem gait	[]
NCBIGene:5351	None	None	HP:0000939	Osteoporosis	[]
NCBIGene:5351	None	None	HP:0004942	Aortic aneurysm	[]
NCBIGene:5351	None	None	HP:0001030	Fragile skin	[]
NCBIGene:5351	None	None	HP:0020152	Distal joint laxity	[]
NCBIGene:5351	None	None	HP:0002987	Elbow flexion contracture	[

NCBIGene:113189	None	None	HP:0003198	Myopathy	[]
NCBIGene:113189	None	None	HP:0000483	Astigmatism	[]
NCBIGene:113189	None	None	HP:0001238	Slender finger	[]
NCBIGene:113189	None	None	HP:0000175	Cleft palate	[]
NCBIGene:113189	None	None	HP:0000239	Large fontanelles	[]
NCBIGene:113189	None	None	HP:0000974	Hyperextensible skin	[]
NCBIGene:113189	None	None	HP:0005684	Distal arthrogryposis	[]
NCBIGene:113189	None	None	HP:0002036	Hiatus hernia	[]
NCBIGene:113189	None	None	HP:0031364	Ecchymosis	[]
NCBIGene:113189	None	None	HP:0001659	Aortic regurgitation	[]
NCBIGene:113189	None	None	HP:0000153	Abnormality of the mouth	[]
NCBIGene:113189	None	None	HP:0001537	Umbilical hernia	[]
NCBIGene:113189	None	None	HP:0001270	Motor delay	[]
NCBIGene:113189	None	None	HP:0000324	Facial asymmetry	[]
NCBIGene:113189	None	None	HP:0000100	Nephrotic syndrome	[]
NCBIGene:113189	None	None	HP:0001030	Fragile skin	[]
NCBIGene:113189	None	None	HP:0002566	Intestinal malrotation	[]
NCBI

## Enrichment

Next we will roll up annotations. We choose two representations of the same EDS concept, from Orphanet and OMIM
(note we can provide as many diseases as we like).

We will use HPO terms roughly inspired by https://www.omim.org/clinicalSynopsis/130060

In [6]:
hp -G hpoa_g2p -g input/hpoa_g2p.tsv enrichment -U input/eds-genes-ncbigene.tsv -O csv --autolabel -o output/eds-genes-enriched.tsv

In [7]:
import pandas as pd

In [8]:
df=pd.read_csv("output/eds-genes-enriched.tsv", sep="\t")
df

Unnamed: 0,class_id,p_value,class_label,p_value_adjusted,false_discovery_rate,fold_enrichment,sample_count,sample_total,background_count,background_total
0,HP:0000974,9.020119e-36,Hyperextensible skin,1.227638e-32,,,19,20,69,4896
1,HP:0001075,1.695407e-30,Atrophic scars,2.307449e-27,,,15,20,34,4896
2,HP:0008067,2.362580e-27,Abnormally lax or hyperextensible skin,3.215471e-24,,,19,20,175,4896
3,HP:0010647,5.876369e-25,Abnormal elasticity of skin,7.997738e-22,,,19,20,231,4896
4,HP:0100699,6.760144e-25,Scarring,9.200556e-22,,,18,20,175,4896
...,...,...,...,...,...,...,...,...,...,...
150,HP:0002813,3.129277e-05,Abnormality of limb bone morphology,4.258947e-02,,,16,20,1662,4896
151,HP:0040068,3.129277e-05,Abnormality of limb bone,4.258947e-02,,,16,20,1662,4896
152,HP:0010488,3.156866e-05,Aplasia/Hypoplasia of the palmar creases,4.296495e-02,,,3,20,16,4896
153,HP:0000153,3.265157e-05,Abnormality of the mouth,4.443879e-02,,,18,20,2195,4896
