# Independent Enrichment Analysis

This Appyter performs enrichment analysis given an input set of items, for example, gene symbols, and a library of sets in GMT format, for example, a gene set library. The Appyter performs the Fisher exact test to compute enrichment p-value and q-values, and reports the results as a sorted table, a bar graph, and a Manhattan plot. For ranked items, the appyter is able to return a bridge plot for each set in the library and reports the p- and q-values by the Mann-Whitney U test. 

In [None]:
#%%appyter init
from appyter import magic
magic.init(lambda _=globals: _())

In [None]:
from maayanlab_bioinformatics.enrichment.crisp import enrich_crisp, fisher_overlap
from maayanlab_bioinformatics.plotting import bridge_plot

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from IPython.display import display, FileLink, Markdown, HTML
from statsmodels.stats.multitest import multipletests
from scipy.stats import mannwhitneyu
from collections import OrderedDict
import urllib
# Manhattan Plot Imports
import matplotlib.patches as mpatches
import matplotlib.cm as cm

# Bokeh
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import HoverTool, CustomJS, ColumnDataSource, Span, Select, PreText, Paragraph
from bokeh.layouts import layout, row, column, gridplot
from bokeh.palettes import all_palettes

import base64

In [None]:
%%appyter hide_code_exec
{% do SectionField(
    name='Set_Section',
    title='Submit Your Set',
    subtitle='Upload a text file containing your set or copy and paste your set into the text box below (one item per row). You can also try the default set provided. ',
    img='analysis.png'
    
) %}
{% do SectionField(
    name='Library_Section',
    title='Submit Your Library',
    subtitle='Upload a GMT file containing your library. You can also load the default library.',
    img='analysis.png'
    
) %}
{% do SectionField(
    name='Background_Section',
    title='(Optional) Submit Your Background',
    subtitle='Upload a text file containing a list of background items to use as a filter for both the input and the libary sets. You can also copy and paste your background items into the text box below (one item per row). The default is no background and no filtering out any items.',
    img='analysis.png'
    
) %}

In [None]:
%%appyter hide_code

{% set set_kind = TabField(
    name='set_kind',
    label='Set',
    default='Try Example 1 (Gene Set)',
    description='Paste or upload your set',
    choices={
        'Paste': [ 
            TextField(
                name='set_input1',
                label='Set',
                default='',
                description='Paste your set (one item per row). Names in the set should match the names in the GMT file.',
                section = 'Set_Section'
            )
        ],
        
        'Upload': [
            FileField(
                name='set_filename',
                label='Set File',
                default='',
                description='Upload your set as a text file (one item per row). Names in the set should match the names in the GMT file.',
                section = 'Set_Section'
            ),
        ],
        
        'Try Example 1 (Gene Set)': [
            TextField(
                name='set_input2',
                label='Set',
                default='TAAR9\nEBF2\nWDR78\nRRAGA\nSPATA18\nSPINT2\nMRGPRD\nCD9\nRBP1\nCYB5RL\nMXRA8\nPM20D1\nITIH5\nEPAS1\nAHCYL2\nPANK2\nPON2\nLRP5\nSLC5A3\nNSL1\nCLDN2\nLRP8\nAQP1\nCLDN1\nTMEM72\nGNG4\nNHLH2\nC10ORF107\nS100A13\nLY6G6C\nPOF1B\nWLS\nC2ORF82\nFZD4\nCOG7\nFZD6\nFOXF1\nFZD7\nERLIN2\nTYSND1\nACADSB\nOR51I2\nPARP12\nPPFIBP2\nATP4A\nALDH7A1\nTCN2\nSLCO5A1\nSFXN4\nPRR15\nMOXD1\nCAPSL\nCOL13A1\nC1ORF177\nWFDC2\nSLC6A2\nDNALI1\nTNS1\nLGALS2\nT\nBLOC1S1\nHMOX1\nPDK4\nLRAT\nMNX1\nSLC19A1\nHOXC9\nSCARF2\nAS3MT\nARGLU1\nACE\nANXA2\nCARD9\nPAX7\nSORCS1\nRAB33B\nPHOX2A\nKIF9\nCLDN16\nPTPRB\nID3\nITPKB\nNCR1\nGAS6\nCC2D1B\nATR\nMYCBP\nIGSF6\nTPH1\nWFIKKN2\nIGSF5\nACY3\nMAOA\nCAB39L\nCTSZ\nPRDM16\nCYP7A1\nLIMD1\nTMEM27\nSLC22A18\nKRT28\nTIMP3\nEMB\nRNF152\nPLEKHN1\nCLIC3\nSTRA6\nCTSC\nCGNL1\nPARP4\nTMEM176B\nELOVL7\nSORBS3\nGPR4\nF5\nGUCA2B\nSERPINB6\nHADHB\nFOXR1\nNBR1\nSHKBP1\nRLIM\nDHRS13\nHRSP12\nCD63\nCCL11\nF13A1\nFAM69C\nKCNA7\nHCCS\nGUCA1A\nADAMTSL2\nLMAN1\nING3\nEGFLAM\nSCML4\nOLFML1\nSOSTDC1\nCTNNA1\nC16ORF78\nFADS1\nCCDC157\nPDGFRB\nCA12\nCD164\nPRLR\nLRRC69\nUNC5CL\nMPEG1\nSLC31A1\nTECRL\nVCAM1\nATP11A\nUBXN10\nZNF558\nDYDC1\nCD69\nS100A8\nFIGF\nPHLDB2\nERVFRD-1\nCD82\nASB14\nGPR65\nVWCE\nTEKT1\nTEKT4\nMSX1\nSLC16A9\nZNF423\nCA14\nIGFBP2\nSLC30A7\nLRRC46\nPDIA2\nPPEF1\nEPHX1\nFANCM\nRBPMS\nTTC21A\nMR1\nDDX52\nLSM5\nKRT31\nMAVS\nTMEM237\nSMO\nC6ORF118\nPGPEP1L\nIL7R\nC21ORF62\nC11ORF97\nDOCK6\nAKNA\nISYNA1\nCD151\nCBFB\nPYROXD2\nSLC2A1\nGSTCD\nLGALS3BP\nHIGD1B\nAK7\nLTBP1\nARHGAP5\nRGS5\nSALL1\nCOBLL1\nFHAD1\nMAEL\nBTLA\nIGFBP7\nODF1\nACAA2\nKL\nTTC16\nEMX2\nTTC12\nGGH\nCCDC37\nCFLAR\nGPR98\nLAMB2\nBICC1\nBMP6\nCUL4B\nDNAJC3\nSP1\nDAP\nDNAJC1\nPIKFYVE\nDMRTA1\nALPL\nMTRF1L\nBCAR3\nKDM5D\nSHC4\nTTC25\nDBH\nDBI\nCHD1\nWNT6\nSPN\nTTC23L\nPLTP\nCYP26B1\nCASP6\nTMEM204\nTMEM207\nCCDC180\nCCDC34\nCA9\nOVGP1\nPLEKHG2\nCPT1A\nPLEKHG3\nMYO10\nRNASET2\nTBC1D9\nNAGA\nPCOLCE\nMUT\nFOXJ1\nSOD3\nATOX1\nKRT73\nSNTB1\nRP2\nRPIA\nCOL8A1\nALS2\nCOL8A2\nSMPDL3A\nPCOLCE2\nSLC25A13\nTAF3\nFOLR1\nITGB2\nHEMGN\nPRPS2\nSLC24A5\nFLT1\nALAS2\nLSP1\nSYCP2\nSEMA3B\nETFB\nPRELP\nZBTB40\nPBXIP1\nSLC4A5\nCLN8\nEFS\nTTR\nRBM3\nHECTD3\nNAGLU\nALDH2\nCTNNAL1\nPCBD1\nCYTH2',
                description='Paste your set (one item per row). Names in the set should match the names in the GMT file.',
                section = 'Set_Section'
            )
        ],
        'Try Example 2 (Drug Set)': [
            TextField(
                name='set_input3',
                label='Set',
                default='hexachlorophene\nlopinavir\nbazedoxifene\nabemaciclib\ncamostat\nmefloquine\ncyclosporine\nanidulafungin\nchloroquine\namodiaquine\nloperamide\nalmitrine\nhydroxychloroquine\nniclosamide\nivacaftor\nproscillaridin\nremdesivir',
                description='Paste your set (one item per row). Names in the set should match the names in the GMT file.',
                section = 'Set_Section'
            )
        ],
        
    },
    section = 'Set_Section',
) %}

{% set is_ranked = BoolField(
    name='is_ranked', 
    label='Ranked?', 
    default='false',
    description='Check if your input set (>10,000 items) is ranked.', 
    section='Set_Section',
) 
%}

In [None]:
%%appyter code_exec
{% set library_kind = TabField(
    name='library_kind',
    label='Library',
    default='Upload',
    description='',
    choices={
        'Upload': [ 
            FileField(
                name='library_filename', 
                label='Library file (.gmt or .txt)', 
                default='GeneSet_Allen_Brain_Atlas_scRNAseq_10x.gmt',
                examples={'GeneSet_Allen_Brain_Atlas_scRNAseq_10x.gmt': "https://appyters.maayanlab.cloud/storage/Independent_Enrichment_Analysis/Example1_GeneSet_Allen_Brain_Atlas_scRNAseq_10x_2021.gmt", 'DrugSet_L1000FWD_Signature_Down.txt': "https://maayanlab.cloud/DrugEnrichr/geneSetLibrary?mode=text&libraryName=L1000FWD_Signature_Down"}, 
                description='A tab-delimited file format that describes sets. Visit https://bit.ly/35crtXQ for more information.', 
                section='Library_Section')
        ],
        
        'Select a library from Enrichr': [
            ChoiceField(
                name='enrichr_library', 
                description='Select one Enrichr library for which to an enrichment analysis.', 
                label='Enrichr Library', 
                default='WikiPathways_2019_Human', 
                section = 'section2',
                choices=[
                    'ARCHS4_Cell-lines',
                    'ARCHS4_IDG_Coexp',
                    'ARCHS4_Kinases_Coexp',
                    'ARCHS4_TFs_Coexp',
                    'ARCHS4_Tissues',
                    'Achilles_fitness_decrease',
                    'Achilles_fitness_increase',
                    'Aging_Perturbations_from_GEO_down',
                    'Aging_Perturbations_from_GEO_up',
                    'Allen_Brain_Atlas_10x_scRNA_2021',
                    'Allen_Brain_Atlas_down',
                    'Allen_Brain_Atlas_up',
                    'BioCarta_2013',
                    'BioCarta_2015',
                    'BioCarta_2016',
                    'BioPlanet_2019',
                    'BioPlex_2017',
                    'CCLE_Proteomics_2020',
                    'CORUM',
                    'COVID-19_Related_Gene_Sets',
                    'Cancer_Cell_Line_Encyclopedia',
                    'ChEA_2013',
                    'ChEA_2015',
                    'ChEA_2016',
                    'Chromosome_Location',           
                    'Chromosome_Location_hg19',
                    'ClinVar_2019',
                    'dbGaP',
                    'DSigDB',
                    'Data_Acquisition_Method_Most_Popular_Genes',
                    'DepMap_WG_CRISPR_Screens_Broad_CellLines_2019',
                    'DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019',
                    'DisGeNET',
                    'Disease_Perturbations_from_GEO_down',
                    'Disease_Perturbations_from_GEO_up',
                    'Disease_Signatures_from_GEO_down_2014',
                    'Disease_Signatures_from_GEO_up_2014',
                    'DrugMatrix',
                    'Drug_Perturbations_from_GEO_2014',
                    'Drug_Perturbations_from_GEO_down',
                    'Drug_Perturbations_from_GEO_up',
                    'ENCODE_Histone_Modifications_2013',
                    'ENCODE_Histone_Modifications_2015',
                    'ENCODE_TF_ChIP-seq_2014',
                    'ENCODE_TF_ChIP-seq_2015',
                    'ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X',
                    'ESCAPE',
                    'Elsevier_Pathway_Collection',
                    'Enrichr_Libraries_Most_Popular_Genes',
                    'Enrichr_Submissions_TF-Gene_Coocurrence',
                    'Enrichr_Users_Contributed_Lists_2020',
                    'Epigenomics_Roadmap_HM_ChIP-seq',
                    'GO_Biological_Process_2013',
                    'GO_Biological_Process_2015',
                    'GO_Biological_Process_2017',
                    'GO_Biological_Process_2017b',
                    'GO_Biological_Process_2018',
                    'GO_Cellular_Component_2013',
                    'GO_Cellular_Component_2015',
                    'GO_Cellular_Component_2017',
                    'GO_Cellular_Component_2017b',
                    'GO_Cellular_Component_2018',
                    'GO_Molecular_Function_2013',
                    'GO_Molecular_Function_2015',
                    'GO_Molecular_Function_2017',
                    'GO_Molecular_Function_2017b',
                    'GO_Molecular_Function_2018',
                    'GTEx_Tissue_Sample_Gene_Expression_Profiles_down',
                    'GTEx_Tissue_Sample_Gene_Expression_Profiles_up',
                    'GWAS_Catalog_2019',
                    'GeneSigDB',
                    'Gene_Perturbations_from_GEO_down',
                    'Gene_Perturbations_from_GEO_up',
                    'Genes_Associated_with_NIH_Grants',
                    'Genome_Browser_PWMs',
                    'HMDB_Metabolites',
                    'HMS_LINCS_KinomeScan',
                    'HomoloGene',
                    'HumanCyc_2015',
                    'HumanCyc_2016',
                    'Human_Gene_Atlas',
                    'Human_Phenotype_Ontology',
                    'huMAP',
                    'InterPro_Domains_2019',
                    'Jensen_COMPARTMENTS',
                    'Jensen_DISEASES',
                    'Jensen_TISSUES',
                    'KEA_2013',
                    'KEA_2015',
                    'KEGG_2013',
                    'KEGG_2015',
                    'KEGG_2016',
                    'KEGG_2019_Human',
                    'KEGG_2019_Mouse',
                    'Kinase_Perturbations_from_GEO_down',
                    'Kinase_Perturbations_from_GEO_up',
                    'L1000_Kinase_and_GPCR_Perturbations_down',
                    'L1000_Kinase_and_GPCR_Perturbations_up',
                    'LINCS_L1000_Chem_Pert_down',
                    'LINCS_L1000_Chem_Pert_up',
                    'LINCS_L1000_Ligand_Perturbations_down',
                    'LINCS_L1000_Ligand_Perturbations_up',
                    'Ligand_Perturbations_from_GEO_down',
                    'Ligand_Perturbations_from_GEO_up',
                    'lncHUB_lncRNA_Co-Expression',
                    'MCF7_Perturbations_from_GEO_down',
                    'MCF7_Perturbations_from_GEO_up',
                    'MGI_Mammalian_Phenotype_2013',
                    'MGI_Mammalian_Phenotype_2017',
                    'MGI_Mammalian_Phenotype_Level_3',
                    'MGI_Mammalian_Phenotype_Level_4',
                    'MGI_Mammalian_Phenotype_Level_4_2019',
                    'MSigDB_Computational',
                    'MSigDB_Hallmark_2020',
                    'MSigDB_Oncogenic_Signatures',
                    'Microbe_Perturbations_from_GEO_down',
                    'Microbe_Perturbations_from_GEO_up',
                    'miRTarBase_2017',
                    'Mouse_Gene_Atlas',
                    'NCI-60_Cancer_Cell_Lines',
                    'NCI-Nature_2015',
                    'NCI-Nature_2016',
                    'NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions',
                    'NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions',
                    'NIH_Funded_PIs_2017_Human_AutoRIF',
                    'NIH_Funded_PIs_2017_Human_GeneRIF',
                    'NURSA_Human_Endogenous_Complexome',
                    'OMIM_Disease',
                    'OMIM_Expanded',
                    'Old_CMAP_down',
                    'Old_CMAP_up',
                    'PPI_Hub_Proteins',
                    'Panther_2015',
                    'Panther_2016',
                    'Pfam_Domains_2019',
                    'Pfam_InterPro_Domains',
                    'PheWeb_2019',
                    'Phosphatase_Substrates_from_DEPOD',
                    'ProteomicsDB_2020',
                    'RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO',
                    'Rare_Diseases_AutoRIF_ARCHS4_Predictions',
                    'Rare_Diseases_AutoRIF_Gene_Lists',
                    'Rare_Diseases_GeneRIF_ARCHS4_Predictions',
                    'Rare_Diseases_GeneRIF_Gene_Lists',
                    'Reactome_2013',
                    'Reactome_2015',
                    'Reactome_2016',
                    'SILAC_Phosphoproteomics',
                    'SubCell_BarCode',
                    'SysMyo_Muscle_Gene_Sets'
                    'TF-LOF_Expression_from_GEO',
                    'TF_Perturbations_Followed_by_Expression',
                    'TG_GATES_2020',
                    'TRANSFAC_and_JASPAR_PWMs',
                    'TRRUST_Transcription_Factors_2019',
                    'Table_Mining_of_CRISPR_Studies',
                    'TargetScan_microRNA',
                    'TargetScan_microRNA_2017',
                    'Tissue_Protein_Expression_from_Human_Proteome_Map',
                    'Tissue_Protein_Expression_from_ProteomicsDB.csv',
                    'Transcription_Factor_PPIs',
                    'UK_Biobank_GWAS_v1',
                    'Virus-Host_PPI_P-HIPSTer_2020',
                    'VirusMINT',
                    'Virus_Perturbations_from_GEO_down',
                    'Virus_Perturbations_from_GEO_up',
                    'WikiPathways_2013',
                    'WikiPathways_2015',
                    'WikiPathways_2016',
                    'WikiPathways_2019_Human',
                    'WikiPathways_2019_Mouse'
                ]
            )
        ],
        
        
    },
    section = 'Library_Section',
) %}



In [None]:
%%appyter hide_code

{% set background_kind = TabField(
    name='background_kind',
    label='Background',
    default='Paste',
    description='Paste or upload your background set',
    choices={
        'Paste': [
            TextField(
                name='background_input',
                label='Background Set',
                default='',
                description='Paste your background set (one item per row). Names in the background set should match the names in the GMT file.',
                section = 'Background_Section'
            ),
        ],
        'Upload': [
            FileField(
                name='background_filename',
                label='Background File',
                default='',
                description='Upload your background set as a text file (one item per row). Names in the background set should match the names in the GMT file.',
                section = 'Background_Section'
            ),
        ],
    },
    section = 'Background_Section',
) %}

In [None]:
%%appyter code_exec

{%- if set_kind.raw_value == 'Paste' or set_kind.raw_value == 'Try Example 1 (Gene Set)' or set_kind.raw_value == 'Try Example 2 (Drug Set)'%}
set_input = {{ set_kind.value[0] }}
{%- else %}
set_filename = {{ set_kind.value[0] }}
{%- endif %}

{%- if library_kind.raw_value == 'Upload' %}
library_kind = "Upload"
library_filename = {{ library_kind.value[0] }}
library_name = library_filename.replace("_", " ").replace(".txt", "").replace(".gmt", "")

{%- else %}
library_kind = "Select a library from Enrichr"
library_filename = "{{ library_kind.value[0] }}"
library_name = "{{ library_kind.value[0] }}"
{%- endif %}


{%- if background_kind.raw_value == 'Paste' %}
background_input = {{ background_kind.value[0] }}
{%- else %}
background_filename = {{ background_kind.value[0] }}
{%- endif %}


In [None]:
output_notebook()
# Table Parameters
significance_value = 0.05
display_topk = 20

# Bar Chart Parameters
figure_file_format = ['png', 'svg']
output_file_name = 'Enrichment_analysis_results_bar'
color = 'lightskyblue'
final_output_file_names = ['{0}.{1}'.format(output_file_name, file_type) for file_type in figure_file_format]
topk = 10

# Manhattan Plot Parameters
manhattan_colors = ['#003f5c', '#7a5195', '#ef5675', '#ffa600']

# Bridge Plot Parameters
bridge_plot_topk = 10

In [None]:
%%appyter code_exec

{%- if set_kind.raw_value == 'Paste' or set_kind.raw_value == 'Try Example 1 (Gene Set)' or set_kind.raw_value == 'Try Example 2 (Drug Set)' %}
items = set_input.split('\n')
items = [x.strip() for x in items]
{%- else %}
open_set_file = open(set_filename,'r')
lines = open_set_file.readlines()
items = [x.strip() for x in lines]
open_set_file.close()
{%- endif %}

# remove duplicates in items
items = list(OrderedDict.fromkeys(items)) 

In [None]:
%%appyter code_exec

{%- if background_kind.raw_value == 'Paste' %}
background_items = background_input.split('\n')
background_items = [x.strip() for x in background_items]
{%- else %}
open_background_file = open(background_filename,'r')
lines = open_background_file.readlines()
background_items = [x.strip() for x in lines]
open_background_file.close()
{%- endif %}
condition1 = len(background_items) == 1 and background_items[0] == ""
condition2 = len(background_items) > 0
if condition1 == False and condition2 == True:
    items = [x for x in items if x in background_items]
    background_items_bool = True
else:
    background_items_bool = False

is_ranked = {{is_ranked.value}}

In [None]:
def download_library(library_name):
    with open(f"{library_name}", "w") as fw:
        with urllib.request.urlopen(f'https://maayanlab.cloud/Enrichr/geneSetLibrary?mode=text&libraryName={library_name}') as f:
            for line in f.readlines():
                fw.write(line.decode('utf-8'))
                fw.flush()

In [None]:
def load(library_filename, items, background_items):
    if library_kind == "Select a library from Enrichr":
        download_library(library_filename)
    library_data, average_items_in_library = load_library(library_filename, background_items)
    # to upper case
    items = [x.upper() for x in items]
    validate_inputs(items, library_data)
    
    try:
        validate_ranked_items(items, average_items_in_library)
        
    except Exception as error:
        display(Markdown(f'#### **Warning: {error}**'))
    return library_data, items

def load_library(library_filename, background_items):
    library_data = dict()
    items_in_library = list()
    with open(library_filename, "r") as f:
        lines = f.readlines()
        for line in lines:
            splited = line.strip().split("\t")
            
            if background_items_bool == True:
                elements = [x for x in splited[2:] if x in background_items]
            else:
                elements = splited[2:]
            if len(elements) > 0:
                # to upper case
                library_data[splited[0]] = [x.upper() for x in elements]
                items_in_library.append(len(elements))
    return library_data, (sum(items_in_library)+0.0)/len(items_in_library)

def validate_inputs(items, library_data):
    library_items = set()
    for key, values in library_data.items():
        library_items.update(set(values))
    if len(items) == 0:
        raise Exception('No items in the input set. Please check the background information.') 
    if len(library_data.keys()) == 0:
        raise Exception('No items in the input library. Please check the background information.') 
    if len(set(items).intersection(library_items)) == 0:
        raise Exception('No matches in the input set and library.')    
        
def validate_ranked_items(items, average_items_in_library):
    if is_ranked == True and len(items) < average_items_in_library*10:
        raise Exception(f'We recommend that the ranked input list should be ~10 times longer than the sets in the library.')

# Enrichment analysis
def get_library_iter(library_data):
    for term in library_data.keys():
        single_set = library_data[term]
        yield term, single_set

def get_enrichment_results(items, library_data):
    return sorted(enrich_crisp(items, get_library_iter(library_data), 20000, True), key=lambda r: r[1].pvalue)


def get_pvalue(row, unzipped_results, all_results):
    if row['Name'] in list(unzipped_results[0]):
        index = list(unzipped_results[0]).index(row['Name'])
        return all_results[index][1].pvalue
    else:
        return 1
    
# Call enrichment results and return a plot and dataframe for Scatter Plot
def get_values(obj_list):
    pvals = []
    odds_ratio = []
    n_overlap = []
    overlap = []
    for i in obj_list:
        pvals.append(i.pvalue)
        odds_ratio.append(i.odds_ratio)
        n_overlap.append(i.n_overlap)
        overlap.append(i.overlap)
    return pvals, odds_ratio, n_overlap, overlap

def get_qvalue(p_vals):
    r = multipletests(p_vals, method="fdr_bh")
    return r[1]


def enrichment_analysis(items, library_data):    
    all_results = get_enrichment_results(items, library_data)
    unzipped_results = list(zip(*all_results))
    pvals, odds_ratio, n_overlap, overlap = get_values(unzipped_results[1])
    df = pd.DataFrame({"Name":unzipped_results[0], "p value": pvals, \
                       "odds_ratio": odds_ratio, "n_overlap": n_overlap, "overlap": overlap})
    df["-log(p value)"] = -np.log10(df["p value"])
    df["q value"] = get_qvalue(df["p value"].tolist())
    return [list(unzipped_results[0])], [pvals], df

# Output a table of significant p-values
def create_download_link(df, title = "Download CSV file of this table", filename = "data.csv"):  
    csv = df.to_csv(index = False)
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload, title=title, filename=filename)
    return HTML(html)



In [None]:
# Bar Chart Functions
def enrichr_figure(all_terms, all_pvalues, all_qvalues, plot_names, all_libraries, bar_color, topk=10): 
    all_terms = [all_terms[0][:topk]]
    all_pvalues = [all_pvalues[0][:topk]]
    all_qvalues = [all_qvalues[:topk]]
    # Bar colors
    if bar_color != 'lightgrey':
        bar_color_not_sig = 'lightgrey'
        edgecolor=None
        linewidth=0
    else:
        bar_color_not_sig = 'white'
        edgecolor='black'
        linewidth=1    

    plt.figure(figsize=(24, 12))
    
    i = 0
    bar_colors = [bar_color if (x < 0.05) else bar_color_not_sig for x in all_pvalues[i]]
    fig = sns.barplot(x=np.log10(all_pvalues[i])*-1, y=all_terms[i], palette=bar_colors, edgecolor=edgecolor, linewidth=linewidth)
    fig.axes.get_yaxis().set_visible(False)
    fig.set_title(all_libraries[i], fontsize=26)
    fig.set_xlabel('−log₁₀(p‐value)', fontsize=25)
    fig.tick_params(axis='x', which='major', labelsize=20)
    if max(np.log10(all_pvalues[i])*-1)<1:
        fig.xaxis.set_ticks(np.arange(0, max(np.log10(all_pvalues[i])*-1), 0.1))
    for ii,annot in enumerate(all_terms[i]):
        if all_qvalues[i][ii] < 0.05:
            annot = '  *'.join([annot, str(str(np.format_float_scientific(all_qvalues[i][ii], precision=2)))]) 
        else:
            annot = '  '.join([annot, str(str(np.format_float_scientific(all_qvalues[i][ii], precision=2)))])

        title_start= max(fig.axes.get_xlim())/200
        fig.text(title_start, ii, annot, ha='left', wrap = True, fontsize = 26)

    fig.spines['right'].set_visible(False)
    fig.spines['top'].set_visible(False)
    # Save results 
    for plot_name in plot_names:
        plt.savefig(plot_name, bbox_inches = 'tight')
    
    # Show plot 
    plt.show()  

In [None]:
# Create Manhattan Plots
def manhattan(df):
    df = df.sort_values("Name")
    list_of_xaxis_values = df["Name"].values.tolist()

    # define the output figure and the features we want
    p = figure(x_range = list_of_xaxis_values, plot_height=300, plot_width=750, tools='pan, box_zoom, hover, reset, save')

    # loop over all libraries
    r = []
    color_index = 0
    if color_index >= len(manhattan_colors):
        color_index = 0 

    # calculate actual p value from -log(p value)
    actual_pvalues = []
    for log_value in df["-log(p value)"].values.tolist():
        actual_pvalues += ["{:.5e}".format(10**(-1*log_value))]

    # define ColumnDataSource with our data for this library
    source = ColumnDataSource(data=dict(
        x = df["Name"].values.tolist(),
        y = df["-log(p value)"].values.tolist(),
        pvalue = actual_pvalues,
    ))

    # plot data from this library
    r += [p.circle(x = 'x', y = 'y', size=5, fill_color=manhattan_colors[color_index], line_color = manhattan_colors[color_index], line_width=1, source = source)]
    color_index += 1

    p.background_fill_color = 'white'
    p.xaxis.major_tick_line_color = None 
    p.xaxis.major_label_text_font_size = '0pt'
    p.y_range.start = 0
    p.yaxis.axis_label = '-log(p value)'

    p.hover.tooltips = [
        ("Name", "@x"),
        ("p value", "@pvalue"),
    ]
    p.output_backend = "svg"
    
    # returns the plot
    return p

In [None]:
# Create bridge plots and Perform MannWhitney U tests
def get_mannwhitneyu_pvalue(set_name):
    all_ranks = list(range(len(items)))
    common_items = list(set(items).intersection(library_data[set_name]))
    selected_ranks = list()
    for common_item in common_items:
        selected_ranks.append(items.index(common_item))
        
    if len(common_items) == 0:
        return float('NaN')
    _, p_value = mannwhitneyu(all_ranks, selected_ranks)
    return p_value

def test_mannwhitneyu(library_data):
    mann_whitney_results = dict()
    set_names = library_data.keys()
    mann_whitney_results["p value"] = dict()
    for set_name in set_names:
        mann_whitney_results["p value"][set_name] = get_mannwhitneyu_pvalue(set_name)
    df = pd.DataFrame(mann_whitney_results)
    df = df.dropna()
    df = df.sort_values("p value", ascending=True).reset_index()
    df["q value"] = get_qvalue(df["p value"].tolist())
    df.columns = ["Set Name", "p value", "q value"]
    return df

def return_bridge_plot(key):
    result = dict()
    series = pd.Series(items)
    input_series = series.isin(library_data[key])
    x, y = bridge_plot(input_series)
    result["x"] = x
    result["y"] = y
    return result

def extract_pval(dictionary, key):
    if key in dictionary:
        return str(dictionary[key])
    else:
        return "undefined"
    
def plot_bridge(mannwhitney_df, bridge_plot_topk):
    
    pval = mannwhitney_df.set_index("Set Name").to_dict()['p value']    
    pval = {key: round(item, 6) for key, item in pval.items()}
    
    qval = mannwhitney_df.set_index("Set Name").to_dict()['q value']    
    qval = {key: round(item, 6) for key, item in qval.items()}
    
    # select topk
    mannwhitney_df = mannwhitney_df.iloc[:bridge_plot_topk, :]
    options = mannwhitney_df["Set Name"].tolist()
    
    
    # run bridge plot for all sets
    bridge_plot_results = dict()
    for key in options:
        bridge_plot_results[key] = return_bridge_plot(key)
    
    # init
    source = ColumnDataSource(data=dict(x=bridge_plot_results[options[0]]["x"], y=bridge_plot_results[options[0]]["y"]))
    plot = figure(plot_width=400, plot_height=400)
    plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)
    plot.output_backend = "svg"
    
    select = Select(title="Select a set:", value=options[0], options=options)
    
    default_caption_text = "Figure 1. Bridge plot of {} (p-value {}, q-value {}). The x-axis shows the ranks of items, and the y-axis shows scores."

    pre = Paragraph(text = default_caption_text.format(select.value, extract_pval(pval, select.value), extract_pval(qval, select.value)),
    width=500, height=100, style={"font-family":'Helvetica', "font-style": "italic"})
    
    callback = CustomJS(args=dict(source=source, pre=pre, pval=pval, bridge_plot_results=bridge_plot_results), code="""        
        var data = source.data
        const val = cb_obj.value;
        data.x = bridge_plot_results[val]['x']
        data.y = bridge_plot_results[val]['y']
        
        pre.text = "Figure 1. Bridge plot of " + val + " (p-value " + pval[val] +"). The x-axis shows the ranks of items, and the y-axis shows scores." 
        
        source.change.emit();
        
    """)
    select.js_on_change('value', callback)
    
    
    col = column(select, plot, pre)
    show(col)
    

In [None]:
%%appyter markdown
{% if is_ranked.value == False %}
# Enrichment Analysis
The table below displays the top 20 enrichment analysis results for the given set library. The table contains the sets name, p-value, odds ratio, the number of overlapping items, overlapping items, and -log(p-value). The table is sorted by p-values in ascending order. The full results are downloadable in CSV format.
{% endif %}


In [None]:
%%appyter code_exec
{% if is_ranked.value == False %}
library_data, items = load(library_filename, items, background_items)
results, pvals, results_df = enrichment_analysis(items, library_data)
if 'p value' in results_df.columns:
    sorted_df = results_df.sort_values(by = ['p value'])
    filtered_df = sorted_df.iloc[:display_topk]
    if len(filtered_df) != 0:
        display(HTML(filtered_df.to_html(index = False)))
        display(Markdown(f"*Table 1. Enrichment analysis results of {library_name}*"))        
        display(create_download_link(sorted_df))
{% endif %}

In [None]:
%%appyter markdown
{% if is_ranked.value == False %}
# Bar Chart
{% endif %}


In [None]:
%%appyter code_exec
{% if is_ranked.value == False %}
display(Markdown(f"The bar chart below shows the top {topk} enriched terms in a given library. Colored bars correspond to terms with significant p-values (<0.05). The bar chart is downloadable as an image in the PNG and SVG formats. "))
{% endif %}

In [None]:
%%appyter code_exec
{% if is_ranked.value == False %}
enrichr_figure(results, pvals, results_df["q value"].tolist(), final_output_file_names, [library_name], color, topk)
display(Markdown(f"*Figure 1. Bar chart of the top {topk} enriched terms in {library_name}, along with their corresponding p-values. Colored bars correspond to terms with significant p-values (<0.05). An asterisk next to a p-value indicates the term also has a significant q-value (<0.05)*"))     
    
# Download Bar Chart
for i, file in enumerate(final_output_file_names):
    display(FileLink(file, result_html_prefix=str('Download ' + figure_file_format[i] + ': ')))
    
{% endif %}

In [None]:
%%appyter markdown
{% if is_ranked.value == False %}
# Manhattan Plot
In the Manhattan plot below, each line on the x-axis denotes a single set from the library, while the y-axis measures the −log₁₀(p‐value) for each set. Hovering over a point will display the name of the set and the associated p-value. You can also zoom, pan, and save the plot as an svg using the toolbar on the right.
{% endif %}


In [None]:
%%appyter code_exec
{% if is_ranked.value == False %}
show(manhattan(results_df))
display(Markdown(f"*Figure 2. Manhattan plot that displays sets from {library_name} and their p-values on a -log10 scale.*"))     
{% endif %}

In [None]:
%%appyter markdown
{% if is_ranked.value == True %}
# Bridge Plot
In a bridge plot, the x-axis shows the ranks of items, and the y-axis shows scores that are calculated as follows. By walking down the ranked list of input items, if an item is in a set in the given library, the score will be incremented, otherwise, the score will be decremented. You can explore and see a bridge plot for each set of the library. You can zoom, pan, and download the plot in the SVG format using the functions on the toolbar next to the plot. P-values are calculated by the Mann-Whitney U test.
{% endif %}

In [None]:
%%appyter code_exec
{% if is_ranked.value == True %}
library_data, items = load(library_filename, items, background_items)
# calculate p-values
mannwhitney_df = test_mannwhitneyu(library_data)
plot_bridge(mannwhitney_df, bridge_plot_topk)
{% endif %}

In [None]:
%%appyter markdown
{% if is_ranked.value == True %}
# Table of top-ranked sets
The Appyter performs the Mann-Whitney U tests for the given ranked list against given sets in the library. The table shows the top-ranked terms based on p-values and the full results are downloadable.
{% endif %}

In [None]:
%%appyter code_exec
{% if is_ranked.value == True %}
display(mannwhitney_df.iloc[:display_topk].set_index("Set Name"))
display(Markdown(f"*Table 1. Top {display_topk} sets and their associated p-values by the Mann-Whitney U test.*"))     

display(create_download_link(mannwhitney_df, title="Download CSV file of this table"))

{% endif %}