In [None]:
#hide
from glycowork import *
from IPython.display import HTML

# Glycowork

- Glycans are a fundamental biological sequence, similar to DNA, RNA, or proteins. Glycans are complex carbohydrates that can form branched structures comprising monosaccharides and linkages as constituents. Despite being conspicuously absent from most research, glycans are ubiquitous in biology. They decorate most proteins and lipids and direct the stability and functions of biomolecules, cells, and organisms. This also makes glycans relevant to every human disease.

- The analysis of glycans is made difficult by their nonlinearity and their astounding diversity, given the large number of monosaccharides and types of linkages. Glycowork is a Python package designed to process and analyze glycan sequences, with a special emphasis on glycan-focused machine learning. Next to various functions to work with glycans, Glycowork also contains glycan data that can be used for glycan alignments, model pre-training, motif comparisons, etc.

- The inspiration for glycowork can be found in [Bojar et al., 2020](https://www.cell.com/cell-host-microbe/fulltext/S1931-3128(20)30562-X) and [Burkholz et al., 2021](https://www.biorxiv.org/content/10.1101/2021.03.01.433491v1). There, you can also find examples of possible use cases for the functions in glycowork.

## Install

In [None]:
#later
#`pip install glycowork`

#if you don't have a security token
#`pip install git+https://YOURUSERNAME:YOURPASSWORD@github.com/BojarLab/glycowork.git

#`pip install git+https://github.com/BojarLab/glycowork.git
#import glycowork`

## How to use

Glycowork currently contains four main modules:
 - **alignment**
     - can be used to find similar glycan sequences by alignment according to a glycan-specific substitution matrix
 - **glycan_data**
     - stores several glycan datasets and contains helper functions
 - **ml**
     - here are all the functions for training and using machine learning models, including train-test-split, getting glycan representations, etc.
 - **motif**
     - contains functions for processing glycan sequences, identifying motifs and featues, and analyzing them
     
Below are some examples of what you can do with glycowork, be sure to check out the full documentation for everything that's there.

In [None]:
#converting a glycan into a graph object (node list + edge lists)
from glycowork.motif.graph import glycan_to_graph
glycan_to_graph('Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')

([794, 1045, 450, 1045, 450, 281, 1015],
 [(0, 1, 2, 3, 5, 6), (1, 2, 3, 4, 6, 4)])

In [None]:
#using graphs, you can easily check whether two glycans are the same - even if they use different bracket notations!
from glycowork.motif.graph import compare_glycans
print(compare_glycans('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                     'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'))
print(compare_glycans('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                     'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc'))

True
False


In [None]:
#querying some of the stored databases
from glycowork.motif.query import get_insight
get_insight('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')

Let's get rolling! Give us a few moments to crunch some numbers.

This glycan occurs in the following species: ['Antheraea_pernyi', 'Apis_mellifera', 'Autographa_californica_nucleopolyhedrovirus', 'AvianInfluenzaA_Virus', 'Bombyx_mori', 'Bos_taurus', 'Caenorhabditis_elegans', 'Drosophila_melanogaster', 'Homo_sapiens', 'HumanImmunoDeficiency_Virus', 'Mamestra_brassicae', 'Megathura_crenulata', 'Mus_musculus', 'Rattus_norvegicus', 'Spodoptera_frugiperda', 'Sus_scrofa', 'Trichinella_spiralis']

This glycan contains the following motifs: ['Chitobiose', 'Trimannosylcore', 'core_fucose']

That's all we can do for you at this point!


In [None]:
#get motifs, graph features, and sequence features of a set of glycan sequences to train models or analyze glycan properties
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN']
from glycowork.motif.annotate import annotate_dataset
HTML(annotate_dataset(glycans, feature_set = ['known', 'graph', 'exhaustive']).to_html())

Unnamed: 0,LewisX,LewisY,SialylLewisX,SulfoSialylLewisX,LewisA,LewisB,SialylLewisA,SulfoLewisA,H_type2,H_type1,A_antigen,B_antigen,Galili_antigen,GloboH,Gb5,Gb4,Gb3,3SGb3,8DSGb3,3SGb4,8DSGb4,6DSGb4,3SGb5,8DSGb5,6DSGb5,6DSGb5_2,6SGb3,8DSGb3_2,6SGb4,8DSGb4_2,6SGb5,8DSGb5_2,66DSGb5,Forssman_antigen,iGb3,I_antigen,i_antigen,PI_antigen,Chitobiose,Trimannosylcore,LacNAc_type1,LacNAc_type2,LacdiNAc_type1,LacdiNAc_type2,bisecting,VIM,PolyLacNAc,Ganglio_Series,Lacto_Series,NeoLacto_Series,betaGlucan,KeratanSulfate,Hyluronan,Mollu_series,Arthro_series,Cellulose_like,Chondroitin_4S,GPI_anchor,Isoglobo_series,LewisD,Globo_series,SDA,Muco_series,Heparin,Peptidoglycan,Dermatansulfate,CAD,Lactosylceramide,Lactotriaosylceramide,LexLex,GM3,H_type3,GM2,GM1,cisGM1,VIM2,GD3,GD1a,GD2,GD1b,SDLex,Nglycolyl_GM2,Fuc_LN3,GT1b,GD1,GD1a.1,LcGg4,GT3,Disialyl_T_antigen,GT1a,GT2,GT1c,2Fuc_GM1,GQ1c,O_linked_mannose,GT1aa,GQ1b,HNK1,GQ1ba,O_mannose_Lex,2Fuc_GD1b,Sialopentaosylceramide,Sulfogangliotetraosylceramide,B-GM1,GQ1aa,bisSulfo-Lewis x,para-Forssman,core_fucose,GP1c,B-GD1b,GP1ca,Isoglobotetraosylceramide,polySia,high_mannose,diameter,branching,nbrLeaves,avgDeg,varDeg,maxDeg,nbrDeg4,max_deg_leaves,mean_deg_leaves,deg_assort,betweeness,betwVar,betwMax,eigenMax,eigenMin,eigenAvg,eigenVar,closeMax,closeMin,closeAvg,closeVar,flowMax,flowAvg,flowVar,flow_edgeMax,flow_edgeMin,flow_edgeAvg,flow_edgeVar,loadMax,loadAvg,loadVar,harmMax,harmMin,harmAvg,harmVar,secorderMax,secorderMin,secorderAvg,secorderVar,size_corona,size_core,nbr_node_types,egap,entropyStation,N,dens,Unnamed: 161,"1,4-Anhydro-Gal","1,4-Anhydro-Kdo",1-1,1-2,1-3,1-4,1-5,1-6,1dAlt-ol,1dEry-ol,"2,3-Anhydro-All","2,3-Anhydro-Man","2,3-Anhydro-Rib","2,5-Anhydro-D-Alt","2,5-Anhydro-D-AltOS","2,5-Anhydro-L-Man","2,5-Anhydro-Man","2,5-Anhydro-Man-ol","2,5-Anhydro-ManOS","2,5-Anhydro-Tal-ol","2,5-Anhydro-TalOP","2,7-Anhydro-Kdo","2,7-Anhydro-Kdof",2-4,2-5,2-6,3,"3,6-Anhydro-Fruf","3,6-Anhydro-Gal","3,6-Anhydro-GalOS","3,6-Anhydro-Glc","3,6-Anhydro-L-Gal","3,6-Anhydro-L-GalOMe",3-3,3-5,3-6,3dLyxHepUlosaric,4,"4,7-Anhydro-KdoOPEtn","4,8-Anhydro-DDGlcOct","4,8-Anhydro-Kdo","4,8-Anhydro-LDGlcOct",4-5,4dAraHex,4dEry-ol,4eLeg5Ac7Ac,5-2,5-3,5-5,5-6,6dAlt,6dAltNAc,6dAltOAc,6dAltf,6dAltfOAc,6dGul,6dManHep,6dTal,6dTalNAc,6dTalNAcOAc,6dTalOAc,6dTalOAcOAc,6dTalOAcOMe,6dTalOMe,6dTalOMe-ol,6dTalf,8eAciNAcNAc,8eLeg,8eLeg5Ac7Ac,8eLeg5Ac7AcGro,8eLegNAc,8eLegNAcNBut,Abe,AbeOAc,AcefA,AciNAcNAc,Aco,AcoNAc,AllN,AllOAc,AllOMe,Alt,AltA,AltAN,AltNAcA,AltOMeA,Altf,AltfOAc,Ami,ApiOAc,ApiOMe-ol,Apif,Ara,Ara-ol,AraHepUloNAc-onic,AraHepUloNAcN-onic,AraHepUloNGc-onic,AraHexA,AraN,AraNMeOMe,AraOAc,AraOAcOP-ol,AraOMe,AraOPN,Araf,ArafGro,ArafOCoum,ArafOFer,ArafOMe,ArafOS,Asc,Bac,BacNAc,BoiOMe,Col,D-2dAraHex,D-2dAraHexA,D-3dAraHepUlosonic,D-3dLyxHepUlosaric,D-3dThrHexUlosonic,D-3dThrPen,D-3dXylHexOMe,D-4dAraHex,D-4dEryHexOAcN4en,D-4dLyxHex,D-4dLyxHexOMe,D-4dThrHexA4en,D-4dThrHexAN4en,D-4dThrHexOAcN4en,D-4dXylHex,D-6dAllOMe,D-6dAlt,D-6dAltHep,D-6dAltHepOMe,D-6dAltHepf,D-6dAraHex,D-6dAraHexN,D-6dAraHexNAc,D-6dAraHexOMe,D-6dLyxHexOMe,D-6dManHep,D-6dManHepOAc,D-6dManHepOP,D-6dTal,D-6dTalHep,D-6dTalOAc,D-6dTalOAcOMe,D-6dTalOMe,D-6dXylHex,D-6dXylHexN4Ulo,D-6dXylHexNAc4Ulo,D-6dXylHexOMe,D-7dLyxOctUlosonic,D-9dThrAltNon-onic,D-Alt,D-Apif,D-ApifOAc,D-ApifOMe,D-Ara,D-Ara-ol,D-AraHepUlo-onic,D-AraHex,D-AraHexUloOMe,D-AraN,D-AraOS,D-Araf,D-ArafN,D-Fuc,D-Fuc-ol,D-FucN,D-FucNAc,D-FucNAc-ol,D-FucNAcN,D-FucNAcNMe,D-FucNAcNMeN,D-FucNAcOAc,D-FucNAcOMe,D-FucNAcOP,D-FucNAcOPEtn,D-FucNAlaAc,D-FucNAsp,D-FucNBut,D-FucNButGro,D-FucNFo,D-FucNLac,D-FucNMeN,D-FucNN,D-FucNThrAc,D-FucOAc,D-FucOAcN,D-FucOAcNBut,D-FucOAcNGroA,D-FucOAcOBut,D-FucOAcOMe,D-FucOBut,D-FucOEtn,D-FucOMe,D-FucOMeN,D-FucOMeOCoum,D-FucOMeOFer,D-FucOMeOSin,D-FucOS,D-Fucf,D-FucfNAc,D-FucfOAc,D-Ido,D-IdoA,D-IdoOSA,D-Rha,D-Rha-ol,D-RhaCMe,D-RhaGro,D-RhaN,D-RhaNAc,D-RhaNAcOAc,D-RhaNBut,D-RhaNButOMe,D-RhaNFo,D-RhaOFoN,D-RhaOMe,D-RhaOMeN,D-RhaOP,D-RhaOS,D-RibHex,D-RibHexNAc,D-Sor,D-ThrHexA4en,D-ThrHexAN4en,D-ThrHexfNAc2en,D-ThrPen,D-Thre-ol,DDAltHep,DDAltHepOMe,DDGalHep,DDGalHepOMe,DDGlcHep,DDManHep,DDManHepGroPA,DDManHepOBut,DDManHepOEtn,DDManHepOMe,DDManHepOP,DDManHepOPEtn,DDManNonUloNAcOFoN-onic,DLAltNonUloNAc-onic,DLGalNonUloNAc-onic,DLGalNonUloNAcN,DLGalNonUloNAcN-onic,DLGlcHepOMe,DLHepGlcOMe,DLManHep,DLManHepOPEtn,Dha,Dig,DigCMe,DigOAc,DigOFo,DigOMe,Ery,Ery-L-GlcNonUloNAcOAcOMeSH-onic,Ery-ol,Ery-onic,EryHex,EryHex2en,EryHexA3en,EryOMe-onic,Fru,Fruf,FrufF,FrufI,FrufN,FrufNAc,FrufOAc,FrufOAcOBzOCoum,FrufOAcOFer,FrufOBzOCin,FrufOBzOCoum,FrufOBzOFer,FrufOFer,FrufOLau,Fuc,Fuc-ol,FucN,FucNAc,FucNAcA,FucNAcGroP,FucNAcN,FucNAcNMe,FucNAcOAc,FucNAcOMe,FucNAla,FucNAm,FucNBut,FucNFo,FucNProp,FucNThrAc,FucOAc,FucOAcNAm,FucOAcNBut,FucOAcOMe,FucOAcOSOMe,FucOMe,FucOMeOPam,FucOMeOVac,FucOP,FucOPOMe,FucOS,FucOSOMe,Fucf,Gal,Gal-ol,Gal3S,Gal6S,Gal6Ulo,GalA,GalA-ol,GalAAla,GalAAlaLys,GalAGroN,GalALys,GalAN,GalANCys,GalANCysAc,GalANSerAc,GalAOLac,GalAOPyr,GalASer,GalAThr,GalAThrAc,GalCl,GalF,GalGro,GalGroN,GalGroP,GalN,GalNAc,GalNAc-ol,GalNAc-onic,GalNAc4S,GalNAcA,GalNAcAAla,GalNAcAN,GalNAcASer,GalNAcGro,GalNAcGroP,GalNAcGroPAN,GalNAcN,GalNAcOAc,GalNAcOAcA,GalNAcOAcAN,GalNAcOAcGroP,GalNAcOAcOMeA,GalNAcOAcOP,GalNAcOMe,GalNAcOP,GalNAcOPCho,GalNAcOPEtn,GalNAcOPyr,GalNAcOS,GalNAla,GalNAmA,GalNCysGly,GalNFoA,GalNFoAN,GalNOPCho,GalNSuc,GalNonUloNAc-onic,GalOAc,GalOAcA,GalOAcAGroN,GalOAcAOLac,GalOAcAThr,GalOAcGroP,GalOAcN,GalOAcNAla,GalOAcNAmA,GalOAcNFoA,GalOAcNFoAN,GalOAcOFoA,GalOAcOMe,GalOAcOP,GalOAcOPyr,GalOFoAN,GalOFoNAN,GalOLac,GalOLac-ol,GalOMe,GalOMeA,GalOMeCl,GalOMeF,GalOMeNAla,GalOP,GalOPA,GalOPAEtn,GalOPAN,GalOPCho,GalOPEtn,GalOPEtnA,GalOPEtnN,GalOPy,GalOPyr,GalOS,GalOSA,GalOSOEt,GalOSOMeA,GalOctUloNAc-onic,Galf,GalfGro,GalfGroP,GalfNAc,GalfOAc,GalfOAcGro,GalfOAcGroP,GalfOAcOLac,GalfOLac,GalfOMe,GalfOP,GalfOPCho,GalfOPyr,Gl,Glc,Glc-ol,Glc6Ulo,GlcA,GlcA3S,GlcAAla,GlcAAlaLys,GlcAGlu,GlcAGly,GlcAGro,GlcAGroN,GlcALys,GlcAN,GlcAOLac,GlcAOPy,GlcAOPyr,GlcASer,GlcAThr,GlcAThrAc,GlcCho,GlcF,GlcGro,GlcGroA,GlcGroP,GlcGroPA,GlcI,GlcN,GlcN-ol,GlcN2S6S,GlcNAc,GlcNAc-ol,GlcNAc6S,GlcNAcA,GlcNAcAAla,GlcNAcAN,GlcNAcANAla,GlcNAcANAlaAc,GlcNAcANAlaFo,GlcNAcAla,GlcNAcCl,GlcNAcGlu,GlcNAcGly,GlcNAcGro,GlcNAcGroP,GlcNAcGroPA,GlcNAcI,GlcNAcN,GlcNAcN-ol,GlcNAcNAla,GlcNAcNAlaFo,GlcNAcNAmA,GlcNAcNButA,GlcNAcOAc,GlcNAcOAcA,GlcNAcOAcN,GlcNAcOAcNAla,GlcNAcOAcOCmOOle,GlcNAcOAcOCmOPam,GlcNAcOAcOCmOVac,GlcNAcOAcOLac,GlcNAcOAcOOle,GlcNAcOAcOPam,GlcNAcOAcOPyr,GlcNAcOAcOS-ol,GlcNAcOAcOVac,GlcNAcOGc,GlcNAcOLac,GlcNAcOLacAla,GlcNAcOLacGro,GlcNAcOMe,GlcNAcOMeA,GlcNAcOP,GlcNAcOPCho,GlcNAcOPEtg,GlcNAcOPEtn,GlcNAcOPOAch,GlcNAcOPyr,GlcNAcOS,GlcNAcOS-ol,GlcNAcOSA,GlcNAm,GlcNAmA,GlcNBut,GlcNButAN,GlcNButOAc,GlcNCmOCm,GlcNCmOCmOOle,GlcNCmOCmOVac,GlcNCmOVac,GlcNGc,GlcNGly,GlcNMe,GlcNMeOCm,GlcNMeOCmOPam,GlcNMeOCmOSte,GlcNMeOCmOVac,GlcNMeOSte,GlcNMeOVac,GlcNN,GlcNOAep,GlcNOCmOAch,GlcNOCmOVac,GlcNOMar,GlcNOMe,GlcNOMyr,GlcNOOle,GlcNOPam,GlcNOPyr,GlcNOSte,GlcNOVac,GlcNS,GlcNSOS,GlcNSOSOMe,GlcNSuc,GlcOAc,GlcOAcA,GlcOAcGro,GlcOAcGroA,GlcOAcGroP,GlcOAcN,GlcOAcNBut,GlcOAcNCmOOle,GlcOAcNCmOPam,GlcOAcNCmOVac,GlcOAcNMeOCm,GlcOAcNMeOCmOVac,GlcOAcNMeOVac,GlcOAcNOCmOVac,GlcOAcNOOle,GlcOAcNOPam,GlcOAcNOVac,GlcOAcOCoum,GlcOAcOFer,GlcOAcOOle,GlcOAcOP,GlcOAcOPam,GlcOAcOS,GlcOAcOSA,GlcOAcOSte,GlcOButA,GlcOBz,GlcOCoum,GlcOEt,GlcOEtn,GlcOEtnA,GlcOEtnN,GlcOFer,GlcOFoN,GlcOGc,GlcOLac,GlcOMal,GlcOMe,GlcOMe-ol,GlcOMeA,GlcOMeAN,GlcOMeN,GlcOMeNOMyr,GlcOMeOFoA,GlcOMeOPyr,GlcOOle,GlcOP,GlcOP-ol,GlcOPA,GlcOPCho,GlcOPChoGro,GlcOPEtn,GlcOPEtnGro,GlcOPEtnN,GlcOPGroP,GlcOPN,GlcOPNOMyr,GlcOPNOPam,GlcOPOOle,GlcOPPEtn,GlcOPPEtnN,GlcOPam,GlcOPyr,GlcOS,GlcOSA,GlcOSN,GlcOSNMeOCm,GlcOSOEt,GlcOSOMe,GlcOSOMeA,GlcOSin,GlcS,GlcSH,GlcThr,Glcf,Gro,Gro-ol,Gul,GulAN,GulNAcA,GulNAcAN,GulNAcNAmA,GulNAcOAcA,Hep,HepOP,HepOPEtn,HepOPPEtn,Hex,HexA,HexN,HexNAc,HexOMeOFo,Hexf,Ido,IdoA,IdoA2S,IdoN,IdoNAc,IdoOAcA,IdoOAcOSA,IdoOMeA,IdoOS,IdoOSA,IdoOSOEtA,IdoOSOMeA,Kdn,KdnOAc,KdnOMe,KdnOPyr,Kdo,Kdo-ol,KdoGroP,KdoN,KdoOAc,KdoOAcOS,KdoOMe,KdoOP,KdoOPEtn,KdoOPN,KdoOPOEtn,KdoOPOPEtn,KdoOPPEtn,KdoOPPEtnN,KdoOPyr,KdoOS,Kdof,Ko,KoOMe,KoOPEtn,L-4dEryHexAN4en,L-4dThrHex4en,L-4dThrHexA4en,L-4dThrHexA4enAla,L-4dThrHexAN4en,L-4dThre-ol,L-6dAraHex,L-6dAraHexOMe,L-6dGalHep,L-6dGalHepOP,L-6dGulHep,L-6dGulHepOMe,L-6dGulHepOP,L-6dXylHexNAc4Ulo,L-Aco,L-AcoOMe,L-AcoOMeOFo,L-BoiOMe,L-Cym,L-CymOAc,L-DigOMe,L-Ery,L-EryCMeOH,L-EryHexA4en,L-Fru,L-Fruf,L-Gal,L-GalAN,L-GalNAc,L-GalNAc-onic,L-GalNAcA,L-GalNAcAN,L-GalNAcOAcA,L-GalNAmA,L-GalOAcNAmA,L-GalOS,L-Glc,L-GlcA,L-GlcNAc,L-GlcOMe,L-Gro-onic,L-GroHexUlo,L-Gul,L-Gul-onic,L-GulA,L-GulAN,L-GulHep,L-GulNAc,L-GulNAcA,L-GulNAcAGly,L-GulNAcAN,L-GulNAcANEtn,L-GulNAcNAmA,L-GulNAcNEtnA,L-GulNAcOAc,L-GulNAcOAcA,L-GulNAcOAcAN,L-GulNAcOEtA,L-GulNAcOEtnA,L-GulOAcA,L-Lyx,L-LyxHex,L-LyxHexNMe,L-LyxHexOMe,L-Man,L-ManOMe,L-ManOctUlo-onic,L-Ole,L-OleOAc,L-Oli,L-OliOMe,L-Qui,L-QuiN,L-QuiNAc,L-QuiNAcOMe,L-QuiNAcOP,L-QuiOMeN,L-RibHex,L-Ribf,L-Tal,L-The,L-TheOAc,L-Thr,L-ThrHexA4en,L-ThrHexAN4en,L-ThrHexOMe4en,L-ThrHexOMeA4en,L-ThrHexOSA4en,L-Xyl,L-XylHex,L-XylOMe,LDGalHep,LDGalNonUloNAc-onic,LDGlcHep,LDIdoHep,LDIdoHepPro,LDManHep,LDManHepGroN,LDManHepGroPA,LDManHepOAc,LDManHepOCm,LDManHepOEtn,LDManHepOMe,LDManHepOP,LDManHepOPEtn,LDManHepOPEtnOEtn,LDManHepOPOCm,LDManHepOPOMe,LDManHepOPOPEtn,LDManHepOPOPPEtn,LDManHepOPPEtn,LDManHepOPPEtnOPyrP,LDManHepOPyrP,LDManNonUloNAcOFoN-onic,LDManNonUloOFoNN-onic,LLManNonUloOFoN-onic,Leg,Leg5Ac7Ac,LegNAc,LegNAcAla,LegNAcNAla,LegNAcNAm,LegNAcNBut,LegNFo,Lyx,LyxHex,LyxHexOMe,LyxOMe,LyxOctUlo-onic,Lyxf,Man,Man-ol,ManA,ManCMe,ManF,ManGroP,ManN,ManNAc,ManNAcA,ManNAcAAla,ManNAcAGro,ManNAcAN,ManNAcANOOrn,ManNAcASer,ManNAcAThr,ManNAcGroA,ManNAcGroP,ManNAcGroPA,ManNAcNAmA,ManNAcNEtnA,ManNAcOAc,ManNAcOAcA,ManNAcOLac,ManNAcOMe,ManNAcOMeAN,ManNAcOPEtn,ManNAcOPyr,ManNBut,ManNGroP,ManNonUloNAc-onic,ManOAc,ManOAcA,ManOAcN,ManOAcOMe,ManOAcOPyr,ManOAep,ManOBut,ManOEtn,ManOLac,ManOMe,ManOMeA,ManOP,ManOP-ol,ManOPCho,ManOPEtn,ManOPOMe,ManOPOPyr-ol,ManOPy,ManOPyr,ManOS,ManOctUlo,ManSH,Manf,Mur,MurNAc,MurNAcAla,MurNAcOP,MurNAcSer,Neu,Neu5Ac,Neu5AcN,Neu5AcNAc,Neu5AcNMe,Neu5AcOAc,Neu5AcOAcOMe,Neu5AcOGc,Neu5AcOMe,Neu5AcOS,Neu5Gc,Neu5GcA,Neu5GcN,Neu5GcOMe,Neu5GcOS,NeuNAc,NeuOFo,NeuOMe,OLac,Ole,Oli,OliN,OliNAc,OliOMe,Par,Parf,PerNAc,Pse,Pse5Ac7Ac,Pse5Ac7AcNBut,Pse5Ac7AcOBut,PseNAc,PseNAcNAm,PseNAcNBut,PseNAcNFo,PseNAcNGro,PseNAcOAcNBut,PseNAcOBut,PseNButNFo,PseNGcNAm,PseOAc,PseOAcOFo,PseOFo,Qui,QuiN,QuiNAc,QuiNAc-ol,QuiNAcGro,QuiNAcGroP,QuiNAcN,QuiNAcNAlaAc,QuiNAcNAm,QuiNAcNAspAc,QuiNAcNBut,QuiNAcNButGro,QuiNAcNGroA,QuiNAcOAc,QuiNAcOBut,QuiNAcOMe,QuiNAcOP,QuiNAla,QuiNAlaAc,QuiNAlaAcGro,QuiNAlaBut,QuiNAlaButGro,QuiNAspAc,QuiNBut,QuiNButAla,QuiNButOMe,QuiNFo,QuiNGlyAc,QuiNHse,QuiNHseGro,QuiNLac,QuiNMal,QuiNSerAc,QuiNThrAc,QuiOMe,QuiOMeN,QuiOS,QuiOSN,QuiOSNBut,Rha,Rha-ol,RhaCMe,RhaCl,RhaGro,RhaGroA,RhaGroP,RhaNAc,RhaNAcNBut,RhaNAcNFo,RhaNAcOAc,RhaNPro,RhaOAc,RhaOAcOLac,RhaOAcOMe,RhaOBut,RhaOFer,RhaOLac,RhaOMe,RhaOMeCMeNLac,RhaOMeCMeOFo,RhaOP,RhaOPEtn,RhaOPOMe,RhaOProp,RhaOPyr,RhaOS,Rhaf,Rib,Rib-ol,RibGroP-ol,RibOAc,RibOAcOP-ol,RibOP-ol,RibOPEtn-ol,RibOPGroP-ol,Ribf,Ribf-uronic,RibfOAc,Sed,Sedf,Sor,Sorf,Suc,Sug,SugOAc,Tag,Tal,The,Thr,Thre-ol,Thre-onic,Tyv,VioNAc,Xluf,XlufOMe,Xyl,Xyl-ol,Xyl-onic,XylHex,XylHexNAc,XylHexUlo,XylHexUloN,XylHexUloNAc,XylNAc,XylNMe,XylOAc,XylOBz,XylOMe,XylOP,XylOS,Xylf,Yer,YerOAc,a-Tri-ol,a-Tri-onic,a1-1,a1-2,a1-3,a1-4,a1-5,a1-6,a1-7,a1-8,a2-1,a2-2,a2-3,a2-4,a2-5,a2-6,a2-7,a2-8,a2-9,a6-6,"aldehyde-2,5-Anhydro-L-Man","aldehyde-2,5-Anhydro-Tal",aldehyde-Gro,aldehyde-Hex,aldehyde-L-Gro,aldehyde-L-GroN,aldehyde-QuiNAc,aldehyde-Rib,aldehyde-a-Tri-ol,aldehyde-b-Tri-ol,b-Tri-N-ol,b-Tri-OP-ol,b-Tri-ol,b-Tri-onic,b1-1,b1-2,b1-3,b1-4,b1-4Glc,b1-5,b1-6,b1-7,b1-8,b1-9,b2-1,b2-2,b2-3,b2-4,b2-5,b2-6,b2-7,b2-8,b3-3,cNeu5Ac,Fuc*a1-3*GlcNAc,GalNAc*a1-4*GlcNAcA,GlcN*b1-7*Kdo,GlcNAc*b1-4*GlcNAc,GlcNAcA*a1-4*Kdo,GlcOPN*b1-6*GlcOPN,Kdo*a2-4*Kdo,Kdo*a2-5*Kdo,Kdo*a2-6*GlcOPN,Man*a1-2*Man,Man*a1-3*Man,Man*a1-6*Man,Man*b1-4*GlcNAc,Xyl*b1-2*Man
Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,1.0,4.0,1.846154,0.591716,4.0,1.0,4.0,4.0,-0.03448276,0.240093,0.051241,0.727273,0.337083,0.251423,0.276471,0.000487,0.4,0.181818,0.288591,0.003994,0.727273,0.240093,0.051241,0.318182,0.090909,0.179293,0.00646,0.727273,0.240093,0.051241,6.95,3.253571,4.82033,0.90878,66.603303,26.305893,44.589784,127.7512,4.0,13.0,13.0,0.018063,-2.422758,13.0,12.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,1,1
Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10.0,1.0,3.0,1.866667,0.248889,3.0,0.0,3.0,3.0,-2.396231e-15,0.263004,0.037282,0.703297,0.288267,0.234925,0.257853,0.000179,0.341463,0.157303,0.238951,0.003059,0.703297,0.263004,0.037282,0.296703,0.076923,0.182104,0.005067,0.703297,0.263004,0.037282,6.616667,3.407937,4.921958,0.799295,70.823725,26.381812,48.985176,174.452553,3.0,15.0,15.0,0.03737,-2.70461,15.0,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,1,1,0
GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10.0,2.0,4.0,1.866667,0.382222,3.0,0.0,4.0,4.0,-0.01449275,0.23956,0.044684,0.615385,0.287575,0.234359,0.257668,0.000274,0.35,0.17284,0.255611,0.003247,0.615385,0.23956,0.044684,0.307692,0.076923,0.169545,0.00624,0.615385,0.23956,0.044684,6.616667,3.563492,5.083122,0.950051,66.992537,28.248894,47.236515,150.711681,4.0,15.0,15.0,0.016526,-2.692253,15.0,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,0,0,0,0,0


In [None]:
#identify significant binding motifs with (for instance) Z-score data
import pandas as pd
from glycowork.motif.analysis import get_pvals_motifs
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN',
           'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'Glc(b1-3)Glc(b1-3)Glc']
label = [3.234, 2.423, 0.733, 3.102, 0.108]
test_df = pd.DataFrame({'glycan':glycans, 'binding':label})
HTML(get_pvals_motifs(test_df, glycan_col_name = 'glycan', label_col_name = 'binding').to_html())

Unnamed: 0,motif,pval,corr_pval
1075,Man*b1-4*GlcNAc,0.0,0.0
1074,Man*a1-6*Man,0.0,0.0
1073,Man*a1-3*Man,0.0,0.0
450,GlcNAc,0.0,0.0
1015,a1-6,0.0,0.0
1066,GlcNAc*b1-4*GlcNAc,0.0,0.0
1045,b1-4,0.0,0.0
1012,a1-3,0.018875,1.0
794,Man,0.019506,1.0
710,L-GulHep,1.0,1.0
