# GeneNetworkAPI

Provides access to the [GeneNetwork](http://genenetwork.org) database
and analysis functions using the [GeneNetwork REST
API](https://github.com/genenetwork/gn-docs/blob/master/api/GN2-REST-API.md).

Karl Broman wrote the
[GNapi](https://github.com/kbroman/GNapi/blob/main/README.md) R
package for providing access to GeneNetwork from R.  This package
follows the structure and function of that package closely.

## Note on terminology

GeneNetwork collects data on genetically segregating populations
(called _groups_) in a number of _species_ including humans.  Most of
the phenotype data is "omic" data which are organized as _datasets_. 

## Import genenetworkapi

In [1]:
import genenetworkapi.v_pre1 as gnapi

## Check connection
To check if the website is responding properly:

In [2]:
gnapi.check_gn()

GeneNetwork is alive.


200

## Get species list
Which species have data on them?

In [3]:
gnapi.list_species()

Unnamed: 0,FullName,Id,Name,TaxonomyId
0,Mus musculus,1,mouse,10090
1,Rattus norvegicus,2,rat,10116
2,Arabidopsis thaliana,3,arabidopsis,3702
3,Homo sapiens,4,human,9606
4,Hordeum vulgare,5,barley,4513
5,Fly (Drosophila melanogaster dm6),6,drosophila,7227
6,Macaca mulatta,7,monkey,9544
7,Glycine max,8,soybean,3847
8,Solanum lycopersicum,9,tomato,4081
9,Populus trichocarpa,10,poplar,3689


To get information on a single species:

In [4]:
gnapi.list_species("rat")

Unnamed: 0,FullName,Id,Name,TaxonomyId
0,Rattus norvegicus,2,rat,10116


## List groups for a species
Since the information is organized by segregating population
("group"), it is useful to get a list for a particular species you
might be interested in.

In [5]:
gnapi.list_groups("rat")

Unnamed: 0,DisplayName,FullName,GeneticType,Id,MappingMethodId,Name,SpeciesId,public
0,Hybrid Rat Diversity Panel (Includes HXB/BXH),Hybrid Rat Diversity Panel (Includes HXB/BXH),,10,1.0,HXBBXH,2,2
1,UIOWA SRxSHRSP F2,UIOWA SRxSHRSP F2,intercross,24,1.0,SRxSHRSPF2,2,2
2,NIH Heterogeneous Stock (RGSMC 2013),NIH Heterogeneous Stock (RGSMC 2013),,42,1.0,HSNIH-RGSMC,2,2
3,NIH Heterogeneous Stock (Palmer),NIH Heterogeneous Stock (Palmer),,55,,HSNIH-Palmer,2,2
4,NWU WKYxF344 F2 Behavior,NWU WKYxF344 F2 Behavior,intercross,82,3.0,NWU_WKYxF344_F2,2,2
5,HIV-1Tg and Control,HIV-1Tg and Control,,83,1.0,HIV-1Tg,2,2
6,HRDP-HXB/BXH Brain Proteome,HRDP-HXB/BXH Brain Proteome,,87,1.0,HRDP_HXB-BXH-BP,2,2
7,HRDP-HXB/BXH Metabolome,HRDP-HXB/BXH Metabolome,,98,1.0,HRDP_HXB-BXH-Metb,2,2


You can see the type of population it is.  Note the short name
(`Name`) as that will be used in queries involving that population
(group).

## Get genotypes for a group
To get the genotypes of a group:

In [6]:
gnapi.get_geno("BXD").iloc[:10]

  df = pd.read_csv(stringbytes, header=header, sep="\t")


Unnamed: 0,Chr,Locus,cM,Mb,BXD1,BXD2,BXD5,BXD6,BXD8,BXD9,...,BXD077xBXD062F1,BXD083xBXD045F1,BXD087xBXD100F1,BXD065bxBXD055F1,BXD102xBXD077F1,BXD102xBXD73bF1,BXD170xBXD172F1,BXD172xBXD197F1,BXD197xBXD009F1,BXD197xBXD170F1
0,1,rsm10000000001,0.0,3.00149,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
1,1,rs31443144,0.11,3.010274,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
2,1,rs6269442,0.21,3.492195,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
3,1,rs32285189,0.32,3.511204,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
4,1,rs258367496,0.43,3.659804,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
5,1,rs32430919,0.53,3.777023,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
6,1,rs36251697,0.64,3.812265,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
7,1,rs30658298,0.75,4.430623,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
8,1,rs31879829,0.85,4.518714,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H
9,1,rs36742481,0.96,4.776319,B,B,D,D,D,B,...,D,H,B,H,H,H,B,H,H,H


Currently, we only support the `.geno` format which returns a data
frame of genotypes with rows as marker and columns as individuals.

## List datasets for a group

To list the (omic) datasets available for a group, you have to use the
name as listed in the group list for a species:

In [7]:
gnapi.list_datasets("HSNIH-Palmer")

Unnamed: 0,AvgID,CreateTime,DataScale,FullName,Id,Long_Abbreviation,ProbeFreezeId,ShortName,Short_Abbreviation,confidentiality,public
0,24,"Mon, 27 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (A...,860,HSNIH-Rat-Acbc-RSeq-Aug18,347,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (A...,HSNIH-Rat-Acbc-RSeq-0818,0,1
1,24,"Sun, 26 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18...,861,HSNIH-Rat-IL-RSeq-Aug18,348,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18...,HSNIH-Rat-IL-RSeq-0818,0,1
2,24,"Sat, 25 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) ...,862,HSNIH-Rat-LHB-RSeq-Aug18,349,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) ...,HSNIH-Rat-LHB-RSeq-0818,0,1
3,24,"Fri, 24 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) ...,863,HSNIH-Rat-PL-RSeq-Aug18,350,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) ...,HSNIH-Rat-PL-RSeq-0818,0,1
4,24,"Thu, 23 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug...,864,HSNIH-Rat-VoLo-RSeq-Aug18,351,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug...,HSNIH-Rat-VoLo-RSeq-0818,0,1
5,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (A...,868,HSNIH-Rat-Acbc-RSeqlog2-Aug18,347,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (A...,HSNIH-Rat-Acbc-RSeqlog2-0818,0,0
6,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18...,869,HSNIH-Rat-IL-RSeqlog2-Aug18,348,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18...,HSNIH-Rat-IL-RSeqlog2-0818,0,0
7,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) ...,870,HSNIH-Rat-LHB-RSeqlog2-Aug18,349,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) ...,HSNIH-Rat-LHB-RSeqlog2-0818,0,0
8,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) ...,871,HSNIH-Rat-PL-RSeqlog2-Aug18,350,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) ...,HSNIH-Rat-PL-RSeqlog2-0818,0,0
9,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug...,872,HSNIH-Rat-VoLo-RSeqlog2-Aug18,351,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug...,HSNIH-Rat-VoLo-RSeqlog2-0818,0,0


## Get sample data for a group

This gives you a matrix with rows as individuals/samples/strains and
columns as "clinical" (non-omic) phenotypes.  The number after the
underscore is the phenotype number (to be used later).  Some data may
be missing.

In [8]:
gnapi.get_pheno("HSNIH-Palmer").iloc[81:100]

  df = pd.read_csv(stringbytes, header=0, sep=",")


Unnamed: 0,id,HSR_10001,HSR_10002,HSR_10003,HSR_10004,HSR_10005,HSR_10006,HSR_10007,HSR_10008,HSR_10009,...,HSR_10499,HSR_10500,HSR_10501,HSR_10502,HSR_10503,HSR_10504,HSR_10505,HSR_10506,HSR_10507,HSR_10508
81,00072AAC0D,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
82,00072AC972,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
83,00077E61DC,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
84,00077E61EC,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
85,00077E61F3,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
86,00077E61F5,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
87,00077E6204,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
88,00077E6207,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
89,00077E6299,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
90,00077E62CD,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x


To obtain omics phenotypes, you can utilize the `get_omics()` function, which provides a matrix with individuals/samples/strains as rows and omic phenotypes as columns. This function requires the input of a short abbreviation representing the available (omic) datasets for a particular group. To obtain the short abbreviation, you can refer to the section titled "List datasets for a group" and use the `list_dataset()` function.
For instance, if you want to acquire the phenotype matrix corresponding to "HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18) rlog," you would use its respective short abbreviation.

In [9]:
gnapi.get_omics("HSNIH-Rat-IL-RSeq-0818")

Unnamed: 0,"(0, 00071F4FAF)","(1, 00071F6771)","(2, 00071F768E)","(3, 00071F95F9)","(4, 00071FB160)","(5, 00071FB747)","(6, 00072069AD)","(7, 0007207A73)","(8, 0007207BE7)","(9, 00072126F3)",...,"(6161, 0007899914)","(6162, 0007899976)","(6163, 0007929913)","(6164, 0007929918)","(6165, 0007929945)","(6166, 00077E840E)","(6167, 00077E9879)","(6168, 00077E9920)","(6169, 00077E9D84)","(6170, 00077E949D)"
ENSRNOG00000000001,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
ENSRNOG00000000007,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
ENSRNOG00000000008,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
ENSRNOG00000000009,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
ENSRNOG00000000010,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ENSRNOG00000062301,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
ENSRNOG00000062302,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
ENSRNOG00000062303,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x
ENSRNOG00000062304,x,x,x,x,x,x,x,x,x,x,...,x,x,x,x,x,x,x,x,x,x


underscore is the phenotype number (to be used later).  Some data may
be missing.

## Get information about traits

To get information on a particular (non-omic) trait use the group name
and the trait number:

In [10]:
gnapi.info_dataset(dataset="HSNIH-Palmer", trait="10308")

Unnamed: 0,dataset_type,description,id,name
0,phenotype,"Central nervous system, behavior: Reaction tim...",10308,reaction_time_pint1_5


## Summary information on traits

Get a list of the maximum LRS for each trait and position.

In [11]:
gnapi.info_pheno(group="HXBBXH").iloc[:10]

Unnamed: 0,Additive,Authors,Chr,Description,Id,LRS,Locus,Mb,Mean,PubMedID,Year
0,0.049997,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",8,Original post publication description: insulin...,10001,16.283131,rsRn10010063,27.969673,0.18364,12016513.0,2002
1,-0.092636,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",14,Original post publication description: insulin...,10002,10.977678,rs63915446,0.439058,0.2814,12016513.0,2002
2,0.60189,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",20,Original post publication description: glucose...,10003,13.651471,rsRn10018260,0.11774,6.34948,12016513.0,2002
3,0.992576,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",8,Original post publication description: glucose...,10004,13.15206,rsRn10010761,109.713893,7.0864,12016513.0,2002
4,0.008542,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",8,Original post publication description: insulin...,10005,18.589521,rsRn10010063,27.969673,0.02916,12016513.0,2002
5,-0.035521,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",8,Original post publication description: insulin...,10006,13.389624,rsRn10010217,48.384095,0.0408,12016513.0,2002
6,0.413279,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",2,Original post publication description: glucose...,10007,10.034807,rsRn10002250,61.977051,6.6456,12016513.0,2002
7,-0.936806,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",3,Original post publication description: glucose...,10008,13.249384,rsRn10004899,146.548904,9.142,12016513.0,2002
8,1.23913,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",7,Original post publication description: glucose...,10009,12.081519,rsRn10009060,28.357297,8.57,12016513.0,2002
9,1.298196,"Pravenec M, Zidek V, Musilova A, Simakova M, K...",7,Original post publication description: glucose...,10010,10.594786,rsRn10009060,28.357297,8.38132,12016513.0,2002


You could also specify a group and a trait number or a dataset and a probename.

In [12]:
gnapi.info_pheno(dataset="BXD", trait="10001")

TypeError: info_pheno() got an unexpected keyword argument 'dataset'

In [None]:
gnapi.info_pheno(group="HC_M2_0606_P", trait="1436869_at")

## Analysis commands


### GEMMA

- db (required) - Dataset name for the trait below (Short_Abbreviation listed when you query for datasets)
- trait_id (required) - ID form trait being mapped
- use_loco - Whether to use LOCO (leave one chromosome out) method (default = False)
- maf - minor allele frequency (default = 0.01)

In [None]:
name, df = gnapi.run_gemma(db="BXDPublish", trait_id="10015", use_loco=True)
print(name)
print(df.iloc[:10])

### R/qtl

This function performs a one-dimensional genome scan.  The arguments
are

- db (required) - Dataset name for trait below (Short_Abbreviation listed
  when you query for datasets)
- trait_id (required) - ID for trait being mapped
- method - Corresponds to the "method" option for the R/qtl scanone function (default = hk; Options: hk, ehk, em, imp, mr, mr-imp mr-argmax)
- model - corresponds to the "model" option for the R/qtl scanone function normal (default = normal, Options: normal, binary, 2-part, np) 
- n_perm - number of permutations (default = 0)
- control_marker - Name of marker to use as control; this relies on
  the user knowing the name of the marker they want to use as a
  covariate
- interval_mapping - Whether to use interval mapping (default = False)

In [None]:
gnapi.run_rqtl(db="BXDPublish", trait_id="10015").iloc[:10]


### Correlation

This function correlates a trait in a dataset against all traits in a
target database.

- db (required) - DB name for the trait above (this is the Short_Abbreviation listed when you query for datasets)
- trait_id (required) - ID for trait used for correlation
- target_db (required) - Target DB name to be correlated against
- tp - (default = sample; Option: sample, tissue)
- method - (default = pearson; Options: pearson, spearman)
- return - Number of results to return (default = 500)

In [None]:
gnapi.run_correlation(
    trait_id="1427571_at", db="HC_M2_0606_P", target_db="BXDPublish"
).iloc[:10]