# More reliable ways of accessing data

## Representational State Transfer - [REST](https://doi.org/10.1145/514183.514185)

REST is a design philosophy for client/server-style software.  It says:

1. Every "Thing" should be designated by ONE globally unique identifier/address
2. Every "Thing" exists in a particular "state"
    * e.g. if a=2, then the state of "a" is "2"
3. There are a limited number of uniform (i.e. globally accepted) functions a client may use to request/query/modify the state of the Thing
    * this means that you are not allowed to invent your own API!!!
4. The client is "stateless" - after performing an operation on a Thing, the client forgets everything it knows, and the Thing also forgets that it was visited by the client.  The client "has no memory" - all memory is in the "state" of the informaton system itself (i.e. the Web)
    * this means, NO COOKIES!
5. Because the system is stateless, the current state of the "Thing", and all operations on a "Thing" that are valid at a given moment, must be reported by the "Thing's" server, to the client, using "Hypertext" (a way to describe the current "state", and possible actions, in a machine-readable way)
6. the client then selects one of those valid operations and executes it, using one of the limited number of uniform fuctions (see [3] above).
7. finally, the client is able to retrieve the "Thing" in one or more "representations" (e.g. retrieve the AB13345 record in GenBank or XML or EMBL or PDF format) using the same identifier in every case, and one of the functions in [3] above.
<pre>


</pre>



# REST on the Web

The most common way to implement REST on the Web is to use the Hypertext Transfer Protocol ([HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)) and Uniform Resource Identifiers ([URIs](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier)).


# HTTP

At the core of HTTP are 5 "methods" (there are additional methods that we wont discuss):

* GET - retrieve
* PUT - create/replace
* POST - process/update
* HEAD  - request metadata
* DELETE - remove


# REST using HTTP

* GET URL - retrieve the current state of URL
* PUT URL [data] - create/replace the current state of URL with [data]
* POST URL [data] - process [data] in manner defined by URL hypertext; update state of URL with that result
* HEAD URL  - request metadata about current state of URL
* DELETE URL - set state of URL to be NULL


# Requesting a specific Representation in REST+HTTP

In REST, everything is represented by a single, globally-unique URI (URL).  How, then, can we get the XML version, or the HTML version, or the PDF version?  On the Web, this is achieved through [Content-negotiation](https://en.wikipedia.org/wiki/Content_negotiation).  The client can, for example through a HEAD request, ask the server what representations are available for a Resource.  The client responds with a set of [Media types](https://en.wikipedia.org/wiki/Media_type)
it can provide, using their [MIME-type abbreviation](https://www.iana.org/assignments/media-types/media-types.xhtml).

    e.g.  application/pdf, image/png
    
In this way, the client can say "please give me the current state of URL_XXX in the format 'image/png'"

For this course, you don't neeed to understand REST any more deeply than that!




# REST interfaces in Bioinformatics

First, there are hardly any interfaces in Bioinformatics that follow the rules above.  Even the ones that claim to be REST, almost never are!  **This is not a bad thing**!  REST is actually pretty hard, and most interfaces that claim to be REST are actually quite useful!!  The bad thing is that bioinformatics websites CLAIM to be REST only __because it is cool__, but without understanding how hard it really is.  Since they don't understand, they try to make REST simple, and they break the rules to achieve this... so they are not REST anymore :-)

We are going to use one of these **almost** REST interfaces in this lecture.  The interface we will use is called [TOGO](http://togows.dbcls.jp)

We will go to that website now to explore the interface together, then we will create some Ruby code that accesses the TOGO "REST" interface to retrieve data.

In [1]:
require 'rest-client'
require 'json'  # to handle JSON format


address = 'http://togows.dbcls.jp/entry/uniprot/AP3_ARATH/dr.json'

response = RestClient::Request.execute(  #  or you can use the 'fetch' function we created last class
  method: :get,
  url: address)  

# puts response.body

data = JSON.parse(response.body)


#puts data[0]["InterPro"]

for elem in data[0]["InterPro"].each
  # puts elem
  puts "InterPro ID: #{elem[0]}  name: #{elem[1]}"
end


InterPro ID: IPR033896  name: MADS_MEF2-like
InterPro ID: IPR002487  name: TF_Kbox
InterPro ID: IPR002100  name: TF_MADSbox
InterPro ID: IPR036879  name: TF_MADSbox_sf
InterPro ID: IPR036168  name: AP2_Mu_C_sf
InterPro ID: IPR001392  name: Clathrin_mu
InterPro ID: IPR018240  name: Clathrin_mu_CS
InterPro ID: IPR011012  name: Longin-like_dom_sf
InterPro ID: IPR028565  name: MHD
InterPro ID: IPR017105  name: AP3_complex_dsu
InterPro ID: IPR011989  name: ARM-like
InterPro ID: IPR016024  name: ARM-type_fold
InterPro ID: IPR002553  name: Clathrin/coatomer_adapt-like_N
InterPro ID: IPR029390  name: AP3B_C
InterPro ID: IPR026739  name: AP_beta
InterPro ID: IPR011989  name: ARM-like
InterPro ID: IPR016024  name: ARM-type_fold
InterPro ID: IPR002553  name: Clathrin/coatomer_adapt-like_N
InterPro ID: IPR001471  name: AP2/ERF_dom
InterPro ID: IPR036955  name: AP2/ERF_dom_sf
InterPro ID: IPR016177  name: DNA-bd_dom_sf
InterPro ID: IPR044808  name: ERF_plant
InterPro ID: IPR016635  name: AP_complex

[["IPR033896", "MADS_MEF2-like"], ["IPR002487", "TF_Kbox"], ["IPR002100", "TF_MADSbox"], ["IPR036879", "TF_MADSbox_sf"], ["IPR036168", "AP2_Mu_C_sf"], ["IPR001392", "Clathrin_mu"], ["IPR018240", "Clathrin_mu_CS"], ["IPR011012", "Longin-like_dom_sf"], ["IPR028565", "MHD"], ["IPR017105", "AP3_complex_dsu"], ["IPR011989", "ARM-like"], ["IPR016024", "ARM-type_fold"], ["IPR002553", "Clathrin/coatomer_adapt-like_N"], ["IPR029390", "AP3B_C"], ["IPR026739", "AP_beta"], ["IPR011989", "ARM-like"], ["IPR016024", "ARM-type_fold"], ["IPR002553", "Clathrin/coatomer_adapt-like_N"], ["IPR001471", "AP2/ERF_dom"], ["IPR036955", "AP2/ERF_dom_sf"], ["IPR016177", "DNA-bd_dom_sf"], ["IPR044808", "ERF_plant"], ["IPR016635", "AP_complex_ssu"], ["IPR022775", "AP_mu_sigma_su"], ["IPR027155", "APS3"], ["IPR000804", "Clathrin_sm-chain_CS"], ["IPR011012", "Longin-like_dom_sf"], ["IPR001471", "AP2/ERF_dom"], ["IPR036955", "AP2/ERF_dom_sf"], ["IPR016177", "DNA-bd_dom_sf"], ["IPR001471", "AP2/ERF_dom"], ["IPR036955",

## Below is an example of content-type negotiation

we are going to add a "header" to our request (you will see more of this in a later lecture) telling the server that we want json


In [None]:
require 'rest-client'
require 'json'  # to handle JSON format


address = 'https://doi.org/10.1038/sdata.2016.18'

response = RestClient::Request.execute(  #  or you can use the 'fetch' function we created last class
  method: :get,
  url: address,
#  headers: {accept: 'text/plain'}
#  headers: {accept: 'application/json'}
#  headers: {accept: 'text/turtle'}
)  
puts response.body.force_encoding('UTF-8')  # the encoding instruction is to ensure 'strange' characters are converted into displayable characters

# Prove you understand

* Write the code that prints the journal titles for all of the 'references' of AP3_ARATH

* Write the code that retrieves and prints the Gene Ontology annotations from the UFO protein of _Arabidopsis thaliana_ (look up the UniProt protein ID); limit the output to only those that are "Inferred from Direct Assasy (IDA), or "Inferred from Mutant Phenotype (IMP)" and tell the user what the source was (e.g.IMP:TAIR)
* pick a problem and a gene/protein/pathway that is interesting for you, and try to solve it using TOGO + Ruby
* COMBINE TWO REST APIs:  
    * Gibberellins are involved in a wide range of Plant metabolic activities.  I want you to use TOGO to find the (KEGG) reaction:  Gibberellin A12 aldehyde <=> Gibberellin A53 aldehyde, then extract the **Map identifier** for that reaction.
    * Use the Map Identifier with the [KEGG REST API](http://www.kegg.jp/kegg/docs/keggapi.html) to retrieve the image of the entire pathway 
        * **You have succeeded when you have [THIS](http://rest.kegg.jp/get/rn00904/image)**