# Package Manager

RubyGems is a PackageManager - that is, it is software that makes it easier to find, share, and reuse other people's classes

The Website for RubyGems is:  http://rubygems.org

browse to that site now...

search for "xml" - we are looking for a package that will give us an easy way to manage XML files  (please tell me if you need me to give you a lecture on XML...)



Found one here:  https://rubygems.org/gems/xml-simple

        
## xml-simple 1.1.5

A simple API for XML processing.


Click on the "documentation" link:  http://www.rubydoc.info/gems/xml-simple/1.1.5


    Class: XmlSimple

    Inherits:    Object
    Includes:  REXML
    Defined in:   lib/xmlsimple.rb


The "Class" tells you the name of the object
The "Defined in" tells you the name of the package you need to require.




In [None]:
# so we have...
require 'xmlsimple'

simple = XmlSimple.new




## Look at more of the documentation


### Class Method Summary

    .xml_in(string = nil, options = nil) ⇒ Object

    This is the functional version of the instance method xml_in.
    .xml_out(hash, options = nil) ⇒ Object

    This is the functional version of the instance method xml_out.

### Instance Method Summary

    #initialize(defaults = nil) ⇒ XmlSimple constructor

    Creates and initializes a new XmlSimple object.
    #xml_in(string = nil, options = nil) ⇒ Object

    Converts an XML document in the same way as the Perl module XML::Simple.
    #xml_out(ref, options = nil) ⇒ Object

    Converts a data structure into an XML document.


The documentation tells you that this object is extremely simple - basically, it can do two things:  read XML in, and write XML out.   Interestingly, it also tells you that the object has both Class methods, and Instance methods, and that these methods (xml_in and xml_out) are identical.  

That means that:



In [None]:
require 'xmlsimple'

simple = XmlSimple.new  # create an instance of XmlSimple
data1 = simple.xml_in("<xml>hello1</xml>")  # call the instance xml_in method

# is effectively the same as 

data2 = XmlSimple.xml_in("<xml>hello2</xml>")  # call the class xml_in method

puts data1
puts data2




# Let's get some interesting XML data


surf to:  http://rest.ensembl.org  (https://academic.oup.com/bioinformatics/article/31/1/143/2366240)

This is another API into the EnsEMBL database.  Like DB Fetch, it provides predictably structured URLs for access to the data in EnsEMBL (these ones are somewhat "cleaner" than DB Fetch, but DB Fetch can access things that this API cannot)

Scroll down to the "Ontologies and Taxonomy" section.  

Click on "taxonomy/id/:id"

The documentation tells you that this will retrieve the taxonomy information for a given species.  Their examples are human (taxon:9606).  Arabidopsis is taxon:3701.

We want to know the names of all Arabodiopsis species.  We will retrieve the taxon information for Arabidopsis, and using our new XmlSimple package, we will parse the data in xml format:

http://rest.ensembl.org/taxonomy/id/3701?content-type=text/xml



In [None]:
require 'net/http'   # this is how you access the Web
require 'xmlsimple'
#require 'pp'

address = URI('http://rest.ensembl.org/taxonomy/id/3701?content-type=text/xml')  # create a "URI" object (Uniform Resource Identifier: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier)
response = Net::HTTP.get_response(address)  # use the Net::HTTP object "get_response" method
                                               # to call that address
#puts response.body

# http://ruby-doc.org/core-1.9.3/String.html#method-i-gsub 
cleaned_body = response.body.gsub(/<(\/?)(\w+)\s(\w+)>/, '<\1\2\3>')
cleaned_body.gsub!(/<(\/?)(\w+)\s(\w+)\s(\w+)>/, '<\1\2\3\4>')

data = XmlSimple.xml_in(cleaned_body)
for child in data["data"][0]["children"].each 
  puts child["name"]
end
  
  



## Prove that you understand by using a different Web resource

Use the Gene Ontology again.  Find the Ruby Gem that handles Gene Ontology (GO) files.

Reading the documentation, you see that it reads GO from a file, so you will need to create that data file.  Jupyter has specific locations for data files (see the documentation here:  http://jupyter.readthedocs.io/en/latest/projects/jupyter-directories.html)

 1. In your code, retrieve the GO Slim Plant Ontology:
http://www.geneontology.org/ontology/subsets/goslim_plant.obo
and write it to a file

    File.open('geneontology.obo', 'w') do |myfile|  # w makes it writable
      myfile.puts geneontologycontent  
    end  


now read that file into your code5. (~easy) In your code, parse that file, and for a GO identifier (e.g. GO:0006950) print the GO Term name to the screen (e.g. “response to stress”) (there are MANY different solutions for this!  All of them are regular expressions...)

 6. (hard) Add “GO_Annotation” attribute to your AnnotatedGene Class (array of strings), then:
 
For every gene in the gene_information.tsv file, 
* Retrieve the UniProt Record
* Retrieve the GO annotations (GO_NNNNNN) from the UniProt record
* Retreive the GO Term name from UniProt, and Term definition from the goslim_plant.obo file
* Add those annotations to your AnnotatedGene object (think about this…..)
* create a report like:


<code>Gene_ID: 
Term1 (def)
Term2 (def)
Term3 (def)
</code>

In [34]:
require 'net/http'   # this is how you access the Web
require 'gene_ontology'  # the gem for gene ontology obo files

puts Dir.pwd   # this is how you discover what folder you are in...

def fetch(uri_str)  # this "fetch" routine does some basic error-handling.  

  address = URI(uri_str)  # create a "URI" object (Uniform Resource Identifier: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier)
  response = Net::HTTP.get_response(address)  # use the Net::HTTP object "get_response" method
                                               # to call that address

  case response   # the "case" block allows you to test various conditions... it is like an "if", but cleaner!
    when Net::HTTPSuccess then  # when response is of type Net::HTTPSuccess
      # successful retrieval of web page
      return response  # return that response object
    else
      raise Exception, "Something went wrong... the call to #{uri_str} failed; type #{response.class}"
      # note - if you want to learn more about Exceptions, and error-handling
      # read this page:  http://rubylearning.com/satishtalim/ruby_exceptions.html  
      # you can capture the Exception and do something useful with it!
      response = False
      return response  # now we are returning False
    end 
end
    

    
res = fetch('http://www.geneontology.org/ontology/subsets/goslim_plant.obo');
  
if res  # res is either the response object, or False, so you can test it with 'if'
  body = res.body  # get the "body" of the response
  #puts body
  # Create a new file and write to it  
  File.open('geneontology.obo', 'w') do |myfile|  
  # use "\n" for two lines of text  
    myfile.puts body 
  end  
end

  
go = GeneOntology.new.from_file("geneontology.obo")
#puts go.header # => a GeneOntology::Header object
#puts go.id_to_term # => a hash from GO id to the GeneOntology::Term
  
term = go.id_to_term['GO:0005634']   # GO 5634 is the term for "nucleus"

#puts term.public_methods  # all of the methods that a term can respond to.  Two of these are "id" and "xref"
puts "cross-references for #{term.id} are #{term.xref}"


/home/osboxes/UPM_BioinfoCourse/Lectures
cross-references for GO:0005634 are ["NIF_Subcellular:sao1702920020", "Wikipedia:Cell_nucleus"]


# learn something new in Ruby --> Blocks

Look at the documentation for the "each" method:

--------------

### Instance Method Details

__permalink #each(&block) ⇒ Object__

starting with that term, traverses upwards in the tree

----------------

so if you call term.each it will go up the Gene Ontology tree until it gets to "root" (depending on the tree, this will be "biological proces", "molecular function", or "cellular component").... but what does it do??

This is what a "&block" is.  It gives you the chance to tell the Object what __you want__ it to do!

&block is, therefore, a piece of code that you provide to the method, where the object sends information into your block of code.

For example:



In [51]:
  
go = GeneOntology.new.from_file("geneontology.obo")

term = go.id_to_term['GO:0003676']  # "nucleotide binding"


# There are two ways to pass a block of code.  You can do it all on one line:

term.each {|thisterm| puts thisterm.name}


puts ""; puts""

# or you can do it on multiple lines as follows
term.each do |thisterm| 
  puts "The term #{thisterm.name} is at level #{thisterm.level} of the ontology"
end



nucleic acid binding
binding
molecular_function


The term nucleic acid binding is at level "Interacting selectively and non-covalently with any nucleic acid." [GOC:jl] of the ontology
The term binding is at level "The selective, non-covalent, often stoichiometric, interaction of a molecule with one or more specific sites on another molecule." [GOC:ceb, GOC:mah, ISBN:0198506732] of the ontology
The term molecular_function is at level "The actions of a single gene product or complex at the molecular level consisting of a single biochemical activity or multiple causally linked biochemical activities. A given gene product may exhibit one or more molecular functions." [GOC:go_curators] of the ontology


[<[1]GO:0005488: binding is_a.size=1>]

## note that this is an example of "method overriding"

You already know what ".each" does on list objects...



In [41]:
[1,2,3,4].each do |number|
  puts "#{number} plus 1 equals #{number + 1}"
end


number 1 plus 1 equals 2
number 2 plus 1 equals 3
number 3 plus 1 equals 4
number 4 plus 1 equals 5


[1, 2, 3, 4]

The author of the GeneOntology object wanted to provide exactly the same functionality, but could not use the native .each method of the object, because that would list __everything__, not just the parent classes.  So the author implemented their own ".each" method, which takes a block, and then traverses along the ontology tree.  You can see the code in 

    /home/osboxes/.rvm/gems/ruby-2.4.2/gems/gene_ontology-0.0.1/lib/gene_ontology.rb

<code>
    def each(&block)
      block.call(self)
      is_a.each do |term|
        term.each(&block)
      end
    end

</code>

So now their code functions exactly like .each does on a list, but the list is generated by calling other methods (is_a) that traverse up the ontology tree.

That's quite cool!

<pre>


</pre>


# Prove you understand

Create code that will search for the term "receptor activity", then it reports the GO number, GO Term, and the definition (def) for each of the parent terms.

<pre>
  


</pre>
# Documentation

The documentation provided on the rubygems website is contained inside of the Gems.  The authors provide documentation in either RDoc format, or YARD format.  We are going to look at YARD.

Good documentation is __critical__ if you write code for others to use!  

__NOTE:  I will now start including the quality of documentation in my evaluation of your assignments!!!__


YARD is explained on their website:  https://yardoc.org/guides/index.html

We will begin with a simple Patient.rb class:



In [52]:

class Patient

  attr_accessor :name
  attr_accessor :age 
  
  def initialize (thisname = "Some Person", thisage = "10") 
      @name = thisname 
      @age = 10
  end
  
end




:initialize

This class has three methods:  "initialize", (called with Patient.new()),  "name" and "age"

We should document the behavior of these.   This is how it looks when we document the object using YARD tags


In [None]:
# == Patient
#
# This is a simple representation of a patient
# with name and age attributes
#
# == Summary
# 
# This can be used to represent aspects of sick people
#

class Patient

  # Get/Set the patient's name
  # @!attribute [rw]
  # @return [String] The name
  attr_accessor :name

  # Get/Set the patient's age
  # @!attribute [rw]
  # @return [Integer] The age
  attr_accessor :age 
  
  # Create a new instance of Patient

  # @param name [String] the name of the patient as a String
  # @param age [Integer] the age of the patient as a Integer
  # @return [Patient] an instance of Patient
  def initialize (name = "Some Person", age = 10) 
      @name = name 
      @age = age
  end
end


# Generating YARD documentation

open a terminal window.  Browse to /home/osboxes/UPM_BioinfoCourse/Lectures

Edit the Patient.rb file to include the YARD documentation above, then save it.

Now, in your terminal type:

    $  yard doc Patient.rb
    
You get a short report about how many things were documented.  All of the documentation is in a new folder called "doc"

Staying at the command prompt, type:

    $ firefox ./doc/index.html
    
There is your documentation!  



# try it yourself

Explore the documentation for yourself.  look at the http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md  website and try some other tags and markup.  Also look-up RDoc, since the yard documentation tool can understand RDoc instructions also.
    