In [1]:
require 'csv'
require 'benchmark'

true

# [SEQUEL](https://sequel.jeremyevans.net/documentation.html) 

SEQUEL is an Object Relational Mapping (ORM) for Ruby, it allows to connect to various DBMS (e.g., Postgres, SQLite3, MySQL, and so on).

In [2]:
require 'sequel'

true

In [3]:
PG_DB = Sequel.connect(adapter: 'postgres', 
                    host: 'postgres', 
                    user: 'postgres',
                    password: 'mysecretpassword')

#<Sequel::Postgres::Database: {:adapter=>"postgres", :host=>"postgres", :user=>"postgres", :password=>"mysecretpassword"}>

## [SQLITE](https://www.sqlite.org/index.html) 

It is a fast and easy barebone file-based DBMS, it is extremely reliable for testing, small computations, and fast prototyping. 

In [4]:
SI_DB = Sequel.sqlite('local.db')

#<Sequel::SQLite::Database: {:adapter=>:sqlite, :database=>"local.db"}>

Many DBMS has the in-memory feature, but the the combination of SQLITE and SEQUEL makes it stupid simple.  

In [5]:
SI_DB_INMEMORY = Sequel.sqlite

#<Sequel::SQLite::Database: {:adapter=>:sqlite}>

### Why in memory databases? 

Let us consider the following example: 

In [6]:
adrs = CSV.read('sequel_material/meddra.csv').to_a.map{|row| row[3]}; nil

In [7]:
def create_adrs_table!(db_connection)
  db_connection.create_table!(:adrs) do
    primary_key :id
    String :name
  end  
end  

:create_adrs_table!

The above code create a table called $\mathbf{adrs}$ in the database pointed by $\mathbf{db\_connection}$

SEQUEL has the following convention about $create\_table$ and some other
"database altering" methods:
<ul>
     <li>$create\_table!$ forces the creation of a new table (if a table with the same name previously exists it will be deleted);</li>
     <li>$create\_table?$ create the table if it does not exist otherwise
     such instruction is skipped;</li>
     <li>$create\_table$ create the table if it does not exist, 
         if the table exists already, it raises an error.</li>
</ul>

This deviates a little bit the Ruby standard practice for methods names. 
In Ruby we have the following standard practice:

In [8]:
h = {a: 1, b:1}

{:a=>1, :b=>1}

<ul>
     <li>$!$ MODIFY the current object;</li>
</ul>     

In [9]:
h.merge!({a:3, c:4})
h

{:a=>3, :b=>1, :c=>4}

<ul>
<li>$?$  DENOTES methods that return a boolean value WITHOUT MODIFYING the current object;</li>
</ul>          

In [10]:
h.empty?

false

In [11]:
{}.empty?

true

<ul>     
<li> some methods have their version both with  and without $!$. 
In this case the method WITHOUT $!$ COPY  the current object
and returns the copy where the  method $!$ is applied,
the original object is NOT MODIFIED. 
</li>
</ul>

In [12]:
h1 = {a: 1, b:1}

{:a=>1, :b=>1}

In [13]:
h2 = h1.merge({a:3, c:4})

{:a=>3, :b=>1, :c=>4}

In [14]:
h1

{:a=>1, :b=>1}

In [15]:
h2

{:a=>3, :b=>1, :c=>4}

WARNING: this practice holds JUST for methods that have both the version with and without $!$,
other methods may modify the object because it is in their semantics.

In [16]:
h3 = {a: 1, b:8}

{:a=>1, :b=>8}

EXAMPLE: the method delete of the Hash class delete the key from the CURRENT hash and retuns the value associated to the deleted key. 

In [17]:
h3.delete(:a) 

1

In [18]:
h3

{:b=>8}

In [19]:
create_adrs_table!(SI_DB)
create_adrs_table!(SI_DB_INMEMORY)

In [20]:
class Adr < Sequel::Model(SI_DB)
end

We are mapping the class $Adr$ to lines in the $adrs$ table, in the $SI\_DB$ database. 
There is a convenient naming convention that if no table is specified it maps 
the class to its PLURAL name. 

In [21]:
class AdrMemory < Sequel::Model(SI_DB_INMEMORY[:adrs])
end  

Otherwise we may specify the name of the table to map.

<div style="background-color: lightgreen">
Course exercise - SEQUEL 1 <br><br>
class AdrMemory < Sequel::Model(SI_DB_INMEMORY)
<br>
end  
<br>

**Which table will be associated with the above declaration?**
</div>

Now rows of the table are treated as object in Ruby, we can access their fields, create them, delete them, and so on.

In [22]:
Adr.create(name: "Study induced Headache")

#<Adr @values={:id=>1, :name=>"Study induced Headache"}>

you may access the object/row by id

In [25]:
Adr[1]

#<Adr @values={:id=>1, :name=>"Study induced Headache"}>

In [26]:
Adr.create(name: "BDSS Exercises induced Headache")

#<Adr @values={:id=>2, :name=>"BDSS Exercises induced Headache"}>

In [29]:
Adr.all.to_a

[#<Adr @values={:id=>1, :name=>"Study induced Headache"}>]

In [28]:
Adr[2].delete

#<Adr @values={:id=>2, :name=>"BDSS Exercises induced Headache"}>

In [30]:
Adr.all.to_a

[#<Adr @values={:id=>1, :name=>"Study induced Headache"}>]

WARNING  $create$ and $new$ for classes extending Sequel::Model, $create$ creates the obeject and STORES it in the table, $new$ works as a standard  constructor and create the object
in the current workspace WITHOUT STORING IT  in the table. 

NOTICE:You may create the object with $new$ and after some operation you may store it with the method $save$.

In [31]:
adr =  Adr.new(name: "BDSS Exercises induced Headache")

#<Adr @values={:name=>"BDSS Exercises induced Headache"}>

In [32]:
Adr.all.to_a

[#<Adr @values={:id=>1, :name=>"Study induced Headache"}>]

In [33]:
adr.save

#<Adr @values={:id=>3, :name=>"BDSS Exercises induced Headache"}>

In [34]:
Adr.all.to_a

[#<Adr @values={:id=>1, :name=>"Study induced Headache"}>, #<Adr @values={:id=>3, :name=>"BDSS Exercises induced Headache"}>]

Going back to why  In-memory database are useful, let us consider the following example:

How many adrs are we going to load?

In [35]:
adrs.size

23088

In [36]:
time_storage = Benchmark.measure{adrs.each{|adr| Adr.create(name: adr)}}

#<Benchmark::Tms:0x000055a6c3962548 @label="", @real=226.3676983410005, @cstime=0.0, @cutime=0.0, @stime=19.772597, @utime=17.01705, @total=36.789647>

In [37]:
time_memory = Benchmark.measure{adrs.each{|adr| AdrMemory.create(name: adr)}}

#<Benchmark::Tms:0x000055a6c380c2c0 @label="", @real=3.8225148159981472, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=3.8225729999999984, @total=3.8225729999999984>

In memory database allows us to  access the data in a fastest way for heavy computations while retaining the the consistency and the operation of a relational database.

## The ORM Layer

In [69]:
def create_reports_table!(db)
  db.create_table!(:reports) do
    primary_key :id
    Date :date
  end  
end  

def create_reported_adrs_table!(db)
  db.create_table!(:reported_adrs) do
    primary_key :id
    Integer :report_id
    Integer :adr_id
  end  
end  

def create_adrs_table!(db)
  db.create_table!(:adrs) do
    primary_key :id
    String :meddra_name
  end
end 

def create_drugs_table!(db)
  db.create_table!(:drugs) do
    primary_key :id
    String :name
    String :atc
  end
end 

def create_treatments_table!(db)
  db.create_table!(:treatments) do
    primary_key :id
    Integer :report_id
    Integer :drug_id
  end
end 

:create_treatments_table!

In [39]:
create_reports_table!(PG_DB)
create_reported_adrs_table!(PG_DB)
create_treatments_table!(PG_DB)
create_adrs_table!(PG_DB)
create_drugs_table!(PG_DB)

In [42]:
Object.send(:remove_const, :Adr) if defined?(Adr)

Adr

In [43]:
class Adr < Sequel::Model(PG_DB)
end 
class Report < Sequel::Model(PG_DB)
  many_to_many :adrs, join_table: :reported_adrs
  #,left_key: :report_id,
  #right_key: :adr_id
end  
class Drug < Sequel::Model(PG_DB)
end   

In [44]:
Adr.create(meddra_name: "Headache")

#<Adr @values={:id=>1, :meddra_name=>"Headache"}>

In [45]:
Report.create(date: Date.today)

#<Report @values={:id=>1, :date=>#<Date: 2020-03-23 ((2458932j,0s,0n),+0s,2299161j)>}>

In [46]:
Report[1].add_adr(Adr[1])

#<Adr @values={:id=>1, :meddra_name=>"Headache"}>

In [50]:
Report[1].adrs

[#<Adr @values={:id=>1, :meddra_name=>"Headache"}>, #<Adr @values={:id=>2, :meddra_name=>"Astenia"}>]

In [48]:
Adr.create(meddra_name: "Astenia")

#<Adr @values={:id=>2, :meddra_name=>"Astenia"}>

In [49]:
Report[1].add_adr(Adr[2])

#<Adr @values={:id=>2, :meddra_name=>"Astenia"}>

WARNING: it may be the case that reloading some class will cause a class mismatch error, to avoid restarting everything just use the folllowing method  that remove the ENTIRE class and, then, 
 reload the ENTIRE Class.

In [None]:
Object.send(:remove_const, :Report)

Ruby is a (very) dynamic language if you avoid its pitfalls programming even the more complex things 
may become very easy and elegant. For instance you may define add (o override methods) of a class dynamically 
in your code.

For example let us define the "entropy" method for the ruby Array class.
Given a multiset $M=<a_1,\ldots, a_n>$ on the element set $\{\overline{a}_1, \ldots, \overline{a}_m\}$ we define the 
entropy of $M$ as
$$  Entropy(M) = - \sum_{i=0}^m\left( \frac{|\{\ j: a_j = \overline{a}_i  \}|}{n}\log_2\frac{|\{\ j: a_j = \overline{a}_i  \}|}{n} \right) $$

In [51]:
class Array
  def entropy
    h = {}
    self.each{|item| h[item].nil? ? h[item] = 1 : h[item] += 1 }
    r = 0
    h.each{|_,v| r += (v.to_f/self.size) *  Math.log2(v.to_f/self.size) }
    (-1) * r
  end  
end  

:entropy

In [52]:
[ "a","a", "c", "h"].entropy

1.5

WARNING: if you know what you are doing you may redefine even existing methods but this is not advisable if you don't 
have the complete control over the class (e.g., for Array class redefining == would be risky).
All the current methods of a class may be retrieved with the following
method:

In [53]:
Array.methods

[:try_convert, :[], :new, :json_creatable?, :allocate, :superclass, :<=>, :<=, :>=, :==, :===, :autoload?, :autoload, :included_modules, :include?, :name, :ancestors, :attr, :attr_reader, :attr_writer, :attr_accessor, :instance_methods, :public_instance_methods, :protected_instance_methods, :private_instance_methods, :constants, :const_get, :const_set, :const_defined?, :class_variables, :remove_class_variable, :class_variable_get, :class_variable_set, :class_variable_defined?, :public_constant, :freeze, :inspect, :deprecate_constant, :private_constant, :const_missing, :singleton_class?, :prepend, :class_exec, :module_eval, :class_eval, :include, :<, :>, :remove_method, :undef_method, :alias_method, :protected_method_defined?, :module_exec, :method_defined?, :public_method_defined?, :to_s, :public_class_method, :public_instance_method, :define_method, :private_method_defined?, :private_class_method, :instance_method, :to_json, :instance_variable_set, :instance_variable_defined?, :remove

<div style="background-color: lightgreen">
Course exercise - SEQUEL 2 <br><br>
Complete (and test) the following method:
</div>

In [40]:
class Array
  def <(a)
    min_size = [self.size, a.size].min
    (0..min_size-1).each{ |i| 
      if self[i] < a[i] then return true end
      if self[i] > a[i] then return false end
    }
    return self.size < a.size
  end  
end  


:<

In [41]:
a = ['a', 'b', 'c']
b = ['a', 'b', 'c', 'd']
puts (a < b)

true


<div style="background-color: lightgreen">
where $<$ is the lexicographical order between arrays
</div>

In [76]:
class Report < Sequel::Model(PG_DB)
  many_to_many :adrs, join_table: :reported_adrs
  many_to_many :drugs, join_table: :treatments
end

#<Sequel::Model::Associations::ManyToManyAssociationReflection Report.many_to_many :drugs, :join_table=>:treatments>

In [57]:
class ReportedAdr < Sequel::Model(PG_DB) 
  many_to_one :report                     
  many_to_one :adr
end

#<Sequel::Model::Associations::ManyToOneAssociationReflection ReportedAdr.many_to_one :adr>

In [58]:
class Adr < Sequel::Model(PG_DB) 
end  

 "ReportedAdr" is an example of "camel notation"  where spaces are replaced by uppercase letters, SEQUEL by convention
 will look, if not specified otherwise, for the table "reported_adrs" in the database.

"many_to_one :report" "many_to_one :adr" establish entity-relation asssociation between tables and generate the 
associated methods without extra code.

In [59]:
r = Report.create(date: Date.today)

#<Report @values={:id=>2, :date=>#<Date: 2020-03-23 ((2458932j,0s,0n),+0s,2299161j)>}>

In [60]:
ra = ReportedAdr.create

#<ReportedAdr @values={:id=>3, :report_id=>nil, :adr_id=>nil}>

In [61]:
ra.report = r

#<Report @values={:id=>2, :date=>#<Date: 2020-03-23 ((2458932j,0s,0n),+0s,2299161j)>}>

In [62]:
adr  = Adr.all.sample 

#<Adr @values={:id=>1, :meddra_name=>"Headache"}>

In [63]:
ra.adr = adr

#<Adr @values={:id=>1, :meddra_name=>"Headache"}>

<div style="background-color: lightgreen">
Course exercise - SEQUEL 3 <br><br>
Complete the class declaration according to the schema ADR proposed in the past lectures.
<img src="sequel_material/adr_schema.png"/>
</div>

<div style="background-color: lightgreen">
Course exercise - SEQUEL 4 <br><br>
Rewrite the prr calculation of 
Course Exercises ETL 1, ETL 2, ETL 3 codes using sequel methods.
</div>

In [64]:
Report[1].adrs

[#<Adr @values={:id=>1, :meddra_name=>"Headache"}>, #<Adr @values={:id=>2, :meddra_name=>"Astenia"}>]

In [71]:
class Drug < Sequel::Model(PG_DB)
  many_to_many :reports,  join_table: :treatments
end  

#<Sequel::Model::Associations::ManyToManyAssociationReflection Drug.many_to_many :reports, :join_table=>:treatments>

In [72]:
Drug.create(name: "Paracetamol", atc: "N02BE01")

#<Drug @values={:id=>1, :name=>"Paracetamol", :atc=>"N02BE01"}>

In [77]:
Report[1].drugs

[]

In [78]:
Drug.create(name: "ibuprofen", atc: "M01AE01")

#<Drug @values={:id=>2, :name=>"ibuprofen", :atc=>"M01AE01"}>

In [79]:
Report[1].add_drug(Drug[1])

#<Drug @values={:id=>1, :name=>"Paracetamol", :atc=>"N02BE01"}>

In [81]:
Report[1].drugs

[#<Drug @values={:id=>1, :name=>"Paracetamol", :atc=>"N02BE01"}>]

In [82]:
Drug[2].add_report(Report[1])

#<Report @values={:id=>1, :date=>#<Date: 2020-03-23 ((2458932j,0s,0n),+0s,2299161j)>}>

In [83]:
Report[1].drugs

[#<Drug @values={:id=>1, :name=>"Paracetamol", :atc=>"N02BE01"}>, #<Drug @values={:id=>2, :name=>"ibuprofen", :atc=>"M01AE01"}>]

Python Alternative for OBJECT RELATIONAL MAPPING (ORM): <a href="https://www.sqlalchemy.org/">SQLAlchemy</a>