Title: RDBMS & ActiveRecored Dev Meeting
Author: Guy Yogev
Date: March 9, 2016
#Relational DB & ORM
RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, Postgres, sqlite...
SQL stands for Structured Query Language, which is a computer language for storing, manipulating and retrieving data stored in relational database.
A table is the most common and simplest form of data storage in a relational database.
A record, also called a row of data, is each individual entry that exists in a table.
A column is a vertical entity in a table that contains all information associated with a specific field in a table.
id | first_name | last_name | catch_phrase | |
---|---|---|---|---|
1 | johny | bravo | man I'm pretty | <- record/row |
2 | elmaer | fudge | be vewi vewi quiet | |
3 | ... | ... | ... | |
^ column |
Object-Relational Mapping, commonly referred to as its abbreviation ORM, is a technique that connects the rich objects of an application to tables in a relational database management system. Using ORM, the properties and relationships of the objects in an application can be easily stored and retrieved from a database without writing SQL statements directly and with less overall database access code. For example, ActiveRecord (at rails apps) & Ecto (at PhoenixFramework).
Migrations are a convenient way to alter your database over time in a consistent and easy way. They use Ruby code so that you don't have to edit tables by SQL.
Migrations, mighty as they may be, are not the authoritative source for your database schema. That role is in the hands of db/schema.rb, which shows the db current state.
For example, the migration
class CreateStudents < ActiveRecord::Migration
def change
create_table :students do |t|
t.string :first_name
t.string :last_name
t.integer :age
t.belongs_to :house
t.timestamps
end
end
end
will be translated to schema.rb as
create_table "students", force: :cascade do |t|
t.string "first_name"
t.string "last_name"
t.integer "age"
t.integer "house_id"
t.datetime "created_at"
t.datetime "updated_at"
end
Lets review some basic AR code, and its SQL interpretation.
#####Insert The INSERT INTO Statement is used to add new rows of data to a table in the database.
Student.create(first_name: "Johny", last_name: "Bravo", age: 25)
INSERT INTO students (first_name, last_name, age) VALUES ('Johny', 'Bravo', 25)
#####Select SELECT statement is used to fetch the data from a database table which returns data in the form of result table. These result tables are called result-sets.
Student.first
SELECT * FROM students ORDER BY id LIMIT 1
The WHERE clause is used to specify a condition while fetching the data from single table or joining with multiple tables.
Student.where('age < 30')
SELECT * FROM students WHERE age < 30
Student.where("first_name LIKE (?)", "%Har%")
SELECT students.* FROM students WHERE (first_name LIKE ('%Har%'))
You can chain AR queries. AR will analyze the code, and convert it to one or more SQL queries.
Student.where('age < 30').select(:first_name, :last_name)
SELECT first_name, last_name FROM students WHERE age < 30
An SQL JOIN clause is used to combine rows from two or more tables, based on a common field between them.
The most common type of join is INNER JOIN (simple join). An SQL INNER JOIN returns all rows from multiple tables where the join condition is met.
Say we have 2 tables Students:
id | first_name | last_name | catch_phrase | house_id |
---|---|---|---|---|
1 | johny | bravo | man I'm pretty | 1 |
2 | elmaer | fudge | be vewi vewi quiet | 2 |
3 | ... | ... | ... | |
Houses:
id | city | street | number |
---|---|---|---|
1 | NY | Broadway | 1 |
2 | NY | 5th | 4 |
3 | ... | ... | ... |
We want to get a list of [student_id, house_id]
SELECT students.*, houses.*
FROM students
INNER JOIN houses
ON students.house_id = houses.id
Note that we don't really need to pull each student & houses data into memory. We can specify which fields are required for our purpose.
More about that at the best practices
section.
SELECT students.id as student_id , houses.number as houses_number
FROM students
INNER JOIN houses
ON students.house_id = houses.id
Result:
student_id | house_number |
---|---|
1 | 1 |
2 | 4 |
3 | ... |
Relations between tables can be defined using secondary indexes.
For example, at the AR migration below, belongs_to :house
simply adds a house_id column at that student table.
class CreateStudents < ActiveRecord::Migration
def change
create_table :students do |t|
t.string :first_name
t.string :last_name
t.integer :age
t.belongs_to :house
t.timestamps
end
end
end
We also want to be able to add this student-house functionality to the Student Model:
class Student < ActiveRecord::Base
belongs_to :house
def my_house_name
house.name
end
def self.first_student_house_name
Student.first.house.name
end
end
Many-to-many relations require an intimidate tables. So if a Student can take many courses & each course has many students - we need to create such table. Note that at intimidate tables, id column is not mandatory.
class CreateCoursesStudents < ActiveRecord::Migration
def change
create_table :courses_students do |t|
t.belongs_to :student, index: true
t.belongs_to :course, index: true
end
end
end
Now we can add functionality to our models:
class Student < ActiveRecord::Base
belongs_to :house
has_and_belongs_to_many :courses
def first_course
courses.first
end
end
class Course < ActiveRecord::Base
has_and_belongs_to_many :students
def sort_students_by_id
students.sort
end
end
By default tables are created with an primary id
integer column. The value of this this filed is determined by the table sequence
.
This is what makes sure ids are unique and not null, as primary keys requires.
Some DB's allows us to create a costume sequence. Assuming we use Postgres, we want models Potion & Charms that share an unique magical_id, and use that as our primary key.
First we'll define the migrations
class CreateMagicalSequence < ActiveRecord::Migration
def change
execute "CREATE SEQUENCE magical_seq INCREMENT BY 1 START WITH 1000"
end
end
class CreatePotions < ActiveRecord::Migration
def change
execute "CREATE TABLE potions(magical_id INTEGER DEFAULT NEXTVAL('magical_seq'), name VARCHAR(32));"
end
end
class CreateCharms < ActiveRecord::Migration
def change
execute "CREATE TABLE charms(magical_id INTEGER DEFAULT NEXTVAL('magical_seq'), name VARCHAR(32));"
end
end
Then set AR models to use 'magical_id' as primary key.
class Potion < ActiveRecord::Base
self.primary_key = 'magical_id'
end
class Charm < ActiveRecord::Base
self.primary_key = 'magical_id'
end
Now we can run something like
Potion.create(name: 'p1').magical_id
# => 1000
Charm.create(name: 'c1').magical_id
# => 1001
Charm.create(name: 'c1').magical_id
# => 1002
Potion.create(name: 'p2').magical_id
# => 1003
ActiveRecord has a lot of built-in functionality we can use.
Course.ids
Student.count
House.exists?
AR will convert different queries to the same SQL. It's easier to follow when we use the same patterns everywhere.
House.where(name: 'house_name').first # bad
House.find_by_name('house_name') # bad
House.find_by(name: 'house_name') # good
SELECT houses.* FROM houses WHERE houses.name = 'house_name' LIMIT 1
Use arrays at queries.
# bad
s1 = Student.find(1)
s2 = Student.find(5)
s3 = Student.find(23)
# good
Student.where(id: [1, 5, 23])
SELECT students.* FROM students WHERE students.id IN (1, 5, 23)
Use find_or_create
emails = %w(mama@email.com papa.email.com)
# not so good
emails.each do |e|
u = User.find_by_email(e)
unless u User.create(email: e)
end
# better
emails.each {|e| User.find_or_create_by(email: e)}
Each time you insert data to a table there are a few things going on. Creating a connection to the DB is an expensive action. Not only that, chances are the DB is on a remote system somewhere, so we need to think about network overhead too.
AR has some built-in bulk actions such as update_all, destroy_all...
def add
i = Student.count
Student.create(first_name: "fname_#{i}", last_name: "lname_#{i}", age: AGE_RANGE.sample)
end
# VERY BAD!
def bulk_insert(n)
(1..n).each { add }
end
# better...
def bulk_insert(n)
c = Student.count
ActiveRecord::Base.transaction do
(c + 1.. c + n).each { |i| Student.create(first_name: "fname_#{i}", last_name: "lname_#{i}", age: AGE_RANGE.sample) }
end
end
# Even Better!
def bulk_insert(n)
c = Student.count
values = (c + 1..c + n).map { |i| "('fname_#{i}', 'lname_#{i}', #{AGE_RANGE.sample})" }.join(', ')
sql = "INSERT INTO students (first_name, last_name, age) VALUES #{values}"
ActiveRecord::Base.connection.execute(sql)
end
Lets consider this code, what is the time complexity?
students = Student.where("first_name LIKE (?)", "%H%")
students.each {|s| s.house}
Now look at the console logs:
2.3.0 :004 > students = Student.where("first_name LIKE (?)", "%H%")
Student Load (0.5ms) SELECT "students".* FROM "students" WHERE (first_name LIKE ('%H%'))
2.3.0 :005 > students.each {|s| s.house}
House Load (0.1ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 1]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 1]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 3]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 3]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 3]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 1]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 3]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 4]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 2]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 1]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 2]]
House Load (0.0ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" = ? LIMIT 1 [["id", 3]]
Smells Bad aha?
Eager loading is AR mechanism for loading the associated records of the objects returned by Model.find using as few queries as possible.
Lets see what happens when we use the includes method
2.3.0 :006 > students = Student.includes(:house).where("first_name LIKE (?)", "%H%")
Student Load (0.5ms) SELECT "students".* FROM "students" WHERE (first_name LIKE ('%H%'))
House Load (0.2ms) SELECT "houses".* FROM "houses" WHERE "houses"."id" IN (1, 3, 4, 2)
This code will send only 2 queries to the db. AR loaded the needed Houses to memory too. Running the students.each
loop will not access the db.
You can also reduce time complexity by covering pure ruby logic down to db queries. For example say we want to get tuples of {student_first_name, student_last_name, course_name} for each student & his courses.
def students_names_and_courses
Student.all.map do |s|
res = []
s.courses.each do |c| res.push({
first_name: s.first_name,
last_name: s.last_name,
course_name: c.name
})
end
res
end
end
We can do that by a single SQL query
SELECT students.first_name as first_name, students.last_name as last_name, courses.name as course_name
FROM courses_students
JOIN courses ON courses_students.course_id = courses.id
JOIN students ON courses_students.student_id = students .id
Every time we make a query, we get back an ActiveRecord::Relation object. Sometimes those objects can be pretty big. Use select
to make then thin, or even better, use pluck!
Say we wanna do some calculations on our students ages
# bad - keeping unnecessary data in memory
ages = []
students = Student.all
students.each {|s| arr.push(s.age)}
do_some_calcs_on(ages)
# better, but still bad - minimize the object size.
ages = []
Student.all.select(:age)
students.each {|s| arr.push(s.age)}
do_some_calcs_on(s.pluck(ages)
# better - don't keep the object at memory at all!
ages = Student.pluck(:age)
do_some_calcs_on(ages)
You can also use limit
to get the same effect. Say we want to get our 3 oldest students.
# bad - doing the calculations in memory
students = Student.all.sort_by{|s| s.age}.take 3
# => SELECT "students".* FROM "students" ...
# better - let the DB do the work
Student.where('age > 17').order(:age).limit 3
# => SELECT "students".* FROM "students" WHERE (age > 17) ORDER BY "students"."age" ASC LIMIT 3 ...
####Resources