Skip to content

Commit

Permalink
partial update to handbook
Browse files Browse the repository at this point in the history
  • Loading branch information
alexrodba committed Jul 8, 2021
1 parent 8bcfe12 commit 49b2966
Show file tree
Hide file tree
Showing 21 changed files with 251 additions and 184 deletions.
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

We welcome any suggestion, and we encourage discussions.

This is an opinionated resource. In order to change these guides a vast majority of the MarsBased developers should agree. You can open a PR directly with your suggestions or you can create an issue to discuss and collect opinions before opening it.
This is an opinionated resource. In order to change these guides, a vast majority of the MarsBased developers should agree. You can open a PR directly with your suggestions or you can create an issue to discuss and collect opinions before opening it.

Ask any question about the guides opening an issue.
Ask any questions about the guides opening an issue.

## Submitting changes

Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# The MarsBased Handbook
# The MarsBased Handbook 🚀

Welcome to __MarsBased__! This is the first thing you should read when boarding the __MarsBased__ spaceship.

Here you will find the most important information about the company to help you to get familiar with it. We also compiled some of our guides & interesting stuff you should read (and some trivia & fun stories to make it more digestible).

If you're a visitor and want to use them, feel free to use them anywhere, but we'd appreciate that you linked us back in appreciation.
If you're a visitor and want to use them, feel free to use them anywhere, but we'd appreciate it that you linked us back in appreciation.

We have decided to make this handbook publicly available so we can share as much as we can with other companies out there, potential candidates and prospective clients alike. Everyone is welcome!

# Sections

Expand Down Expand Up @@ -47,6 +49,7 @@ For now, we have the following resources available:

# Other useful resources

1. <a href="https://marsbased.com" title="MarsBased website" target="_blank">MarsBased website</a>
1. <a href="https://marsbased.com/blog" title="MarsBased blog" target="_blank">MarsBased blog</a>
1. <a href="https://pod.link/1516103872" title="Life on Mars - The MarsBased Podcast, English Edition" target="_blank">Life on Mars - The MarsBased Podcast, English Edition</a>
1. <a href="https://pod.link/1516103872" title="Life on Mars - The MarsBased Podcast, Spanish Edition" target="_blank">Life on Mars - The MarsBased Podcast, Spanish Edition</a>
Expand Down
64 changes: 43 additions & 21 deletions guides/development/activerecord-guide.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# ActiveRecord guide

## Retrieving single records
There are three ways to retrieve a single record matching a certain criteria. The method to use depends on whether we want it to raise an exception if not found or if we want to find by the primary key or other attributes.
There are three ways to retrieve a single record matching certain criteria. The method to use depends on whether we want it to raise an exception if not found or if we want to find by the primary key or other attributes.

### Retrieve by primary key
Retrieving a record by primary key is the most common scenario. We generally do that in controller methods and background jobs.
Expand All @@ -12,7 +12,7 @@ If we want the retrieval to **raise an exception** we use `find`. Example:
Post.find(params[:id]) # Raises ActiveRecord::NotFound error if the record does not exist
```

We need to be careful when using this form as it can make a request crash. Normally we use this form in controller methods because we want the request to return a 404 Not Found error and in background jobs to retry the job. In any other case we should evaluate if it's really a good idea to raise an exception.
We need to be careful when using this form as it can make a request crash. Normally, we use this form in controller methods because we want the request to return a 404 Not Found error and in background jobs to retry the job. In any other case, we should evaluate if it's really a good idea to raise an exception.

If we want the retrieval to **not raise an exception** we use `find_by`. Example:

Expand All @@ -23,10 +23,12 @@ Post.find_by(id: params[:id]) # Returns nil if the record does not exist
This is useful to handle cases when a record may or may not exist.

#### Performance
In any of the forms, since we are making a search by ID the query will always be fast (as long as the index on the primary key has not deliberately been deleted). So the performance of a query like this is always very good.
In any of the forms, since we are searching by ID the query will always be fast, as long as the index on the primary key has not deliberately been deleted. The performance of a this kind of queries is always very good.

### Retrieve by other attributes
Retrieving a record by a non-primary key attribute is quite common. We may want to find a post by a slug, author, tag, etc. When more than one record match the condition, only the first one is returned (in general, it will be the most recent record).
Retrieving a record by a non-primary key attribute is quite common.

We may want to find a post by a slug, author, tag, etc. When more than one record matches the condition, only the first one is returned (in general, it will be the most recent record).

If we want the retrieval to **raise an exception** we use `find_by!`. Example:

Expand All @@ -43,11 +45,17 @@ Post.find_by(slug: params[:slug]) # Returns nil if the record does not exist
```

#### Performance
By default a query on a table by a condition needs to scan **all records** on the table in order to filter them. This is OK for small tables like settings-like tables that have the order of 100 records, but it starts to get slower with more than that.

In order to make these queries fast we need to have **an index** on the combination of attributes we want to retrive by or at least some of them. The best way to know if an index works for a query is to `EXPLAIN` the query and check if it uses the index or not.
By default, a query on a table by a condition needs to scan **all records** on the table in order to filter them.

This is OK for small tables like settings-like tables that have the order of 100 records, but it starts to get slower when it gets to bigger orders of magnitude.

To make these queries fast, we need to have **an index** on the combination of attributes we want to retrieve by or at least some of them.

The best way to know if an index works for a query is to `EXPLAIN` the query and check if it uses the index or not.

## Retrieving multiple records

When retrieving a list of records from the database the **most important** thing to always take into account is how many records we are requesting. If we ask for too many records to the database the query will be very slow, it will consume a huge amount of memory and the treatment in Ruby will be slow too.

If not careful **a whole application can be taken down by a single query requesting too much data**.
Expand All @@ -58,9 +66,13 @@ To retrieve a list of records from the database we use `where`. Example:
Post.where(category: params[:category]).limit(50)
```

When doing a query like this in the context of a web request we need to always limit the returned results by using `limit`. Usually this is done with a pagination library. But if there is no pagination involved we still need to limit the query to a reasonable number.
When doing a query like this in the context of a web request, we need to always limit the returned results by using `limit`. Usually this is done with a pagination library. If there is no pagination involved, we still need to limit the query to a reasonable number.

When we absolutely need to retrieve a large number of records to treat them, we need to use `find_each` and ideally do it outside of a request to avoid having a high response time and a potential (highly probable) timeout.

Usually, we would do it in a background job. The `find_each` method asks for all the records to the database in batches of 1000 by default (it can be configured with the `batch_size` option, like: `find_each(batch_size: 100)`), so that the database load is kept constant and only a limited number of records are retrieved every time.

When we absolutely need to retrieve a large number of records to treat them we need to use `find_each` and ideally do it outside of a request to avoid having a high response time and a potential (highly probable) timeout. Usually we would do it in a background job. The `find_each` method asks for all the records to the database in batches of 1000 by default (it can be configured with the `batch_size` option, like: `find_each(batch_size: 100)`), so that the database load is kept constant and only a limited number of records are retrieved every time. This also controlls memory as only N number of records are kept in memory at a time.
This also controls memory as only N number of records are kept in memory at a time.

Example:

Expand All @@ -78,11 +90,11 @@ It's crucial to understand when a query is executed by Rails and how we can cont
It's very hard to list all the possible ways when this happens, but these are the most common:

* The results of the query need to be printed to stdout (when using a rails console, for example).
* The results of the query need to be shown in the page.
* The results of the query need to be shown on the page.
* An Enumerable method is called on the query to perform some treatment or transformation on the data.
* An aggregation method like `count`, `min` or `max` is called.

The best way to see when a particular expressions gets executed and how is to look at the log and see the exact query that gets sent to the database.
The best way to see when a particular expression gets executed and how is to look at the log and see the exact query that gets sent to the database.

#### Examples

Expand Down Expand Up @@ -117,13 +129,15 @@ def index
end
```

This will execute the query in order to do the `map` becauase the `map` method needs to have the values in order to transform them.
This will execute the query in order to do the `map` because the `map` method needs to have the values in order to transform them.

### Performance
The argument is similar to finding a record by a non-primary key attribute. By default a query to retrieve a list of results will need to scan through all the table. These queries can be improved by having indexes on the filtered attributes.

The argument is similar to finding a record by a non-primary key attribute. By default, a query to retrieve a list of results will need to scan through all of the table. These queries can be improved by having indexes on the filtered attributes.

## Joining tables
A common technique used to build more complex queries is to join various tables together through their associations. This allows to filter a table by an attribute of an associated table, order by that attribute, etc.

A common technique used to build more complex queries is to join various tables together through their associations. This allows filtering a table by an attribute of an associated table, order by that attribute, etc.

To join tables we use the `joins` method. Example:

Expand All @@ -138,7 +152,11 @@ Post.joins(:author).where(authors: { gender: :female })
Post.joins(:author).order(birth_date: :desc)
```

Note that inside the where and order clauses we use `authors` (plural). This is because we need to specify the name of the table instead of the association. This is also relevant when using custom association names in which the name of the association is different than the name of the table. An example:
Note that inside the where and order clauses we use `authors` (plural). This is because we need to specify the name of the table instead of the association.

This is also relevant when using custom association names in which the name of the association is different from the name of the table.

An example:

```
class User; end
Expand All @@ -149,7 +167,9 @@ class Post
end
```

Joins can traverse as many tables as we want by nesting hashes inside hashes. For example:
Joins can traverse as many tables as we want by nesting hashes inside hashes.

For example:

```
Post.joins(author: { address: :city }).where(cities: { country: 'ES' }) # Post -> Author -> Address -> City
Expand All @@ -158,10 +178,11 @@ Post.joins(author: { address: :city }).where(cities: { country: 'ES' }) # Post -
Often, though, these queries can be simplified why using `has_many through:` associations in the model which end up producing the same queries.

### How joining works at the database level

At a high level a join between table A and table B works like this:

1. It creates a new *virtual table* combining all the columns of table A and all the columns of table B.
2. For every record in table A it takes all the records in table B that match the `ON` criteria. By default this matches the primary key of table B with the association foreign key in table A. For each matched record in B it adds a new row to the *virtual table* with the values of the record in A and the values of the record in B.
2. For every record in table A, it takes all the records in table B that match the `ON` criteria. By default, this matches the primary key of table B with the association foreign key in table A. For each matched record in B, it adds a new row to the *virtual table* with the values of the record in A and the values of the record in B.

It's much easier to see with an example:

Expand Down Expand Up @@ -214,11 +235,12 @@ authors.id|authors.name|authors.gender|posts.id|posts.title|posts.author_id
```

### Handling repeated values

A common pitfall when working with joins is forgetting to call distinct to remove duplicate values.

Continuing with the example above, suppose we want to get all authors that have posts in the ruby category. We can write this query: `Author.joins(:posts).where(posts: { category: :ruby })`.

However if we iterate on the results of this query we will find that authors are duplicated **when the author has more than one post in the ruby category**. Specifically every author will appear N times, where N is the number of ruby posts of the author.
However, if we iterate on the results of this query we will find that authors are duplicated **when the author has more than one post in the ruby category**. Specifically, every author will appear N times, where N is the number of ruby posts of the author.

To remove duplicates we need to call `distinct`. Example: `Author.joins(:posts).where(posts: { category: :ruby }).distinct`

Expand Down Expand Up @@ -253,7 +275,7 @@ There are times where is not as easy as adding an includes, for example when add
end
```

In this example we want to only load published posts for each author, instead of all posts. In order to remove the N+1 query in this scenario we can define a different association in the model with a scope, like this:
In this example, we want to only load published posts for each author, instead of all posts. In order to remove the N+1 query in this scenario we can define a different association in the model with a scope, like this:

```
class Author < ApplicationRecord
Expand All @@ -275,10 +297,10 @@ Note that in the loop we need to use `@author.published_posts`. If we use `@auth

## Aggregation functions

We need to be careful when running queries that include aggregation functions because these are very hard (often impossible) to optimize by the database engine and often require a full scan of the table.
We need to be careful when running queries that include aggregation functions because these are very hard (often impossible) to optimise by the database engine and often require a full scan of the table.

Running an aggregation function, specially a count, on a big table can have a considerable negative impact on the performance of an application.
Running an aggregation function, especially a count, on a big table can have a considerable negative impact on the performance of an application.

When running aggregation queries it should always be by adding conditions that limit the scope of the results, so the full scan only needs to be done from the returned results.

Protip: Use `size` instead of `count` unless you are doing a direct count on a table. Using `count` always triggers a query while using `size` is able to use the cached values of a previous query. Example: `Post.published.size` instead of `Post.published.count`.
__Protip:__ Use `size` instead of `count` unless you are doing a direct count on a table. Using `count` always triggers a query while using `size` is able to use the cached values of a previous query. Example: `Post.published.size` instead of `Post.published.count`.
Loading

0 comments on commit 49b2966

Please sign in to comment.