Skip to content

Development Policies

Erik Hetzner edited this page Apr 18, 2018 · 2 revisions

Development Policies

How to use this document

This document is trying to outline the major problems we face as developers on Aperta and provide solutions and guidance to combat those problems. This should be a living document which reflects our team's current understanding of best practices, so feel free to edit. If you feel like you're adding some content that may be contentious, please mark that content by using the text color red: like this. That should hopefully allow people to feel like they can start a discussion about it.

Our Problems

Our code is complex

Use the Faraday HTTP library

We have a number of other HTTP libraries available in Tahi (by historical accident). Please use Faraday for new code, and consider migrating old code you are touching to use Faraday.

It's complicated to fetch things over HTTP using multiple different libraries. Standardizing on one should reduce our code's complexity.

Use the same library everyone else is using

This reduces complexity. Only bring in something new if you have a really good reason


Data pushed over websockets adds complexity

pusher/slanger

  1. Prefer HTTP over pusher for basic functionality if possible
  2. Rely on pusher for updating content between users

Our data are inconsistent

Background: We've lately been plagued by a number of difficult to handle bugs. For example: 1 - having to do with reviewer reports 2 - having to do with attachments 3 - having to do with snapshots storing links to now-missing resource tokens.

Every single one of these bugs - which have resulted in the first case in throwing away user data, for reviewers - have been caused by inconsistent data in the database. This inconsistent data has included missing values, inconsistent de-normalized data, foreign keys to now missing rows, and more. In every case this bad data could have been prevented in the first place by more database validations and constraints. These bugs really seem to be reducing our velocity.

By using constraints and validations to reduce the number of valid combinations of data that are possible, we can both reduce the number of code paths and also reduce the possibility of errors that arise from having missing data. It's much easier to write code if we can assume that every foo has one and only one bar_id and that there exists a Bar that has that id.

Database Migrations

  1. All database migrations should include pre- and post-migration assertions (for example, table/row counts) to validate the data state and ensure that all relevant data has been migrated. I will add a check to our PR template.
  2. Enforce database constraints whenever possible. For example:
    1. If a null value doesn't make sense, we should make the column null: false
    2. If the column is a FK, you should use add_reference
    3. If certain combinations of data should be unique, e.g. there should only be row per version, paper_id combination, you should enforce a unique index

Use transactions

Use transactions to ensure that partial DB changes are rolled back if an error occurs.

Use validations

uniqueness, presence, so forth. Think about the validations we are actually expecting, and put them into code.

Respect referential integrity

If you expect a reference to always point to a data object which exists, create guards which ensure that. Database foreign keys have this built-in, we may have to get more creative when we're using jsonb (maybe this is a good reason to avoid jsonb) or when we're referring to s3 files.


Users don't know when an error occurs and continue doing things that cause more errors

Error Handling

  1. For any type of error (client-side or server-side) that impacts the user interaction with the system, an appropriate error message with a proper action that they can take (e.g. Reload your browser) should be displayed. If it’s not part of the AC, please reach out to your team’s product owner.

Complex Stories

  1. The developer and QA engineer should spend at least 30 min discussing different test cases from the user perspective with regards to the paper state, user role and card transitions.

Race conditions between text input save requests and form submit requests

background: A long time ago, we saved all text input fields in aperta on the "blur" event. That means that a user would type along in a field and nothing would save until they lost focus on that field (either by tabbing or clicking somewhere else on the page). This mostly worked well, but at some point we encountered an issue where we were sometimes failing to save the last text entry that a reviewer made in their reviewer report (APERTA-8730). In the Fullstory sessions, we could see that they were adding an answer, but we had no record in the database to go along with it. As it turns out, those users were encountering a race condition problem when submitting their reviewer report. When they'd click the "submit" button, that would also cause focus to be lost on their last answer which would cause that answer to be saved. That answer save request would be triggered at the exact same time (or at least close to it) as the request to submit the reviewer report. In general, we implement rules that prevent tasks and such from being modified after they are marked "submitted" (or "complete"), and we had one such rule for reviewer reports. So if the "submit" request was processed before the "answer save" request, it would cause the "answer save" request to fail because it was attempting to change something which was immutable due to state. Thus, we were failing to save the users last edited text input.

Save on text inputs on debounced "input" event instead of "blur" or "change" events

Our solution in APERTA-8730 was to trigger save on a debounced "input" event instead of "blur" event. This had the effect of saving after each bout of user typing in an input field rather than only saving when a user clicks out. This doesn't provide a bulletproof guard against encountering the race condition above, but pratically stopped it from happening because most humans tend to pause for at least a few hundred milliseconds between typing and clicking. It also means that we save more often as a user is actually composing a message, which will lessen the probability of lost work if they close their browser window or somehow abruptly end their editing session.

Clone this wiki locally