Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid duplicate records in poll answers #5539

Open
wants to merge 7 commits into
base: poll_duplicate_voters
Choose a base branch
from

Conversation

javierm
Copy link
Member

@javierm javierm commented May 14, 2024

References

Objectives

Notes

Since the unique index is added on a new column, the migration will run even if there are existing duplicate answers in the database at the time of running it. We're also adding a task to remove existing duplicate answers.

Even if all duplicate answers were removed here, there's still one scenario we don't handle in this pull request (and not in #5532 either): until now, due to race conditions, it was possible to select multiple different options in a single-choice question. We aren't providing a task to delete these records, although we might do so in the future. Note this only applies to single-choice questions; mutiple-choice questions shouldn't have this issue because we added a pessimistic lock when we added multiple-choice questions.

Release notes

⚠️ We've added a new option_id field to the poll_answers table; a task that populates this field is part of the release tasks. If you've added custom votation types to your code, you might need to modify this task. Also, this task doesn't handle an edge case where the options for a question were renamed after the poll started (which isn't possible since pull request #4904) and an extremely rare case that only applies to multi-language polls and that we're not even sure is possible in practice. For either of these cases, you'll get a warning when running the task, so you can choose what to do about it (you should handle these cases, since future version will use the option_id column to calculate results and stats). The task doesn't change records which already have an option_id, so it can be run as many times as you'd like to. Also not that, if you've got custom code to save records in the poll_answers table, you'll have to make sure that, from now on, that code populates both the answer column and the option_id column (unless you've implemented open answers in your custom code; in that case, you can skip the option_id when dealing with open answers).

@javierm javierm added Polls security Pull requests that address a security vulnerability labels May 14, 2024
@javierm javierm self-assigned this May 14, 2024
@javierm javierm added this to Doing in Consul Democracy May 14, 2024
@javierm javierm force-pushed the add_option_id_to_poll_answers branch 3 times, most recently from 295ddcf to 08bd5ee Compare May 14, 2024 04:16
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from 08bd5ee to f2f9e6e Compare May 14, 2024 17:08
@javierm javierm force-pushed the poll_duplicate_voters branch 2 times, most recently from 386b39d to 845f031 Compare May 14, 2024 17:39
@javierm javierm force-pushed the add_option_id_to_poll_answers branch 3 times, most recently from c6561b3 to 0e15140 Compare May 14, 2024 18:08
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from 0e15140 to 3686f17 Compare May 14, 2024 18:34
@javierm javierm force-pushed the poll_duplicate_voters branch 2 times, most recently from 9a2a393 to ffdb562 Compare May 15, 2024 16:33
@javierm javierm force-pushed the add_option_id_to_poll_answers branch 2 times, most recently from 60014d9 to 9a004e6 Compare May 15, 2024 17:22
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from 9a004e6 to 6424024 Compare May 15, 2024 18:36
@javierm javierm marked this pull request as ready for review May 15, 2024 18:39
@javierm javierm moved this from Doing to Reviewing in Consul Democracy May 15, 2024
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from 6424024 to dab5fef Compare May 15, 2024 18:47
lib/tasks/polls.rake Outdated Show resolved Hide resolved
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from 34a4647 to bb61b36 Compare May 15, 2024 22:58
lib/tasks/polls.rake Outdated Show resolved Hide resolved
lib/tasks/polls.rake Outdated Show resolved Hide resolved
@javierm javierm force-pushed the add_option_id_to_poll_answers branch 2 times, most recently from 4243798 to 3c2920c Compare May 15, 2024 23:26
lib/tasks/polls.rake Outdated Show resolved Hide resolved
@javierm javierm force-pushed the add_option_id_to_poll_answers branch 3 times, most recently from 03d7b49 to 111cabd Compare May 16, 2024 15:45
lib/tasks/polls.rake Outdated Show resolved Hide resolved
lib/tasks/polls.rake Outdated Show resolved Hide resolved
lib/tasks/polls.rake Outdated Show resolved Hide resolved
lib/tasks/polls.rake Outdated Show resolved Hide resolved
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from 111cabd to c9afa38 Compare May 16, 2024 16:17
@javierm javierm force-pushed the add_option_id_to_poll_answers branch 2 times, most recently from 5860e9b to 98aec84 Compare May 16, 2024 20:52
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from 98aec84 to df1b744 Compare May 17, 2024 20:51
The routes for poll questions were accidentally deleted in commit
5bb831e when deleting the `:show` action, and restored in commit
9871503. However, the deleted code was:

```
resources :questions, only: [:show], controller: 'polls/questions' (...)
```

While the restored code was:

```
resources :questions, controller: 'polls/questions' (...)
```

Meaning we forgot to add the `only: []` option when restoring the
routes.

We also forgot to remove the `before_action` code when deleting the
`:show` action, so we're removing it now.
This code wasn't used since commits d9ad658 and 4955daf.
It was confusing to have the action to create an answer in
`QuestionsController#answer` while the action to destroy it was
`AnswersController#destroy`.
Until now, we've stored the text of the answer somebody replied to. The
idea was to handle the scenarios where the user voters for an option but
then that option is deleted and restored, or the texts of the options
are accidentally edited and so the option "Yes" is now "Now" and vice
versa.

However, since commit 3a6e99c, options can no longer be edited once
the poll starts, so there's no risk of the option changing once somebody
has voted.

This means we can now store the ID of the option that has been voted.
That'll also help us deal with a bug introduced int 673ec07, since
answers in different locales are not counted as the same answer. Note we
aren't dealing with this bug right now.

We're still keeping (and storing) the answer as well. There are two
reasons for that.

First, we might add an "open answer" type of questions in the future and
use this column for it.

Second, we've still got logic depending on the answer, and we need to be
careful when changing it because there are existing installations where
the answer is present but the option_id is not.

Note that we're using `dependent: nullify`. The reasoning is that, since
we're storing both the option_id and the answer text, we can still use
the answer text when removing the option. In practice, this won't matter
much, though, since we've got a validation rule that makes it impossible
to destroy options once the poll has started.

Also note we're still allowing duplicate records when the option is nil.
We need to do that until we've removed every duplicate record in the
database.
Note: to avoid confusion, "answer" will mean a row in the poll_answers
table and "choice" will mean whatever is in the "answer" column of that
table (I'm applying the same convention in the code of the task).

In order make this task perform reasonably on installations with
millions of votes, we're using `update_all` to update all the answers
with the same choice at once. In order to do that, we first need to
check the existing choices and what are the possible option_ids for
those choices.

Note that, in order for this task to work, we need to remote the
duplicate answers first. Otherwise, we will run into a RecordNotUnique
exception when trying to add the same option_id to two duplicate
answers.

So we're making this task depend on the one that removes duplicate
answers. That means we no longer need to specify the task to remove
duplicate answers in the release tasks; it will automatically be
executed when running the task to add an option_id.
@javierm javierm force-pushed the poll_duplicate_voters branch 2 times, most recently from 63b25c4 to be02718 Compare May 17, 2024 21:18
@javierm javierm force-pushed the add_option_id_to_poll_answers branch from df1b744 to 091b247 Compare May 17, 2024 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Polls Release notes security Pull requests that address a security vulnerability
Projects
Consul Democracy
  
Reviewing
Development

Successfully merging this pull request may close these issues.

None yet

1 participant