-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update requeue task to requeue by document type #1009
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me, just had one comment.
Have you tried the task on integration?
lib/tasks/queue.rake
Outdated
desc "Add published editions to the message queue by document type" | ||
task :requeue_document_type, [:document_type] => :environment do |_, args| | ||
document_type = args[:document_type] | ||
raise ValueError("expecting document_type") if document_type.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do unless document_type.present?
because it catches empty strings too.
I've only tried it on dev so far, but will test the branch on integration before merging as I also want to add in a new queue on the consumer side. |
0c84230
to
5a7d087
Compare
lib/requeue_content.rb
Outdated
queue_payload = Presenters::EditionPresenter.new( | ||
edition, draft: false, | ||
).for_message_queue(version) | ||
attr_reader :scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be one line, attr_reader :scope, :version
.
@thomasleese I think there might be another problem, which is that when binding queues to the exchange, we can't easily distinguish between a routing key of I'd like to route these messages to a separate transient queue, rather than having our existing queue handle them, and this would get in the way of that. My suggestion is to change the routing key to 3 words instead of 2: |
5a7d087
to
f10635e
Compare
@MatMoore I would say there's nothing wrong with a three word routing key for the moment. In the future we're planning to do some clean up on the naming of the routing keys so they're more consistent (https://trello.com/c/OqyPl7pc/534-stop-messagequeueeventtype-from-using-update-type), do you think there is anything we should be aware of now in preparation for that? Perhaps when we get round to that, the puppet module will support multiple bindings per queue. |
@thomasleese Cool, I'll proceed with the 3 word key then for now. Looking at the proposal in the card though - should I be distinguishing between A couple of thoughts on the proposed renames: Rummager now needs to subscribe to the set of { So if we have Multiple bindings would also do the job, if supported, but wildcard matching is simpler and we can definitely do it. |
@MatMoore I would go for I'll copy your notes into the card on renaming the routing keys as that's useful information to have, thanks! |
This task was added when Finding Things team migrated all the links to the publishing api, to sync everything back to Rummager. We now need something similar to synchronise all existing content, so that Rummager can be a direct downstream consumer of publishing api. This includes a few implementation changes: 1. Use non persistent messages, because we want requeues to be as fast and low overhead as possible. Requeued messages are less important than normal publishing messages and should be abandoned if RabbitMQ goes down. 2. Use a separate routing key - `*.bulk.reindex` - to indicate the intent (see removed comment). This allows us to define a separate non-durable queue for these messages in Rummager; currently the `rummager_govuk_index` receives everything and is set up as `durable` to survive server restarts. 3. Filter on `content_store` instead of `state`, because we care about unpublished (especially withdrawn) editions as well as published. 4. Filter by document type for now, as we don't need the whole lot yet. We may add a 'requeue everything' task back in later.
f10635e
to
acb9f16
Compare
@thomasleese Cool, I've made that change. |
# because we don't want to send additional email alerts to users. | ||
service.send_message( | ||
queue_payload, | ||
routing_key: "#{edition.schema_name}.bulk.reindex", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if it's possible to do event_type: "bulk.reindex"
here? AFAIK it's just string interpolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so but I thought this was less confusing as you can see what the key looks like all in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see what you mean. It's annoying having to include the duplication of edition.schema_name
though. It's up to you, I'll approve the PR since it's not major.
In RabbitMQ binding keys the wildcard `#` matches any number of dot separated words, whereas `*.*` ensures it's two words. This allows us to keep matching `schema.major`, `schema.minor`, `schema.links`, `schema.unpublish`, while not matching reindexing messages (alphagov/publishing-api#1009) I'm going to set up a separate queue to handle these messages. Note that the puppet module for rabbitmq is not smart enough to update bindings for an existing queue when the binding key changes, so this will need to be manually changed after deployment.
This task was added when Finding Things team migrated all the links to the
publishing api, to sync everything back to Rummager.
We now need something similar to synchronise all existing content (not just links), so that
Rummager can be a direct downstream consumer of publishing api.
This PR includes a few implementation changes:
Use non persistent messages, because we want requeues to be as fast and low
overhead as possible. Requeued messages are less important than normal
publishing messages and should be abandoned if RabbitMQ goes down.
Use a separate routing key -
*.bulk.reindex
- to indicate the intent (seeremoved comment). This allows us to define a separate non-durable queue
for these messages in Rummager; currently the
rummager_govuk_index
receives everything and is set up as
durable
to survive server restarts.Filter on
content_store
instead ofstate
, because we care aboutunpublished (especially withdrawn) editions as well as published.
Filter by document type for now, as we don't need the whole lot yet. We may
add a 'requeue everything' task back in later.
Part of https://trello.com/c/5d4Qkt31/244-bulk-requeue-content-by-format / https://trello.com/c/VQsLP52d/246-create-rake-task-to-bulk-republish-by-format