Skip to content

Commit

Permalink
Update rfc-058-publishing-api-events.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dougdroper committed May 17, 2017
1 parent f32be29 commit e071e60
Showing 1 changed file with 1 addition and 86 deletions.
87 changes: 1 addition & 86 deletions rfc-058-publishing-api-events.md
Original file line number Diff line number Diff line change
@@ -1,87 +1,2 @@
## Problem

An event is created for every request that comes into the publishing API.
These events are used as a form of Event Sourcing to track the application state
as a series of events over time, and can be used to reconstruct past states.

One of the problems is the sheer number of events that go through the
publishing API, roughly growing at 600MB a month, currently sitting at 6GB,
4.2 million events, which has implications for replication, and for developers
copying that data to their local environments.

The other problem being that the event information, as it stands, starts losing its effectiveness
over time, with code changes, schema changes and API version changes, the ability
of being able to replay the events from scratch gets lost.

## Proposal

The event information is useful for replaying events and debugging but we would

like to archive the events in the events table, but with still having ability to:

1.   Easily search through events for debugging
2.   The ability to retrieve events, to replay in the case of failure.
3.   Store future events

------------------------------------------------------------------------------------------------------------------
Option A:
Start storing events in Elasticsearch.

This would imply that we no longer store events in the Publishing API, but rather
every time an event occurs, we asynchronously store it in elasticsearch. We would
need to decouple events from any command, and store the event in the same transaction.
Which has further implications of sending versioned data downstream to the
content store, which currently is managed by the event id.

pros:

- Ease of storing and retrieving events and rich query language makes this useful for debugging
- No development time needed to build an interface (Kibana)
- Already part of our infrastructure
- Easily retrieve events to replay

cons:

- Reliability, if someone accidentally deletes the index
- Effectively becomes the primary data source of events
- Might also need archiving as a backup

------------------------------------------------------------------------------------------------------------------
Option B:
Log payload params directly to logstash

This would mean simply logging the request params in the log,
and loses any concept of an event

pros:

- Very little work initially
- No other elasticsearch clusters

cons:

- Difficult to backdate events
- More difficult to replay events
- Not as easy to query since it will be interspersed with other data

------------------------------------------------------------------------------------------------------------------
Option C:
Another Postgres DB in the Publishing API

By establishing a different connection in the Events model
[http://api.rubyonrails.org/classes/ActiveRecord/Base.html](http://api.rubyonrails.org/classes/ActiveRecord/Base.html)

pros:

- Same infrastructure as current publishing API
- Little to change

cons:

- Doesn't solve the problem of data replication
- Increases complexity

------------------------------------------------------------------------------------------------------------------
Other Options:
Archive to S3 and carry on as normal
My preference would be for option D since the information is not needed on a day to day basis and should be the easiest to implement .

0 comments on commit e071e60

Please sign in to comment.