Skip to content
This repository has been archived by the owner on Nov 20, 2022. It is now read-only.

Fix data coming from DB integration #9

Closed
Akii opened this issue Nov 5, 2018 · 3 comments
Closed

Fix data coming from DB integration #9

Akii opened this issue Nov 5, 2018 · 3 comments
Assignees
Labels
enhancement integration Task related to a disruption source integration
Projects

Comments

@Akii
Copy link
Owner

Akii commented Nov 5, 2018

Basically, the DB integration does something like this:

  • report everything is broken
  • report nothing is broken

This can be fixed by chunking up the stream by seconds and processing each chunk comparing it to the chunk before that. If the difference is too high, the chunk can be ignored all together.

For this to work, the replaying and persistence of events must move into the source such that this becomes possible:

  P.each disruptionEvents >-> chunkP >
                                      |
clientP >-> monitorP >-> storeEventP >+> monitorP' >-> P.concat

This also enables the usage of appendMany, which I assume is faster.

As a follow up, the Source m a can then become a Functor and Monoid with which the handling later on is simplified.

@Akii Akii added enhancement integration Task related to a disruption source integration labels Nov 5, 2018
@Akii Akii added this to In progress in Iteration 3 Nov 5, 2018
@Akii Akii self-assigned this Nov 5, 2018
@Akii
Copy link
Owner Author

Akii commented Nov 10, 2018

First step is to detect when the API is disrupted. To do this it's possible to compare number of disrupted facilities and look for spikes. If first there are about 180 and suddenly there are 2200 disruptions it's a downtime. Similarly when it decreases to 0 from 180.

Once a disruption is detected, monitoring must continue until the disruption is resolved (disruptions are "around" 180 again). In this phase all events are filtered out and kept inside the monitor for later reference.

Then the last known good state and the current state must be compared. Occurred events will then be released based on the model below. It's currently not intended to generate new events on the fly so the actual ones that have been emitted must be used. If that is impossible, I need to re-think some parts.

MonitorP must implement the original idea from #4 :

<---> defines the time range in which the API is known to be disrupted.

|-------------------------|
|------------------------>| Case 1: Monitoring dis. only
|------------<----------->| Case 2: Disruption before, resolved after
|-----<------>------------| Case 3: Disruption before and after
|------------>------------| Case 4: Disrupted after

-> monitoring dis only = ignore
-> Case 2 -> we don't know when it really ended; have to include

@Akii
Copy link
Owner Author

Akii commented Nov 11, 2018

To do this it's possible to compare number of disrupted facilities and look for spikes.

Easier said than done. The 1.5 x IQR rule turned out to be good enough.
Next up: Compensating actions.

@Akii
Copy link
Owner Author

Akii commented Nov 12, 2018

Turned out that was quite easy using the already existing disruption monitor.

@Akii Akii closed this as completed Nov 12, 2018
Iteration 3 automation moved this from In progress to Done Nov 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement integration Task related to a disruption source integration
Projects
No open projects
Iteration 3
  
Done
Development

No branches or pull requests

1 participant