Feature/opos #81

hellais · 2017-01-09T14:55:02Z

This branch include work in progress on the probe orchestration system

bassosimone

Interesting read! I've made some comments meant at improving the document and clarify some points.

bassosimone · 2017-01-25T21:42:33Z

opos/OONI-Probe-Orchestration-System-Design.md

+There should be full transparency over what is being scheduled to be run on
+orchestrated probes. That is to say that it should be possible for anybody to
+inspect the log of all the instructions and determine which ones have affected
+their probe.


The possibility for anybody to inspect, seems to contradict "either publicly or privately depending on what is safer for the user", isn't it?

bassosimone · 2017-01-25T21:43:05Z

opos/OONI-Probe-Orchestration-System-Design.md

+Components:
+
+**OPOS Client**: This is generally an instance of ooniprobe (or
+ooniprobe-mobile) that is subscribing to the OPOC event feed.


Shouldn't this be "OPOS event feed" rather than "OPOC"?

bassosimone · 2017-01-25T21:44:29Z

opos/OONI-Probe-Orchestration-System-Design.md

+on and their progress.
+
+**OPOS Event feed**: This is the backend component responsible for emitting
+events to clients and informing them of what they should be doing.


I don't get completely the difference between scheduler and feed, and specifically how they will be implemented in real: is the scheduler a daemon and the feed a database (or a queue)?

Or is the feed the output of the scheduler?

As a matter of fact the "scheduler" was added after the "event feed". Originally I had excluded the "scheduler" from the architecture as I was interested in documenting only what is exposed and the "scheduler" in a way is an architectural construct that is in a way transparent to the user.
However in specifying this I felt the need to have a this architectural construct and so I added it.

To exemplify this in a more concrete way, a possible implementation of this would be:

OPOS Event feed: a websockets interface that clients subscribe to receive notices of what they should do

OPOS Scheduler: the engine responsible for knowing which clients should be notified of events

bassosimone · 2017-01-25T21:45:48Z

opos/OONI-Probe-Orchestration-System-Design.md

+
+When somebody with proper access permissions adds an event to via the **OPOS
+Management Interface** all the subscribed probes that are affected by that
+event will be notified and will begin performing the specified action.


In this description, it should be useful to clarify the role of the scheduler (when does it kick in?)

I added a note about this below.

bassosimone · 2017-01-25T21:48:48Z

opos/OONI-Probe-Orchestration-System-Design.md

+  it on github.com) to avoid it being blocked hence being resilient to censorship.
+
+* It SHOULD be possible to know how many *active* **OPOS Clients** there are at
+  a given time.


I am not convinced that this requirement is simple to implement with a pub-sub scheme. I mean: when using WebSockets (or perhaps we should consider HTTP/2?), you can keep a persistent connection and thus you can enumerate your clients (even anonymously, if they pass through some collateral freedom thingy). I do not see easily how can you enumerate clients with pub-sub and low latency.

That is why I mentioned this as a SHOULD and not as a MUST. It's something I would like to have the power to know, because it's very useful, but it's not a strict requirement.

I agree that depending on the specific implementation there will be different ways to do this that can be more or less accurate. Worst case scenario we can do something similar to how Tor counts users. We estimate the average number of requests each client will make per day and count the number of requests and divide by the number of average requests a client is meant to make.

bassosimone · 2017-01-25T21:59:05Z

opos/OONI-Probe-Orchestration-System-Design.md

+* The start time of the job in ISO 8601 (YYYY-MM-DDThh:mm:ssTZ, where TZ is the timezone of `Z` for UTC).
+  An empty start time means start immediately.
+
+* The run interval, defined following the ["Duration" component of the ISO 8601](https://en.wikipedia.org/wiki/ISO_8601#Durations) standard.


I'd in any case explain the semantic of this field without only redirecting to the standard.

bassosimone · 2017-01-25T22:00:14Z

opos/OONI-Probe-Orchestration-System-Design.md

+The Management Interface will assign to a given job a unique id (hereafter
+referred to as `job_id`) that can be used to unschedule jobs.
+
+It should be possible for somebody interacting with the Management Interface to


Here I'd say "an authorised operator"

bassosimone · 2017-01-25T22:02:39Z

opos/OONI-Probe-Orchestration-System-Design.md

+### 4.1.1 Run test
+
+This action is used to instruct a probe to run a certain OONI test once. For
+sake of clarity we omit the common elements from the above format.


Suggestion: can we design a format by which we can omit common fields unless necessary to override some defaults, such that the JSON that will travel on the wire is actually the one in here?

Unrelated: It would help to clarify in some way from which two entities will this message travel.

bassosimone · 2017-01-25T22:04:34Z

opos/OONI-Probe-Orchestration-System-Design.md

+    RUNNING --> COMPLETED;
+    COMPLETED --> [*];
+    @enduml
+)


Is this a picture?

Ah, never mind: it is a picture (I just checked by viewing the document)

Yeah it's a UML chart.

bassosimone · 2017-01-25T22:07:08Z

opos/OONI-Probe-Orchestration-System-Design.md

+
+# Links
+
+These


I think we can omit this "these"

bassosimone · 2017-02-02T10:32:34Z

opos/OONI-Probe-Orchestration-System-Design.md

+
+* The country of a probe
+
+* The Network of a probe


bikeshedding: I don't think we should capitalize network here

bassosimone · 2017-02-02T10:36:12Z

opos/OONI-Probe-Orchestration-System-Design.md

+Management Interface** all the subscribed probes that are affected by that
+event will be notified and will begin performing the specified action.
+
+The **OPOS Scheudler** is the scheduling engine responsible for figuring out


bassosimone · 2017-02-02T10:36:53Z

opos/OONI-Probe-Orchestration-System-Design.md

+
+# 3.0 Transport protocol requirements
+
+We use the term "Transport" loosely here to mean the protocol that is being


I don't think Transport should be capitalised here (and let's paint the bike shed ooni-blue)

bassosimone · 2017-02-02T10:39:03Z

opos/OONI-Probe-Orchestration-System-Design.md

+
+The base data format is the following:
+
+```json


bikeshedding: if we declare this as json, the comment (which is not json) will result badly coloured. I think it's better either to have a valid json or to do not declare this as json.

Yes you are right, I guess I should declare it as hjson (https://hjson.org/)

bassosimone · 2017-02-02T10:39:19Z

opos/OONI-Probe-Orchestration-System-Design.md

+            },
+            "probe_asn": "",
+            "probe_id": "",
+            "probe_family": "",


What is this family?

I guess it's a bit confusing to include something that is not yet implemented in ooniprobe here, but my reasoning behind it was that at some point we may want to group ooniprobes together by family, so that you can say "the ooniprobe run by ORGX" and they are 10 each with their own ID.

bassosimone · 2017-02-02T10:41:43Z

opos/OONI-Probe-Orchestration-System-Design.md

+`filter`:
+The `where` clause in the filter definition is an implementation of the [loopback
+style filters](https://github.com/strongloop/loopback-filters), but with "in"
+in place of "inq" for clarity.


I know I originally advocated for in rather than inq. Now, I'm wondering whether keeping inq will be better because we adhere to the standard and we have one less diff-y thing to remember.

Yeah that was also my reasoning for using inq in the beginning. I don't have a strong opinion for either (maybe we can say that they are both supported?)

bassosimone · 2017-02-02T10:43:40Z

opos/OONI-Probe-Orchestration-System-Design.md

+* **COMPLETED**: when a job has finished executing it enters the COMPLETED state.
+
+
+# Links


I'd perhaps link inline rather than at the bottom, so it's more clear what refers to what

I am just going to drop the links that are not mapped to anything in particular. The reason why I put chronos there originally was for part of the inspiration of the scheduler format.

bassosimone

I don't want to prevent us from moving forward with this specification more than necessary.

I feel like most of my comments can be addressed also at a later time, when we start making a prototype.

anadahz · 2017-02-03T16:38:39Z

opos/Measurements-and-url-policy.md

+Measurements should be scheduled via OPOS and/or any other system run by OONI
+to gather network measurement data.
+
+1. To the extent that it is possible we will always do what is best for users of OONI.


I guess it should be mentioned to the policy that users can be opt-out by OPOS and will not be enabled by default.

I think that this is a bit out of scope for the policy document as this defined the governance and rules for when OPOS is in place. Whether it is in place for all users, opt-in, disableable or whatnot is an implementation detail.

BTW I think we should have it enabled by default for all new users and prompt old users to enable it.

BTW I think we should have it enabled by default for all new users and prompt old users to enable it.

Yes, I agree with this

anadahz · 2017-02-03T16:39:47Z

opos/Measurements-and-url-policy.md

+# Measurements and URL policy
+
+This document explains the policy we follow when considering what URLs and
+Measurements should be scheduled via OPOS and/or any other system run by OONI


The OPOS initials could be expanded here.

anadahz · 2017-02-03T16:45:23Z

opos/Measurements-and-url-policy.md

+   measurement must be signed off by at least one other person.
+
+7. Anybody in the governance roster can nominate somebody else to join the
+   roster. Their inclusion shall be discussed and once there is consensus they


I guess the inclusion of people or entities must (and not shall) be discussed beforehand.

I think shall and must are synonyms in this context, though shall sounds to me as a bit softer.

anadahz · 2017-02-03T16:51:40Z

opos/Measurements-and-url-policy.md

+
+   a. Have a legittimate reason to do so (ex. you do research on censorship)
+
+   b. Be a respected member of the OONI community


We should be very careful with people using the OPOS not only for (obvious) reasons of malice but also for reason of experimentation failure: eg: researching on a new censorship analysis that could de-anonymize users.

I assume that the bigger this governance roster became the harder will be to control OPOS usage.

anadahz · 2017-02-03T16:54:08Z

opos/OONI-Probe-Orchestration-System-Design.md

+The target latency for the OPOS is of around 1 minute. This means that from the
+time a new instruction is inserted into the OPOS to the moment that the
+affected probes receive it 1 minute should pass.
+


I find the latency window quite short, is there any reasoning for this latency?

By short you mean it's too little? 1 minute of latency is actually fairly high, even with long polling you are going to achieve latencies that are less than 1 minute.

By short I mean too little, given the fact that there are a number of probes on very slow links getting a new instruction per minute can be too much.

Even if the link is slow (in terms of throughput), they will most likely have a latency < 60 seconds.
If they do have a latency > 60 seconds TCP sockets will timeout before they are established and bumping the time up will not help much there.
In which circumstances do you imagine there to be a > 60 seconds latency with OPOS?

I imagine that in most cases the probes will not have completed the scheduled job in less than 60 seconds. I can't think of many good reasons of having such a short (for ooni-probes) polling.

The 1 minute latency is from when they receive it, that doesn't mean it will take them less than 1 minutes to complete it.

Anyways I don't want to fixate on this as in the end this is going to be an implementation detail. The purpose of specifying this is that the order of magnitude we are targeting is not seconds, but rather minutes and not hours.

Would it make you more comfortable if I were to say "minutes" instead of "1 minute"?

Yeah, minutes rather than one minute is reasonable as well. In any case, the probe ability to refuse running a job is IMO our best line of defense against over scheduling.

Sure that makes and it's definetely not a blocker for the OPOS spec.
Thanks for working on this @hellais .

anadahz · 2017-02-03T17:08:49Z

opos/OONI-Probe-Orchestration-System-Design.md

+* The country of a probe
+
+* The Network of a probe
+


Some useful granular items that could be added:

Installed tests
List of installed tests per probe.

Allowed tests
List of permitted tests to be run per probe.

Type of collector (onion, https, cloudfront) used to submit reports

Privacy options enabled per probe (includeip, includeasn, includecountry)

Size of measurement quota (limit, free %)

I don't think these should go inside of the high-level design goals. The purpose of this section is to set the expectations as to what broad type of problem this system is trying to solve (schedule measurements based on network and location of a probe), the specific requirements (that are due to implementation details) that are needed in order to satisfy the job is too low level for at least this section at least.

These are some important properties that need to be enumerated in advance otherwise we may "violate" some user's options.

Some possible scenarios:

Schedule tests that are not installed by probes.

Schedule tests that are not "allowed" to run ie: the http header field manipulation test.

Schedule a test with a long list of experiments that require a decent amount of disk space to probes reaching their disk quota capacity.

Some possible scenarios:
Schedule tests that are not installed by probes.
Schedule tests that are not "allowed" to run ie: the http header field manipulation test.
Schedule a test with a long list of experiments that require a decent amount of disk space to probes reaching their disk quota capacity.

I think these are implementation details that don't necessarily have to live in this document, or at least not in this section. As an operator I don't care to specify these options and the orchestrator should be smart enough to figure out which of the available probes is capable of running my measurements.

I can add a note about the fact that certain attributes of the probe need to be sent to the OPOS in order for it to be able to resolve these constraints, in 4.0 Jobs.

anadahz · 2017-02-03T17:10:09Z

opos/OONI-Probe-Orchestration-System-Design.md

+
+A user interested in seeing what has been scheduled in the past can visit the
+**OPOS Event log** and see a history of what has happenned.
+


A note should be added with the default log retention date.

There is no log retention date. We should store all logs forever.

There are a number of probes that cannot handle storing all logs forever.
Perhaps it make sense to add a default log retention plan rather than end up having probes running out of disk space.

There are a number of probes that cannot handle storing all logs forever.

The logs are not stored by the probes, but by the probe orchestration system.

For privacy reasons it would be better if these logs are stored only on the probe(s) that performed the scheduled tests and are not stored in a service online.

For privacy reasons it would be better if these logs are stored only on the probe(s) that performed the scheduled tests and are not stored in a service online.

but as you pointed out before probes can have limited disk space available and they can't store them all.

Moreover there is nothing stopping somebody from subscribing to the event feed as all possible probe types and logging all the messages.
I think the assumption is that it's safe to log all the scheduled events as they are being broadcasted publicly.

What exactly is the privacy implication of making the scheduled events be known public?

anadahz · 2017-02-03T17:11:32Z

opos/OONI-Probe-Orchestration-System-Design.md

+example by specifying `"delay": 60` the action will be triggerred after a
+number of seconds going from `0-60`.
+
+`schedule`: The scheduling for the job, in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. Consists of 3 parts separated by /:


The / character is the separator for the 3 parts. I should probably wrap it in ` though

anadahz · 2017-02-03T17:13:59Z

opos/OONI-Probe-Orchestration-System-Design.md

+            "url": "https://torproject.org/"
+        }
+    }
+}


Isn't a job_id needed to schedule a test?

No, as is mentioned below:

The Management Interface will assign to a given job a unique id (hereafter referred to as job_id) that can be used to unschedule jobs.

anadahz · 2017-02-03T18:25:43Z

opos/OONI-Probe-Orchestration-System-Design.md

+The OPOS Scheduler keeps track of the lifecycle of a job. With respect to an
+**OPOS Client** a job can be in one of the following states:
+
+* **READY**: this is the state in which a job is when it is first added via the OPOS Management Interface.


For consistency this can be: this is the state a job is in when[...]

anadahz · 2017-02-03T18:27:38Z

opos/OONI-Probe-Orchestration-System-Design.md

+  progress of any given task.
+
+* **COMPLETED**: when a job has finished executing it enters the COMPLETED state.
+


For consistency the REJECTED RUNNING COMPETED states can start with: this is the state a job is in when[...]

anadahz · 2017-02-03T18:29:41Z

opos/Measurements-and-url-policy.md

+   roster. Their inclusion shall be discussed and once there is consensus they
+   are added. To be included in the roster you need to:
+
+   a. Have a legittimate reason to do so (ex. you do research on censorship)


Typo: s/legittimate/legitimate

anadahz · 2017-02-03T18:30:37Z

opos/OONI-Probe-Orchestration-System-Design.md

+which clients to notify of jobs and how.
+
+A user interested in seeing what has been scheduled in the past can visit the
+**OPOS Event log** and see a history of what has happenned.


Typo: s/happenned/happened

anadahz · 2017-02-03T18:31:14Z

opos/OONI-Probe-Orchestration-System-Design.md

+by specifying the optional `delay` key.
+
+We use JSON serialization for representing the message structures inside of
+this document, however if due to other contraints JSON is not an adequate


Typo: s/contraints/constraints

anadahz · 2017-02-03T18:31:54Z

opos/OONI-Probe-Orchestration-System-Design.md

+you can say: `"platform": "macos"` or `"platform": ["macos", "android"]`.
+
+`delay`: is an upper bound on the number of seconds to delay the action by. For
+example by specifying `"delay": 60` the action will be triggerred after a


Typo: s/triggerred/triggered

anadahz · 2017-02-03T18:32:29Z

opos/OONI-Probe-Orchestration-System-Design.md

+* The run interval, defined following the ["Duration" component of the ISO 8601](https://en.wikipedia.org/wiki/ISO_8601#Durations) standard.
+  Durations are indicated as a string in the format
+  `P[n]Y[n]M[n]DT[n]H[n]M[n]S`, where `[n]` is a number representing a value
+  for the follwoing date or time element.


Typo: s/follwoing/following

anadahz · 2017-02-07T12:31:20Z

opos/OONI-Probe-Orchestration-System-Design.md

+orchestrated probes. That is to say that all measurements scheduled via OPOS
+must be recorded in an append only log so that it's possible, upon request,
+for a user to learn what jobs have affected their probe.
+


How and from which entity a probe-user can request backlog of the past scheduled tasks?

OPOS Event log: Through this it is possible to inspect the history of
instrumented events.

@anadahz

based on feedback from @anadahz & @bassosimone

hellais · 2017-02-07T16:49:24Z

I think all feedback should have been addressed. Can I proceed with merging this?

bassosimone · 2017-02-07T16:55:12Z

Hit it!

hellais and others added 14 commits December 13, 2016 14:51

Start draft of the OONI Probe Orchestration System (OPOS)

e35dd6a

Add action definitions

ac60b1e

Update probe orchestration document with some more information

dd44523

Fix indentation?

ca7d4b4

Remove spurous code signs

b72a910

Add chart

e02cc6c

Fix chart

c7b0e3b

Add extra where metadata

2ceecf2

Replace "inq" with "in".

eafa659

Add Measurements and URL policy

5827789

Add specification metadata and cleanup headings

79ed272

Misc fixes

c08723c

Move into opos directory

201adeb

Avoid newline without paragraph

0a997fc

bassosimone suggested changes Jan 25, 2017

View reviewed changes

Address comments by @bassosimone

183e4ac

bassosimone reviewed Feb 2, 2017

View reviewed changes

bassosimone approved these changes Feb 2, 2017

View reviewed changes

anadahz reviewed Feb 3, 2017

View reviewed changes

anadahz reviewed Feb 7, 2017

View reviewed changes

hellais added 3 commits February 7, 2017 12:47

Fix some typos and improvements

52dcd2a

based on feedback from @anadahz & @bassosimone

s/1 minute/minutes/

0abcdc6

Fix typo

cbd199c

hellais merged commit f449f26 into master Feb 7, 2017

hellais deleted the feature/opos branch February 7, 2017 17:11


		# 3.0 Transport protocol requirements

		We use the term "Transport" loosely here to mean the protocol that is being

		* COMPLETED: when a job has finished executing it enters the COMPLETED state.


		# Links


		a. Have a legittimate reason to do so (ex. you do research on censorship)

		b. Be a respected member of the OONI community


		A user interested in seeing what has been scheduled in the past can visit the
		OPOS Event log and see a history of what has happenned.

		progress of any given task.

		* COMPLETED: when a job has finished executing it enters the COMPLETED state.

Feature/opos #81

Feature/opos #81

Conversation

hellais commented Jan 9, 2017

bassosimone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bassosimone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anadahz Feb 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anadahz Feb 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hellais Feb 3, 2017 • edited Loading

Choose a reason for hiding this comment

anadahz Feb 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hellais commented Feb 7, 2017

anadahz Feb 3, 2017 •

edited

Loading

anadahz Feb 7, 2017 •

edited

Loading

hellais Feb 3, 2017 •

edited

Loading

anadahz Feb 3, 2017 •

edited

Loading