Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/orchestration #112

Merged
merged 45 commits into from
Aug 21, 2017
Merged

Feature/orchestration #112

merged 45 commits into from
Aug 21, 2017

Conversation

hellais
Copy link
Member

@hellais hellais commented Jun 12, 2017

This is based on top of the monitoring pull request and includes all the roles for deploying probe orchestration backends.

hellais and others added 30 commits May 13, 2017 17:03
Fix calling of the proteus-* service command
* Add admin password to vault
* Write the proteus notify certificates
1) As stated in ooni/orchestra#13, we must not set a topic
for Android because that leads to FCM rejecting the message.

2) Define two different topics for iOS, one for production and
one for testing, because I guess we'll need them soon.
fix,refactor(proteus-events): topics correctness
@@ -3,4 +3,4 @@
- include: config.yml
become: yes
become_user: postgres
become_method: su
become_method: sudo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have sudo in dom0, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Also, that' default become_method.

---
- name: Add YarnPkg apt package signing key
apt_key:
url: "https://dl.yarnpkg.com/debian/pubkey.gpg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "better" to import from keyservers using full fingerprint.

Copy link
Member Author

@hellais hellais Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


- name: Add YarnPkg apt repository
apt_repository:
repo: 'deb https://dl.yarnpkg.com/debian/ stable main'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stable for stable yarn release or for debian:stable ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for debian:stable

shell: "psql -U {{ proteus_db_user}} -h 127.0.0.1 -d {{ proteus_db_name }} -f {{ proteus_db_schema_path}}/{{ item }}"
environment:
PGPASSWORD: "{{ proteus_db_password }}"
with_items: "{{ proteus_db_files }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when one needs to ALTER something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually all of this stuff can probably be removed as it's now part of proteus itself. We only need to create the proteus_db_user and proteus will take care of populating the database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, @bassosimone was struggling with sql updates embedded into the binary while launching concurrent processes as same DB is used by various services. So there was a suggestion to move SQL files away from the binary and apply them using some other tooling. Is it still the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think you were the one suggesting that as a solution, but the issue in the migration scripts was solved, so there is no plan to go that route in the near future.

createhome: no
shell: /sbin/nologin
comment: "Proteus User"
state: present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/sbin/nologin is not a valid shell.

UID will clash with superq as soon as dom0 with node_exporter is deployed there:

events.proteus.test.ooni.io | SUCCESS | rc=0 | (stdout) uid=10007(proteus) gid=10007(proteus) groups=10007(proteus)
notify.proteus.test.ooni.io | SUCCESS | rc=0 | (stdout) uid=10007(proteus) gid=10007(proteus) groups=10007(proteus)
proteus.test.ooni.io | SUCCESS | rc=0 | (stdout) uid=10007(proteus) gid=10007(proteus) groups=10007(proteus)
registry.proteus.test.ooni.io | SUCCESS | rc=0 | (stdout) uid=10007(proteus) gid=10007(proteus) groups=10007(proteus)
notify.proteus.ooni.io | SUCCESS | rc=0 | (stdout) uid=10006(proteus) gid=10006(proteus) groups=10006(proteus)
registry.proteus.ooni.io | SUCCESS | rc=0 | (stdout) uid=10006(proteus) gid=10006(proteus) groups=10006(proteus)
events.proteus.ooni.io | SUCCESS | rc=0 | (stdout) uid=10006(proteus) gid=10006(proteus) groups=10006(proteus)
proteus.ooni.io | SUCCESS | rc=0 | (stdout) uid=10006(proteus) gid=10006(proteus) groups=10006(proteus)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UID clash is a blocker to merge it back to master, so I'll have to fix it and redeploy. It'll require some minor proteus downtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. How much downtime are we talking about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minutes if everything is OK, couple of hours if something goes seriously wrong :)

[Service]
User={{ proteus_user }}
Group={{ proteus_group }}
ExecStart={{ proteus_path }}/proteus-events --config {{ proteus_config_path }}/proteus-events.toml start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the user for proteus-events or for proteus-frontend ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They all use the same user (proteus)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case this is the user running the proteus-events binary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, is there any reason for these services to share unix user? As far as I see, there is no -- these services share no objects.

I'm sorry for the form of the question :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason.


[Service]
User={{ proteus_user }}
Group={{ proteus_group }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the user for proteus-frontend or for gorush ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's for gorush.

- hosts: orchestration
gather_facts: true
roles:
- docker_py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single use of orchestration group I see is to install docker and docker is useless for db-1 hosts. What's the point of that group?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was previously using this group as a "hack" to run dom0 and gather_facts pre-emptively. I guess it can be removed.

gather_facts: false
vars:
psql_db_name: proteus
psql_db_user: proteus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have different db_name and db_user for grepability

proteus_events_https_port: 443
letsencrypt_nginx: yes
letsencrypt_domains: "{{ inventory_hostname }}"
proteus_database_url: "postgres://proteus:{{ proteus_database_password_testing }}@db-1.proteus.test.ooni.io:5432/proteus?sslmode=require"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sslmode=require is nice, but it's using /etc/ssl/certs/ssl-cert-snakeoil., so what's the point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to protect against a purely passive attack and requiring an active attack to leak the password.
Also it's probably not a big deal since we are communicating inside of our network.

mode: 0755
owner: "{{ proteus_user }}"
group: "{{ proteus_group }}"
copy: no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bassosimone please, add sha256sums file to releases of proteus binaries and the script that generates releases.

These checksum files may be used to speedup re-run of the playbook as ger_url module accepts checksum argument like that.

Copy link
Member Author

@hellais hellais Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed a ticket about it: ooni/orchestra#22

proteus_apn_key_password: "{{ proteus_apn_key_password_testing }}"
proteus_apn_key_content: "{{ proteus_apn_key_content_testing }}"
proteus_environment: "production"
proteus_notify_basic_auth_password: "{{ proteus_notify_basic_auth_password_testing }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellais and @bassosimone, please, read paragraph on variable precedence in ansible, it explains that you probably don't need inderection like proteus_database_password_testing, you just need proteus_database_password defined in right places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can clean it up a bit that would be good. Though while deploying a first version of production I accidentally used the testing password in production and to avoid the risk of confusion I found it more clear to just label them as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though while deploying a first version of production I accidentally used the testing password in production and to avoid the risk of confusion I found it more clear to just label them as such.

Yeah, I guess this is a good argument in favour of a little more redundancy and robustness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

robustness

Yep. I just wanted to say, that reuse of proteus_user in four different roles is probably a bad idea :-)

gorush_url: "{{ gorush_download_url }}/v{{ gorush_version }}/gorush-v{{ gorush_version }}-linux-{{ go_arch }}"
gorush_path: "{{ proteus_install_path }}/gorush-v{{ gorush_version }}-linux-{{ go_arch }}"

notification_backend: "gorush"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should code for notification_backend: proteus be removed and corresponding binaries and services cleaned up?

- name: restart proteus
systemd: name=proteus-registry.service state=restarted
- name: reload proteus
systemd: name=proteus-registry.service state=reloaded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO same handler name for different actions is a way to subtle bugs.

@darkk darkk merged commit df0b67d into master Aug 21, 2017
@darkk darkk mentioned this pull request Aug 21, 2017
11 tasks
@hellais hellais deleted the feature/orchestration branch August 22, 2017 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants