Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1475 user auth (steps 1 & 2) - configurable public access #1480

Merged
merged 20 commits into from Jul 31, 2015

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Jun 2, 2015

First part of #1475 - use a custom connection validator to authorize a client for free access to scan data (no passphrase required) or full control access (with the standard suite passphrase).

For the next step (not in this pull request) this new connection validator can easily be made to work with a user+password file for control or read-only authorization of individual users, instead of a single passphrase required by all authorized users - we just need to decide on details (e.g. do we want to allow authorization to multiple suites at once...)

As a side effect, this change greatly simplifies port scanning - no need to try all passphrases on each port (except in a much cleaned-up back compat mode for existing older suite daemons).

Also, cylc scan can now see all suites (not just the user's) and I've allowed it to retrieve minimal task state totals as well as name and owner, which makes it a CLI version of cylc gsummary - hence this can close #578. Do we agree that this is an appropriate extension to cylc scan?

[update: global config scan groups taken out in favour of simple suite name and owner options]
[update: outdated image removed; see new version below]
[update: gsummary comments shifted to a new issue #1488]

Close #577

@hjoliver hjoliver self-assigned this Jun 2, 2015
@hjoliver hjoliver added this to the soon milestone Jun 2, 2015

import cylc.flags

def get_task_cycle_statuses_updatetime(host, suite, owner=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this out of gsummary.py to use it in cylc-scan as well. However, now I'm getting the state totals from the "suite identity" (scan) interface in the daemon, and this can go back to gsummary (just for computing stop summaries)...

@matthewrmshin
Copy link
Contributor

Yes, this is moving cylc scan in the right direction. I agree that we should combine the communication logic for cylc scan and cylc gsummary.

Other thoughts:

  • The default should be to display suites of the current user ID?
  • The hyphen between column 2 and 3 is probably unnecessary, given the column-based display.

@arjclark
Copy link
Contributor

arjclark commented Jun 2, 2015

The default should be to display suites of the current user ID

I think so, perhaps with a -A option to display all users' suites.

@hjoliver
Copy link
Member Author

hjoliver commented Jun 5, 2015

Branched rebased onto master and squashed. To limit the scope of this pull request, I have punted gsummary changes to #1488 after adapting it minimally to the new scan command (it still gets all task state totals from cylc cat-state, not via the new scanning).

@hjoliver hjoliver changed the title #1475 (user authentication) step 1 - free access to scan info. #1475 (user auth) step 1 - free access to scan info. Jun 7, 2015
@hjoliver
Copy link
Member Author

hjoliver commented Jun 7, 2015

note - port files removed, no longer needed

[update: I've gone back on this - port files restored https://github.com//pull/1480#issuecomment-109964703]

Now that port scanning is more efficient (except for back compat with older suite daemons) we can get suite port numbers by scanning instead of using ssh to cat the port file. Advantages of this:

  • no ssh subprocess is spawned
  • no need for passwordless ssh to the suite host account
  • accidental deletion of the port file does not prevent communication with the suite
  • port-file clean-up was fallible - e.g. kill -9 <suite PID>; now at start-up we can always say definitively if the suite is already running or not

@hjoliver
Copy link
Member Author

hjoliver commented Jun 8, 2015

Note - once we get rid of passphrases stored in suite definition directories we'll no longer need to consult the suite registration db at run-time (it's currently used to find the passphrase)

@arjclark
Copy link
Contributor

arjclark commented Jun 8, 2015

Now that port scanning is more efficient (except for back compat with older suite daemons) we can get suite port numbers by scanning instead of using ssh to cat the port file

Do we not still need to have some sort of file around to indicate the suite is running in the event network access to a suite host machine has failed?

@matthewrmshin
Copy link
Contributor

@hjoliver To elaborate @arjclark 's comment, consider a farm of suite servers with a shared file system, we need to have a mechanism to prevent a suite from running on multiple servers. Does this change handle this kind of situation?

A separate point, I suppose the user@host:port information is also documented in the cylc-suite-env file.

@matthewrmshin
Copy link
Contributor

Does it also mean that gcylc has to scan ports on everything hosts?

@hjoliver
Copy link
Member Author

hjoliver commented Jun 8, 2015

@arjclark and @matthewrmshin - the full test battery now passes in my environment on this branch ... but I think you're probably right. Also, after some testing I already had concerns that while scanning all ports is now much faster, it may not be fast enough under all circumstances to trump using the port file... I'll look into this tomorrow, and will restore the port file if necessary.

@hjoliver
Copy link
Member Author

hjoliver commented Jun 9, 2015

(I reverted the port file change today; just ran the test battery on my laptop and two of the restart tests are failing - one of them is also failing on master - I'll look at these tomorrow ... then this should be ready for review)

@arjclark
Copy link
Contributor

arjclark commented Jun 9, 2015

I've had problems with one of the restart tests recently but it eventually ran fine when that test was run on its own.

@hjoliver hjoliver force-pushed the 1475.pyro-authentication branch 3 times, most recently from d3ab4e2 to f4000a4 Compare June 11, 2015 00:52
@hjoliver
Copy link
Member Author

(@kaday - branch updated from master, is merge-able again. One conflict resolved in lib/cylc/network/suite_broadcast.py; I've checked that the broadcast tests all pass.)

@hjoliver
Copy link
Member Author

(Ouch, nasty conflicts here now...)

Conflicts:
	bin/cylc-cat-log
	bin/cylc-cycle-point
	bin/cylc-jobscript
	bin/cylc-trigger
	lib/cylc/CylcOptionParsers.py
	lib/cylc/scheduler.py
@hjoliver
Copy link
Member Author

Branch updated. Test battery still passes.

@kaday
Copy link
Contributor

kaday commented Jul 29, 2015

cylc scan has some caveats, for example cylc scan -c should imply cylc scan -c -s.

cylc scan is also very slow, I thought Matt speed this up by making it scan multiple hosts. Has this been changed again?

@hjoliver
Copy link
Member Author

cylc scan should be much faster on this branch because it does not need to try every passphrase on every port. However:

  • if it detects an older suite daemon, it still has to fall back to the trying every passphrase - so if you test on a system with many suites running at <= cylc-6.5.0, the difference may not be clear.
  • note that since 6.50 the size of the process pool used to scan multiple hosts concurrently (same global config as for the suite daemon process pool for job submission etc.) now defaults to 4 instead of the CPU count.

That said, if you think that scan is slower on this branch than 6.5.0 we should investigate...

@kaday
Copy link
Contributor

kaday commented Jul 29, 2015

Backwards compatibility with cylc scan and cylc gsummary is fine in my environment. I tested with cylc 6.5.0, 6.4.1, 6.4.0.u2 and 6.1.2

@kaday
Copy link
Contributor

kaday commented Jul 29, 2015

I believe there is a speed issue with cylc scan.

I ran it on cylc 6.4.1 and had multiple suites at multiple versions and it took 22 seconds.

I ran it on cylc 6.5.0 with one suite and it took 22 seconds.

I ran it on the branch with only one suite that was the branch version and it took 53 seconds.

@matthewrmshin
Copy link
Contributor

Some quick timing information (seconds) on how long it takes to run the cylc.network.port_scan.scan method for each of our suite hosts. (Note: I am not running any suite on localhost.)

host master this branch
suitehost01 1.06919097900390620 2.093166112899780300
suitehost02 1.54857683181762700 2.608752012252807600
suitehost03 1.25125694274902340 2.259714126586914100
suitehost04 1.31666803359985350 1.938277006149292000
suitehost05 1.37587404251098630 3.692133188247680700
suitehost06 0.37758398056030270 0.326751947402954100
suitehost07 0.81919789314270020 2.640250205993652300
suitehost08 0.35372805595397949 0.231967210769653320
suitehost09 1.00492787361145020 2.602063894271850600
suitehost10 1.47894692420959470 2.203112840652465800
localhost 0.14222812652587891 0.025026082992553711

@matthewrmshin
Copy link
Contributor

(I'll do more time analysis tomorrow.)

@matthewrmshin
Copy link
Contributor

(This time with number of suites on hosts. All hosts running mainly suites at 6.5.0, 6.4.1, or below.)

host n-suites time on master time on branch
suitehost01 38 1.16861200332641600 2.20450592041015620
suitehost02 48 1.34274005889892580 2.69403600692749020
suitehost03 37 1.17865085601806640 2.14849495887756350
suitehost04 35 0.93740415573120117 2.06255602836608890
suitehost05 39 1.31931090354919430 2.64694094657897950
suitehost06 02 0.38626098632812500 0.32387804985046387
suitehost07 24 0.96043610572814941 1.64359712600708010
suitehost08 01 0.36349892616271973 0.26632690429687500
suitehost09 34 1.03919601440429690 1.76545405387878420
suitehost10 38 1.59931898117065430 2.92711496353149410

@hjoliver
Copy link
Member Author

OK, thanks for that - I did find a typo that slowed down scanning of new suites a bit. It wouldn't affect your results (all old suites), but I'm not convinced there is a problem.

On each occupied port, if an initial attempt at new authentication fails with "connection denied", we try again (old-style) with all passphrases - so new-style scan should be slower for old suite daemons: it is roughly equivalent to old scan plus one attempt at new authentication. This shouldn't have a large impact though, and it is a temporary - new scan of new suites should be faster than old scan.

Timings (mean of 10 scans) on my laptop VM seem to confirm this:

% python -m timeit -n 10 -r 1 'import os; os.system("cylc scan")'

1) 10 OLD suites running, 10 registered suites.
   HEAD (new scan):  1.2s
   6.5.0 (old scan): 1.2s

2) 10 NEW suites running, 10 registered suites.
   HEAD:  0.8s
   6.5.0: 1.1s

3) 10 NEW suites running, 66 registered suites ("cylc import-examples")
   HEAD:  0.8s
   6.5.0: 3.0s

Note that old scan slows down with the number of registered suites (because there's more passphrases to try).

@hjoliver
Copy link
Member Author

Note I've added two shell scripts to dev/bin: one generates, registers, and runs N suites (with --hold to minimize CPU loading) and the other stops them and cleans up - to aid in testing on a clean host. Note also (as per my timing notes above) an easy way to register a large number of suites to see the effect on old-scan is to run cylc import-examples.

@matthewrmshin
Copy link
Contributor

I have written a similar script to launch 30 suites (each running a single task with script=false) on my desktop, and my results are like this:

Suites 6.4.1 Scan #1480 Scan
6.4.1 Suites 1.13s 1.75s
#1480 Suites 1.70s 0.63s

@matthewrmshin
Copy link
Contributor

So, I can confirm the benefit of this branch when most suites have moved up to this version or above.

@kaday
Copy link
Contributor

kaday commented Jul 30, 2015

Excellent that there is not a speed problem once all suites are on the new version.

@hjoliver
Let me know when you have done this bit:
cylc scan has some caveats, for example cylc scan -c should imply cylc scan -c -s
then I will check it and test battery again.

@kaday
Copy link
Contributor

kaday commented Jul 30, 2015

My issue with speed seemed to be linked to the extensive number of suites I had in my cylc-run/ directory. Much quicker now!

@hjoliver
Copy link
Member Author

extensive number of suites I had in my cylc-run/ ...

Strictly speaking it is the number of registered suites that matters (cylc db print | wc -l), as suite registrations are used to find passphrases. I suspect most users are unaware of the effect this has on (old) cylc scan.

@hjoliver
Copy link
Member Author

Let me know when you have done this bit...

Done (well, the one caveat you mentioned explicitly - are there others?).

@kaday
Copy link
Contributor

kaday commented Jul 31, 2015

All passes in test battery in my environment. Working in my environment. Looks okay to me.

kaday added a commit that referenced this pull request Jul 31, 2015
#1475 user auth (steps 1 & 2) - configurable public access
@kaday kaday merged commit 4e0ab14 into cylc:master Jul 31, 2015
@hjoliver hjoliver deleted the 1475.pyro-authentication branch August 1, 2015 00:37
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Aug 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command line equivalent of "cylc gsummary"? cylc scan deficiencies
4 participants