New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Getting Started guide #451

dw opened this Issue Dec 13, 2018 · 0 comments


None yet
1 participant
Copy link

dw commented Dec 13, 2018

This is just a bunch of notes.

Missing, at a minimum:

  • Explain the process model, explain that .call() runs on the main thread
  • forking vs threading: Mitogen use puts huge restraint on fork use. Cannot support fork without hugely restricting library functionality (by ripping out threading, and/or forking lots of helper processes), cannot internally use fork instead of threading without restricting support for sane non-fork OSes (Windows). "Safe" fork helpers supplied, but net cost to use them is unreasonably high -- problem is intractable, fork entails making arbitrary inconsistent snapshot of process in parallel to that process changing, including the state of all locks. UNIX fork by design guarantees a corrupted snapshot of the parent process in the presence of threads
  • Explain what services are, explain that services run on a thread pool
  • Annotated example showing a source file with a .call() such that a function in that file runs on a different machine. Emphasize importance of paying attention to where functions are running because it gets confusing quickly
  • Better explanation of the serialization limitations -- "remote box is an untrusted web service" idea
  • Explain Context object:
    • being an IP address-like value (just an integer),
    • with some helper methods for sending messages to that address.
    • Explain they're cheap, serializable, copyable, don't manage any resources of their own.
    • Explain they are unrelated to the streams ultimately connecting to that address.
    • Maybe mention subscribing to their events?
  • Re-emphasize importance of logging framework use and all the important events it records
  • Much better explanation of how to cope with a hang
  • Explain how to get stack traces out of hung processes
  • Maybe explain worst-case use of Python gdbinit and a debug build
  • Explain object lifetimes -- Pool and Broker are 99% per-process, Router is 98% per-process. Stream is managed internally, unless Context.shutdown() invoked against the remote end, which triggers cleanup.
  • Explain Receivers must be closed(), else possible to leak memory and/or cause hangs
  • Explain parallel connect is currently a manual mess
  • Recommend faulthandler.enable()
  • Explain risks of custom serialization -- pickle/Dill is actually useful, but it must be done safely. Master->target ONLY. under no circumstances target->master, ever. instant security problem
  • explain log_to_file() and main() are only helpers -- real app wants to customize their behaviour
  • explain exception types collapse into CallError over the wire. suboptimal, neat helpers are in the works
  • explain FileService and document it
  • explain case-by-case uploading a file, downloading a file
  • explain waiting on multiple results
  • explain waiting on multiple results /and/ function return values (nested select)
  • explain app is fully asynchronous and any 'blocking waits' are a mirage. Broker thread and service threads let parent app do work on behalf of children even while main thread appears to be 'blocked' waiting for those children to finished some task
  • maybe explain punching holes in security via auth_id?
  • explain the moduleresponder blacklist/whitelist -- explain security concerns w.r.t. untrusted context that can import Django settings module
  • explain whitelisting presently disables all use of locally installed modules on the remote. design will improve in future
  • explain importance of strict app/import layering, else megatons sucked over the wire. mention module whitelisting as one strategy
  • explain senders and receivers -- mention they're totally async, even with respect to the same process and thread.
  • explain sender is lightweight handle just like context -- just another integer involved.
  • maybe explain how receiver directly wraps stream protocol 'handle'/'reply_to' field
  • explain strategies for per-CPU scaling -- top-level process starts per-CPU .local() children, who fan out to many .ssh() connections. service pools in per-CPU children host all the heavy lifting
  • explain in-order guarantees in numerous places. Broker.defer() is strictly ordered, therefore Context.call_asyncs() are all strictly ordered with respect to a single enqueing thread. Calls run in order, and serially for each target they apply to
  • ^ this belongs alongside the process model discussion
  • explain enable_profiling() to get a cProfile trace of the children
  • explain enable_debug() to get a debug log of the children
  • explain the value of keeping all IO within Mitogen -- transparent proxying comes for free. disconnect / connection breakage notification uses a single mechanism, etc
  • explain why you might set remote_id or customize
  • process model: explain what the Broker thread actually does (explicitly), and explain what kind of use cases exist for Broker.defer() and Broker.defer_sync() ("any function that must execute atomically with respect to a consistent view of IO state")
  • explain threading model (or lack thereof!!!!). Explain which classes are explicitly thread safe, and which classes you shouldn't mess with one connections / broker / etc etc is running
  • explain Receivers can be waited on by multiple threads
  • explain Selects are not currently thread-safe (right?)
  • explain preference for extending e.g. .ssh() parameter list rather than passing raw strings through -- free upgrades later
  • explain the difficulty of supporting lambda better:
    • lambda-as-callback-parameter: gives untrusted context a free, un-typechecked GOTO into your privileged code. Impossible to reason about. used to be supported, got ripped out long ago
    • lambda-as-invoked-function: requires serializing unbounded program state. explain each of the reently added .call() type check error cases
  • explain in-memory-only persistence model -- explain benefits of not touching disk, explain risks when touching disk -- app suddenly has cross-host versioning and coherence issues etc etc
  • explain how to check for roundtrips due to importer (And/or generaly), and pad out debug logging to make them super obvious
  • explain cross-major-Python-version issues. tie into new bootstrap version check option
  • mention somewhere that enabling whitelisting enforces not reading those whitelisted packages from disk, ever, to avoid cross-host versioning hell (caused bugs in the ansible extension)
  • explain Select ownership, notifier chain, requirement for close() to be called to clean up after select. Add context manager protocol to select, add destructors that warn if receiver/select still owned at destruction time
  • mention various CTRL+C issues: purpose of ThreadWatcher, shutting down Pool, etc.
  • mention import preloading / ModuleResponder.forward_moduels(), file bugs/mention desire to integrate this with .call() so it happens automatically
  • mention the mitogen.core decorators and what they're useful for
  • mention mitogen.core.takes_router / parent.upgrade_router() and starting new connections from within a child without master's support, and how all that just works w.r.t. routing
  • mention aspects of disconnect handling/detection: Receiver(respondent=) is race-free, but listening to the Context 'disconnect' signal might be desirable for code doing fancier stuff
  • high level program models / tradeoffs:
    • the 'minimal helper' approach - Ansible extension
    • the 'whole replication' approach - Opsmop
  • aspects of topology aware code -- each component is both a client and a server. no good patterns for it yet, but they're coming.
    • unidirectional - PushFileService example
    • bidirectional - ModuleResponder example
  • show an example stack trace, explain it contains stack from both parent and child
  • explain lazy loading within mitogen: core is the bootstrap, parent comes if via= is used, service comes for SERVICE_CALL, connection method modules loaded as needed

Security Chapter

(from bug #389)

  • Library's assumptions
  • Serialization implications
  • Reasons some stuff is missing, and why it shouldn't be worked around
    • Lambdas possible, but each lambda creates a persistent new code execution vector exposed to untrusted code, too easy to shoot yourself in the foot. Possibly some future verbose wrapper type that allows callbacks, with annoyingly explicit opt-in
    • Custom types possible, restrictions may be relaxed in future, but for now assume things must remain strict because too easy to shoot yourself in the foot
    • Exception types discarded. Future syntax sugar to make this easier to work with, but problem is same as with custom types
  • Using another serialization to work around limitations of Mitogen serialization is a huge red flag. Might be sane in some cases (use same-host/UID example from linear2 branch)
  • Parent->Child unidirectional trust model
  • Treat and call_service() like the remote end is pure evil
  • Call may never return
  • Return value may be garbage (string where list expected, list where dict expected, ..)
  • All returned data is suspect just like calling an untrusted web service

@dw dw added the docs label Dec 13, 2018

@dw dw added the user-reported label Jan 22, 2019

@dw dw referenced this issue Jan 22, 2019


docs: add a security section #389

0 of 12 tasks complete

@dw dw added the security label Jan 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment