Skip to content
This repository has been archived by the owner on Aug 10, 2024. It is now read-only.

Use processes and threads in Azure cloud plugins #138

Merged
merged 1 commit into from
May 21, 2019
Merged

Use processes and threads in Azure cloud plugins #138

merged 1 commit into from
May 21, 2019

Conversation

susam
Copy link
Member

@susam susam commented May 18, 2019

Both AzCloud and AzVM now rely on multiprocessing and multithreading
to pull resources concurrently from Azure cloud in order to complete its
scan faster.

All of the multiprocessing and multithreading code involved is put in a
new ioworkers module. AzCloud and AzVM invoke ioworkers.run() to
launch multiple worker processes each of which launch multiple worker
threads. The callers, i.e., AzCloud and AzVM, need to provide this
ioworkers.run() function two callback functions:

  • An input function which when invoked yields tuples where each tuple
    defines a unit of work.
  • An output function which works on each tuple yielded by the input
    function, does something with it, and then yields outputs.

The output functions are run concurrently by ioworkers in multiple
worker threads within multiple worker processes. This two-level
approach, i.e., worker threads within worker processes, has been chosen
to maximize utilization of CPUs. This approaches provides a good balance
between the following two extremes:

  • Using a large number of processes only would consume too many IPC
    resources and increase the overhead of context switching for
    processes.
  • Using a large number of threads only does not scale well in Python
    due to global interpreter lock (GIL) which allows only one thread to
    execute at a time.

Lauching multiple workers sidesteps GIL and lets us utilize multiple
CPUs. Launching multiple threads within each process lets us scale even
further without incurring the overhead of more processes.

Both `AzCloud` and `AzVM` now rely on multiprocessing and multithreading
to pull resources concurrently from Azure cloud in order to complete its
scan faster.

All of the multiprocessing and multithreading code involved is put in a
new `ioworkers` module. `AzCloud` and `AzVM` invoke `ioworkers.run()` to
launch multiple worker processes each of which launch multiple worker
threads. The callers, i.e., `AzCloud` and `AzVM`, need to provide this
`ioworkers.run()` function two callback functions:

  - An input function which when invoked yields tuples where each tuple
    defines a unit of work.
  - An output function which works on each tuple yielded by the input
    function, does something with it, and then yields outputs.

The output functions are run concurrently by `ioworkers` in multiple
worker threads within multiple worker processes. This two-level
approach, i.e., worker threads within worker processes, has been chosen
to maximize utilization of CPUs. This approaches provides a good balance
between the following two extremes:

  - Using a large number of processes only would consume too many IPC
    resources and increase the overhead of context switching for
    processes.
  - Using a large number of threads only does not scale well in Python
    due to global interpreter lock (GIL) which allows only one thread to
    execute at a time.

Lauching multiple workers sidesteps GIL and lets us utilize multiple
CPUs. Launching multiple threads within each process lets us scale even
further without incurring the overhead of more processes.
Copy link
Collaborator

@s-nirali s-nirali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a brief review of the logic. It looks good.

Copy link
Collaborator

@prateeknischal prateeknischal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very Nice commit 👍
One thing that can be added as a doc help is a sample plugin similar to mockcloud, that demonstrates how to use the ioworkers.py as a reference. Since, ioworkers.py is pretty abstract, it might not make sense to write it as a docstring within the file.

@susam susam force-pushed the ioworkers branch 2 times, most recently from 26683aa to e6ebf04 Compare May 21, 2019 03:07
@susam
Copy link
Member Author

susam commented May 21, 2019

@prateeknischal: Very Nice commit 👍 One thing that can be added as a doc help is a sample plugin similar to mockcloud, that demonstrates how to use the ioworkers.py as a reference. Since, ioworkers.py is pretty abstract, it might not make sense to write it as a docstring within the file.

Thank you. I will write the documentation for this in a separate pull request when I next make doc changes. By then, this new approach would have been used by us for sometime, so we would be fairly confident that this is a good approach and I can put the effort for documenting it.

@susam susam merged commit e6ebf04 into master May 21, 2019
@susam susam deleted the ioworkers branch June 13, 2019 11:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants