# The threading Module

## Getting Started

The threading module allows your program to run multiple operations at once. Threads work best with I/O operations, but if you have a CPU intensive task, you will want to use the multiprocessing module instead. The reason for this is that Python has the Global Interpreter Lock (GIL) that basically makes all threads run inside of one master thread. Because of this, when you go to run multiple CPU intensive operations with threads, you may find that it actually runs slower. So we will be focusing on what threads do best: I/O operations!

In [13]:
# Simple threading example
import threading


def doubler(number):
    """
    A function that can be used by a thread
    """
#     print(threading.currentThread().getName() + '\n')
    print(number * 2)


if __name__ == '__main__':
    for i in range(5):
        # Use threading.thread method where we pass our function we want to thread and the args
        my_thread = threading.Thread(target=doubler, args=(i,))
        my_thread.start()

0
2
4
6
8


In [17]:
# cd to demos folder to save logging in next example to
# You may not see this folder in the git repo because git does noot allow you to push folders with only logging files in them
!ls
%cd 28_threading_module_demos

[34m01_arg_demos[m[m                         16_super_built-in.ipynb
01_argparse.ipynb                    17_descriptors.ipynb
[34m02_collections_demos[m[m                 18_scope.ipynb
02_collections_module.ipynb          19_web_scraping.ipynb
[34m03_context_manager_demos[m[m             20_web_apis.ipynb
03_context_managers.ipynb            [34m21_ftp_demos[m[m
04_functools_module.ipynb            21_working_with_ftp.ipynb
05_imports.ipynb                     [34m22_urllib_demos[m[m
[34m06_import_lib_demos[m[m                  22_urllib_module.ipynb
06_importlib_module.ipynb            [34m23_doctest_demos[m[m
07_iterators_and_generators.ipynb    23_doctest_module.ipynb
08_itertools.ipynb                   [34m24_unittest_demos[m[m
09_regular_expressions.ipynb         24_unittest_module.ipynb
10_typing_module.ipynb               25_mock_module.ipynb
11_python_builtins.ipynb             26_coverage.py.ipynb
12_unicode.ipynb                     [34m26_coverage

In [5]:
# Using the logger to log our outputs
# logging module is thread safe!
import logging


def get_logger():
    logger = logging.getLogger("threading_example")
    logger.setLevel(logging.DEBUG)

    fh = logging.FileHandler("threading.log")
    fmt = '%(asctime)s - %(threadName)s - %(levelname)s - %(message)s'
    formatter = logging.Formatter(fmt)
    fh.setFormatter(formatter)

    logger.addHandler(fh)
    return logger


def doubler(number, logger):
    """
    A function that can be used by a thread
    """
    logger.debug('doubler function executing')
    result = number * 2
    logger.debug('doubler function ended with: {}'.format(
        result))


if __name__ == '__main__':
    logger = get_logger()
    thread_names = ['Mike', 'George', 'Wanda', 'Dingbat', 'Nina']
    for i in range(5):
        my_thread = threading.Thread(target=doubler, name=thread_names[i], args=(i,logger))
        my_thread.start()

In [6]:
# Changing the code above to create a Thread class instead of calling it directly
class MyThread(threading.Thread):

    def __init__(self, number, logger):
        threading.Thread.__init__(self)
        self.number = number
        self.logger = logger

    def run(self):
        """
        Run the thread
        """
        logger.debug('Calling doubler')
        doubler(self.number, self.logger)


def get_logger():
    logger = logging.getLogger("threading_example")
    logger.setLevel(logging.DEBUG)

    fh = logging.FileHandler("threading_class.log")
    fmt = '%(asctime)s - %(threadName)s - %(levelname)s - %(message)s'
    formatter = logging.Formatter(fmt)
    fh.setFormatter(formatter)

    logger.addHandler(fh)
    return logger


def doubler(number, logger):
    """
    A function that can be used by a thread
    """
    logger.debug('doubler function executing')
    result = number * 2
    logger.debug('doubler function ended with: {}'.format(
        result))


if __name__ == '__main__':
    logger = get_logger()
    thread_names = ['Mike', 'George', 'Wanda', 'Dingbat', 'Nina']
    for i in range(5):
        thread = MyThread(i, logger)
        thread.setName(thread_names[i])
        thread.start()

## Locks and Synchronization

Locks allow you to reserve certain resources for a specific thread so that you can avoid conflicts, if a thread tries to access resources that are locked the thread will pause until the lock is released.

In [7]:
# Example of code that needs a lock
# one thread may try to update the total before another is has completed the same task!

total = 0

def update_total(amount):
    """
    Updates the total by the given amount
    """
    global total
    total += amount
    print (total)

if __name__ == '__main__':
    for i in range(10):
        my_thread = threading.Thread(
            target=update_total, args=(5,))
        my_thread.start()

5
10
15
20
25
30
35
4045

50


In [8]:
# Using lock to prevent multiple threads from performing the same operation at the same time

total = 0
lock = threading.Lock()

def update_total(amount):
    """
    Updates the total by the given amount
    """
    global total
    # First aquire the lock
    lock.acquire()
    # try to update amount
    try:
        total += amount
    # Release lock whether or not update succeeds
    finally:
        lock.release()
    print(total)

if __name__ == '__main__':
    for i in range(10):
        my_thread = threading.Thread(target=update_total, args=(5,))
        my_thread.start()

5
10
15
20
25
3035

40
4550



In [9]:
# Using lock with a context manager instead of the syntax above
total = 0
lock = threading.Lock()

def update_total(amount):
    """
    Updates the total by the given amount
    """
    global total
    with lock:
        total += amount
    print (total)

if __name__ == '__main__':
    for i in range(10):
        my_thread = threading.Thread(
            target=update_total, args=(5,))
        my_thread.start()

5
10
1520
25
30

35
4045

50


In [10]:
# Multiple threads accessing multiple functions

total = 0
lock = threading.RLock()

def do_something():

    with lock:
        print('Lock acquired in the do_something function')
    print('Lock released in the do_something function')

    return "Done doing something"

def do_something_else():
    with lock:
        print('Lock acquired in the do_something_else function')
    print('Lock released in the do_something_else function')

    return "Finished something else"


def main():
    with lock:
        result_one = do_something()
        result_two = do_something_else()

    print (result_one)
    print (result_two)

if __name__ == '__main__':
    for i in range(1):
        my_thread = threading.Thread(target=main)
        my_thread.start()

Lock acquired in the do_something function
Lock released in the do_something function
Lock acquired in the do_something_else function
Lock released in the do_something_else function
Done doing something
Finished something else


## Timers

The threading module has a class Timer that you can use to represent an action that should take place after some specified period of time, they are started using the same start() method as the regular Thread class uses, you can even cancel a timer before it starts.

In [11]:
# Using the timer class
import subprocess

from threading import Timer

kill = lambda process: process.kill()
cmd = ['ping', 'www.google.com']
ping = subprocess.Popen(
    cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# We pass the time to wait, the function and its params in that order
my_timer = Timer(5, kill, [ping])

try:
    my_timer.start()
    stdout, stderr = ping.communicate()
finally:
    my_timer.cancel()

print(str(stdout))

b'PING www.google.com (142.250.176.196): 56 data bytes\n64 bytes from 142.250.176.196: icmp_seq=0 ttl=118 time=6.120 ms\n64 bytes from 142.250.176.196: icmp_seq=1 ttl=118 time=5.454 ms\n64 bytes from 142.250.176.196: icmp_seq=2 ttl=118 time=6.423 ms\n64 bytes from 142.250.176.196: icmp_seq=3 ttl=118 time=5.608 ms\n64 bytes from 142.250.176.196: icmp_seq=4 ttl=118 time=6.199 ms\n'


## Other Thread Components

The threading module has support for other items too like the following:
* A Semaphore (one of the oldest synchronization primitives in computer science) allows you to manage an internal counter that will be decremented whenever you call the acquire method on it and incremented when you call the release method. If you call when its zero, then it will block.
* An Event allows you to communicate between threads using signals
* The Barrier is a primitive that manages a thread pool where the threads have to wait for each other. To pass the barrier, the thread needs to call the wait() method which will block until all the threads have made the call. All threads will then be released at the same time.

## Thread Communication

You may have use cases where you need threads to communicate with eachother, you can create an Event for this as stated above. Although a more common method is to use a Queue. In the example below we will use both.

In [12]:
# Using to threads to complete a task at the same time
# We use the Event fopr the creator function to wait for all to complete
# THe consumer function takes the data passed from thecreator to the queue and doubles it

from queue import Queue


def creator(data, q):
    """
    Creates data to be consumed and waits for the consumer
    to finish processing
    """
    print('Creating data and putting it on the queue')
    for item in data:
        evt = threading.Event()
        q.put((item, evt))

        print('Waiting for data to be doubled')
        evt.wait()


def my_consumer(q):
    """
    Consumes some data and works on it

    In this case, all it does is double the input
    """
    while True:
        data, evt = q.get()
        print('data found to be processed: {}'.format(data))
        processed = data * 2
        print(processed)
        evt.set()
        q.task_done()


if __name__ == '__main__':
    q = Queue()
    data = [5, 10, 13, -1]
    thread_one = threading.Thread(target=creator, args=(data, q))
    thread_two = threading.Thread(target=my_consumer, args=(q,))
    thread_one.start()
    thread_two.start()

    q.join()

Creating data and putting it on the queue
Waiting for data to be doubled
data found to be processed: 5
10
Waiting for data to be doubleddata found to be processed: 10
20

Waiting for data to be doubleddata found to be processed: 13
26

Waiting for data to be doubled
data found to be processed: -1
-2
