# Multiprocessing and Multithreading
Each program in the operational system is a separate process. Each process has one or more threads. If a process has several threads, they appear to run simultaneously. The above approaches are required to achieve ***concurrency*** and ***parallelism*** in python.

Two approaches can be used to spread the workload in programs:

* **Multiple processes** : Multiple processes have separate regions of memory and can only communicate by special mechanisms. The processor loads and saves a separate set of registers for each thread. It is inconvenient for communication and data sharing. This is handled by the *subprocess* module.

<img align="center" src="./images/multiprocessing.png" alt="multiprocessing" width="800" height="800" />

* **Multiple threads** : Multiple threads in a single process have access to the same memory. They communicate simply by sharing data, providing ensure of one thread at time, handled by the threading module. Threads share the process’s resources, including the heap space. But each thread still has it own stack. Threads are lighter than processes.

<img align="center" src="./images/multithreading.png" alt="multithreading" width="800" height="800" />

**Note:** Although python has multithreading but it only executes one thread at a time. Threads share the same memory and process resources so they also share global variables which may cause a problem if the global variable is being edited during process. <br>
***GIL*** (Global Interpretor Lock) makes sure that this don't happen. GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can't effectively make use of multiple cores.

**Summary** : If your code is IO bound, both multiprocessing and multithreading in Python will work for you.<br> *Multithreading* can speed up process for network based operations(like web scraping - downloading multiple files). <br> *Multiprocessing* is a easier to just drop in than threading but has a higher memory overhead. If your code is CPU bound(writing lots of to hard disk, copying files etc.), multiprocessing is most likely going to be the better choice—especially if the target machine has multiple cores or CPUs.

### Subprocess module
It is used to create a pair of parent-child programs. The parent program is started by the user and this in turn runs instances of the child program, each with different work to do. Using child processing allow us to take maximum advantage of multicore processor and leaves concurrency issues to be handled by the operational system.

In [1]:
import subprocess
subprocess.call("exit 1",shell=True)

1

### Threading module
The major problem with threading is deadlocks which occur when we have to share data among multiple threads.
Threading module has *Executor* abstract class which has two subclasses:
* ThreadPoolExecutor - for multithreading
* ProcessPoolExecutor - for multiprocessing

In [2]:
# threadpool executor
from concurrent.futures import ThreadPoolExecutor
from time import sleep

# waits for 3 second to deliver the message
def return_message(message):
    sleep(3)
    return message

pool = ThreadPoolExecutor(3)
futures = pool.submit(return_message,('hello world!'))

# this returns in false as the task hasn't been completed
print(futures.done())
sleep(2)
print(futures.done())   
sleep(2)
print(futures.done()) # after completing it returns true
print(futures.result())

False
False
True
hello world!


In [3]:
# processpool executor
from concurrent.futures import ProcessPoolExecutor
from time import sleep

# waits for 3 second to deliver the message
def return_message(message):
    sleep(3)
    return message

pool = ProcessPoolExecutor(3)
futures = pool.submit(return_message,('hello world!'))

# this returns in false as the task hasn't been completed
print(futures.done())
sleep(4)
print(futures.done())    # after completing it returns true
print(futures.result())

False
True
hello world!


In [4]:
#Executor.map() - returns things in order as they are passed
#.as_completed() - returns things as soon as they are completed
import urllib
import concurrent.futures

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# defining a function which hits url and returns first page
def load_url(url,timeout):
    with urllib.urlopen(url) as conn:
        return conn.read()

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    future_to_url = {executor.submit(load_url,url,60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as error:
            print('Exception returned as %s'%error)
        else:
            print('%r page is %d bytes',(url,len(data)))

Exception returned as addinfourl instance has no attribute '__exit__'
Exception returned as addinfourl instance has no attribute '__exit__'
Exception returned as addinfourl instance has no attribute '__exit__'
Exception returned as addinfourl instance has no attribute '__exit__'
Exception returned as addinfourl instance has no attribute '__exit__'


In [2]:
import concurrent.futures
import math
import time
PRIMES = [i for i in range(112272535095293,112272535095393)]
 
def is_prime(n):
    if n % 2 == 0:
        return False
 
    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    start = time.time()
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number,prime in zip(PRIMES,executor.map(is_prime,PRIMES,chunksize=1)):
           pass
    end = time.time()
    print('total time: ',end-start)
if __name__=='__main__':
    main()


('total time: ', 3.703892946243286)


In [3]:
from concurrent.futures import ThreadPoolExecutor, wait, as_completed
from time import sleep
from random import randint

def return_after_5_secs(num):
    sleep(randint(1, 5))
    return "Return of {}".format(num)

pool = ThreadPoolExecutor(5)
futures = []
for x in range(5):
    futures.append(pool.submit(return_after_5_secs, x))

for x in as_completed(futures):
    print(x.result())

Return of 4
Return of 0
Return of 1
Return of 2
Return of 3


In [4]:
# wait returns a tuple (set of done, set of not done) futures
from concurrent.futures import ThreadPoolExecutor, wait, as_completed
from time import sleep
from random import randint
 
def return_after_5_secs(num):
    sleep(randint(1, 5))
    return "Return of {}".format(num)
 
pool = ThreadPoolExecutor(5)
futures = []
for x in range(5):
    futures.append(pool.submit(return_after_5_secs, x))
 
print(wait(futures))

DoneAndNotDoneFutures(done=set([<Future at 0x7fd29c04dc10 state=finished returned str>, <Future at 0x7fd29c0dcc50 state=finished returned str>, <Future at 0x7fd29c04df10 state=finished returned str>, <Future at 0x7fd2a321b650 state=finished returned str>, <Future at 0x7fd29c0763d0 state=finished returned str>]), not_done=set([]))


### Virtual environments
___

#### 1. Virtualenv 
it creates virtual environment within the current folder
commands executed on bash terminal
* **virtualenv virtual_environment_1**
* **source virtual_environment_1/bin/activate**
* **deactivate**

#### 2. Virtualenvwrapper 
it places all of your virtual environments in one place
* **export WORKON_HOME=\$HOME/.virtualenvs**
* **export PROJECT_HOME=\$HOME/developer**
* **source ~/.local/bin/virtualenvwrapper.sh**
* **mkvirtualenv test1**    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # create virtual environment
* **workon virtual_env_1**  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  # enter into virtual environment
* **rm virtual_env_1**  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  # remove virtual environment

* **lsvirtualenvList** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   # list all virtual environments
* **cdvirtualenv** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   # enter into virtual environment directory
* **cdsitepackages** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   # enter into site-packages directory of virtual environment
* **lssitepackages** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   # list all site-packages directory of virtual environment

### Debugging
___

#### pdb - python debugger
It helps in debugging in bash terminal interactively
* **import pdb**
* **pdb.set_trace()**

You can also use without importing as well via bash command line terminal
* **python -m pdb sample_program.py**

### Profiling
___
Tips: for improving memory performance 
1. Use tuples rather than lists for storing read-only data
2. Use ***generators*** rather than lists or tuples for iterating over data
3. Rather than using + for concatenation use append in lists for strings.

In [7]:
# cProfile package - for finding out which function or module is taking a lot of time
import cProfile

def main():
    a = 3
    b = 5000000
    c = [i**2 for i in range(a,b)]
    d = [i**3 for i in range(a,b)]

cProfile.run("main()")

         5 function calls in 3.511 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    3.314    3.314    3.433    3.433 <ipython-input-7-bc57f0188498>:4(main)
        1    0.078    0.078    3.511    3.511 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.119    0.060    0.119    0.060 {range}




for doing via commandline 
* **python -m cProfile sample_script.py**

In [1]:
# timeit package - timing small pieces of code
import timeit
timeit.timeit("x=2+2")

0.017785072326660156

In [2]:
timeit.timeit("sum(i for i in range(10))")

0.8751599788665771

for doing via command line
* **python -m timeit -s 'import mymodule as m' 'm.myfunction()'**

In [12]:
import time
def sumOfN2(n):
    '''
    a simple example of how to time a function
    '''
    start = time.time()
    theSum = 0
    for i in range(1,n+1):
        theSum = theSum + i
    end = time.time()
    return theSum,end-start

if __name__ =='__main__':
    n = 5
    print("Sum is %d and required %10.7f seconds"%sumOfN2(n))
    n = 200
    print("Sum is %d and required %10.7f seconds"%sumOfN2(n))

Sum is 15 and required  0.0000021 seconds
Sum is 20100 and required  0.0000150 seconds


### Unit testing
Unit testing is a good way of testing your code snippets. It ensures that the functions, classes or modules behave in exact way as they are designed. 
<br>
Python provides two modules:
* **doctest**
* **unittest**
<br><br>
There are third party modules or libraries as well: **nose** and **py.test**
***

In [18]:
import doctest

def testing_function():
    a = 2
    b = 3
    print(a+b)

if __name__=='__main__':
    doctest.testmod()

You can also test by importing doctest module in '__main__' function call.
* **python sample_script.py -v**

You can create a separate test program using *unittest* module
<br><br>
**import doctest<br>
import unittest<br>
import module_to_be_tested<br>
suite = unittest.testsuite()<br>
suite.addtest(doctest.doctestsuite(module_to_be_tested)<br>
runner = unittest.testtestrunner()<br>
print(runner.run(suite))**<br>
<br>

using pytest
<br><br>
To check whether pytest was successfully installed
* **python -m pytest**
<br>

In case of more than one test:
* **py.test -q test_class.py**
<br>

$$ The\ End $$