In [16]:
import numpy as np 
from functools import partial 
from concurrent.futures import ThreadPoolExecutor 
from queue import Queue
from threading import Thread
from threading import Lock
from multiprocessing.pool import Pool
from functools import partial 
from concurrent.futures import ThreadPoolExecutor 

### Homework 07: Concurrency

## Due Date: Apr 5, 2023, 11:59pm

#### Firstname Lastname: Buz Galbraith

#### E-mail: wbg231@nyu.edu

#### Enter your solutions and submit this notebook


---

**Problem 1** **(60 Points)**

Let us consider the Gamma function, or the Euler integral of the second kind: 

$$\Gamma(x) = \int_{0} ^ \infty t ^{x - 1} e^{-t} dt, $$

and in this HW we consider real $x > 0$.

(Here is more on the Gamma function https://en.wikipedia.org/wiki/Gamma_function .
It is not needed for this HW assignment.) 

**1.1 (Points 15)**: 

Write a function (in the cell below) that sequentially calculates the given Gamma integral.


In [2]:
def calculate_gamma(x, bound_1, bound_2, number_of_steps):
    # sequential version to calculate Gamma(x):
    # where we approximate the given integral,
    # like this a discrete sum in number_of_steps
    # equidistant points on the interval [bound_1, bound_2]
    # return Gamma(x)
    t=np.linspace(bound_1, bound_2, number_of_steps)
    term_1=np.exp(-1*t)
    term_2=np.multiply.reduce([t] * (x-1))
    dt=t[1]-t[0]
    out=np.sum(term_1*term_2)*dt
    return out



**1.2 (Points 5)** 

Evaluate, $\Gamma(6)$ by using `calculate_gamma(x, bound_1, bound_2, number_of_steps)` and the error of this computation.


As arguments, use `x=6, bound_1=0, bound_2=1000, number_of_steps=10_000_000`. We know that $\Gamma(x) = x!$, so $\Gamma(6) = 5! = 120$. 


In [3]:
x=6
bound_1=0 
bound_2=1000
number_of_steps=10_000_000

arpoxmiation=calculate_gamma(x, bound_1, bound_2, number_of_steps)
error=120-arpoxmiation
print("My function aproximates the given function call as {0}\n\
this results in an error of {1}".format(arpoxmiation, error))


My function aproximates the given function call as 120.00000000000006
this results in an error of -5.684341886080802e-14


---

Write two functions to calculate $\Gamma(x)$ by using:



**1.3.1 (Points 15)**
**threading** with N=4 threads; 



    

In [4]:
y=0
def calc_gamma_chunk(q,x,n_chunk_steps, lock):
    while True:
        bound_1_chunk, bound_2_chunk = q.get()
        t_chunk=np.linspace(bound_1_chunk, bound_2_chunk, n_chunk_steps)
        global y
        with lock:  # force synchronization
            dt=t_chunk[1]-t_chunk[0]
            term_1=np.exp(-1*t_chunk)
            term_2=np.multiply.reduce([t_chunk] * (x-1))
            y = y+np.sum(term_1*term_2)*dt
            q.task_done()
def multi_thread_gamma(x,bound_1, bound_2,number_of_steps,num_threads):
    chunks = [(i, i + 100) for i in range(bound_1, bound_2, 100)] ## defines so we can work across smaller ranges of length 100. 
    lock = Lock()
    q = Queue()
    for chunk in chunks:
        q.put(chunk)
    n_chunk_steps=number_of_steps//len(chunks)
    for i in range(num_threads):
        worker = Thread(target=calc_gamma_chunk, args=(q,x,n_chunk_steps, lock))
        worker.daemon=True # this stop the threads when the program quits  
        worker.start()         # start the threads
    q.join()
x=6
bound_1=0 
bound_2=1000
number_of_steps=10_000_000

multi_thread_gamma(x,bound_1, bound_2,number_of_steps,4)

y

120.00000000000003

**1.3.2 (Points 15)**
**multiprocessing** with N=4 processes. 



In [5]:
def calc_gamma_chunk_multiprocess(x,n_chunk_steps, bound_1_chunk, bound_2_chunk):
    t_chunk=np.linspace(bound_1_chunk, bound_2_chunk, n_chunk_steps)
    dt=t_chunk[1]-t_chunk[0]
    term_1=np.exp(-1*t_chunk)
    term_2=np.multiply.reduce([t_chunk] * (x-1))
    return np.sum(term_1*term_2)*dt
def multi_process_gamma(x, bound_1, bound_2, number_of_steps, number_processors):
    chunks = [(i, i + 100) for i in range(bound_1, bound_2, 100)]
    n_chunk_steps=number_of_steps//len(chunks)
    gamma = partial(calc_gamma_chunk_multiprocess,x, n_chunk_steps )
    with Pool(number_processors) as p:
        results=p.starmap(gamma,chunks)
    return np.sum(results)
x=6
bound_1=0 
bound_2=1000
number_of_steps=10_000_000

multi_process_gamma(x,bound_1, bound_2,number_of_steps,4)


120.00000000000003

**1.3.3 (Points 10)** 
Compare the times of the three versions and write a short explanation of what you are observing.


How does the answer change when N=8 and why?

In [6]:
%timeit -n 5 calculate_gamma(x, bound_1, bound_2, number_of_steps)

477 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)


In [7]:
%timeit -n 5 multi_thread_gamma(x,bound_1, bound_2,number_of_steps,4)

467 ms ± 13.2 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


In [8]:
%timeit -n 5 multi_process_gamma(x,bound_1, bound_2,number_of_steps,4)


263 ms ± 7.4 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


In [9]:
%timeit -n 5 multi_thread_gamma(x,bound_1, bound_2,number_of_steps,8)

473 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


In [10]:
%timeit -n 5 multi_process_gamma(x,bound_1, bound_2,number_of_steps,8)

434 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


as we can see from the above runs:
- the single core/thread version of the code seems to be the slowest. 
- the mutli thread version performs at about the same level as the first approach, and often has higher variance 
- the mitli processor approach is substantially faster than the other two. 
- further we an see increasing the number of threads to 8 increases the run time slightly
    - this is likely happening because this is a cpu bound task, that is we are only doing computation not any communication so breaking the program up over our cpu cores will yield some 
    benefit while as python GIL does not allow for actual parallelism adding threads will not really increase performance as the issue is computation time not communication time.  
- while increasing the processor core count to 8 greatly increases the run time 
    - this likely happens as my cpu does not seem to have 8 free cores, so trying to make the program run with that number of cores, will result in multiple workers running on the same cpu resources. this especially makes sense as we are dealing with a cpu bound task which is computationally expensive. 

---

**Problem 2 (40 points)**

__Website uptime__ is the time that a website or web service is available to the users over a given period.

The task is to build an application that checks the uptime of websites. 

- The application should go over a list of website URLs and checks if those websites are up.
- Instead of performing a classic HTTP GET request, it performs a HEAD request so that it does not affect traffic significantly.
- If the HTTP status is in the danger ranges (400+, 500+), a message is casted. 

Here are some useful functions:

In [22]:
#### _website uptimer_ ####

import time
import logging
import requests
 
class WebsiteDownException(Exception):
    pass
 
def ping_website(address, timeout=20):
    """
    Check if a website is down. A website is considered down 
    if either the status_code >= 400 or if the timeout expires
     
    Throw a WebsiteDownException if any of the website down conditions are met
    """
    try:
        response = requests.head(address, timeout=timeout)
        if response.status_code >= 400:
            logging.warning("Website %s returned status_code=%s" % (address, response.status_code))
            raise WebsiteDownException()
    except requests.exceptions.RequestException:
        logging.warning("Timeout expired for website %s" % address)
        raise WebsiteDownException()
         
def check_website(address):
    """
    Utility function: check if a website is down, if so, notify the user
    """
    try:
        ping_website(address)
    except WebsiteDownException:
        print('The websie ' + address + ' is down')

---

You need a website list to try our system out. Create your own list or use the following one. 

---

In [12]:
WEBSITE_LIST = [
    'http://amazon.co.uk',
    'http://amazon.com',
    'http://facebook.com',
    'http://google.com',
    'http://google.fr',
    'http://google.es',
    'http://google.co.uk',
    'http://gmail.com',
    'http://stackoverflow.com',
    'http://github.com',
    'http://heroku.com',
    'http://really-cool-available-domain.com',
    'http://djangoproject.com',
    'http://rubyonrails.org',
    'http://basecamp.com',
    'http://trello.com',
    'http://shopify.com',
    'http://another-really-interesting-domain.co',
    'http://airbnb.com',
    'http://instagram.com',
    'http://snapchat.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
    'http://live.com',
    'http://linkedin.com',
    'http://netflix.com',
    'http://wordpress.com',
    'http://bing.com',
]

---

A serial version of the _website uptimer_ can be written as: 

---


In [32]:
import time
 
start_time = time.time()
 
for address in WEBSITE_LIST:
    check_website(address)
         
end_time = time.time()        
 
print("Time for Serial: %ssecs" % (end_time - start_time))
def serial_check_website(website_list):
    for address in website_list:
        check_website(address)



The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down
Time for Serial: 4.164453744888306secs


You should build two versions of the **website uptimer**, by using:

**2.1 (Points 15)**
**threading** with N=4 threads; 




- here i wrote two functions too do this. one using a queue the other using and thread pool executor. They seem to be pretty equivalent especially since there is no risk of the function not being thread safe since we are not writing the output to anywhere. 

In [28]:
def threaded_check_websites_down(website_list, n_threads):
        with ThreadPoolExecutor(max_workers=n_threads) as ex:
                ex.map(check_website, website_list)
threaded_check_websites_down(WEBSITE_LIST, 4)



The websie http://really-cool-available-domain.com is down
The websie http://another-really-interesting-domain.co is down


In [29]:
def check_website_thread(q):
    while True:
        adress = q.get()
        check_website(adress)
        q.task_done()
def threaded_check_websites_down_2(website_list, num_threads):
    q = Queue()
    for address in website_list:
        q.put(address)
    for i in range(num_threads):
        worker = Thread(target=check_website_thread, args=(q,))
        worker.daemon=True # this stop the threads when the program quits  
        worker.start()         # start the threads
    q.join()

threaded_check_websites_down_2(WEBSITE_LIST, 4)



The websie http://really-cool-available-domain.com is down
The websie http://another-really-interesting-domain.co is down


**2.2 (Points 15)**
**multiprocessing** with N=4 processes. 



In [31]:
def check_website_multi_process(website_list, n_threads):
    with Pool(n_threads) as p: 
        p.map(check_website, website_list) 

check_website_multi_process(WEBSITE_LIST, 4)



The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down


**2.3 (Points 10)** 

Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?

- here i am just going to report the timeit magic in markdown since calling these functions many time results in a lot of messy print statements

In [None]:
%timeit -n 10 serial_check_website(WEBSITE_LIST)

2.79 s ± 96 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)

In [None]:
%timeit -n 10 threaded_check_websites_down(WEBSITE_LIST, 4)

1.06 s ± 71.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [None]:
%timeit -n 10 threaded_check_websites_down_2(WEBSITE_LIST, 4)

1.12 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [None]:
%timeit -n 10 check_website_multi_process(WEBSITE_LIST, 4)

1.24 s ± 59.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [None]:
%timeit -n 10 threaded_check_websites_down(WEBSITE_LIST, 8)

1 s ± 544 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [None]:
%timeit -n 10 threaded_check_websites_down_2(WEBSITE_LIST, 8)

1.26 s ± 498 ms per loop (mean ± std. dev. of 7 runs, 10 loops each

In [None]:
%timeit -n 10 check_website_multi_process(WEBSITE_LIST, 8)

1.08 s ± 24.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

as can be seen from the results above: 
- the serial version is the slowest of any version including those using 4 or 8 threads/processors 
- either of the threaded implementations work better than either the serial or multiprocessor approaches 
- the multiprocessor approach results in a slight speed up over the serial approach but is slower than the threaded approach. 
        - this is likely the case as this task is io bound so increasing the number of threads will increase the speed as we are sending more requests at the same time as opposed to waiting for the website to respond before moving on. in contrast, increasing the number of cpu's was not effective at reducing run time, as this is not a cpu intensive task and increasing the number of cpus will not speed up the io time. 
- increasing the number of processors or core count to 8 results in minor speed ups in most cases (there is high variance though so it is kind of hard to tell) 
    - it seems that increasing the number of cores (if there are more available ) should increase the speed as this is an io bound task, where a lot of the run time is dedicated to communicating with websites, alternatively if there are not enough cpu rearouses available than there could be a slow down as tasks are being subdivided beyond the point of increasing returns given the same number of the cpu rearouses. 
