<center> <h1> Heterogeneous Computing for AI </h1> </center>

<center> <h2> Lecture 02 -: Hands-on Exercise</h2> </center>

<center> <h4> Raghava Mukkamala (rrm.digi@cbs.dk)</h4> </center>

Instructions

Please use Python 3 for working on the following questions.




## Exercise 01:  Simple Python Program 

### Write a simple python script that reads the data from 'numbers.txt' file and sums them up.

Please note that numbers.txt file is available in the same folder.

### Your Solution

In [1]:
import numpy as np

numbers = np.loadtxt("numbers.txt")

numbers

array([ 178.,  567.,   23.,  178., 9000.])

## Exercise 02:  Analyse the follwing Scenerio

#### If you had to read 100 files in your local storage like 'numbers.txt'. Would you opt to use threads to speed up the reading? Why/why not?

### Discuss your answer

In [2]:
"""
As you need to wait for I/O operations when reading the 100 files, threading will help in increasing the speed to the reading.
The reading it self will be the bottelneck and doing multiple reading will speed up the process.
"""

'\nAs you need to wait for I/O operations when reading the 100 files, threading will help in increasing the speed to the reading.\nThe reading it self will be the bottelneck and doing multiple reading will speed up the process.\n'

## Exercise 03:  Multi-thread Downloader

Let's take a look at an I/O intensive operation as follows:-

Please look at the downloads.py file provided for these exercises.
Using the concurrent.futures library, create a multi-threaded version of the web page downlaods.
Report the speedup provided by the multi-threaded version.

### Your Solution

In [3]:
links = ["https://twitter.com", "https://facebook.com", "https://linkedin.com"]

for link in links:
    print(link)

https://twitter.com
https://facebook.com
https://linkedin.com


In [4]:
import requests
import time
from threading import Thread

def download_site(url):
    with requests.get(url) as response:
        return len(response.content)
    
def download_all_sites(urls):
    for url in urls:
        print("For URL: ", url, download_site(url))

start_time = time.perf_counter()

#create and start 10 threads
threads = []
links = ["https://twitter.com", "https://facebook.com", "https://linkedin.com"]

for n in range(1, 11):
    
    t = Thread(target=download_all_sites, args=(links,))

    threads.append(t)

    print(f"Starting the {n}th with Urls= {links[0]}, {links[1]} and {links[2]}")
    t.start()
        
# Waiting for the threads to complete
for t in threads:
    
    t.join()
    print(f"closing {t}")
    
end_time = time.perf_counter()

print(f"It took {end_time - start_time: 0.2f} second(s) to complete!")

Starting the 1th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 2th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 3th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 4th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 5th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 6th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 7th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 8th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 9th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
Starting the 10th with Urls= https://twitter.com, https://facebook.com and https://linkedin.com
For URL:  https://twitter.com 125432
For URL:  ht

<p>Creating 10 seperate threads improved the speed of execution from roughly 16sec to 2-3sec</p>

## Exercise 04:  Multi-thread Range Counter

This exercise is related to the file range_counter.py (found in the same folder) 

Using the concurrent.futures library, create a multi-threaded version of applying the range_counter function.

That is, apply the range_counter function to the data by utilising threads. 

Comment on the performance of the multi-threaded version.

### Your Solution

In [5]:
from threading import Thread
import time
from typing import List
import numpy as np
from concurrent.futures import ProcessPoolExecutor 
import concurrent.futures

In [10]:
def range_counter(row: List[int], min: int = 5, max: int = 10) -> int:
    """
    Returns the number of values in the row that fall between the given range
        Args:
        i.   row: List of numbers
        ii.  min: minimum values of range
        iii. max: maximum values of range
        
        Returns: a count(int) of values that fall in the range
    """
    count = 0
    for val in row:
        if min <= val <= max:
            count += 1
        return count
    
def apply_range_counter_concurrently(lists, min=5, max = 10):
    
    processes = []
    result = []
        
    with ProcessPoolExecutor() as executor:
        for n in lists:
            process = executor.submit(range_counter, n, min, max)
            
            processes.append(process)
        
        for p in processes:
            result.append(p.result())
    
    
    return result
    """
    def apply_range_counter_concurrently(data: List[List[int]],
                                     min: int,
                                     max: int) -> List[int]:
    """
    #This function takes data and applies the range_counter function 
    #over all the rows in the data
    """
    result = []
    with future.ProcessPoolExecutor() as executor:
        
        futures = {executor.submit(apply_range_counter_concurrently, 
                                         data, 5, 10): row in data}
        
        for fut in future.as_completed(futures):
            
            count = future_to_row[fut]
            
            try: 
                result.append(row.result())
            except Exception as exc:
                print("%r generated an exception: %s" % (count, exc))
            
    return result"""

In [11]:
if __name__ == "__main__":
    #Provide a seed to get the same "random" values each time
    np.random.seed(0)
    
    #create a matrix with dimensions 200x5 (200rows and 5 columns)
    arr = np.random.randint(0, 10, size=[200, 5])
    
    #convert into a List of lists
    data = arr.tolist()
    
    print(data[0:10])
    
    #timing the concurrent solution
    start_concurrent = time.perf_counter()
    
    conc_result = apply_range_counter_concurrently(data, 5, 10)
    
    end_concurrent = time.perf_counter()
    
    print(f"Finished concurrent computation in {end_concurrent-start_concurrent} second(s)")
    print(f"First 10 results of the concurrent result: {conc_result[:10]}")
    
print("done!")

[[5, 0, 3, 3, 7], [9, 3, 5, 2, 4], [7, 6, 8, 8, 1], [6, 7, 7, 8, 1], [5, 9, 8, 9, 4], [3, 0, 3, 5, 0], [2, 3, 8, 1, 3], [3, 3, 7, 0, 1], [9, 9, 0, 4, 7], [3, 2, 7, 2, 0]]


BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore