### Agenda

1. Unconcurrent Tasks vs Threading vs Parallel Processing
2. Types of Processes in a program
3. Threading in Python
4. Map Functions
5. Examples of Threading 
6. Disadvantages of Threading

#### Ways of Processing in a Program
1. For Loops -        Sequential 
2. Threading -        Sequential + Concurrent
3. Multi Processing - Concurrent + Parallel

#### Types of Processes
1. I/O Bound - File Upload and Download, Calling an API/Scraping, Print Operations
2. CPU Bound - Mathematical Processing, String Manupulations, Matrix Operations, DataFrame Operations

In [2]:
# Loading Important libraries

import numpy as np
import pandas as pd

# libraries to execute threading
import concurrent
from concurrent.futures import ThreadPoolExecutor
import threading

# Additional libraries
import requests
import time
from itertools import chain

#### Threading in Python

##### Lets Print 'Hi', 'Hello' and 'Hey' 3 times each using For Loop and then apply concurrency using Threading

In [29]:
# Using for loop


def print_func(string):
    for i in range(3):
        time.sleep(1)
        print(string," ")

l = ['Hi','Hello','Hey']
for s in l:
    print_func(s)

    
        

Hi  
Hi  
Hi  
Hello  
Hello  
Hello  
Hey  
Hey  
Hey  


In [5]:
# Using ThreadPoolExecutor

def print_func(string):
    for i in range(3):
        time.sleep(1)
        print(string)
        print(threading.current_thread().name)

# Using Threading
start_time  = time.time()
l = ['Hi','Hello','Hey']
with ThreadPoolExecutor() as executor:
    for i in range(1000):
        executor.submit(print_func,l[i])


        
   

        
    
        


Hi
ThreadPoolExecutor-2_0
Hello
ThreadPoolExecutor-2_1
Hey
ThreadPoolExecutor-2_2
Hi
ThreadPoolExecutor-2_0
Hello
ThreadPoolExecutor-2_1
Hey
ThreadPoolExecutor-2_2
HiHello
ThreadPoolExecutor-2_0

HeyThreadPoolExecutor-2_1
ThreadPoolExecutor-2_2



#### Map Functions in Python

##### Create another list with square of original list 

In [32]:
l = [1,2,3]

#Using For Loops
l_sqr = []
for num in l:
    l_sqr.append(num*num)
print(l_sqr)
    
# Map Function

l_sqr = list(map(lambda x: x**2,l))
print(l_sqr)
    

# Map functions are highly optimised for CPU bound operations    

[1, 4, 9]
[1, 4, 9]


#### I/O Bound Use Cases of Threading

#### Making an API call/Scraping
1. Majority time is spend in waiting for the calls from the servers, CPU does not play any role here
2. Only role of CPU is to make a call and collect data and then process it
3. So threading can be really effective here to make multiple calls at once


In [33]:
# Create a list of 80 urls that we are going to scrap
sites = ["https://www.jython.org","http://olympus.realpython.org/dice"] * 40
print(len(sites))

80


In [36]:
# How will the data look like after we make a call
requests.get("https://www.jython.org").content[0:100]

# Html of the page we requested, can be processed further using beautiful soup library

b'<!DOCTYPE html>\n<html lang="en-US">\n\n  <head>\n    <meta charset=\'utf-8\'>\n    <meta http-equiv="X-UA-'

#### Lets Compare time taken by For-Loop, Map and Threading 

In [37]:
# Using For Loop
start_time  = time.time()
content_list = []
for site in sites:
    content_list.append(requests.get(site).content)
end_time = time.time() - start_time
print(end_time)

40.435543060302734


In [39]:
# Using Map Function
start_time  = time.time()
content_list = list(map(lambda x: requests.get(x).content,sites))
end_time = time.time() - start_time
print(end_time)



40.380831241607666


In [40]:
# Using Threading
start_time  = time.time()
def get_request_func(site_list,i):
    return list(map(lambda x: requests.get(x).content,site_list))

thread = []
result = []
with ThreadPoolExecutor(max_workers=4) as executor:
    for i in range(4):
        thread.append(executor.submit(get_request_func,sites[i*20:(i+1)*20],i))
    for i in range(4):
        result.append(thread[i].result())

        
content_list =list(chain.from_iterable(result))   
end_time = time.time() - start_time
print(end_time)
        
    
## thread.result will collect the output returned
#Input  - [[list1],[list2],[list3],[list4]] - 20 elements each
#Output  - [[list1],[list2],[list3],[list4]] - 20 elements each

# using chain.from_iterable
# Output = [80 elements] - ID list

10.053593158721924


#### Writing file to Disk
1. Interaction with Disk so its I/O Bound
2. Threading can really help here

In [3]:
df = pd.read_csv('Threading1.csv')

In [5]:
# Use for loop
start_time = time.time()
for i in range(50):
    df.to_csv('Thread'+str(i)+'.csv')
end_time = time.time() - start_time
print(end_time)

0.329803466796875


In [10]:
# Use Threading

start_time = time.time()
def write_file(df,mul):
    for i in range(0*mul,25*mul,1):
        df.to_csv('Thread'+str(i)+'.csv')

with ThreadPoolExecutor(max_workers=2) as executor:
    for i in range(2):
        executor.submit(write_file,df,i)
        
print(time.time() - start_time)


   



0.15187382698059082


#### Disadvantages of Threading
#### Cases we should not use Threading and Use map functions if possible
#### CPU bound Operation, Extensive Processing and Very less or No I/O Operations

#### Consider a huge list and lets find the square of that list

In [21]:
# List of Numbers
numbers = [5_000_000 + x for x in range(1000000)]


start_time = time.time()
sqr_list = []
for x in numbers:
    sqr_list.append(x*x)
    
end_time = time.time() - start_time
print(end_time)

0.2712705135345459


In [22]:
start_time = time.time()
sqr_list = map(lambda x:x*x,numbers)
    
end_time = time.time() - start_time
print(end_time)

0.02393651008605957


In [27]:
# Using Threading


def get_sqr(num):
    return list(map(lambda x:x*x,num))

list_of_list = []
n = int(1000000/4)
for i in range(4):
    list_of_list.append(numbers[i*n:(i+1)*n])
    

thread = []
result = []
start_time  = time.time()
with ThreadPoolExecutor(max_workers=4) as executor:
    for i in range(4):
        thread.append(executor.submit(get_sqr,list_of_list[i]))
    for i in range(4):
        result.append(thread[i].result())

        
content_list =list(chain.from_iterable(result))   
end_time = time.time() - start_time
print(end_time)
        
    
        

0.268221378326416


In [None]:
# References and Additonal Readings
#https://pymotw.com/3/concurrent.futures/
#https://realpython.com/python-concurrency/

In [6]:
1.0 + 2.0 == 3.0

True