# **Chapter 4: Using the threading and concurrent.futures Modules**

This chapter covers the following topics:
* Defining threads
* Choosing between threading and _thread
* Using threading to obtain the Fibonacci series term for multiple inputs
* Crawling the Web using the concurrent.futures module

# Defining threads# 

![TEST](http://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Multithreaded_process.svg/450px-Multithreaded_process.svg.png)

"세포랑 비슷하네여"
![cell](http://www.phschool.com/science/biology_place/biocoach/images/transcription/euovrvw.gif)
"Process가 여러개라면 세포가 여러개"

"Thread 가 여러개라면 전사체가 여러개"

"Threads belong to the same process and share the same memory space. Hence, the developer's task is to control and access these areas of memory." 

"Theads는 하나의 세포 안에서 자원을 공유하기 때문에 개발자는 잘 분배 해야함"

# Advantages and disadvantages of using threads
**The advantages of using threads are as follows:**
* The speed of communication of the threads in the same process, data
location, and shared information is fast
* Thread 간의 정보 공유가 빠르다. 왜냐하면 같은 세포안에 있기 때문에
* The creation of threads is less costly than the creation of a process,
as it is not necessary to copy all the information contained in the context
of the main process
* Thread 를 생성하는것이 세포 전체를 생성하는 것보다 자원이 덜든다. 
* Making the best use of data locality by optimizing memory access through
the processor cache memory
* 메모리 최적화에 유리하다

**The disadvantages of using threads are as follows:**
* Data sharing allows swift communication. However, it also allows the
introduction of difficult-to-solve errors by inexperienced developers.
* 경험 없는 사람은 에러잡기가 힘들다. 
* Data sharing limits the flexibility of the solution. Migrating to a distributed
architecture, for instance, may cause a real headache. In general, they limit
the scalability of algorithms.
* 어렵다


#Understanding different kinds of threads
* There are two types of threads, kernel and user.
* kernel / user라는 두가지 종류의 Threads 가 있다. 
* The kernel threads are the threads that are created and managed by the operating system.
* kernel이란 것은 오퍼레이팅 시스템에 의해 만들어지고 관리 되는 애 
* For the user threads, these states are controlled by the package developer.
* User threads라는 것은 개발자에 의해 관리되는 애

** The advantages of the kernel threads are as follows:**
* One kernel thread is referenced to one process. So if a kernel thread blocks, others can still run.
* 하나의 세포안에는 하나의 kernel thread 만이 참조된다. kernel하나가 막혀도 다른건 계속 일한다.다른 세포의 커널스레드가 돈다는 이야기인듯
* The kernel threads can run on different CPUs.
* kernel threads 는 다른 CPU에서도 돌 수 있다.
![](http://upload.wikimedia.org/wikipedia/commons/thumb/8/8f/Kernel_Layout.svg/382px-Kernel_Layout.svg.png)

** The disadvantages of the kernel threads are as follows: **
* The creation and synchronization routines are too expensive
* 생성 및 동기화(?) 루틴의 비용이 높다. (비용이 높다는 것은??)
* The implementation is platform dependent
* 플렛폼에 따라서 적용여부가 달라진다

**The advantages of the user threads are as follows:**
* The user thread has low cost for creation and synchronization
* 생성 및 동기화의 비용이 낮다.
* The user thread is platform independent
* 플렛폼 독립적이다.

**The disadvantages of the user threads are as follows:**
* All the user threads inside a process are related to only one kernel thread. So, if one user thread blocks, all the other user threads can't run.
* 모든 유저 스레드는 하나의 커널 스레드에 연관되어 있다. 그래서 유저스레드 하나가 블럭되면 다른것도 못돈다. 
* The user threads can't run on different CPUs.
* 유저 스레드는 다른 CPU에서 못돈다.

#Defining the states of a thread
* Creation: This is the main process that creates a thread, and after its creation,
it is sent to a line of threads ready for execution
* 생성 : 스레드를 생성하는 과정 이후 실행 라인으로 넘어간다. 
* Execution: At this stage, the thread makes use of the CPU
* 실행 : CPU의 사용을 할당한다.
* Ready: At this stage, the thread is in a line of threads ready for execution
and bound to be executed
* 준비 : 실행 준비
* Blocked: At this stage, the thread is blocked to wait for an I/O operation
to happen, for example, and it does not make use of the CPU at this stage
* 차단 : 이 단계에서는 CPU를 사용하지 않고 I/O를 기다린다.
* Concluded: At this stage, free resources are to be used in an execution and
end the life span of the thread
* 결과 : 이단계에서는 남는 리소스는 다른 실행에 사용됨. 스레드의 끝

# Choosing between threading and _thread
 * the _thread module (http://docs.python.org/3.3/library/_thread.html) -> 프로용
 * threading module (http://docs.python.org/3.3/library/threading.html) -> 초보용 (더편함).

#Using threading to obtain the Fibonacci series term with multiple inputs
* The mission is to parallelize the execution of the terms
of the Fibonacci series when multiple input values are given.
* 주어진 복수의 값으로 피보나치 수열을 병렬화를 통해 구해보자

* First, a list will store the four values to be calculated and the values will be
sent into a structure that allows synchronized access of threads.
* 리스트에 4개의 값을 넣는다. 각 값은 스레드에 동시에 들어가게 한다. 
2. After the values are sent to the synchronized structure, the threads that
calculate the Fibonacci series need to be advised that the values are ready
to be processed. For this, we will use a thread synchronization mechanism
called Condition. (The Condition mechanism is one of the Python objects
that offer data access synchronization mechanisms shared among threads;
you can learn more at http://docs.python.org/3/library/threading.
html#threading.Condition.)
* 각 값이 동시에 스레드에 들어간뒤에는 각 스레드에 해당 값들을 처리해도 되는지 알려줘야한다. 이를 위해 Condition이라는 방법을 사용할거다. 
3. After each thread finishes their Fibonacci series calculation, the results will
be saved in a dictionary.
* 각 계산이 끝나면 딕셔너리에 저장!

In [2]:
import threading
from Queue import Queue
 
fibo_dict = {}
shared_queue = Queue() # 할일 바구니
input_list = [3, 10, 5, 7]
 
queue_condition = threading.Condition()
 
 
def fibonacci_task(condition):
    with condition: # Without the with statement, we would have to explicitly acquire the lock and release it acquiring the lock
 
        while shared_queue.empty(): #바구니에 일없으면
            print("[{}] - waiting for elements in queue..".format(threading.current_thread().name))
            condition.wait() # 기다려
 
        else: # 기다려 풀어지면 
            value = shared_queue.get() # 할일 바구니에서 하나를 꺼내서 
            a, b = 0, 1
            for item in range(value):
                a, b = b, a + b
                fibo_dict[value] = a
 
        shared_queue.task_done() # 일 다했다고 라벨 
        print("[{}] fibonacci of key [{}] with result [{}]".
              format(threading.current_thread().name, value, fibo_dict[value]))
 
 
def queue_task(condition):
    print('Starting queue_task...')
    with condition: #기다려 상태이면서 
        for item in input_list:
            shared_queue.put(item)
 
        print("Notifying fibonacci task threads that the queue is ready to consume...")
        condition.notifyAll() # 할일 바구니가 준비가 되었다고 알려주는 역할, 기다려 풀어
 
 
threads = []
for i in range(4):
    thread = threading.Thread(target=fibonacci_task, args=(queue_condition,))
    thread.daemon = True # 이게 뭐지? 
    threads.append(thread)
 
[thread.start() for thread in threads] # 4개의 threads가 시작됨. 
 
prod = threading.Thread(name='queue_task_thread', target=queue_task, args=(queue_condition,))
prod.daemon = True
prod.start()
 
[thread.join() for thread in threads]
 
print("[{}] - Result {}".format(threading.current_thread().name, fibo_dict))

[Thread-7] - waiting for elements in queue..
[Thread-8] - waiting for elements in queue..
[Thread-9] - waiting for elements in queue..
[Thread-10] - waiting for elements in queue..
Starting queue_task...
Notifying fibonacci task threads that the queue is ready to consume...
[Thread-9] fibonacci of key [3] with result [2]
[Thread-10] fibonacci of key [10] with result [55]
[Thread-7] fibonacci of key [5] with result [5]
[Thread-8] fibonacci of key [7] with result [13]
[MainThread] - Result {10: 55, 3: 2, 5: 5, 7: 13}


# Crawling the Web using the concurrent.futures module

In [42]:

import concurrent.futures
import re

#html_link_regex = \
#re.compile('<a\s(?:.*?\s)*?href=[\'"](.*?)[\'"].*?>')
urls = Queue() # 할일 바구니
urls.put('http://www.google.com')
urls.put('http://br.bing.com/')
urls.put('https://duckduckgo.com/')
urls.put('https://github.com/')
urls.put('http://br.search.yahoo.com/')
result_dict = {}

In [36]:
def group_urls_task(urls):
    try:
        url = urls.get(True, 0.05)
        result_dict[url] = None
        logger.info("[%s] putting url [%s] in dictionary..." % (threading.current_thread().name, url))
    except queue.Empty: # 일없으면 
        logging.error('Nothing to be done, queue is empty')
        

In [37]:
def crawl_task(url):
    links = []
    try:
        request_data = requests.get(url)
        logger.info("[%s] crawling url [%s] ..." % (threading.current_thread().name, url))
        links = html_link_regex.findall(request_data.text)
    except:
        logger.error(sys.exc_info()[0])
        raise
    finally:
        return (url, links)

In [38]:
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as group_link_threads:
    for i in range(urls.qsize()):
        group_link_threads.submit(group_urls_task, urls)

INFO:root:[Thread-23] putting url [http://www.google.com] in dictionary...
2015-03-19 18:15:52,308 - [Thread-23] putting url [http://www.google.com] in dictionary...
2015-03-19 18:15:52,308 - [Thread-23] putting url [http://www.google.com] in dictionary...
2015-03-19 18:15:52,308 - [Thread-23] putting url [http://www.google.com] in dictionary...
INFO:root:[Thread-23] putting url [http://br.bing.com/] in dictionary...
2015-03-19 18:15:52,309 - [Thread-23] putting url [http://br.bing.com/] in dictionary...
2015-03-19 18:15:52,309 - [Thread-23] putting url [http://br.bing.com/] in dictionary...
2015-03-19 18:15:52,309 - [Thread-23] putting url [http://br.bing.com/] in dictionary...
INFO:root:[Thread-23] putting url [https://duckduckgo.com/] in dictionary...
2015-03-19 18:15:52,310 - [Thread-23] putting url [https://duckduckgo.com/] in dictionary...
2015-03-19 18:15:52,310 - [Thread-23] putting url [https://duckduckgo.com/] in dictionary...
2015-03-19 18:15:52,310 - [Thread-23] putting url

In [39]:
future_tasks = {crawler_link_threads.submit(crawl_task, url): url for url in result_dict.keys()}

NameError: global name 'crawler_link_threads' is not defined