## 참고자료
[왜 Python에는 GIL이 있을까? - 개발새발로그](https://dgkim5360.tistory.com/entry/understanding-the-global-interpreter-lock-of-cpython)  
[\[Python\] Global Interpreter Lock (GIL) 에 대한 이야기 - alice](https://blog.naver.com/alice_k106/221566619995)

### Race Condition

In [1]:
import threading

x = 0 # A shared value

def foo():
    global x 
    for i in range(100000):
        x += 1
        
def bar():
    global x
    for i in range(100000):
        x -= 1
        
t1 = threading.Thread(target=foo)
t2 = threading.Thread(target=bar)

t1.start()
t2.start()
t1.join()
t2.join() # Wait for completion

print(x)

-7149


전역변수 x에 2개의 쓰레드가 접근하면서 연산이 씹혀서 0이 나오지 않음.

### mutex
프로세스 내에서 공유되는 메모리의 데이터를 여러개의 쓰레드가 동시에 수정하지 못하게 함.

``` C
/*****************************************************************************
 * FILE: dotprod_mutex.c
 * DESCRIPTION: * This example program illustrates the use of mutex variables 
 * in a threads program. This version was obtained by modifying the 
 * serial version of the program (dotprod_serial.c) which performs a 
 * dot product. The main data is made available to all threads through 
 * a globally accessible structure. Each thread works on a different 
 * part of the data. The main thread waits for all the threads to complete 
 * their computations, and then it prints the resulting sum. 
 * SOURCE: Vijay Sonnad, IBM 
 * LAST REVISED: 01/29/09 Blaise Barney 
 ******************************************************************************/ 
#include <pthread.h> 
#include <stdio.h> 
#include <stdlib.h> 
/* The following structure contains the necessary information 
to allow the function "dotprod" to access its input data and place 
its output into the structure. This structure is 
unchanged from the sequential version. */
typedef struct {
    double *a;
    double *b;
    double sum;
    int veclen;
} DOTDATA; 

/* Define globally accessible variables and a mutex */ 
#define NUMTHRDS 4 
#define VECLEN 100000 
DOTDATA dotstr; 
pthread_t callThd[NUMTHRDS];
pthread_mutex_t mutexsum; 
/* The function dotprod is activated when the thread is created. 
As before, all input to this routine is obtained from a structure 
of type DOTDATA and all output from this function is written into 
this structure. The benefit of this approach is apparent for the 
multi-threaded program: when a thread is created we pass a single 
argument to the activated function - typically this argument is a 
thread number. All the other information required by the function 
is accessed from the globally accessible structure. */
void *dotprod(void *arg) {
    /* Define and use local variables for convenience */
    int i, start, end, len;
    long offset;
    double mysum, *x, *y;
    offset = (long) arg;
    len = dotstr.veclen;
    start = offset*len;
    end = start + len;
    x = dotstr.a;
    y = dotstr.b;
    /* Perform the dot product and assign result to the appropriate variable in the structure. */
    mysum = 0;
    for (i = start; i < end; i++) {
        mysum += (x[i] * y[i]); 
        }
    /* Lock a mutex prior to updating the value in the shared structure, and unlock it upon updating. */
    pthread_mutex_lock(&mutexsum);
    dotstr.sum += mysum;
    printf("Thread %ld did %d to %d: mysum=%f global sum=%f\n",offset,start,end,mysum,dotstr.sum);
    pthread_mutex_unlock(&mutexsum);

    pthread_exit((void*) 0);
} 
/* The main program creates threads which do all the work and then
print out result upon completion. Before creating the threads,
The input data is created. Since all threads update a shared structure,
we need a mutex for mutual exclusion. The main thread needs to wait
for all threads to complete, it waits for each one of the threads.
We specify a thread attribute value that allow the main thread to
join with the threads it creates. Note also that we free up handles
when they are no longer needed. */ 
int main (int argc, char *argv[]) {
    long i;
    double *a, *b;
    void *status;
    pthread_attr_t attr;

    /* Assign storage and initialize values */
    a = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
    b = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
    for (i = 0; i < veclen*numthrds; i++) {
        a[i]=1;
        b[i]=a[i]; 
        }

    dotstr.veclen = VECLEN;
    dotstr.a = a;
    dotstr.b = b;
    dotstr.sum = 0;

    pthread_mutex_init(&mutexsum, NULL);

    /* create threads to perform the dotproduct */ 
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

    for (i = 0; i < NUMTHRDS; i++) {
        /* Each thread works on a different set of data. 
        * The offset is specified by 'i'. The size of 
        * the data for each thread is indicated by VECLEN. */
        pthread_create(&callThd[i], &attr, dotprod, (void *)i); 
    }

    pthread_attr_destroy(&attr);

    /* Wait on the other threads */
    for (i = 0; i < NUMTHRDS; i++) {
        pthread_join(callThd[i], &status); 
    }
     
    /* After joining, print out the results and cleanup */
    printf ("Sum = %f \n", dotstr.sum);
    free (a);
    free (b);
    pthread_mutex_destroy(&mutexsum);
    pthread_exit(NULL); 
}
```

4개 쓰레드로 구하는 a,b 원소의 곱의 총합

``` C
void *dotprod(void *arg) {
    // ... 
    pthread_mutex_lock(&mutexsum);
    dotstr.sum += mysum;
    pthread_mutex_unlock(&mutexsum); 
    // ... 
}
```
1. 해당 쓰레드가 연산 중 다른 쓰레드가 코드를 진행해서 연산이 씹히지 않도록 mutex_lock
2. 쓰레드에서 연산한 값을 dotstr.sum에 추가
3. 다른 쓰레드가 마저 코드를 진행할 수 있게 mutex_unlock

### Reference Counting

CPython에서는 참조(reference)의 개수를 세는 방법으로 메모리를 관리함.

In [6]:
import sys
a = [] # a 선언.+ 1
b = a # b에 a의 ref를 할당.+1
sys.getrefcount(a) # sys.getrefcount()의 인자로 a 사용. +1

3

참조의 개수는 총 3.  
sys.getrefcount()가 종료된 시점에서 a의 참조 개수는 하나가 줄었으므로 2다.

### GIL

* C에서의 Race Condition은 사용자가 방지해야함.
* CPython은 메모리 관리 방법으로 Reference Counting 사용

위 2개가 근본적인 이유.   
Reference Counting 중 Race condition이 일어나면 메모리 릭, 손실 위험.  
멀티쓰레딩을 도입할 경우 공유 메모리 상의 변수가 변경될 때마다 mutex_lock 필요 -> 성능이 저하.  
결국 한 시점에 한 쓰레드만 사용하도록 설정(GIL)