# Concurrency: parallel execution; POSIX threads
_COSC 208, Introduction to Computer Systems, Fall 2025_

## Parallel execution

* Arithmetic logic unit (ALU) can only perform one mathematical/logical operation at a time
    * Program counter (PC) advances to the next instruction after the previous instruction finishes executing
    * Pipelining allows the control unit to fetch the next instruction while the current instruction is being executed by the processing unit
        * Complicated by conditional branch instructions, because next instruction depends on whether or not the condition is true
        * Pipeline can be stalled waiting for ldr/str to complete – recall accessing the cache takes the same time as performing ~10 mathemtical/logical oprations, and accessing main memory takes the same time as performing ~100 mathematical/logical operations
* _What if we had multiple ALUs?_
    * While one ALU is executing the current instruction, another ALU could execute the next instruction
    * Only works if the next instruction does not depend on the previous instruction – e.g.,
      ```
      add w1, w2, #3
      sub w11, w12, #13
      ```
* _Is there a way to guarantee the next instruction does not depend on the previous instruction?_
    * Execute different parts of the code–e.g., different functions–on each ALU
    * Need a program counter (PC) and instruction register (IR) for each ALU
    * _What about general purpose registers (w/x0, w/x1, ..., w/x30)? Can we allocate some to each ALU?_ – no
        * Would need to know at compile time which ALU was going to execute a specific function so the compiler would no which registers to use
        * Some registers serve specific purposes – e.g., w/x0, w/x1, ... are used for parameters; w/x0 is used for the return value; x30 is used for the return address
        * Each ALU gets its own set of general purpose registers
    * _What about the stack pointer (SP)? Can all ALUs use the same SP?_ – no
        * Each function has its own stack frame, so each ALU needs to keep track of a different stack frame
    * In summary, replicate the entire control unit and processing unit
        * Each pair of control and processing units is called a CPU core 
* _How do we decide which function(s) to execute on which CPU core?_
    * Each `bl` instruction causes a function to be assigned to a CPU core – typically don't execute the next instruction after the `bl` until a `ret` instruction is executed, so the first CPU core will be idle waiting for the other CPU core ==> doesn't actually result in parallel execution
    * Compiler divides code – requires compiler to automatically determine which functions are independent and which functions are dependent
    * Programmer-specified – create thread for each separate (sequence of) independent functions

<p style="height:35em;"></p>

## Pthreads API

* Use the pthreads library—`#include <pthread.h>`
* `int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void * arg)`
    * `thread`—a struct that stores metadata for the thread
    * `attr`—configuration settings for the thread
    * `start_routine`—the function to start executing when the thread starts
        * Pass a pointer to a function
    * `arg`—an argument passed to the aforementioned function
* `int pthread_join(pthread_t thread, void **value_ptr)`
    * `thread`—the same struct passed at thread creation; used to identify the thread we want to wait for
    * `value_ptr`—the location where the function return value should be stored
        * Notice it’s a pointer to a void pointer and the `start_routine` function specified in `pthread_create` returns a void pointer

<p style="height:20em;"></p>

Example

In [8]:
/* 1*/  #include <stdio.h>
/* 2*/  #include <stdlib.h>
/* 3*/  #include <pthread.h>
/* 4*/  #define LENGTH 5
/* 4*/  void *max(void *arg) {
/* 5*/      int *nums = (int*)arg;
/* 6*/      int largest = nums[0];
/* 7*/      for (int i = 1; i < LENGTH; i++) {
/* 8*/          if (nums[i] > largest) {
/* 9*/              largest = nums[i];
/*10*/          }
/*11*/      }
/*12*/      printf("Max: %d\n", largest);
/*13*/      return NULL;
/*14*/  }
/*15*/  void *sum(void *arg) {
/*16*/      int *nums = (int*)arg;
/*17*/      int total = 0 ;
/*18*/      for (int i = 0; i < LENGTH; i++) {
/*19*/          total += nums[i];
/*20*/      }
/*21*/      printf("Sum: %d\n", total);
/*22*/      return NULL;
/*23*/  }
/*24*/  int main() {
/*25*/      int *nums = malloc(sizeof(int) * LENGTH);
/*26*/      for (int i = 0; i < LENGTH; i++) {
/*27*/          nums[i] = random();
/*28*/      }
/*29*/      pthread_t thread1, thread2;
/*30*/      pthread_create(&thread1, NULL, &max, nums);
/*31*/      pthread_create(&thread2, NULL, &sum, nums);
/*32*/      pthread_join(thread1, NULL);
/*33*/      pthread_join(thread2, NULL);
/*34*/  }

Sum: -584636838
Max: 1957747793


* Every program has a main thread
* Each thread has its own stack
* Threads share the heap (and global variables)
* Different threads can run the same function

<p style="height:20em;"></p>

Q2: _Consider the following program:_

In [None]:
/* 1*/  #include <pthread.h>
/* 2*/  #include <stdio.h>
/* 3*/  #include <string.h>
/* 4*/  void word_count(char *str) {
/* 5*/      int count = 1;
/* 6*/      for (int i = 0; i < strlen(str); i++) {
/* 7*/          if (str[i] == ' ') {
/* 8*/              count++;
/* 9*/          }
/*10*/     }
/*11*/     printf("%d words\n", count);
/*12*/  }
/*13*/  int main(int argc, char *argv[]) {
/*14*/      char *str = "I love CS";
/*15*/      pthread_t thr;
/*16*/      pthread_create(thr, NULL, &word_count, str);
/*17*/      pthread_join(thr);
/*18*/  }

_Compiling this program with `gcc` results in the following errors:_

```
buggy_noreturn.c:16:20: warning: passing argument 1 of ‘pthread_create’ 
makes pointer from integer without a cast [-Wint-conversion]
   16 |     pthread_create(thr, NULL, &word_count, str);
      |                    ^~~
      |                    pthread_t {aka long unsigned int}
/usr/include/pthread.h:202:50: note: expected ‘pthread_t * restrict’ 
{aka ‘long unsigned int * restrict’} but argument is of type ‘pthread_t’ 
{aka ‘long unsigned int’}
  202 | extern int pthread_create (pthread_t *__restrict __newthread,
      |                            ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
buggy_noreturn.c:16:31: warning: passing argument 3 of ‘pthread_create’ 
from incompatible pointer type [-Wincompatible-pointer-types]
   16 |     pthread_create(thr, NULL, &word_count, str);
      |                               ^~~~~~~~~~~
      |                               void (*)(char *)
/usr/include/pthread.h:204:36: note: expected ‘void * (*)(void *)’ but 
argument is of type ‘void (*)(char *)’
  204 |                            void *(*__start_routine) (void *),
      |                            ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
buggy_noreturn.c:17:5: error: too few arguments to function ‘pthread_join’
   17 |     pthread_join(thr);
      |     ^~~~~~~~~~~~
/usr/include/pthread.h:219:12: note: declared here
  219 | extern int pthread_join (pthread_t __th, void **__thread_return);
      |            ^~~~~~~~~~~~
```

_How would you change the code to fix these problems?_

* Need to pass `&thr` to `pthread_create` (instead of `thr`) on line 16
* Function executed by a thread must return `void *` and take a single `void *` parameter; replace lines 6-7 with:
    ```C
    void *word_count(void *arg) {
        char *str = (char *)arg;
    ```
    Also add after line 11:
    ```C
    return NULL;
    ```
* Add an additional parameter to `pthread_join` one line 17:
    ```C
    pthread_join(thr, NULL);
    ``` 

<div style="height:1em;"></div>

Q3: _Consider the following program:_

In [None]:
/* 1*/  #include <ctype.h>
/* 2*/  #include <pthread.h>
/* 3*/  #include <stdio.h>
/* 4*/  #include <stdlib.h>
/* 5*/  #include <string.h>
/* 6*/  int count_upper(char *str) {
/* 7*/      int count = 0;
/* 8*/      for (int i = 0; i < strlen(str); i++) {
/* 9*/          if (isupper(str[i])) {
/*10*/              count++;
/*11*/          }
/*12*/      }
/*13*/      return count;
/*14*/  }
/*15*/  int main(int argc, char *argv[]) {
/*16*/      if (argc < 2) {
/*17*/          printf("Error: provide a string\n");
/*18*/          return 1;
/*19*/      }
/*20*/      char *str = argv[1];
/*21*/      pthread_t thr;
/*22*/      pthread_create(thr, NULL, &count_upper, str);
/*23*/      int count = 0;
/*24*/      pthread_join(thr, &count);
/*25*/      printf("There are %d uppercase letters\n", count);
/*26*/  }

_Compiling this program results in the following warnings:_
```
buggy.c:22:20: warning: incompatible integer to pointer conversion passing 
'pthread_t' (aka 'unsigned long') to parameter of type 'pthread_t *' (aka 
'unsigned long *'); take the address with & [-Wint-conversion]
    pthread_create(thr, NULL, &count_upper, str);
                   ^~~
                   &
/usr/include/pthread.h:198:50: note: passing argument to parameter 
'__newthread' here
extern int pthread_create (pthread_t *__restrict __newthread,
                                                 ^

buggy.c:22:31: warning: incompatible function pointer types passing 
'int (*)(char *)' to parameter of type 'void *(*)(void *)' 
[-Wincompatible-function-pointer-types]
    pthread_create(thr, NULL, &count_upper, str);
                              ^~~~~~~~~~~~
/usr/include/pthread.h:200:15: note: passing argument to parameter 
'__start_routine' here
                           void *(*__start_routine) (void *),
                                   ^

buggy.c:24:23: warning: incompatible pointer types passing 
'int *' to parameter of type 'void **' [-Wincompatible-pointer-types]
    pthread_join(thr, &count);
                      ^~~~~~
/usr/include/pthread.h:215:49: note: passing argument to parameter 
'__thread_return' here
extern int pthread_join (pthread_t __th, void **__thread_return);
                                                ^
3 warnings generated.
```
_How would you change the code to fix these problems?_

* Need to pass `&thr` to `pthread_create` (instead of `thr`) on line 22
* Function executed by a thread must return `void *` and take a single `void *` parameter; replace lines 6-7 with:
    ```C
    void *count_upper(void *arg) {
        char *str = (char *)arg;
        int *count = malloc(sizeof(int));
        *count = 0;
    ```
    Also replace line 10 with:
    ```C
    *count++;
    ```
* Need to pass a double pointer to `pthread_join` on line 24; replace lines 23-25 with:
    ```C
    int *count = NULL;
    pthread_join(thr, &count);
    printf("There are %d uppercase letters\n", *count);
    ``` 