**zombie**: "a process that has terminated, but its parent has not (yet) read its exit status (using the wait() function)"

**orphan**: when a process' parent terminates while it is still running.

# Chapter 3

**Process**: A program being executed by the computer that has access to certain resources depending on where its located, like in the user space or the kernel. The CPU of a computer can execute multiple processes.

The process has a memory layout divided into the following sections: 

**text section** - the executable code

**data section** - global variables

**heap section** - memory that is dynamically allocated during program runtime

**stack section** - stack for holding temporary data storage. examples include local variables, function parameters, and return addresses

The text and data section are fixed sizes as they are set constant for the execution of the program, however the heap and stack change size based on the programs execution.

*Note: a process is not an executable file, but an executable file being executed with instructions loaded into memory being executed one by one and resources allocated.\*

An executing process has the following changing states:

New - The process is being created.

Running - Instructions are being executed.

Waiting - The process is waiting for some event to occur (such as an I/O completion or reception of a signal).

Ready - The process is waiting t obe assigned to a processor.

Terminated - The process has finished execution.

![Figure 3.2](./images/f3-2.png)

**Process control block (PCB)**: Known as a kernel data structure in memory. Its what represents a specific process in the OS. Contents:

 - Process state: New, ready, running, waiting, halted, etc..
 - Program Counter: Address of the next instruction this process will execute.
 - CPU registers: The standard CPU register functions including general registers, index registers, stack pointers, accumulators, etc..\
 - CPU scheduling information: process priority with pointers to sceduling queues and other scheduling parameters.
 - Memory managment information: the value of the base and limit registers and the page tables or segment tables
 - Accounting information: amount of CPU and real time use, time limits, account numbers, process numbers (PID)
 - I/O status information: List of I/O devices allocated to the process (list of open files, etc..)

**Process scheduler**: A feature of the CPU that maintains a data structure of PCB's (eg. doubly linked list). Its goal is to maximize the usage of a CPU to have it running some process at all times. Since a CPU can only run one process at a time, the scheduler handles switching between multiple processes to simulate it running multiple things at once for the user.

Figure 3.4 represents a **wait queue** (processes waiting for a certain event, like waiting for a user input) and a **ready queue** (processes initially created ready for execution) for the process scheduler:

![Figure 3.4](./images/f3-4.png)

**Degree of multiprogramming**: The number of processes currently in memory.

**I/O bound process**: Process that spend more of its time doing I/O (input/output) than actualy computations. Ex: prompt user for numbers to sum up (takes longer to wait on users input than to sum up all numbers).

**CPU-bound process**: Opposite of a I/O bound process in that more of its time is spent doing computations.

Figure 3.5 is a **queueing diagram** which represents a common architecture of process scheduling. The circles represent resources for the processes and the arrows represent the flow of the processes in the system:

![Figure 3.5](./images/f3-5.png)

**CPU scheduler**: Selects a process from the ready queue to allocate a CPU core to. It executes at least once every 100ms and often more frequently, one example for frequent calling is it forcibly removing a process from a core and assigning another process to the core while the first process is waiting for a I/O request. This is known as **swapping**.

**Context switch**: The process of switching from one process being executed on a core to switching to another process being executed on this core instead (this can also be switching from a process to a kernal routine due to a interrupt). This is done in the following steps:

1. A **state save** is performed where we save the current context of a process into its PCB. The info saved includes the value of the CPU registers, the process state, and memory-managment information.
2. A **state restore** is then performed where we load the context of the replacing process (from its PCB) into the CPU core.

Figure 3.6 shows a diagram of a context switch:

![Figure 3.6](./images/f3-6.png)

\*Note: Context switching has a high overhead (overhead meaning a nececary constant cost that isnt dependent upon whatever process or functionality we are running). This is overhead because the system cannot do any work while switching and the switching speed is dependent upon the computers hardware (eg, how many registers its copying over, existince of special instructions, etc..).*

In the case of a system having many registers available and faced with a case of less process than registers, it can simply store pointers to registers as a state save then switch to those registers. But if there are more process than registers then the system resorts to copying register data to and from memory.

A process creating a child process has two scenerios:

1. The parent continues to execute concurrently with its children.
2. The parent waits until some or all of its children have terminated.

There are also two address-space possibilities for the new process:

1. The child process is a duplicate of the parent process (it has the same
program and data as the parent).
2. The child process has a new program loaded into it.
   (Both of these follow the fork() and fork() then execve() process respectively)

When a parent process creates a child process (using fork()) if it utilizes a wait() system call it then moves itself off of the ready queue until the termination of the child.

Figure 3.9 demonstrates this:

![Figure 3.9](./images/f3-9.png)


**Cascading termination**: If a parent process is being terminated than all its child process are terminated also. This feature is dependent on the OS, some systems create orphan processes instead.

The UNIX system addresses orphan processes by assigning the init process (the root process) as the new parent to the orphan process. 

Processes executing at the same time (concurrently) can either be independent processes or cooperating processes.

**Independent process**: This process does not share its data with any other process currently executing in the system.

**Cooperating process**: This process can be affected or can affect other processes currently executing in the system.

**Interprocess communication (IPC)**: A mechanism that allows for processes to send and receive data from one another. This mechanism follows two models **shared memory** where a portion of memory is allocated to be shared among processes that they all read and write to, and **message passing** where a message pathway is established among processes that holds messages in queue form.

Figure 3.11 demonstrates both forms of IPC:

![Figure 3.11](./images/f3-11.png)



Message passing is slower (due to them being built using system calls that occupy the kernel) but simpler (since no conflicts need to be avoided) and is thus useful for small amounts of data. Is also used for distributed systems where communication is done by multiple computers (with their own memory) connected by a network.

Shared memory is more expensive and complicated but is faster since we can just implement routine memory reading for each process as the IPC.

A method in which memory sharing works is using two types of buffers: "The **unbounded buffer** places no practical limit on the size of the buffer. The consumer may have to wait for new items, but the producer can always produce new items. The **bounded buffer** assumes a fixed buffer size. In this case, the consumer must wait if the buffer is empty, and the producer must wait if the buffer is full."

In this case the producer has a pointer to the last written point in the buffer and if it isnt full it writes to it (otherwise it waits for it to be empty. The consumer has a pointer to the last read point in the buffer, if the read pointer is equal to the write pointer it waits otherwise it reads from the buffer then increments the read pointer.

In message passing we must establish a **communcation link** for processes to send and receive messages from, There are different implementation methods for the link:

- **Direct communication (aka symmetry)**: Each process must reference which processes they are sending or reading info from (each processes pair has exactly one link):

      send(P, message), receive(Q, message)
  
- **Asymmetry**: Only the message sender names the recipent, while the receiver doesnt have to name the sender
  
      send(P, message) — Send a message to process P.
      receive(id, message) — Receiveamessagefromanyprocess.
  
- **Indirect communication**: messages are sent to and received from mailboxes, or ports. Processes can write to one or more mailboxes and read from one or more mailboxes

    send(A, message) — Send a message to mailbox A.
  
    receive(A, message) — Receive a message from mailbox A.


**Synchronization**: Message passing methods for communication between processes using send() and receive() functions:

- **Blocking send**: The sending process is blocked until the message is received by the receiving process or by the mailbox.
- **Nonblocking send**: The sending process sends the message and resumes operation.
- **Blocking receive**: The receiver blocks until a message is available.
- **Nonblocking receive**: The receiver retrieves either a valid message or a null.

**Buffering**: A queue that holds messages being exchanged between two processes. The following possible buffering techniques are possible:

- **Zero capacity**: The queue has a maximum length of zero; thus, the link cannot have any messages waiting in it. In this case, the sender must block until the recipient receives the message.
- **Bounded capacity**: The queue has finite length n; thus, at most n messages can reside in it. If the queue is not full when a new message is sent, the message is placed in the queue (either the message is copied or a pointer to the message is kept), and the sender can continue execution without waiting. The link’s capacity is finite, however. If the link is full, the sender must block until space is available in the queue. 
- **Unbounded apacity**: The queue’s length is potentially infinite; thus, any number of messages can wait in it. The sender never blocks.

**Pipe**: One of the first IPC mechanisms in early UNIX systems. They act as a channel of communication for two processes. Two common types of pipes are ordinary pipes and named pipes.

**Ordinary pipe**: Standard unidirectional communication system. The producer process writes to one end of the pipe known as the **write end** and the consumer process reads from the other side of the pipe known as the **read end**. If the pipes need two-way communication (they both need to read and write to each other) then we must open up two pipes.

The figure 3.20 below demonstrates a pipe process 
(note that fd[0] is the read end of the pipe, and fd[1] is the write end 

and in this case the read end of the parent (fd_p[1]) is closed as its only writing and the write end of the child is closed (fd_c[0]) as its not writing)

![Figure 3.20](./images/f3-20.png)

**Named pipe**: Also a unidirectional pipe (half-duplex transmission only). But now the pipe remains until its specifically deleted from the file system (acts as a file) and can be used by multiple processes (ordinary pipe deletes itself after its communicating processes have terminated). Follows a FIFO data structure with the possibilty of multiple processes as writers and multiple processes as readers. "Additionally, the communicating processes must reside on the same machine. If intermachine communication is required, sockets must be used."

**Socket**: An endpoint for communication, typically using some network service like SSH, FTP, and HTTP along with a port. These are used as a communication tool for a pair of processes communicating over a network (one socket is assigned to each process). Example connection:

1. Client process host X with IP address 146.86.5.20 initiates a request for a connection with a web server 
2. This web server is listening for connections on port 80
3. The web server reads the clients request and returns a port number for it (greater than 1024) that turns out to be 1625

Now the host and web server are able to exchange information specifically with each other by specifying each others ports as the destination. (Host sends data to port 80 and servers sends data to port 1625). This means that connections containing ports are unique, so if another process wanted to communicate with the web server it would need its own port value assigned (something greater than 1024 and not equal to 1625).

This is demonstrated visually in figure 3.26 below:

![Figure 3.26](./images/f3-26.png)

# Chapter 5

# Chapter 19

**Distributed System**: A collection of computer programs (or processors) known as nodes that each have their own memory and clock (or their own OS) and communicate with each other over a common network and work together to acheive some goal. From the point of view of a specific node, its resources are local whereas all other nodes and resources are remote. One example of a distributed system is the internet.

Below in figure 19.1 is an example of a client-server distributed system:

![Figure 19.1](./images/f19-1.png)

**Resource sharing**: "Provides mechanisms for sharing files at remote sites, processing information in a distributed database, printing files at remote sites, using remote specialized hardware devices such as a supercomputer or a GPU."

**Computation speedup**: Being able to split the computations of a task among many nodes in a distributed system to speed up the computation speed.

**Load balancing**: Taking some of the computation tasks from a node that is overloaded with requests and sending them to a more lighlty loaded node.

A distributed system is reliable in the fact that if one node fails, it doesnt affect the other nodes. Unless nodes are assigned for a specific task that other nodes are dependent upon (like a global database for example). In that case backup nodes for specifc task nodes are expected to be made.

Distributed systems can be connected by two types of networks: local-area networks (LAN) and wide-area networks (WAN).

**Local-area network (LAN)**: Nodes are distributed over a small geographical area like inside a single building or a number of adjacent buildings. One of the main advantages of LAN over WAN is that due to each node/computers short distance the connection/communications links usually have a higher speed and lower error rate. Connection links are usually either WIFI or an ethernet cable.

Figure 19.2 demonstrates an example of a LAN inside a home/office:

![Figure 19.2](./images/f19-2.png)

**Wide-area network (WAN)**: Nodes (or systems) distributed over a large area like the United States. Connection links are usually telephone lines, leased (dedicated data) lines, optical cable, microwave links, radio waves, and satellite channels. The communication links are controlled by **routers** that are responsible for directing traffic to other routers and networks but also the general information transfer through the links to the nodes. Figure 19.3 below demonstrates a distributed system with network hosts (nodes) on the outside and routers on the inside:

![Figure 19.3](./images/f19-3.png)


In this figure we can imagine how the internet is structured. The network hosts (or nodes) are computers connected to LAN's. The LAN's are connected to the internet via regional networks, and the regional networks are interlinked with routers to form the worldwide network (the internet).

Even though WAN's are generally slower than LAN's, WAN's connections that link major cities may have very fast transfer rates through fiber optic cables.

**Transmission Control Protocol/Internet Protocol (TCP/IP)**: A communication standard link that is used by the internet for its distributed system. It transmits data (back and forth) between a client and a server by first establishing a connection and then using that connection to share data by breaking it down into smaller packets and piecing them back together on the other side. It can be broken down into 4 layers:

1. Application layer - Interactive layer for the user, handles data exchange on the application. An example is http on a website
2. Transport layer - Handles the data preperation, splits the data into TCP/UDP packets and places them inside a packet order and controlling flow. (Kinda like the packaging center for a mailroom)
3. Internet layer - Encodes and decodes routing IP values onto the data packets. (Kinda like the mailroom person adding mailing stamps and reading mailing stamps)
4. Link layer - Handles the transmission of data over a physical medium (handling error detection and data recovery). (Kinda like the mailtruck driving the mail).

TCP is a reliable connection-oriented transport protocol (compared to UDP). TCP implements a byte stream that allows for data to be sent in order and uninterrupted. It does so with the following steps:

1. Establishes a **acknowledgment packet (ACK)** which is a required message that the receiver has to send to the sender to acknowledge that they received the data from the sender.
2. Utilizes **sequence numbers** which act as unique id's and increase incrementally with each data packet so the sender can make sure its receiving the data in order.
3. Uses control packets to initialize a connection and to tear down a connection. Known as a **three-way handshake** as the following 3 signals are exchanged: SYN, SYN+ACK, ACK.

Figure 19.11 demonstrates an example of a TCP data transfer. The example can be broken down into the following steps:

1. After a connection is established, the client sends a request packet to the server with the sequence number 904.
2. The server sends back an ACK for the data 904 request.
3. The server then sends its own data to the client identifiable with the sequence number 126.
4. The client receives data 126 and sends back an ACK for it.
5. The server then tries to send data with sequence number 127 to the client, but the data does not reach and gets lost.
6. The server reached its time limit waiting for an ACK for data 127 and so it reaches a timeout and resends data 127.
7. This time the client receives data 127 and sends an ACK for it to the server.
8. The server sends the next piece of data to the client (with sequence number 128).
9. The client receives the data and sends back an ACK for it but the ACK is now lost and never delivers.
10. The server reaches its timeout for the ACK for data 128 and resends data 128.
11. The client receives data 128 again, marks it as a duplicate and resends the ACK for it (knowing that its previous ACK must have not been delivered).

![Figure 19.11](./images/f19-11.png)

**Flow control**: Mechanism utilized by TCP to regulate the flow of data packets. Prevents the sender from overrunning the capacity of the receiver. It does this by encoding a message in ACK's that tell the sender to either slow down their data sending or to speed it up (and this is determined by many factors including the receivers hardware and the current speed of the network).

**Congestion control**: Determines the slow down or speed up message in ACK from flow control by analyzing how many data packets are being lost/dropped (as router drops data packets if overwhelmed by requests).

**Access control (MAC) address**: The address attached to a datapacket that is being sent within a local network (LAN). Every Ethernet/WiFi device has a unique medium access control (MAC) address. 

**Address resolution protocol (ARP)**: The address generated for a data packet that is being sent from a system on a local network to another system on a different local network. Also known as: IP to MAC address mapping. In this mapping of ARP to IP we can then send packets of data that have a certain ARP which is mapped to certain MAC addresses so we can send data to only a certain amount of specific local networks.

**User Datagram Protocol (UDP)**: A bare bones quick and cheap version of TCP. The UDP header only contains four fields: source port number, destination port number, length, and checksum. Packets of data sent quickly but with no guardrails meaning we can lose some data packets in the process (or get them out of order) and its up to the application to handle this by either requesting the missing packet again or deciding to make do with whatever packets it did receive. Also known as a **connectionless** protocol as there is no connection setup to establish a communication state between the servers nor is there a connection teardown, UDP simply sends the data right away and hopes that the recipient is able to receive it.