In [None]:
%run -i ../python/common.py
publish=False

if not publish:
    # cleanup any old state
    bashCmds('''[[ -d mydir ]] && rm -rf mydir
    [[ -a myfile ]] && rm myfile
    [[ -a errors ]] && rm errors 
    [[ -a mydate ]] && rm mydate
    [[ -a mynewdir ]] && rm -rf mynewdir
    [[ -a anotherfile ]] && rm anotherfile
    [[ -a mybin ]] && rm -rf  mybin
    [[ -a myinfo ]] && rm myinfo''')
else:
    bashCmds('''rm -rf ~/*''')
    
closeAllOpenTtySessions()
bash = BashSession()

generated="~/myfile ~/errors ~/mydate ~/mydir ~/mynewdir ~/out"


In [None]:
appdir=os.getenv('HOME')
appdir=appdir + "/lec3"
TermShellCmd("ls ")
output=runTermCmd("[[ -d " + appdir + " ]] &&  rm -rf "+ appdir + 
             ";cp -r ../src/lec3 " + appdir)

```{warning}
This portion of the book is under construction, not ready to be read
```

(cont:gs:abstractions)=
# Operating System Abstractions

UNIX has instilled in generations of engineers a basic aesthetic for how to design and structure complicated collections of software.  In particular one learns that the designers of UNIX tried to structure the system around a small core set of ideas, "abstractions", that once understood allows a programmer to understand the rest of the system and how to get things done.   

We first describe the fundamental abstraction of [files](cont:gs:abstractions:file) that is core to the power and understand of Unix's ability to enable programs, to be composed together, and how this enables powerful functionality to be implemented in a shell.  We then, using a shell as a example, briefly discuss key abstractions and and the interfaces a shell uses to control [processes](cont:gs:abstractions:process), [operate on file descriptors](cont:gs:abstractions:fs:dup), operate on [files in a file systems](cont:gs:abstractions:fs), enable processes to [communicate](cont:gs:abstractions:pipes) and find out what [happened](cont:gs:abstractions:signals) to processes it started.


(cont:gs:abstractions:file)=
## Everything is a file

A core idea of Unix is that everything is a file, where a file is a stream of bytes.  As shown {numref}`file-desc` the kernel maintains for each process a *file descriptor table*, where a file descriptor is an index into that table that can be used to read or write to a particular file.  The system calls that work on all files are:

- `n = write(desc, buffer, len)`: Write `len` bytes from `buffer` into a stream identified by `desc`.

- `n = read(desc, buffer, max)`: Read `max` bytes (or fewer if no data is available) from stream identified by `desc` into `buffer`and returning the actual number of bytes `n` read.  

To understand how to use these operations, you really need to read the manual.  In Linux you can find out about everything using the **man** program.  For example, ```man 2 write``` tells you everything about the ```write``` system call:
  

```{note}
:class: margin
In this case, the *2* refers to the section of the manual for system calls.  To find out about the different sections, you, of course, read the manual about the man command.
```



In [None]:
bash.run("man 2 write", height='1in')

To make files in the file system look like a stream, on each read or write operation, the kernel increases a (per open file)  `offset` by the amount of data read or written.  Turns out that this naturally matches many applications that read or write files in their entirety. 

Entries are added to the file descriptor table by operations that open or create a file or create a special file like a network connection, or a tty.  There are three special file descriptors shown in {numref}`fd-table`, that programs (and libraries) should use for input, output and errors. 

```{figure} ../images/intro/osstructure-fd.drawio.png
---
width: 50%
name: file-desc
---
The kernel maintains for each process an array of file descriptors, where a process can read or write to any kind of I/O object that are open in its table. 
```

```{list-table} Standard well known file descriptors. 
:header-rows: 1
:name: fd-table
:widths: 3 5 20
:width: 4in

* - Value
  - Name
  - Purpose
* - 0
  - stdin
  - standard input; process should read data from here
* - 1
  - stdout
  - standard output; process will write its output here
* - 2
  - stderr
  - standard error; process should write out errors here
```

So, for example, the following program echos a string to the terminal:

In [None]:
bash.run("echo \"Hello class\"", height='1in')

While the same program, can have its output redirected to a file:

In [None]:
bash.run("echo \"Hello class\" > /tmp/reshello")

And we can see that the contents of this file are the same as what was previously written to the terminal, by using the ```cat``` program which writes the contents of a file to the terminal:

In [None]:
bash.run('''cat /tmp/reshello''')

```{note}
:class: margin
This probably seems obvious to a modern reader, i.e., an object oriented design, where you can do the same operations on any object. However, at the time, it was a radical idea, operating systems had specialized interfaces for files with records, terminals, etc…  
```

This fundamental idea Unix introduced, that you can use the same  `read` and `write` operations on any kind of I/O object, is very powerful.  It enables a single program, depending on how it is launched by the shell, to work on data stored in a file system, data entered on a keyboard, or even on data sent over a network by other processes.  By introducing the idea of a special object/file, a *pipe*, you could allow programs to be combined together to do much more powerful tasks.   The **|** symbol tells the shell to create a pipe that connects the output of one program to the input of the next program.  So, lets say we are trying to find all the programs on our computer that have anything to do with perl, the following command will list the contents of the /usr/bin/ directory send the output of that listing to a grep program that searches for the word perl, and send the output of that to a program that counts the number of lines of input it had

In [None]:
bash.run("ls /usr/bin/ | grep perl | wc -l", height='1in')

Today, the idea of everything is a file has been taken much further in Linux.  Linux now exposes all kinds of information through synthetic file systems, giving users and administrators massive ability to automate.  For example, in ```bash``` the shell we are using ```$$``` lets us know the ``id`` of the process we are in.  So, stealing a nice example from [jonathan](https://jappavoo.github.io/UndertheCovers/textbook/unix/shellintro.html#standard-output-and-redirection), the following command shows the 

In [None]:
bash.run('''ls -l /proc/$$/fd/{0..2}''')

In [None]:
bash_pid=bash.getPid()
bash_stdout=os.path.realpath("/proc/" + bash_pid.__str__() + "/fd/1")
bash.run("file " + bash_stdout)

And we can see that our stdin, stdout, and stderr all point to a character special file is in Unix is used to represent a terminal, and we can write to that same special device and it will appear in our terminal. 

In [None]:
bash.run("echo \"hello class\" > " + bash_stdout)

I would strongly encourage reading the shell and unix sections of [Under the Covers: The Secret Life of Software](https://jappavoo.github.io/UndertheCovers/textbook/intro_tb.html#under-the-covers-the-secret-life-of-software)  for a much more detailed coverage of this material.  However, hopefully this has given you enough information to understand the power Unix introduced by introducing polymorphism in the operating system, and creating a shell that enables you to combine all kinds of programs together in complicated ways.  

The remainder of this chapter introduces the core abstractions of Unix, and the system calls you use on those examples, all with examples from a shell. 

(cont:gs:abstractions:process)=
## Process management

As discussed [previously](cont:gs:structure:linux) a process is a virtual computer, and the kernel provides each process: 1) an abstraction of an isolated CPU (while multiplexing it between different processes), 2) a *virtual memory* abstraction of massive contiguous memory that starts at address $0x0$, and 3) a set of file abstractions that allow the process to persist data and communicate with other processes.    After discussing the state maintained by the kernel, we discuss the interfaces the shell (or any application) can use to manipulate processes. 

### State
As shown in {numref}`img:intro:proc` the kernel maintains a table of all processes, indexed by the `process id`, or `PID` to keep track of all the information about that process.   This includes a pointer to the `file descriptor table`(discussed [earlier](file-desc)), as well as data structures to maintain CPU and memory management state.  For CPU, this includes all the registers that need to be loaded when the process runs.  



```{figure} ../images/intro/osstructure-proc.drawio.png
---
width: 80%
name: img:intro:proc
---
A process table, in the kernel, indexed by PID, points to the file descriptor table, memory management regions, and CPU state. 
```


In today's computers the address space, or *virtual memory*, of a process is a huge contiguous abstraction of memory that goes from 0 to $2^{64}$.  As shown in {numref}`img:intro:mmlay` it is typically divided into  *code* or machine-language instructions (for some reason typically
called "text"), *initialized data*, consisting of read-only and
read-write initialized data, *initialized-zero data*, called "BSS" for
obscure historical reasons, *heap* or dynamically allocated memory, and
*stack*.  The *memory regions* referred to in {numref}`img:intro:mmlay` keeps track of each of these regions. 


```{figure} ../images/pb-figures/intro/trad-addr-space.png
---
width: 45%
align: right
name: img:intro:mmlay
---
Virtuam memory layout
```



### System calls

Key System calls in traditional unix related to processes are:


- `pid = fork(void)`:  create a child process that is a duplicate of the parent. Return 0 in child, and `PID` of child in parent.

- `exit(status)`: terminate the calling process and record the status passed in for others 

- `pid = waitpid(pid, *status...)`: wait for specified process to complete (or change state), return the status passed on exit, and garbage collect any kernel resources 

- `err = execve(program, arguments, environment)`: start executing a new process with specified arguments and environment information

The `fork` system call duplicates the parent into a new child process, where the only difference that enables the parent and child to distinguish itself is the return value.  You can think of this logically as creating a copy of all the memory, copying the CPU state, and copying the file descriptor able (while incrementing reference counts on all the files pointed to by the file descriptor table). 

Unix maintains tree in the kernel, where every process has a parent, and a parent may have many children.   For example, below I type I run the bash shell several times, and then printing out the process tree (see `man ps` for arguments) you see that ps is a child of bash, which is a child of bash...

In [None]:
bash.run(
    '''bash
bash
bash
ps -jH''')

The `exit` system call causes the process to complete, passing in a status for the reason and `waitpid` waits for a *process* to change status, and if it has executed, returns the status passed in by `exit`.    While most of the state goes away (e.g., the file descriptor table and memory regions) the process descriptor stays around to keep track of this status information. As a result, if another process does not do a wait on a process, it will become a **zombie** (yes, that is a real unix term) holding on to a process descriptor in the kernel forever.   

The `execve` sysem call executes a new program replacing the memory regions (BSS, text, ...) with memory from the file which the `program` points to.  The CPU state is set to pass in the arguments, and the file descriptor table is not modified.  Note, `exec` will never return unless there was some kind of failure; it is the same process just executing a different program.  

(cont:gs:abstractions:process:example)=
### Examples

Okay lets look at some code. Checking out repository for examples from year


In [None]:
display(Markdown('<font size="1.2rem">' + FileCodeBox(
    file=appdir + "/testfork.c", 
    lang="c", 
    title="<b>C: testfork.c",
    h="100%", 
    w="100%"
) + '</font>'))
TermShellCmd("[[ -a testfork ]] && rm testfork;make testfork", cwd=appdir, prompt='', noposttext=True)
TermShellCmd("./testfork", cwd=appdir, noposttext=True)

The parent prints out the child `pid` that it gets from the fork, and the its `pid` (from calling `getpid()`), which matches the pid that the child gets from `getppid()`. Again, please use ** man ** to find out about any of these system calls. 

Here is an example of a simple program that calls fork and exec. 

In [None]:
display(Markdown('<font size="1.2rem">' + FileCodeBox(
    file=appdir + "/doforke.c", 
    lang="c", 
    title="<b>------ doforke.c ------------",
    h="100%", 
    w="100%"
) + '</font>'))

(cont:gs:abstractions:fs:dup)=
## Changing stdin/stdout 


(cont:gs:abstractions:fs)=
## File system


Introduce dup2, show how that works

So, we know how to read and write from stdin, but how do we direct into a file.  

Read and write work for all files, there are also a number of system calls specific to file systems. While we discuss this in more detail [later](cont:fs:interface), we briefly introduce the key information you need to know.  First, it is important to realize that all Unix file systems organize information in a hierarchy as shown in numref}`fs:tree-logical-abs`.  

```{figure} ../images/pb-figures/fs/filesys-tree.png
---
width: 45%
name: fs:tree-logical-abs
---
Logical view: hierarchical file system name space
```

Some of the system calls specific to file systems are:

- `int desc = open(name, O_READ)`: Verify that file `name` exists and may
be read, and then return a *descriptor* which may be used to refer to
that file when reading it.

- `int desc = open(name, O_WRITE | flags, mode)`: Verify permissions and
open `name` for writing, creating it (or erasing existing contents) if
necessary as specified in `flags`. Returns a descriptor which may be
used for writing to that file.

- `close(desc)`: stop using this descriptor, and free any resources
allocated for it.

- `lseek(desc, offset, flag)`: Set an open file's current position to that
specified by `offset` and `flag`, which specifies whether `offset` is
relative to the beginning, end, or current position in the file.


Show new picture of shell code, with redirection

(cont:gs:abstractions:pipes)=
## Pipes

Okay, but remember that idea of pipe we talked about, well, .. 


(cont:gs:abstractions:signals)=
## Signals 

What if you don't want to wait for the program to finish.  Turns out... introduce background.

talk about zombies

Well remember, we have the abstraction of a virtual computer.  For the physical compter, we have the idea of interrupts.  For virtual comptuer, signals do the same thing. 

## Conclusion

You might think you know it all, but, we have only talked about 



In [None]:
bash.run('''
git clone git@github.com:okrieg/EC440-2022-spring-examples.git >& RES
cat EC440-2022-spring-examples/README.md 
rm -rf EC440-2022-spring-examples >& RES     
         ''')