# Systems Programming

## C Basics

In [2]:
// Hello World in C
#include <stdio.h>
int main() {
    printf("Hello, World!\n");
    return 0;
}
main();

Hello, World!


### Pre-processor commands
Lines that begin with `#` are commands the C pre-processor. The line `#include <stdio.h>` looks for the source code file `stdio.h`  and includes it before compilation. This is a file required to use the standard input and output library, such as the `printf` function. 

### The `main()` function
All C programs have an entry function called `main()`. This is called by the runtime system in order to start the program running. Every C program must have exactly one `main()`, which must return an integer. Only functions called in this function will be executed. 

### The `printf()` function
The `printf()` function is used to print formatted text to the console. 

In [3]:
printf("We've got rectal bleeding.\n");
printf("What, all of you?");

We've got rectal bleeding.
What, all of you?

The `printf()` function  does not automatically add a new line - the newline character `\n` must be used to move to the next line. 

In [4]:
printf("We've got rectal bleeding.\n");
printf("What, all of you?");

We've got rectal bleeding.
What, all of you?

The `printf()` function uses a number of format specifiers to control the format of the output. These are used in the first parameter, which describes how the remaining parameters are to be formatted:
- `%d` - signed decimal (`int`)
- `%u` - unsigned decimal
- `%o`, `%x` - octal, hexadecimal
- `%l` - long integer (used to store numbers larger than 4 bytes, the limit for `int`). Must be combined with one of the specifiers above, e.g. `%ld`, `%lx`
- `%f` - floating point
- `%.nf` - floating point with `n` decimals
- `%e` - floating point in exponent form
- `%c`, `%s` - character, string

In [5]:
#include <stdio.h>
int main() {
    int a = -15;
    long b = 999999999999999999;
    float c = 3.14159;
    char d = 'f';
    char e[] = "lupus";
    
    printf("Signed int: %d\n", a);
    printf("Unsigned int: %u\n", a);
    printf("Octal: %o\n", a);
    printf("Long: %ld\n", b);
    printf("Float: %f\n", c);
    printf("Float (2d.p): %.2f\n", c);
    printf("Exponent: %e\n", c);
    printf("Char: %c\n", d);
    printf("String: %s\n", e);
    
    return 0;
}
main();

Signed int: -15
Unsigned int: 4294967281
Octal: 37777777761
Long: 999999999999999999
Float: 3.141590
Float (2d.p): 3.14
Exponent: 3.141590e+00
Char: f
String: lupus


### The function `return` statement
The `return` function is used to immediately exit a function, optionally sending a value back to the caller.

The return value from the `main()` function is special, with programs usually returning a zero value to indicate they have exited normally. 
If there is no `return` statement in the `main()` function, this generally will not cause a problem at compile-time (with the compiler assuming a return statement of `return 0;`).  

If the return value is of the wrong type, this may cause a warning at compile-time, or an error at run-time. 

## Compiling

![Compilation Stages](compiling.png)

### The C Pre-processor

The C pre-processor is a program that runs before compilation, modifying the source code according to the pre-processor directives. Such directives include `#define` e.g. `#define PI 3.14151` and `#include` e.g. `#include <stdio.h>`.

`#define` is used to define a macro, which is essentially a name for a value or a code snippet. The pre-processor replaces every occurence of the macro with its replacement text before the code is compiled.

When using `#include`:
- if `< >` are used, the system directory (`usr/include`)  is prioritised
- if `" "` are used, the current working directory is used
  - the appropriate delimiters should be used depending on the type of header file e.g. system or user-defined

#### Conditional compilation
Conditional compilation allows the compiler to include or skip parts of code depending on whether certain macros are defined. This can be very useful for debugging.

In [6]:
#include <stdio.h>

// Uncomment to enable debug mode
//#define DEBUG

int main() {
    printf("Program started\n");

#ifdef DEBUG
    printf("Debug mode is ON\n");
#else
    printf("Debug mode is OFF\n");
#endif
    return 0;
}
main();

Program started
Debug mode is OFF


In [7]:
#include <stdio.h>

// Comment to enable debug mode
#define DEBUG

int main() {
    printf("Program started\n");

#ifdef DEBUG
    printf("Debug mode is ON\n");
#else
    printf("Debug mode is OFF\n");
#endif
    return 0;
}
main();

Program started
Debug mode is ON


#### Parameterised macros 

A parameterised (function-like) macro accepts parameters and uses them in its replacement text. They act like inline functions, but the replacment is done textually by the pre-processor before compilation, preventing the need for actual function calls. 

The parameters may appear as many times as desired in the replacement text. 

In [8]:
#include <stdio.h>

#define ADD(a, b) ((a) + (b))  // Parameterized macro

int main() {
    int x = 5, y = 3;

    printf("Sum: %d\n", ADD(x, y));      // replaced by ((x) + (y))
    printf("Sum: %d\n", ADD(2+3, 4+1));  // replaced by ((2+3) + (4+1)) = 10

    return 0;
}
main();

Sum: 8
Sum: 10


Using parameterised macros may make a program slightly faster, since a function call usually requires some overhead during program execution, but a macro invocation does not. Furthermore, macros are 'generic' - they can accept arguements of any type, provided that the resulting program is valid. 

However, this can also be a disadvantage, as arguements aren't checked or converted to the correct type by the pre-processor, whereas in a function, the compiler checks each arguement to see if it has the appropriate type. Since macros work as direct substitutions in code, it is important to always use brackets to the fullest extent possible to prevent any unexepected results.  

## The Shell

A shell is a powerful command-line interface (CLI) thats allows the user to interact with the operating system (OS) by typing commands. This includes the ability to:
- run programs
- control how programs work
- move around between different directories/folders
- perform sequences of commands to achieve more complex work

There are a number of different shells, such as bash and PowerShell. 

### Basic Commands

Some basic commands are given below. 

Note: the `!` before each command is not needed when using an actual shell (it is only necessary since this is a Jupyter Notebook)

`pwd` - *Print working directory*

In [9]:
!pwd 

/mnt/d/Notebooks/COMP2221 Programming Paradigms


`ls` - *List*

In [10]:
!ls 

Systems Programming.ipynb
compiling.png
myscript.sh
myscript2.sh
permission_string.png


`man` - *Manual*

In [11]:
!man ls

LS(1)                            User Commands                           LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort  is  speci‐
       fied.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print C-style escapes for nongraphic characters

       --block-size=SIZE
              with  -l,  scale  sizes  by  SIZE  when  printing  them;   e.g.,
              '--block-size=M'; see SIZE format below

       -B, --ignore-backups
              do not list implied entries ending with ~

       -c     with  -lt: 

`cd` - *Change directory*
- `.` *= current directory*
- `~` *= home folder*
- `..` *= one folder up*

In [12]:
!pwd

/mnt/d/Notebooks/COMP2221 Programming Paradigms


In [13]:
!cd ~ && pwd

/home/francis


In [14]:
!cd .. && pwd

/mnt/d/Notebooks


### `stdin`, `stdout` and `stderr`

`stdin`, `stdout` and `stderr` are the three built-in communication channels that each program recieves from the OS when it starts running. They remove the need to worry about I/O devices.
- `stdin` (Standard Input) is where programs read data from
- `stdout` (Standard Output) is where programs send normal output
- `stderr` (Standard Error) is where programs send error messages

### Pipes 

The shell provides many small tools (commands) - the power comes from composing them together. Pipes provide a means to do this. 

By default, each command takes an input (from the keyboard) and produces an output (to the screen). The input and output of a command can be redirected:
- `<` - taken input from a file
- `>` - write output to a file
  - a single `>` overwrites the file; `>>` appends to the file
- `|` - take the output of one command and use at input to the next

### `grep`
`grep` is a search tool that can be used to search through files or the output of other commands (via pipes). It can search through specific file(s) by providing the filename(s), or it can search through all files in the current directory by using the `-r` recursive flag.

In [15]:
!grep "shell" "systems programming.ipynb"

    "A shell is a powerful command-line interface (CLI) thats allows the user to interact with the operating system (OS) by typing commands. This includes the ability to:\n",
    "There are a number of different shells, such as bash and PowerShell. \n",
    "Note: the `!` before each command is not needed when using an actual shell (it is only necessary since this is a Jupyter Notebook)\n",
      "              do not list implied entries matching shell  PATTERN  (overridden\n",
      "              do not list implied entries matching shell PATTERN\n",
      "              use  quoting style WORD for entry names: literal, locale, shell,\n",
      "              shell-always,  shell-escape,  shell-escape-always,   c,   escape\n",
    "The shell provides many small tools (commands) - the power comes from composing them together. Pipes provide a means to do this. \n",
      "    \"A shell is a powerful command-line interface (CLI) thats allows the user to interact with the operating syst

In [16]:
!grep -r "pipes" 

.ipynb_checkpoints/Systems Programming-checkpoint.ipynb:    "`grep` is a search tool that can be used to search through files or the output of other commands (via pipes). It can search through specific file(s) by providing the filename(s), or it can search through all files in the current directory by using the `-r` recursive flag."
.ipynb_checkpoints/Systems Programming-checkpoint.ipynb:      ".ipynb_checkpoints/Systems Programming-checkpoint.ipynb:    \"`grep` is a search tool that can be used to search through files or the output of other commands (via pipes). It can search through a specific file by providing the filename, or it can search through all files in the current directory by using the `-r` recursive flag.\"\n",
.ipynb_checkpoints/Systems Programming-checkpoint.ipynb:      "Systems Programming.ipynb:    \"`grep` is a search tool that can be used to search through files or the output of other commands (via pipes). It can search through a specific file by providing the filen

`grep` uses **regular expressions** for matching text. 

## Regular Expresions
Regular expressions provide a concise way to match different strings. They use a specific syntax:
- `.` - matches any single character (except a newline character)
- `*` - matches zero or more of the preceeding character
- `?` - matches zero or one of the preceeding character
- `+` - matches one or more of the preceeding character
- `[ABC]` - matches one character that is `A`, `B` or `C`
- `[A-Z]` - matches any upper case character `A` to `Z`
- `[0-9]` - matches any digit

For example, the regular expression `[A-Za-z]*[0-9].txt` matches zero or more letters (uppercase or lowercase), followed by exactly one digit and the literal suffix `.txt`.
Examples of strings that this expression would match include `MyFile5.txt`, `abc0.txt` and `1.txt`.

## File Permissions
Every file and directory in UNIX has an access mode controlling who can read, write, or execute it.

| Permission | Symbol | Description |
|------------|--------|-------------|
| Read       | r      | View or copy the file contents |
| Write      | w      | Modify or delete the file |
| Execute    | x      | Run as a program (for files) or enter (for directories) |


There are three permission groups which can each be granted specific permissions:
- Owner (user) - the person who created the file
- Group - a named collection of users who share the same permissions
- Others - everyone else

The permission string is a 10 character string that specifies the permissions of the different groups. 

<img src="permission_string.png" alt="Permission String" width="300px">

File permissions can be changed using `chmod`. 
The syntax for `chmod` is `chmod [permissions] [file]`.
For example, `chmod u+x file.sh` adds execute permission for the owner (for the file file.sh).

## Text Operations

### `sort`

`sort` takes in a file, if specified, or reads from `stdin` if not file is specified. It sorts the input (alphabetically/numerically) and outputs it to `stdout`, or a file if specified with `-o filename`. 

In [17]:
!echo "C \nA \nD \nB"

C 
A 
D 
B


In [18]:
!echo "C \nA \nD \nB" | sort

A 
B
C 
D 


### `translate`

Usage: `tr SET1 SET2`
- translates or deleted characters from SET1 to SET2
- e.g. `tr 'A-Z' 'a-z' produces a lower case version of `stdin`
- option `-c` takes the complement of SET1
  - `tr -c 'a-zA-Z' '\n'` replaces all non-letter characters with newlines
- option `-s` squeezes repeats to a single character
  - `tr -s ' '` converts multiple spaces into one
- option `-d` deletes all characters in SET1

In [19]:
!echo "abc123" | tr 'a-z' 'A-Z'

ABC123


In [20]:
!echo "abc123" | tr -d 'a-z'

123


In [21]:
!echo "abc123" | tr -dc 'a-z'

abc

In [22]:
!echo "aaabccc12223" | tr -s 'ac2' 

abc123


### `uniq`

`uniq` is used to remove or report repeated lines. It only removes consecutive repeated lines, so it is often used with `sort` to find/remove repeated lines throughout the document (i.e. `sort | uniq`). The option `-c` can be used to count the number of repitions. 

In [23]:
!echo "a\na\nb\na\nc\na" | uniq -c

      2 a
      1 b
      1 a
      1 c
      1 a


In [24]:
!echo "a\na\nb\na\nc\na" | sort | uniq -c

      4 a
      1 b
      1 c


In [25]:
!echo "a\na\nb\na\nc\na" | uniq

a
b
a
c
a


In [26]:
!echo "a\na\nb\na\nc\na" | sort | uniq

a
b
c


## File handling
Files are stored in a hierarchical structure (a tree) - the top level is the root directory `/`. Each directory (folder) can contain files or subdirectories, which allows grouping and organisation.

There are a number of commands for navigating around the file system. `ls` and `cd` have been covered [previously](#Basic-Commands), but additional commands include:
- `mkdir` - make a new folder
- `mv` - move a file/folder (also used to rename)
- `cp` - copy a file/folder
- `rm` - delete a file, or a folder using `-r`
- `du` - show disk usage of a file/folder
- `find` - search for files/folders in a directory tree

## Shell scripts

A shell script is a collection of commands enclosed in a file. 

It allows tasks to be automated by running each command in order automatically, rather than having to type out each command manually. 

When writing a shell script:
- the script can be written in any chosen text editor
- the script should be saved with a `.sh` extension
- they must all begin with the line `#!/bin/bash` (when writing a script for the bash shell)
  - `#!` tells UNIX it is a script that can be run
  - `/bin/bash` tells Linux what program to run the script with



In [27]:
!bash myscript.sh

Hello from myscript.sh


Parameters can be passed in to a script when it is run. The parameters are referred to using the `$` sign in scripts i.e. the first parameter is `$1`, the second is `$2`. 

In [28]:
!bash myscript2.sh "foo" "bar"

Input 1: foo, Input 2: bar


### `For` loops

For loops are useful for performing the same operation on lots of files. The basic syntax is 
```
#!/bin/bash
for f in *
do
 #something in here
 echo $f
done
```


### `If` statements

An example of an if statement in a bash shell script is:
```
#!/bin/bash
if [ $1 -lt $2 ]
then
 echo "yes" $1 "is less than" $2
else
 echo "no it isn't"
fi
```
The `else` clause is optional. For the comparison, `==`, `!=`, `-gt`, `-lt`, `-le` and `-ge` are used for equality, inequality, greater than, less than, less than or equal to and greater than or equal to respectively. 

### Shell variables

A shell variable is a name that stores a temporary value in the shell session. Values can be strings, numbers, filenames, or any text. Like in shell scripts, they are accessed using `$`.

In [29]:
!name="Eric" && echo "Name is $name"

Name is Eric


### Environmental variables
Environment variables store information about the user session and are shared with programs started from the shell.

| Variable | Meaning                               |
|------------|---------------------------------------|
| `$USER`    | Current username                       |
| `$HOME`    | User’s home directory                  |
| `$PWD`     | Present working directory              |
| `$PATH`    | List of directories searched for commands |
| `$SHELL`   | Path to login shell               |

The `export` command be used to change the value of an existing environmental variable, or create a new one, e.g. `export MYVAR="HELLO"`.

## Git

Git is software for tracking changes in files, keeping a history of modifications and enabling you to revert to previous versions, compare changes and see how made what changes. It is used for coordinating work among collaborators and has support for continous integration (CI) tools. 

Common git commands include:
- `git clone` - creates a local copy of a given respository
- `git add` - stage new/modified files for the next commit
- `git rm` - removes files from git tracking
  - using the `--cached` option keeps a local copy
- `git commit` - commits the current staged changes
- `git push` - add the changes made to the repository
- `git pull` - get the changes made to the repository