# Systems Programming

## C Basics

In [1]:
// Hello World in C
#include <stdio.h>
int main() {
    printf("Hello, World!\n");
    return 0;
}
main();

Hello, World!


### Pre-processor commands
Lines that begin with `#` are commands the C pre-processor. The line `#include <stdio.h>` looks for the source code file `stdio.h`  and includes it before compilation. This is a file required to use the standard input and output library, such as the `printf` function. 

### The `main()` function
All C programs have an entry function called `main()`. This is called by the runtime system in order to start the program running. Every C program must have exactly one `main()`, which must return an integer. Only functions called in this function will be executed. 

### The `printf()` function
The `printf()` function is used to print formatted text to the console. 

In [2]:
printf("We've got rectal bleeding.\nWhat, all of you?");

We've got rectal bleeding.
What, all of you?

The `printf()` function  does not automatically add a new line - the newline character `\n` must be used to move to the next line. 

In [3]:
printf("We've got rectal bleeding.");
printf("What, all of you?");

We've got rectal bleeding.What, all of you?

In [4]:
printf("We've got rectal bleeding.\n");
printf("What, all of you?");

We've got rectal bleeding.
What, all of you?

The `printf()` function uses a number of format specifiers to control the format of the output. These are used in the first parameter, which describes how the remaining parameters are to be formatted:
- `%d` - signed decimal (`int`)
- `%u` - unsigned decimal
- `%o`, `%x` - octal, hexadecimal
- `%l` - long integer (used to store numbers larger than 4 bytes, the limit for `int`). Must be combined with one of the specifiers above, e.g. `%ld`, `%lx`
- `%f` - floating point
- `%.nf` - floating point with `n` decimals
- `%e` - floating point in exponent form
- `%c`, `%s` - character, string

In [5]:
#include <stdio.h>
int main() {
    int a = -15;
    long b = 999999999999999999;
    float c = 3.14159;
    char d = 'f';
    char e[] = "lupus";
    
    printf("Signed int: %d\n", a);
    printf("Unsigned int: %u\n", a);
    printf("Octal: %o\n", a);
    printf("Long: %ld\n", b);
    printf("Float: %f\n", c);
    printf("Float (2d.p): %.2f\n", c);
    printf("Exponent: %e\n", c);
    printf("Char: %c\n", d);
    printf("String: %s\n", e);
    
    return 0;
}
main();

Signed int: -15
Unsigned int: 4294967281
Octal: 37777777761
Long: 999999999999999999
Float: 3.141590
Float (2d.p): 3.14
Exponent: 3.141590e+00
Char: f
String: lupus


### The function `return` statement
The `return` function is used to immediately exit a function, optionally sending a value back to the caller.

The return value from the `main()` function is special, with programs usually returning a zero value to indicate they have exited normally. 
If there is no `return` statement in the `main()` function, this generally will not cause a problem at compile-time (with the compiler assuming a return statement of `return 0;`).  

If the return value is of the wrong type, this may cause a warning at compile-time, or an error at run-time. 
allows the compiler to include or skip parts of code depending on whether certain macros are defined. This can be very useful for debugging.

### Variables
Variables and constants are the basic data objects manipulated by a program.

Declarations declare the variables used, their type and possibly initial value e.g. `int x` or `float y = 0.67`

Expressions combine variables and constants to form new values e.g. `int b = 3*5+a`. Expressions are evaluated according to operator precedence (essentially BIDMAS). 

### Data types
C is a strongly typed language, meaning every variable must have a type, usually one of:
- `char`: a single byte, often used to store a character
- `short`: an integer type, represents small whole numbers
- `int`: an integer type, represents whole numbers
- `long int`, `long long int`: an integer type, represents large or very large whole numbers
- `float`: single precision floating point number
- `double`, `long double`: double precision floating point number

The size of each type can vary depending on the system/compiler.

On 64-bit Linux:
- `char` is 1 byte
- `short int` is 2 bytes
- `int` is 4 bytes
- `long int` is 8 bytes

The `sizeof()` function can be used to check the memory size of a data type or variable in bytes.

In [6]:
printf("Char: %zu\n", sizeof(char));
printf("Short: %zu\n", sizeof(short));
printf("Int: %zu\n", sizeof(int));
printf("Long: %zu\n", sizeof(long));

Char: 1
Short: 2
Int: 4
Long: 8


#### `signed` and `unsigned`
`signed` and `unsigned` apply to to `char` or integer types.

`unsigned` integers are 0 or positive, whereas half of the range of `signed` integers is negative:
- a 1 byte `signed char` stores integers in the range [-128,127]
- a 1 byte `unsigned char` stores integers in the range [0,255]

`<limits.h>` and `<float.h>` specify what limits apply on a given system; they are system and architecture dependent. 

Unsigned arithmetic is performed modulo 2<sup>n</sup>, so overflows 'wrap around' automatically.

In [7]:
unsigned char x = 255;
x++; // x = (255 + 1) % 2^8 = 256 % 256 = 0
printf("%u", x);

0

Signed arithmetic overflow is undefined (or implementation-defined). What happens during an overflow is up to the compiler (most compilers use two's complement wrapping behaviour, but this cannot be relied upon). 

In [8]:
signed char x = 127;
x++;
printf("%d", x);

-128

#### Character constants

A character constant is a single character (including escape characters) enclosed in single quotes, e.g. `'A'`, `'3'`, `'\n'`. 

These are stored as integer values - specifically, the integer code for that character in ASCII (or another character set), e.g. `'A'` = 65 in ASCII. 

In [9]:
printf("Printing using %%c: %c\n", 'A');
printf("Printing using %%d: %d", 'A');

Printing using %c: A
Printing using %d: 65

#### String constants

String constants are zero or more characters enclosed in double quotes, e.g. `"Hello"`.

They are stored as a string of `char`s that has a `NULL` character `\0` at the end of the string. 

For example,  `char a[]="Hello";` is the same as `char a[]={'H','e','l','l','o','\0'};`

#### Enumeration

An enumeration (`enum`) creates a new type that can only take one of several named constant values. 

For example, `enum Suit {CLUBS, DIAMONDS, HEARTS, SPADES}` defines a new type `enum Suit`. Any variable of this type can have one of four possible values: `CLUBS`, `DIAMONDS`, `HEARTS` and `SPADES`. A variable `s` of this class can be declared using `enum Suit s`.

Each `enum` named constant value is replaced with an integer value by C during compilation. By default, the first value is 0 and each subsequent value is incremented by 1. 
In the previous example, `CLUBS`, `DIAMONDS`, `HEARTS` and `SPADES` represent `0`, `1`, `2` and `3` respectively. 

In [10]:
enum Suit {CLUBS, DIAMONDS, HEARTS, SPADES};
printf("Value of CLUBS: %d\n", CLUBS);
printf("Value of HEARTS: %d\n", HEARTS);
enum Suit s = DIAMONDS;
printf("Value of s: %d", s);

Value of CLUBS: 0
Value of HEARTS: 2
Value of s: 1

Values for the enumeration constants can be set manually, and can be arbitrary integers in no particular order. 
It is possible allowed for multiple constants to have the same value
When no value is specified for an enumeration constant, its value is one greater than the value of the previous constant. 

In [11]:
enum names {ANTIONE=24, ADAM=15, MARCUS, ADRIEN=3};
printf("Value of ANTIONE: %d\n", ANTIONE);
printf("Value of ADAM: %d\n", ADAM);
printf("Value of MARCUS: %d\n", MARCUS);
printf("Value of ADRIEN: %d\n", ADRIEN);

Value of ANTIONE: 24
Value of ADAM: 15
Value of MARCUS: 16
Value of ADRIEN: 3


### True/False and comparison

Traditionally, C did not have a boolean type, instead just using `int`:
- 0 is false
- Any other `int` is true

Comparisons `<`, `<=`, `==`, `>=`, `>`, `!=` will evaluate to:
- 1 if they hold
- 0 if they don't

The bool type has been introduced in later versions, being defined in `stdbool.h`, but the integer convention can still be used if preferred. 

### Statements and compound statements

A statement in C is a single instruction terminated with a semicolon. 

In [12]:
printf("Hello World");

Hello World

A compound statement is a set of statements enclosed in curly braces `{}`. A statement can always be replaced by a compound statement.

In [13]:
{
printf("Hello ");
printf("World\n");
}

Hello World


Note that in C, formatting does not matter, but it is useful for making code readable. 

### Iteration

There are three iteration statements in C:
- the `while` statement is used for loops whose controlling expression is tested before the loop body is
executed
- the `do` statement is used if the expression is tested after the loop body is executed
- the `for` statement is convenient for loops that increment or decrement a counting variable or iterator

#### `while`
The `while` statement has the following form:
```c
while ( expression ) {
    statement
   }
```
- `expression` is the controlling expression
- `statement` is the loop body
- the expression is evaluated and if it is non-zero (true), the body is executed
- the expression is tested before the loop body begins
- if there is only one `statement` in the loop body, the enclosing `{}` can be omitted

In [14]:
#include <stdio.h>

int main(){
    int i = 1;
    while(i<5){
        printf("Verstappen has %d championships\n", i);
        i = i+1;
    }
    
    printf("Verstappen will soon have %d championships", i);
    return 0;
}

main();

Verstappen has 1 championships
Verstappen has 2 championships
Verstappen has 3 championships
Verstappen has 4 championships
Verstappen will soon have 5 championships

#### `do`
The `do` statement has the following form:
```c
do {
 statement
 } while ( expression );
```
- `expression` is the controlling expression
- `statement` is the loop body
- the expression is evaluated and if it is non-zero (true), the body is executed again
- the expression is tested after the loop body ends

In [15]:
#include <stdio.h>

int main(){
    int i = 1;
    do{
        printf("Verstappen has %d championships\n", i);
        i = i+1;
    } while(i<5);
    
    printf("Verstappen will soon have %d championships", i);
    return 0;
}

main();

Verstappen has 1 championships
Verstappen has 2 championships
Verstappen has 3 championships
Verstappen has 4 championships
Verstappen will soon have 5 championships

#### `for`
The `for` statement has the following form:
```c
for ( expr1 ; expr2 ; expr3 ){ 
    statement
  }
```
- `expr1` - initialisation
- `expr2` - conditional
- `expr3` - increment

In [16]:
#include <stdio.h>

int main() {
    int i;
    for (i=1; i<5; i++) {
        printf("Verstappen has %d championships\n", i);
    }
    printf("Verstappen will soon have %d championships", i);
    return 0;
}

main();

Verstappen has 1 championships
Verstappen has 2 championships
Verstappen has 3 championships
Verstappen has 4 championships
Verstappen will soon have 5 championships

#### `break`
The `break` statement causes the innermost enclosing loop to be exited immediately. 

In [17]:
#include <stdio.h>

int main() {
    int i;
    for (i=1; i<5; i++) {
        break;
        printf("Verstappen has %d championships\n", i);
    }
    printf("Verstappen will soon have %d championships", i);
    return 0;
}

main();

Verstappen will soon have 1 championships

#### `continue`
The `continue` statement causes the next iteration for the loop to begin. 
- in the case of a `while` or `do` loop, the test part is executed immediately
- in the case of a `for` loop, control first passes to the increment step

In [18]:
#include <stdio.h>

int main() {
    int i;
    for (i=1; i<5; i++) {
        continue;
        printf("Verstappen has %d championships\n", i);
    }
    printf("Verstappen will soon have %d championships", i);
    return 0;
}

main();

Verstappen will soon have 5 championships

### Conditionals 

#### `if-else`
`if` allows a choice between two alternatives by testing an expresion. An `if` statements can have an `else` clause:
```c
if ( expr1 ) {
    statement1 
  }
  else {
    statement2
  }
```
When executed, `expr1` is evaluated;
- if `expr1` is non-zero, `statement1` is executed 
- otherwise `statement2` (if present) is executed

In [19]:
#include <stdio.h>

int main() {
    for (int i=1; i < 5; i++) {
        if (i % 2 == 0) {
            printf("%d is even\n", i);
        }
        else {
            printf("%d is odd\n", i);
        }
    }
    return 0;
}

main();

1 is odd
2 is even
3 is odd
4 is even


#### `switch`
A `switch` statement allows a choice between different blocks of code based on the value of an expression. 

It has the following form:
```c
switch (expression) {
    case value1:
        // statements
        break;

    case value2:
        // statements
        break;

    default:
        // statements
        break;
}
```
If there is no `break` statement in a block, then execution 'falls through' - the code in subsequent case blocks is executed even if a matching case has already been found. 

The `default` block is executed if the expression does not match any of the cases.



### `x++` vs `++x`

Both `x++` and `++x` can be used to increment a variable `x`, i.e. they both mean `x=x+1`. However, there is a subtle difference between the two:
- `x++` returns the value of `x` first, then increments
- `++x` increments first, then returns the value of `x`

In [20]:
#include <stdio.h>

int main() {
    int x = 5;
    int y = x++;  // y = 5, x incremented to 6
    int z = ++x;  // x incremented to 7, z = 7
    printf("x=%d, y=%d, z=%d\n", y, x, z);
}

main();

x=5, y=7, z=7


### Functions
Functions encapsulate code in a convenient way so that it can be reused, organised and understood more easily. 

A function definition can appear anywhere in the file, as long as the function declaration comes before it and before the function is used. 

#### Function declarations
Functions can be declared before they are defined, using a function declaration:
```
    [return-type] [function-name] ( [parameters] );     
```
For example,
```c
    int add (int a, int b);
```
Note that the input parameters do not need to be named in the declaratio - only their types must be included. Parameter names are only required in the function definition.

Therefore, the example above could also be written as:
```c
    int add (int, int);
```

Function declarations are often placed in a `.h` header file.

#### Call-by-value
Function parameters in C are passed in using a call-by-value semantic; the values of the arguements are copied into the parameter variables of function. A function cannot effect the value of its arguements. 

In [21]:
int timesFive(int a) {
    a *= 5;
    return a;
}
int x = 3;
int y = timesFive(x);
printf("Value returned: %d\n", y);
printf("Value of x: %d", x);

Value returned: 15
Value of x: 3

#### Organisation
Functions can be organised by using `.h` header files and `.c` source files. An example setup is given below:

`timesFive.h`
```c
    int timesFive(int);
```

`timesFive.c`
```c
    #include "timesFive.h"

    int timesFive(int a) {
        a *= 5;
        return a;
    }
```

`main.c`
```c
    #include <stdio.h>
    #include "timesFive.h"

    int main() {
        int x = 5;
        int y = timesFive(x);
        printf("%d times 5 is %d\n", x, y);
        return 0;
    }
```

If the same header is included multiple times in the same compilation unit, it can cause a 'multiple definitions' error. To prevent this, header guards can be used in header files to conditionally include the contents of the header only if it hasn’t been included already.

For example,

`timesFive.h`
```c
    #ifndef TIMESFIVE_H
    #define TIMESFIVE_H
    int timesFive(int);
    #endif
```

## Compilation

![Compilation Stages](compiling.png)

### The C Pre-processor

The C pre-processor is a program that runs before compilation, modifying the source code according to the pre-processor directives. Such directives include `#define` e.g. `#define PI 3.14151` and `#include` e.g. `#include <stdio.h>`.

`#define` is used to define a macro, which is essentially a name for a value or a code snippet. The pre-processor replaces every occurence of the macro with its replacement text before the code is compiled.

When using `#include`:
- if `< >` are used, the system directory (`usr/include`)  is prioritised
- if `" "` are used, the current working directory is used
  - the appropriate delimiters should be used depending on the type of header file e.g. system or user-defined

#### Conditional compilation
Conditional compilation allows the compiler to include or skip parts of code depending on whether certain macros are defined. This can be very useful for debugging.

In [22]:
#include <stdio.h>

// Uncomment to enable debug mode
//#define DEBUG

int main() {
    printf("Program started\n");

#ifdef DEBUG
    printf("Debug mode is ON\n");
#else
    printf("Debug mode is OFF\n");
#endif
    return 0;
}
main();

Program started
Debug mode is OFF


In [23]:
#include <stdio.h>

// Comment to enable debug mode
#define DEBUG

int main() {
    printf("Program started\n");

#ifdef DEBUG
    printf("Debug mode is ON\n");
#else
    printf("Debug mode is OFF\n");
#endif
    return 0;
}
main();

Program started
Debug mode is ON


#### Parameterised macros 

A parameterised (function-like) macro accepts parameters and uses them in its replacement text. They act like inline functions, but the replacment is done textually by the pre-processor before compilation, preventing the need for actual function calls. 

The parameters may appear as many times as desired in the replacement text. 

In [24]:
#include <stdio.h>

#define ADD(a, b) ((a) + (b))  // Parameterized macro

int main() {
    int x = 5, y = 3;

    printf("Sum: %d\n", ADD(x, y));      // replaced by ((x) + (y))
    printf("Sum: %d\n", ADD(2+3, 4+1));  // replaced by ((2+3) + (4+1)) = 10

    return 0;
}
main();

Sum: 8
Sum: 10


Using parameterised macros may make a program slightly faster, since a function call usually requires some overhead during program execution, but a macro invocation does not. Furthermore, macros are 'generic' - they can accept arguements of any type, provided that the resulting program is valid. 

However, this can also be a disadvantage, as arguements aren't checked or converted to the correct type by the pre-processor, whereas in a function, the compiler checks each arguement to see if it has the appropriate type. Since macros work as direct substitutions in code, it is important to always use brackets to the fullest extent possible to prevent any unexepected results.  

### Compiling with GCC

GCC is a common compiler for C, which takes C source code and turns it into a machine-executable binary that can be run. 

A program can be compiled from the shell using `gcc -o outfile file.c`:
- the option `-o` is used to name the output file
- the option `-E` is used to do pre-processing only (this can also be done using `cpp` i.e. `cpp file.c`)
- the option `-S` is used to go as far as compilation only (no assembling/linking)
- the option `-c` is used to go as far as assembly only (no linking)
- the option `-l` is used to link external libraries
- the option `-I` is used to include the path for more `.h` header files
- multiple files can be compiled and linked together by listing them (e.g. `gcc part1.c part2.c -o outfile`)

### Makefiles

When a program consists of multiple source files, compiling each one can be tedious and error-prone. A Makefile is used to automate and manage the build process, using the `make` keyword. 

A Makefile is a rule-based (declarative) configuration file that tells the make utility how to build the main program. 

The format of each rule is:
```
target [target...]: [component ...]
    [command 1]
    ...
    [command n]
```
`target` is the file to be created and `component` is the files that the target depends on, which must exist or be created by another rule. Note that the space before each `command` is a Tab character. 

Let's say for example we have the files `main.c`, `counter.h`, `counter.c`, `sales.h` and `sales.c`.

Then we could have a Makefile that looks like:

```makefile
all: counter.o sales.o main.c
        gcc -o program main.c counter.o sales.o

counter.o: counter.c counter.h
        gcc -c counter.c

sales.o: sales.c sales.h
        gcc -c sales.c

clean:
        rm -rf program counter.o sales.o
```

`all` is the default target that is built when you just run `make`. 

#### Macros

Macros in Makefiles can be used to store definitions e.g. `CC=gcc`. Macros can also be defined using the output of shell commands e.g. `DATE = $(shell date)`. The Macros can be used in the Makefile, using `$` just as in the shell normally. 

#### Pattern rules

Pattern rules can be used to match multiple files, so that each dependency does not need to be manually listed, e.g.
```
DEPS = counter.h sales.h
%.o: %.c $(DEPS)
        gcc -c $< -o $@
```
This example rule means that to build any `.o` object file, you need a file with the same name ending in `.c`, as well as the header files listed in `DEPS`. `$<` means the first prerequisite, i.e. the first dependency listed after the colon, and `$@` means the target (in this case the `.o` file being built).

`%` is the wildcard symbol, used to match any non-empty substring. The substring that % matches in the target is called the stem. It can only be used once per pattern (it cannot be used multiple times in the target), e.g.:
- `%.c` as a pattern matches any file name that ends in `.c`. 
- `s.%.c` as a pattern matches any file name that starts with `s.`, ends in `.c` and is at least five characters long (there must be at least one character to match the `%`).

Automatic variables have values that are computed for each rule that is executed, based on the current target being built. These can be used in commands. 

| Variable | Meaning |
|----------|---------|
| `$@`       | The target filename |
| `$<`       | The first prerequisite |
| `$^`       | All prerequisites, with duplicates removed |
| `$+`       | All prerequisites, with duplicates kept |
| `$?`      | Newer prerequisites than the target |
| `$*`       | The stem (the part that % matched) |

#### Additional information
- comments can be included by starting the line with `#`
- lazy evaluation is used - an expression is not evaluated or computed until its value is actually needed
- if a target exists and has a later timestamp than all of its components, the Makefile will assume it is up to date and will not re-process it
- Makefiles are not linked with C; they can be used with any code/work
- any specific rule can be run by invoking its target e.g. `make sales.o`

## The Shell

A shell is a powerful command-line interface (CLI) thats allows the user to interact with the operating system (OS) by typing commands. This includes the ability to:
- run programs
- control how programs work
- move around between different directories/folders
- perform sequences of commands to achieve more complex work

There are a number of different shells, such as bash and PowerShell. 

### Basic Commands

Some basic commands are given below. 

Note: the `!` before each command is not needed when using an actual shell (it is only necessary since this is a Jupyter Notebook)

`pwd` - *Print working directory*

In [25]:
!pwd 

/mnt/d/Notebooks/COMP2221 Programming Paradigms


`ls` - *List*

In [26]:
!ls 

Lectures
Practicals
Systems Programming.ipynb
compiling.png
gradescope-submission
gradescope-submission.zip
myscript.sh
myscript2.sh
permission_string.png


`man` - *Manual*

In [27]:
!man ls

LS(1)                            User Commands                           LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort  is  speci‐
       fied.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print C-style escapes for nongraphic characters

       --block-size=SIZE
              with  -l,  scale  sizes  by  SIZE  when  printing  them;   e.g.,
              '--block-size=M'; see SIZE format below

       -B, --ignore-backups
              do not list implied entries ending with ~

       -c     with  -lt: 

`cd` - *Change directory*
- `.` *= current directory*
- `~` *= home folder*
- `..` *= one folder up*

In [28]:
!pwd

/mnt/d/Notebooks/COMP2221 Programming Paradigms


In [29]:
!cd ~ && pwd

/home/francis


In [30]:
!cd .. && pwd

/mnt/d/Notebooks


### `stdin`, `stdout` and `stderr`

`stdin`, `stdout` and `stderr` are the three built-in communication channels that each program recieves from the OS when it starts running. They remove the need to worry about I/O devices.
- `stdin` (Standard Input) is where programs read data from
- `stdout` (Standard Output) is where programs send normal output
- `stderr` (Standard Error) is where programs send error messages

### Pipes 

The shell provides many small tools (commands) - the power comes from composing them together. Pipes provide a means to do this. 

By default, each command takes an input (from the keyboard) and produces an output (to the screen). The input and output of a command can be redirected:
- `<` - taken input from a file
- `>` - write output to a file
  - a single `>` overwrites the file; `>>` appends to the file
- `|` - take the output of one command and use at input to the next

### `grep`
`grep` is a search tool that can be used to search through files or the output of other commands (via pipes). It can search through specific file(s) by providing the filename(s), or it can search through all files in the current directory by using the `-r` recursive flag.

In [31]:
!grep "shell" "systems programming.ipynb"

    "A program can be compiled from the shell using `gcc -o outfile file.c`:\n",
    "Macros in Makefiles can be used to store definitions e.g. `CC=gcc`. Macros can also be defined using the output of shell commands e.g. `DATE = $(shell date)`. The Macros can be used in the Makefile, using `$` just as in the shell normally. \n",
    "A shell is a powerful command-line interface (CLI) thats allows the user to interact with the operating system (OS) by typing commands. This includes the ability to:\n",
    "There are a number of different shells, such as bash and PowerShell. \n",
    "Note: the `!` before each command is not needed when using an actual shell (it is only necessary since this is a Jupyter Notebook)\n",
      "              do not list implied entries matching shell  PATTERN  (overridden\n",
      "              do not list implied entries matching shell PATTERN\n",
      "              use  quoting style WORD for entry names: literal, locale, shell,\n",
      "            

In [32]:
!grep -r "pipes" 

.ipynb_checkpoints/Systems Programming-checkpoint.ipynb:    "`grep` is a search tool that can be used to search through files or the output of other commands (via pipes). It can search through specific file(s) by providing the filename(s), or it can search through all files in the current directory by using the `-r` recursive flag."
.ipynb_checkpoints/Systems Programming-checkpoint.ipynb:      ".ipynb_checkpoints/Systems Programming-checkpoint.ipynb:    \"`grep` is a search tool that can be used to search through files or the output of other commands (via pipes). It can search through specific file(s) by providing the filename(s), or it can search through all files in the current directory by using the `-r` recursive flag.\"\n",
.ipynb_checkpoints/Systems Programming-checkpoint.ipynb:      ".ipynb_checkpoints/Systems Programming-checkpoint.ipynb:      \".ipynb_checkpoints/Systems Programming-checkpoint.ipynb:    \\\"`grep` is a search tool that can be used to search through files or th

`grep` uses **regular expressions** for matching text. 

## Regular Expresions
Regular expressions provide a concise way to match different strings. They use a specific syntax:
- `.` - matches any single character (except a newline character)
- `*` - matches zero or more of the preceeding character
- `?` - matches zero or one of the preceeding character
- `+` - matches one or more of the preceeding character
- `[ABC]` - matches one character that is `A`, `B` or `C`
- `[A-Z]` - matches any upper case character `A` to `Z`
- `[0-9]` - matches any digit

For example, the regular expression `[A-Za-z]*[0-9].txt` matches zero or more letters (uppercase or lowercase), followed by exactly one digit and the literal suffix `.txt`.
Examples of strings that this expression would match include `MyFile5.txt`, `abc0.txt` and `1.txt`.

## File Permissions
Every file and directory in UNIX has an access mode controlling who can read, write, or execute it.

| Permission | Symbol | Description |
|------------|--------|-------------|
| Read       | r      | View or copy the file contents |
| Write      | w      | Modify or delete the file |
| Execute    | x      | Run as a program (for files) or enter (for directories) |


There are three permission groups which can each be granted specific permissions:
- Owner (user) - the person who created the file
- Group - a named collection of users who share the same permissions
- Others - everyone else

The permission string is a 10 character string that specifies the permissions of the different groups. 

<img src="permission_string.png" alt="Permission String" width="300px">

File permissions can be changed using `chmod`. 
The syntax for `chmod` is `chmod [permissions] [file]`.
For example, `chmod u+x file.sh` adds execute permission for the owner (for the file file.sh).

## Text Operations

#### `sort`

`sort` takes in a file, if specified, or reads from `stdin` if not file is specified. It sorts the input (alphabetically/numerically) and outputs it to `stdout`, or a file if specified with `-o filename`. 

In [33]:
!echo "C \nA \nD \nB"

C 
A 
D 
B


In [34]:
!echo "C \nA \nD \nB" | sort

A 
B
C 
D 


### `translate`

Usage: `tr SET1 SET2`
- translates or deleted characters from SET1 to SET2
- e.g. `tr 'A-Z' 'a-z' produces a lower case version of `stdin`
- option `-c` takes the complement of SET1
  - `tr -c 'a-zA-Z' '\n'` replaces all non-letter characters with newlines
- option `-s` squeezes repeats to a single character
  - `tr -s ' '` converts multiple spaces into one
- option `-d` deletes all characters in SET1

In [35]:
!echo "abc123" | tr 'a-z' 'A-Z'

ABC123


In [36]:
!echo "abc123" | tr -d 'a-z'

123


In [37]:
!echo "abc123" | tr -dc 'a-z'

abc

In [38]:
!echo "aaabccc12223" | tr -s 'ac2' 

abc123


### `uniq`

`uniq` is used to remove or report repeated lines. It only removes consecutive repeated lines, so it is often used with `sort` to find/remove repeated lines throughout the document (i.e. `sort | uniq`). The option `-c` can be used to count the number of repitions. 

In [39]:
!echo "a\na\nb\na\nc\na" | uniq -c

      2 a
      1 b
      1 a
      1 c
      1 a


In [40]:
!echo "a\na\nb\na\nc\na" | sort | uniq -c

      4 a
      1 b
      1 c


In [41]:
!echo "a\na\nb\na\nc\na" | uniq

a
b
a
c
a


In [42]:
!echo "a\na\nb\na\nc\na" | sort | uniq

a
b
c


## File handling
Files are stored in a hierarchical structure (a tree) - the top level is the root directory `/`. Each directory (folder) can contain files or subdirectories, which allows grouping and organisation.

There are a number of commands for navigating around the file system. `ls` and `cd` have been covered [previously](#Basic-Commands), but additional commands include:
- `mkdir` - make a new folder
- `mv` - move a file/folder (also used to rename)
- `cp` - copy a file/folder
- `rm` - delete a file, or a folder using `-r`
- `du` - show disk usage of a file/folder
- `find` - search for files/folders in a directory tree

## Shell scripts

A shell script is a collection of commands enclosed in a file. 

It allows tasks to be automated by running each command in order automatically, rather than having to type out each command manually. 

When writing a shell script:
- the script can be written in any chosen text editor
- the script should be saved with a `.sh` extension
- they must all begin with the line `#!/bin/bash` (when writing a script for the bash shell)
  - `#!` tells UNIX it is a script that can be run
  - `/bin/bash` tells Linux what program to run the script with



In [43]:
!bash myscript.sh

Hello from myscript.sh


Parameters can be passed in to a script when it is run. The parameters are referred to using the `$` sign in scripts i.e. the first parameter is `$1`, the second is `$2`. 

In [44]:
!bash myscript2.sh "foo" "bar"

Input 1: foo, Input 2: bar


### `For` loops

For loops are useful for performing the same operation on lots of files. The basic syntax is 
```
#!/bin/bash
for f in *;
do
 #something in here
 echo $f
done
```


### `If` statements

An example of an if statement in a bash shell script is:
```
#!/bin/bash
if [ $1 -lt $2 ]
then
 echo "yes" $1 "is less than" $2
else
 echo "no it isn't"
fi
```
The `else` clause is optional. For the comparison, `==`, `!=`, `-gt`, `-lt`, `-le` and `-ge` are used for equality, inequality, greater than, less than, less than or equal to and greater than or equal to respectively. 

### Shell variables

A shell variable is a name that stores a temporary value in the shell session. Values can be strings, numbers, filenames, or any text. Like in shell scripts, they are accessed using `$`.

In [45]:
!name="Eric" && echo "Name is $name"

Name is Eric


### Environmental variables
Environment variables store information about the user session and are shared with programs started from the shell.

| Variable | Meaning                               |
|------------|---------------------------------------|
| `$USER`    | Current username                       |
| `$HOME`    | User’s home directory                  |
| `$PWD`     | Present working directory              |
| `$PATH`    | List of directories searched for commands |
| `$SHELL`   | Path to login shell               |

The `export` command be used to change the value of an existing environmental variable, or create a new one, e.g. `export MYVAR="HELLO"`.

## Git

Git is software for tracking changes in files, keeping a history of modifications and enabling you to revert to previous versions, compare changes and see how made what changes. It is used for coordinating work among collaborators and has support for continous integration (CI) tools. 

Common git commands include:
- `git clone` - creates a local copy of a given respository
- `git add` - stage new/modified files for the next commit
- `git rm` - removes files from git tracking
  - using the `--cached` option keeps a local copy
- `git commit` - commits the current staged changes
- `git push` - add the changes made to the repository
- `git pull` - get the changes made to the repository

