# Bash Basic Exercises

These exercise should help you either reviewing or learning new shell commands. The exercises can be done localy on Linux, Mac or Windows 11 bash console. 
You can also open this JupyterNotebook directly in Binder or google collab and do the exercises. In this case be aware that the system runs on linux, which is important when using paths.  
Linux and mac use `/` instead of `\` as an example. 

But First we go through a quick explanation on how to use **BASH** in a jupyter notebook on a linux or mac system.

## Using Bash in Jupyter Notebook

Depending on what system you use it is important to first figure out the current path on the file system, for that run the next cell. 



In [None]:
%%bash
pwd

To run a single command you may add an exclamation mark to the front of the command. As seen as in the example below **!** is added to the command *echo*. 
After that just run the cell and it should give the output, in this case "hello world".  

In [1]:
!echo "hello world!"

hello world!


However, that won't work if for example you want to run multiple commands that talk to each other, since each command is executed in a separate process:

In [2]:
!export RAND=42
!echo $RAND 




Nothing is printed, since RAND was defined in a separate process which has already exited

But, if your notebook code cell starts with **%%bash**, it's all executed as one script!

In [3]:
%%bash

export RAND=42
echo $RAND

42


You can also pass python variables into your shell commands, and vise versa! 

In [5]:
# You can also pass python variables into your shell commands, and vise versa!
python_var = []


output_of_ls_as_python_var = !ls -a
print(output_of_ls_as_python_var)

['.', '..', '2-2_bash_tools.md', '2-3_bash_advanced.md', '2_bash_basics.md', 'bash.ipynb']


## 1. Basic Exercises

The exercises below involve creating and moving new files, as well as considering hypothetical files.
Please note that if you create or move any files or directories in your Zipf's Law project, you may want to reorganize your files following the outline at the beginning of the next chapter.
If you accidentally delete necessary files, you can start with a fresh copy of the data files by following the instructions in Section [1 getting started](https://software-engineering-group-up.github.io/RSE-UP/chapters/01/1_getting_started.html#downloading-the-data).

### Exploring more `ls` flags

1.1.1 What does the command `ls` do when used with the `-l` option?

1.1.2 What happens if you use two options at the same time, such as `ls -l -h`?

### Listing recursively and by time 
The command `ls -R` lists the contents of directories recursively, which means the subdirectories, sub-subdirectories, and so on at each level are listed.
The command `ls -t` lists things by time of last change, with most recently changed files or directories first.

1.2.1 In what order does `ls -R -t` display things? Hint: `ls -l` uses a long listing format to view timestamps.

### Absolute and relative paths

1.3.1 Starting from your current directory, which of the following commands could you use to navigate to your home directory,
which is `/Users/*YOUR-USERNAME` or on Linux and Mac `/home/*YOUR-USERNAME*` ?

1. `cd .`
2. `cd /`
3. `cd /home/amira`
4. `cd ../..`
5. `cd ~`
6. `cd home`
7. `cd ~/data/..`
8. `cd`
9. `cd ..`
10. `cd ../.`

### Relative path resolution

1.3.1 Using the filesystem shown below,
if `pwd` displays `/Users/sami`, what will `ls -F ../backup` display?

1.  `../backup: No such file or directory`
2.  `final original revised`
3.  `final/ original/ revised/`
4.  `data/ analysis/ doc/`


![Filesystem/exercise](../figures/bash-basics/exercise-filesystem.png)




### `ls` reading comprehension 

1.4.1 Using the filesystem shown above, if `pwd` displays `/Users/backup`, and `-r` tells `ls` to display things in reverse order, what command(s) will result in the following output:

```bash
doc/ data/ analysis/
```

1.  `ls pwd`
2.  `ls -r -F`
3.  `ls -r -F /Users/backup`

### Creating files a different way 

1.5.1 What happens when you execute `touch my_file.txt`?  (Hint: use `ls -l` to find information about the file)

1.5.2 When might you want to create a file this way?

### Using `rm` safely 

1.6.1 What would happen if you executed `rm -i my_file.txt` on the file created in the previous exercise?

1.6.2 Why would we want this protection when using `rm`?


### Moving to the current folder 

After running the following commands,
Amira realizes that she put the (hypothetical) files `chapter1.txt` and `chapter2.txt` into the wrong folder:

```bash
$ ls -F
```
```text
  data/  docs/
```

```bash
$ ls -F data
```
```text
README.md			frankenstein.txt		sherlock_holmes.txt
chapter1.txt		jane_eyre.txt			time_machine.txt
chapter2.txt		moby_dick.txt
dracula.txt			sense_and_sensibility.txt
```

```bash
$ cd docs
```

1.7.1  Fill in the blanks to move these files to the current folder (i.e., the one she is currently in):

```bash
$ mv ___/chapter1.txt  ___/chapter2.txt ___
```

### Renaming files 

Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: `statstics.txt`

1.8.1 After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?

1. `cp statstics.txt statistics.txt`
2. `mv statstics.txt statistics.txt`
3. `mv statstics.txt .`
4. `cp statstics.txt .`


### Moving and copying 

1.9.1 Assuming the following hypothetical files, what is the output of the closing `ls` command in the sequence shown below?

```bash
$ pwd
```

```text
/Users/amira/data
```

```bash
$ ls
```

```text
books.dat
```

```bash
$ mkdir doc
$ mv books.dat doc/
$ cp doc/books.dat ../books-saved.dat
$ ls
```

1.   `books-saved.dat doc`
2.   `doc`
3.   `books.dat doc`
4.   `books-saved.dat`

### Copy with multiple filenames 

This exercise explores how `cp` responds when attempting to copy multiple things.

1.10.1 What does `cp` do when given several filenames followed by a directory name?

```bash
$ mkdir backup
$ cp dracula.txt frankenstein.txt backup/
```

1.10.2 What does `cp` do when given three or more filenames?

```bash
$ cp dracula.txt frankenstein.txt jane_eyre.txt
```


### List filenames matching a pattern

1.11.1 When run in the `data` directory of your project directory, which `ls` command(s) will produce this output?

`jane_eyre.txt   sense_and_sensibility.txt`

1. `ls ??n*.txt`
2. `ls *e_*.txt`
3. `ls *n*.txt`
4. `ls *n?e*.txt`

### Organizing directories and files 

Amira is working on a project and she sees that her files aren't very well organized:

```bash
$ ls -F
```

```text
books.txt    data/    results/   titles.txt
```

The `books.txt` and `titles.txt` files contain output from her data analysis. 

1.12.1 What command(s) does she need to run to produce the output shown?

```bash
$ ls -F
```

```text
data/   results/
```

```bash
$ ls results
```

```text
books.txt    titles.txt
```

### Reproduce a directory structure 

You're starting a new analysis, and would like to duplicate the directory structure from your previous experiment so you can add new data.

Assume that the previous experiment is in a folder called `2016-05-18`, which contains a `data` folder that in turn contains folders named `raw` and `processed` that contain data files.  The goal is to copy the folder structure of `2016-05-18/data` into a folder called `2016-05-20` so that your final directory structure looks like this:

```bash
	2016-05-20/
	└── data
	    ├── processed
	    └── raw
```
1.13.1 Which of the following commands would achieve this objective?

1.13.2 What would the other commands do?

```bash
# Set 1
$ mkdir 2016-05-20
$ mkdir 2016-05-20/data
$ mkdir 2016-05-20/data/processed
$ mkdir 2016-05-20/data/raw
```

```bash
# Set 2
$ mkdir 2016-05-20
$ cd 2016-05-20
$ mkdir data
$ cd data
$ mkdir raw processed
```

```bash
# Set 3
$ mkdir 2016-05-20/data/raw
$ mkdir 2016-05-20/data/processed
```

```bash
# Set 4
$ mkdir 2016-05-20
$ cd 2016-05-20
$ mkdir data
$ mkdir raw processed
```

### Wildcard expressions 

Wildcard expressions can be very complex, but you can sometimes write them in ways that only use simple syntax, at the expense of being a bit more verbose.
In your `data/` directory, the wildcard expression `[st]*.txt` matches all files beginning with `s` or `t` and ending with `.txt`. 

1.14 Imagine you forgot about this.

1.  Can you match the same set of files with basic wildcard expressions
    that do not use the `[]` syntax? *Hint*: You may need more than one
    expression.

2.  Under what circumstances would your new expression produce an error message
    where the original one would not?

### Removing unneeded files

Suppose you want to delete your processed data files, and only keep your raw files and processing script to save storage.
The raw files end in `.txt` and the processed files end in `.csv`. 

1.15.1 Which of the following would remove all the processed data files, and *only* the processed data files?

1. `rm ?.csv`
2. `rm *.csv`
3. `rm * .csv`
4. `rm *.*`


### Other wildcards 

The shell provides several wildcards beyond the widely used `*`.

1.16.1 To explore them, explain in plain language what (hypothetical) files the expression `novel-????-[ab]*.{txt,pdf}` matches and why.

## 2. BASH Tool Exercises

### 2.1.1 What does `>>` mean?

We have seen the use of `>`, but there is a similar operator `>>` which works slightly differently.
We'll learn about the differences between these two operators by printing some strings.
We can use the `echo` command to print strings as shown below:


```bash
$ echo The echo command prints text
```

```text
The echo command prints text
```

Now test the commands below to reveal the difference between the two operators:

```bash
$ echo hello > testfile01.txt
```

and:

```bash
$ echo hello >> testfile02.txt
```

Hint: Try executing each command twice in a row and then examining the output files.

### Appending data 

2.2.1 Given the following commands, what will be included in the file `extracted.txt`:

```bash
$ head -n 3 dracula.txt > extracted.txt
$ tail -n 2 dracula.txt >> extracted.txt
```

1. The first three lines of `dracula.txt`
2. The last two lines of `dracula.txt`
3. The first three lines and the last two lines of `dracula.txt`
4. The second and third lines of `dracula.txt`

### Piping commands 

In our current directory, we want to find the 3 files which have the least number of lines. 

2.3.1 Which command listed below would work?

1. `wc -l * > sort -n > head -n 3`
2. `wc -l * | sort -n | head -n 1-3`
3. `wc -l * | head -n 3 | sort -n`
4. `wc -l * | sort -n | head -n 3`

### Why does `uniq` only remove adjacent duplicates? 

The command `uniq` removes adjacent duplicated lines from its input. Consider a hypothetical file `genres.txt` containing the following data:

```text
science fiction
fantasy
science fiction
fantasy
science fiction
science fiction
```

Running the command `uniq genres.txt` produces:

```text
science fiction
fantasy
science fiction
fantasy
science fiction
```

2.4.1 Why do you think `uniq` only removes *adjacent* duplicated lines? (Hint: think about very large datasets.) 

2.4.2 What other command could you combine with it in a pipe to remove all duplicated lines?

### Pipe reading comprehension 

A file called `titles.txt` contains a list of book titles and publication years:

```text
Dracula,1897
Frankenstein,1818
Jane Eyre,1847
Moby Dick,1851
Sense and Sensibility,1811
The Adventures of Sherlock Holmes,1892
The Invisible Man,1897
The Time Machine,1895
Wuthering Heights,1847
```

2.5.1 What text passes through each of the pipes and the final redirect in the pipeline below?

```bash
$ cat titles.txt | head -n 5 | tail -n 3 | sort -r > final.txt
```

Hint: build the pipeline up one command at a time to test your understanding


### Pipe construction 

For the file `titles.txt` from the previous exercise, consider the following command:

```bash
$ cut -d , -f 2 titles.txt
```

2.6.1 What does the `cut` command (and its options) accomplish?

### Which pipe? 

Consider the same `titles.txt` from the previous exercises.

The `uniq` command has a `-c` option which gives a count of the number of times a line occurs in its input.

2.6.2 If `titles.txt` was in your working directory, what command would you use to produce a table that shows the total count of each publication year in the file?

1.  `sort titles.txt | uniq -c`
2.  `sort -t, -k2,2 titles.txt | uniq -c`
3.  `cut -d, -f 2 titles.txt | uniq -c`
4.  `cut -d, -f 2 titles.txt | sort | uniq -c`
5.  `cut -d, -f 2 titles.txt | sort | uniq -c | wc -l`

### Doing a dry run 

A loop is a way to do many things at once---or to make many mistakes at once if it does the wrong thing. One way to check what a loop *would* do is to `echo` the commands it would run instead of actually running them.

Suppose we want to preview the commands the following loop will execute without actually running those commands (`analyze` is a hypothetical command):

```bash
$ for file in *.txt
> do
>   analyze $file > analyzed-$file
> done
```

2.7.1 What is the difference between the two loops below, and which one would we want to run?

```bash
$ for file in *.txt
> do
>   echo analyze $file > analyzed-$file
> done
```

or:

```bash
$ for file in *.txt
> do
>   echo "analyze $file > analyzed-$file"
> done
```

### Variables in loops

2.8.1 Given the files in `data/`, what is the output of the following code?

```bash
$ for datafile in *.txt
> do
>    ls *.txt
> done
```

2.8.2 Now, what is the output of the following code?

```bash
$ for datafile in *.txt
> do
>	ls $datafile
> done
```

2.8.3 Why do these two loops give different outputs?

### Limiting sets of files

2.9.1 What would be the output of running the following loop in your `data/` directory?

```bash
$ for filename in d*
> do
>    ls $filename
> done
```

2.9.2 How would the output differ from using this command instead?

```bash
$ for filename in *d*
> do
>    ls $filename
> done
```

### Saving to a file in a loop 

Consider running the following loop in the  `data/` directory:

```bash
for book in *.txt
> do
>     echo $book
>     head -n 16 $book > headers.txt
> done
```

2.10.1 Why would the following loop be preferable?

```bash
for book in *.txt
> do
>     head -n 16 $book >> headers.txt
> done
```



### Why does `history` record commands before running them?

If you run the command:

```bash
$ history | tail -n 5 > recent.sh
```

2.11.1 The last command in the file is the `history` command itself, i.e., the shell has added `history` to the command log before actually running it. 
In fact, the shell *always* adds commands to the log
before running them. Why do you think it does this?

## 3. BASH Advanced

### Cleaning up

3.1.1 As we have gone through this chapter, we have created several files that we won't need again.
We can clean them up with the following commands; briefly explain what each line does.

```bash
$ cd ~/zipf
$ for file in $(find . -name "*.bak")
> do
>   rm $file
> done
$ rm bin/summarize_all_books.sh
$ rm -r results
```

### Variables in shell scripts 

Imagine you have a shell script called `script.sh` that contains:

```bash
head -n $2 $1
tail -n $3 $1
```

With this script in your `data` directory, you type the following command:

```bash
$ bash script.sh '*.txt' 1 1
```

3.2.1 Which of the following outputs would you expect to see?

1. All of the lines between the first and the last lines of each file ending in `.txt`
    in the `data` directory
2. The first and the last line of each file ending in `.txt` in the `data` directory
3. The first and the last line of each file in the `data` directory
4. An error because of the quotes around `*.txt`


### Find the longest file with a given extension 

3.３ Write a shell script called `longest.sh` that takes the name of a directory and a filename extension as its arguments, and prints out the name of the file with the most lines in that directory with that extension. For example:

```bash
$ bash longest.sh data/ txt
```

would print the name of the `.txt` file in `data` that has the most lines.

### Script reading comprehension 

3.4 For this question, consider your `data` directory once again.
Explain what each of the following three scripts would do when run as `bash script1.sh *.txt`, `bash script2.sh *.txt`, and `bash script3.sh *.txt` respectively.

```bash
# script1.sh
echo *.*
```

```bash
# script2.sh
for filename in $1 $2 $3
  do
    cat $filename
  done
```

```bash
# script3.sh
echo $@.txt
```

(You may need to search online to find the meaning of `$@`.)

### Using `grep`

Assume the following text from *The Adventures of Sherlock Holmes* is contained in a file called `excerpt.txt`:

```text
To Sherlock Holmes she is always THE woman. I have seldom heard
him mention her under any other name. In his eyes she eclipses
and predominates the whole of her sex. It was not that he felt
any emotion akin to love for Irene Adler.
```

3.5 Which of the following commands would provide the following output:

```text
and predominates the whole of her sex. It was not that he felt
```

1. `grep "he" excerpt.txt`
2. `grep -E "he" excerpt.txt`
3. `grep -w "he" excerpt.txt`
4. `grep -i "he" excerpt.txt`


### Tracking publication years

In Exercise **TODO** ref(bash-tools-ex-pipe-construction) you examined code that extracted the publication year from a list of book titles.

3.6 Write a shell script called `year.sh` that takes any number of filenames as command-line arguments, and uses a variation of the code you used earlier to print a list of the unique publication years appearing in each of those files separately.


### Counting names 

You and your friend have just finished reading *Sense and Sensibility* and are now having an argument.
Your friend thinks that the elder of the two Dashwood sisters, Elinor, was mentioned more frequently in the book, but you are certain it was the younger sister, Marianne.
Luckily, `sense_and_sensibility.txt` contains the full text of the novel. 

3.7 Using a `for` loop, how would you tabulate the number of times each of the sisters is mentioned?

Hint: one solution might employ the commands `grep` and `wc` and a `|`, while another might utilize `grep` options.
There is often more than one way to solve a problem with the shell; people choose solutions based on readability, speed, and what commands they are most familiar with.

### Matching and subtracting

Assume you are in the root directory of the `zipf` project.

3.8 Which of the following commands will find all files in `data` whose names end in `e.txt`, but do *not* contain the word `machine`?

1.  `find data -name '*e.txt' | grep -v machine`
2.  `find data -name *e.txt | grep -v machine`
3.  `grep -v "machine" $(find data -name '*e.txt')`
4.  None of the above.

### `find` pipeline reading comprehension 

3.9 Write a short explanatory comment for the following shell script:

```bash
wc -l $(find . -name '*.dat') | sort -n
```

### Finding files with different properties 

The `find` command can be given criteria called "tests" to locate files with specific attributes, such as creation time, size, or ownership.

3.10.1 Use `man find` to explore these, then write a single command using `-type`, `-mtime`, and `-user` to find all files in or below your Desktop directory that are owned by you and were modified in the last 24 hours.

3.10.2 Explain why the value for `-mtime` needs to be negative.