# Advanced Bash Concepts

## 1. While Loops in Bash Scripts

Welcome back. Let's take a moment to quickly recap everything you've learned so far in this module. We started by looking at some of the most common Linux commands, you learned about redirecting streams, pipes and pipelines, and signaling processes. 

Then you've got your first taste of the Bash programming language including how to create a script, use variables in globs, and perform conditional execution. 

That's a lot of new interesting concepts that will help you get the most out of the systems that you work with but we aren't done yet. Let's dive into some more advanced bash concepts before we wrap up this module. Are you ready? Good. Let's do it. 

Bash provides similar looping structures to Python. We can iterate while a condition is true using a while loop and iterate over a list of elements using a for loop. Although of course, the syntax for these loops is slightly different. As a quick reminder, loops are what makes the computer do repetitive tasks for us, anything from working with a bunch of numbers to processing the contents of a file line by line. Our computer doesn't care how many times we ask it to do what we want, it will keep doing them until it's done. No coffee breaks.

```bash
#!/bin/bash
n=1
while [ $n -le 5 ]; do
  echo "Iteration number $n"
  ((n+=1))
done
```

In this script, we're using the variable N to print messages, counting from one to five. The condition for the while loop uses the same format as a condition for an if block. In this example, we check if the variable N is less than or equal to five using the `-le` operator. The loop itself starts with the `do` keyword and finishes with a `done` keyword. To increment the value of the variable N, we're using a bash construct of double parentheses that lets us do arithmetic operations with our variables. It seems complicated but when you take a step back, it all comes together. Let's execute this and see what happens.

```
$ ./while.sh
Iteration number 1
Iteration number 2
Iteration number 3
Iteration number 4
Iteration number 5
```

So that works but what about making our loop a bit more interesting. When using while loops and bash scripts, it's common to have a loop that retries a command a number of times until it succeeds. This is really useful with commands that use network connections or that access resources that might be locked. These commands can fail for external reasons and they're likely to succeed after a retry or two. To simulate a command that sometimes succeeds or sometimes fails, we have a small Python script that will return an exit value picked at random by a range that we give it. 

```python
#!/usr/bin/env python

import sys
import random

value=random.randint(0,3)
print('Returning: ' + str(value))
sys.exit(value)
```

Do you see what the script is doing? It uses random rand int to generate a value between zero and three, then it prints the selected value and exits with it. Let's check this script out.

```
$ python random-exit.py
Returning: 2

$ python random-exit.py
Returning: 1
```

Cool. So we see that we get a value in the zero to three range. Which value we get will depend on each call, that's okay. We want to simulate a command that sometimes fails and sometimes succeeds. Now, let's have a look at our Bash script that will retry the command.

```bash
#!/bin/bash

n=0
command=$1
while ! $command && [ $n -le 5 ]; do
  sleep $n
  ((n=n+1))
  echo "Retry #$n"
done;
```

This script is a bit more complex than the earlier one but not by much. One interesting difference is that we're getting the value of a command line argument using the $1, this is what we use in Bash to access the first command line argument. In Python, we get the same information using sys.argv[1]. 

So we store the parameter and the variable called command, and then we execute the while loop until either the command succeeds or the end variable reaches a value of five. In other words, if the received command fails, we'll retry up to five times. In the body of the while loop, we first sleep a few seconds, then increment the variable and print the number of free try attempts. 

So why do we call the sleep command? This is no time for rest, the idea here is that if the command we're calling is failing due to CPU usage, network or resource exhaustion, it might make sense to wait a bit before trying again. So the more we try, the more we wait. We need to let our computer catch a breather and recover from whatever is making our command fail. In our simulation, the command fails randomly but this retry script works with any other commands that could fail for a wide range of reasons. 

To try this out, we'll need to call our retry script with the random exit command as a parameter like this.

```
$ ./retry.sh ./random-exit.py 
Returning: 2
Retry #1
Returning: 3
Retry #2
Returning: 3
Retry #3
Returning: 0
```

We can see how our script keeps executing until the command that we passed returns zero which is exactly what we wanted, zero problems. This last example is a real-world use case for while loops in Bash and includes a few more advanced topics. So it can definitely feel complex but don't let that stop you. Re-watch this video as many times as it takes and practice the scripts that we covered. Once you're ready, you can meet me over in the next video where we'll look at for loops in Bash.

## 2. For Loops in Bash Scripts

Both in Python and Bash, for loops are used to iterate over a sequence of elements. You might remember that the key to for loops is that they let us perform an operation on each of the elements in a sequence. In Python, the sequences are data structures like a list or a tuple or a string. In Bash, we construct these sequences just by listing the elements with spaces in between. Let's check this out using a very simple example. 

In this case, we're iterating over three different elements that have the names of fruits. See how the for loop uses the same do done structure that the while loop used before? Now, let's execute this script and check that it does what we expect it to do. 

```bash
#!/bin/bash

for fruit in peach orange apple; do
  echo "I like $fruit!"
done
```

```
$ ./fruits.sh 
I like peach!
I like orange!
I like apple!
```

All right. That seems simple enough but it's low-hanging fruit. Tasty for sure, but not that useful. We called out in an earlier video that in Bash, we can use globs like star and question mark to create lists of files. These lists are separated by spaces and so we can use them in our loops to iterate over a list of files that match a criteria, like all the files that end in .PDF, all files that start with IMG or whatever it is that we need. Let's use a practical example to see this in action. 

Imagine that you're migrating your company's website from one web server software to another. Your web content is stored in a bunch of files that all end in uppercase HTM, and the new software requires that they all end in lowercase HTML, disaster. You can manually rename them one by one using the MV command, but that could get really old really fast. You'd likely end up making mistakes after the first few commands. Instead, you could do the same thing with short Bash script. First, let's check out our files.

```
$ ls -l
total 0
-rw-r--r-- 1 BRIAN 197121 0 Aug  5 12:44 about.HTM
-rw-r--r-- 1 BRIAN 197121 0 Aug  5 12:44 contact.HTM
-rw-r--r-- 1 BRIAN 197121 0 Aug  5 12:44 footer.HTM
-rw-r--r-- 1 BRIAN 197121 0 Aug  5 12:44 header.HTM
-rw-r--r-- 1 BRIAN 197121 0 Aug  5 12:44 index.HTM
```

Looks like we have five files that we need to rename. So how can we extract the part before the extension? There's a command called `basename` that can help us with that.

```
$ basename index.HTM .HTM
index
```

This command takes a filename and an extension and then returns the name without the extension. Just like that, we're ready to write our script and rename the files.

Our script we'll iterate with a for loop through all the files that end with.HTM. So now for each file we want to call basename to keep the part of the file that we care about. We'll store that in a variable called name.

```
#!/bin/bash

for file in *.HTM; do
  name=$(basename "$file" .HTM)
  mv "$file" "$name.html"
done
```

We still need to run our script to see if it does what it should. Now, let me share a trick with you that might save you a few headaches. Whenever you're going to run a script like this that modifies the files in your file system, it's a really good idea to first run it without actually modifying the file system. This will catch any possible bugs that the script might have. So instead of just running it as it is right now, we'll add an echo in front of the MV command. This means that instead of actually renaming, our script we'll print the renaming that it plans to do.

```
#!/bin/bash

for file in *.HTM; do
  name=$(basename "$file" .HTM)
  echo mv "$file" "$name.html"
done

```

```
$ ./rename.sh 
mv about.HTM about.html
mv contact.HTM contact.html
mv footer.HTM footer.html
mv header.HTM header.html
mv index.HTM index.html
```

Now when we run the file, the extensions are updated

```
$ ./rename.sh 
$ls
about.html  contact.html  footer.html  header.html  index.html
```

## 3. Advanced Command Interaction

Over the past few videos, we've learned a lot about how to do things in the Linux command line and in Bash scripts. We will now look at a couple of interesting applications for all these Bash scripting powers that we just learned to put all this new knowledge into action. Let's go back to our old friend, the system log file located in var/log/syslog. The system log file contains a trove of information about what's going on in the system. So it's really important to learn how to get information out of it. 

Let's use the `tail` command to look at the last 10 lines from the file right now

```
$ tail /var/log/syslog
Aug  5 11:23:07 ubu14 kernel: [   40.544803] 00:00:00.000304 main     Package type: LINUX_64BITS_GENERIC
Aug  5 11:23:07 ubu14 kernel: [   40.546277] 00:00:00.001668 main     5.2.34 r133893 started. Verbose level = 0
Aug  5 11:23:07 ubu14 kernel: [   40.708010] init: plymouth-stop pre-start process (2124) terminated with status 1
Aug  5 12:17:01 ubu14 CRON[2396]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug  5 12:39:48 ubu14 dbus[406]: [system] Activating service name='org.freedesktop.hostname1' (using servicehelper)
Aug  5 12:39:48 ubu14 dbus[406]: [system] Successfully activated service 'org.freedesktop.hostname1'
Aug  5 12:39:48 ubu14 kernel: [ 4640.348070] systemd-hostnamed[2535]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
Aug  5 12:46:49 ubu14 dbus[406]: [system] Activating service name='org.freedesktop.hostname1' (using servicehelper)
Aug  5 12:46:49 ubu14 dbus[406]: [system] Successfully activated service 'org.freedesktop.hostname1'
Aug  5 12:46:49 ubu14 kernel: [ 5060.809106] systemd-hostnamed[2608]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
```

The log lines we see follow a similar pattern. First, they include the date and time of when the entry was added to the file, then the name of the computer, then the name and PID of the process that trigger the event and finally, the actual event that's being logged. Take a second and look at those lines. 

Say that we had a computer that was under significant load but we didn't know why, and to find out we wanted to check what events are being logged the most or Syslog. 

To do that we need to extract the part of the line that has the actual event without the date and time. We can use a command called `cut` to help us with that. This command, let's us take only bits of each line using a field delimiter. In this example, we can split the line using spaces. That would look something like this.

```
$ tail /var/log/syslog | cut -d' ' -f5-
ubu14 kernel: [   40.544803] 00:00:00.000304 main     Package type: LINUX_64BITS_GENERIC
ubu14 kernel: [   40.546277] 00:00:00.001668 main     5.2.34 r133893 started. Verbose level = 0
ubu14 kernel: [   40.708010] init: plymouth-stop pre-start process (2124) terminated with status 1
ubu14 CRON[2396]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
ubu14 dbus[406]: [system] Activating service name='org.freedesktop.hostname1' (using servicehelper)
ubu14 dbus[406]: [system] Successfully activated service 'org.freedesktop.hostname1'
ubu14 kernel: [ 4640.348070] systemd-hostnamed[2535]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
ubu14 dbus[406]: [system] Activating service name='org.freedesktop.hostname1' (using servicehelper)
ubu14 dbus[406]: [system] Successfully activated service 'org.freedesktop.hostname1'
ubu14 kernel: [ 5060.809106] systemd-hostnamed[2608]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
```

In our example, we're passing dash d space to cut to tell it that we want to use a space as a delimiter, and dash f5 dash that tell it that we want to print the field number 5 and everything that comes after it. With that, we remove the date and the name of the computer keeping only the process and the event message, snip snip. Now that we have the information that we care about, we can pipe this to the same pipeline of commands that we saw in an earlier video to find out the lines that are repeated the most; like this.

```
$ cut -d' ' -f5- /var/log/syslog | sort | uniq -c | sort -nr | head      
      2 ubu14 pulseaudio[1655]: [pulseaudio] alsa-util.c: Disabling timer-based scheduling because running inside a VM.
      2 ubu14 NetworkManager[704]: <warn> /sys/devices/virtual/net/lo: couldn't determine device driver; ignoring...
      2 ubu14 NetworkManager[704]: <info> Writing DNS information to /sbin/resolvconf
      2 ubu14 NetworkManager[704]:    Ifupdown: get unmanaged devices count: 0
      2 ubu14 kernel: [    0.000000] ACPI: Local APIC address 0xfee00000
      2 ubu14 kernel: [    0.000000] ACPI: FACS 0x000000003FFF0200 000040
      2 ubu14 dbus[406]: [system] Successfully activated service 'org.freedesktop.hostname1'
      2 ubu14 dbus[406]: [system] Activating service name='org.freedesktop.hostname1' (using servicehelper)
      1 ubu14 whoopsie[967]: whoopsie 0.2.24.6ubuntu4 starting up.
      1 ubu14 whoopsie[967]: Using lock path: /var/lock/whoopsie/lock
```

As you can see, we've chained together a bunch of commands so that we get the most repetitive lines in our Syslog file. There are more files in var/log that we might be interested in. So we can use a for loop to iterate over each of the log files in var/log and get the most repeated lines in each of them. 

I know what you're thinking. This sounds like it's getting a little bit too complex for a one-line chain of commands. We're better off putting this into a bash script, something like this. 

```bash
#!/bin/bash

for logfile in /var/log/*log; do
  echo "Processing: $logfile"
  cut -d' ' -f5- $logfile | sort | uniq -c | sort -nr | head -5
done
```

In this script we process all files in var/log that end in log. We then print the name of the file that we're processing and then use the same group of commands as before to print the top five lines in each file. Ready? Let's execute it and see this in action.

```
$ ./toploglines.sh 
Processing: /var/log/alternatives.log
      2 with --quiet --install /usr/bin/pager pager /bin/less 77 --slave /usr/share/man/man1/pager.1.gz pager.1.gz /usr/share/man/man1/less.1.gz
      2 with --quiet --install /usr/bin/awk awk /usr/bin/mawk 5 --slave /usr/share/man/man1/awk.1.gz awk.1.gz /usr/share/man/man1/mawk.1.gz --slave /usr/bin/nawk nawk /usr/bin/mawk --slave /usr/share/man/man1/nawk.1.gz nawk.1.gz /usr/share/man/man1/mawk.1.gz
      2 with --install /usr/share/man/man7/builtins.7.gz builtins.7.gz /usr/share/man/man7/bash-builtins.7.gz 10
      2 with --install /usr/share/ghostscript/current ghostscript-current /usr/share/ghostscript/9.26 926
      2 with --install /usr/sbin/rmt rmt /usr/sbin/rmt-tar 50 --slave /usr/share/man/man8/rmt.8.gz rmt.8.gz /usr/share/man/man8/rmt-tar.8.gz
Processing: /var/log/auth.log
      6 ubu14 compiz: gkr-pam: unlocked login keyring
      4 ubu14 lightdm: pam_unix(lightdm-autologin:session): session opened for user brian by (uid=0)
      3 ubu14 polkitd(authority=local): Registered Authentication Agent for unix-session:c1 (system bus name :1.51 [/usr/lib/policykit-1-gnome/polkit-gnome-authentication-agent-1], object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
      2 ubu14 pkexec: pam_unix(polkit-1:session): session opened for user root by (uid=1000)
      1 ubu14 useradd[8667]: failed adding user 'vboxadd', data deleted
Processing: /var/log/boot.log
      4 network device security                                          [ OK ]
      3 Sound Card State                                                 [ OK ]
      3 daemon                                                           [ OK ]
      2 V initialisation compatibility                                   [ OK ]
      2 /tmp directory                                                   [ OK ]
Processing: /var/log/bootstrap.log
    382 ...
    285 
     83 problem!
     40 not installed.
     36 multiarch-support
Processing: /var/log/dpkg.log
    319 brltty:amd64 5.0-2ubuntu2
    200 apparmor:amd64 2.10.95-0ubuntu2.6~14.04.4
     99 ureadahead:amd64 0.100.0-16
     82 libsane:amd64 1.0.23-3ubuntu3.1
     57 unpack
Processing: /var/log/faillog
      1 
Processing: /var/log/fontconfig.log
      9 contents: 0 fonts, 0 dirs
      7 contents: 1 fonts, 0 dirs
      3 contents: 2 fonts, 0 dirs
      3 contents: 0 fonts, 4 dirs
      2 directory
Processing: /var/log/gpu-manager.log
     41 
      5 no
      4 "vboxvideo"
      2 yes
      2 in /lib/modules/4.4.0-148-generic/updates/dkms
Processing: /var/log/kern.log
      8 ubu14 kernel: [    0.000000] ACPI: Local APIC address 0xfee00000
      8 ubu14 kernel: [    0.000000] ACPI: FACS 0x000000003FFF0200 000040
      4 ubu14 kernel: [    0.000000] Zone ranges:
      4 ubu14 kernel: [    0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
      4 ubu14 kernel: [    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC  
Processing: /var/log/lastlog
      1 
Processing: /var/log/pm-powersave.log
    152 
      8 to on
      4 snd_ac97_codec to 0...Done.
      4 host2 to max_performance...Done.
      4 for eth0 to enable...Done.
Processing: /var/log/syslog
      2 ubu14 pulseaudio[1655]: [pulseaudio] alsa-util.c: Disabling timer-based scheduling because running inside a VM.
      2 ubu14 NetworkManager[704]: <warn> /sys/devices/virtual/net/lo: couldn't determine device driver; ignoring...
      2 ubu14 NetworkManager[704]: <info> Writing DNS information to /sbin/resolvconf
      2 ubu14 NetworkManager[704]:    Ifupdown: get unmanaged devices count: 0
      2 ubu14 kernel: [    0.000000] ACPI: Local APIC address 0xfee00000
Processing: /var/log/ubuntu-advantage.log
cut: /var/log/ubuntu-advantage.log: Permission denied
Processing: /var/log/vboxadd-install.log
      1 
Processing: /var/log/vboxadd-setup.log
      1 restart the Window System (or just restart the guest system)
      1 modules
      1 Additions.
Processing: /var/log/Xorg.0.log
      5 10.962] 	Entry deleted from font path.
      3 11.261] 	Module class: X.Org Video Driver
      3 11.261] 	ABI class: X.Org Video Driver, version 20.0
      3 
      2 12.328] (II) This device may have been added with another device file.
```