## AWK

- searches files for lines that contain certain patterns
- performs operation described in AWK body on line with certain pattern
- performs operation described in AWK body on chosen line

### Basic Structure

awk 'program_you_will_write' input-file1 input-file2 ...

awk 'BEGIN{code_in_BEGIN_section}   {code_in_main_body_section}   END{code_in_END_section}' input-file1 input-file2 ...

In [1]:
!awk 'BEGIN{print "Hello"}' # Begin section ran before any line being processed

Hello


In [2]:
# !awk '{print "Hello"}' # Print hello after every new line (Please run in Terminal)

In [3]:
!echo hello 1 2 3 | awk 'BEGIN{print "Hello"}' # hello 1 2 3 is treated as one line hence Hello only print one time

Hello


In [4]:
!cat hello.txt

first line 1
second line 2
third line 3

In [5]:
!awk '{print "Hello"}' hello.txt

Hello
Hello
Hello


In [6]:
!cat hello.txt | awk '{print "Hello"}'

Hello
Hello
Hello


In [7]:
!awk 'BEGIN{print "Starting"} {print "Hello"}' hello.txt

Starting
Hello
Hello
Hello


In [8]:
!awk 'BEGIN{print "Starting"} {print "Hello"} END{print "Ending"}' hello.txt

Starting
Hello
Hello
Hello
Ending


## Fields

In [9]:
!cat hello.txt | awk '{print $0}' # $0 will print all fields in a line

first line 1
second line 2
third line 3


In [10]:
!cat hello.txt | awk '{print $1}' # $1 will print the first field

first
second
third


In [11]:
!cat hello.txt | awk '{print $2}' # $2 will print the second field and so on

line
line
line


In [12]:
# Adding characters in fields
!cat hello.txt | awk '{print $1 " is " $3 " adding at the end"}' # Cannot add at the end

first is 1 adding at the end
second is 2 adding at the end
third is 3 adding at the end


# Searching Pattern

In [13]:
!cat employees.txt

name    age     UNIT
Peter   50      IT
Jane    30      HR
John    25      IT
Andreas 45      HR

In [14]:
!cat employees.txt | awk '/n/' # Print line that has letter n in the line

name    age     UNIT
Jane    30      HR
John    25      IT
Andreas 45      HR


In [15]:
!cat employees.txt | awk '/Peter/ {print $1 " " $3}' # Print line that has Peter with field 1 and 3

Peter IT


In [16]:
!cat employees.txt | awk '/\sIT/ {print $1 " " $3}'  # Print line that has ' IT' (space) (regex) with field 1 and 3

Peter IT
John IT


In [17]:
!cat employees.txt | awk '!/IT/ {print $1 " " $3}'  # Print line that doesn't have IT in it

Jane HR
Andreas HR


In [18]:
!cat employees.txt | awk '!/^name/' | awk '/IT/' # remove header and print only IT people

Peter   50      IT
John    25      IT


In [19]:
!cat employees.txt | awk '$1 == "Peter"' # AWK only go line by line and scan the first field not the whole line

Peter   50      IT


In [20]:
!cat employees.txt | awk '$1 != "name"' # AWK only go line by line and scan the first field not the whole line

Peter   50      IT
Jane    30      HR
John    25      IT
Andreas 45      HR


## Printing out number of fields

In [21]:
!cat employees.txt | awk '{print NF}'

3
3
3
3
3


In [22]:
!echo "1 2 3 4" | awk '{print NF}'

4


In [23]:
!echo "one two three four" | awk '{print $NF}' # Since NF=4 (this is the same as print $4) field number 4 which is four

four


In [24]:
!echo "one two three four" | awk '{print $(NF - 1)}'

three


## Number of records

In [25]:
!echo "one two three four" | awk '{print NR}' # only one line

1


In [26]:
!cat employees.txt | awk '{print NR}' # only one line

1
2
3
4
5


In [27]:
!cat employees.txt | awk 'END{print NR}' # only print NR at the END of the awk program

5


In [28]:
!cat employees.txt | awk 'END{print NF}' # only print NF for the last line at the end of the awk program

3


## Field Seperator

In [29]:
!cat employees_field_seperator.txt

name,age,UNIT
Peter,50,IT
Jane,30,HR
John,25,IT
Andreas,45,HR

In [30]:
!cat employees_field_seperator.txt | awk 'BEGIN{FS=","} {print $1 " " $3}'

name UNIT
Peter IT
Jane HR
John IT
Andreas HR


In [31]:
!cat employees_field_seperator.txt | awk -F "," '{print $1 " " $3}' # Same as above with -F option

name UNIT
Peter IT
Jane HR
John IT
Andreas HR


## Record Seperator

In [32]:
!cat employees_field_seperator.txt | awk 'BEGIN{RS=",IT"} {print $0}' # Every ",IT" word is a record seperator

name,age,UNIT
Peter,50

Jane,30,HR
John,25

Andreas,45,HR


In [33]:
!cat employees_field_seperator.txt | awk 'BEGIN{RS=",IT"} {print $0} END{print NR " is the number of total records"}' # Every ",IT" word is a record seperator

name,age,UNIT
Peter,50

Jane,30,HR
John,25

Andreas,45,HR
3 is the number of total records


In [34]:
!cat employees_field_seperator.txt | awk 'BEGIN{RS=",IT" ; FS=","} {print $0} END{print NR " is the number of total records"}' # Multiple parameters in BEGIN with ;

name,age,UNIT
Peter,50

Jane,30,HR
John,25

Andreas,45,HR
3 is the number of total records


## Variable Assignment
### Mathematical operations

- Increment / Decrement
1. a++ (a--)
2. a=a+1 (a=a-1)
- Classical math operations
1. a=b+c
2. a=b*c
3. a=b/c
4. a=b-c
- Inline assignment
1. var += increment
2. var -= decrement
3. var *= coefficient
4. var /= divisor
5. var %= modulus
6. var ^= power-number
7. var **= power-number

In [35]:
!cat employees_field_seperator.txt | awk 'BEGIN{count=0} {count++} END{print NR, count}'

5 5


## If Else

In [36]:
!cat employees_field_seperator.txt | awk 'BEGIN{FS=","} {if ($3 == "IT") print $0}'

Peter,50,IT
John,25,IT


In [37]:
!cat employees_field_seperator.txt | awk 'BEGIN{FS=","} {if ($3 == "IT") {print $0} else {print "Field 3 is not IT"}}'

Field 3 is not IT
Peter,50,IT
Field 3 is not IT
John,25,IT
Field 3 is not IT


## For Loop

In [38]:
!awk 'BEGIN{for (i=1;i<=5;i++) {print "hello", i}}'

hello 1
hello 2
hello 3
hello 4
hello 5


In [39]:
!cat employees_field_seperator.txt | awk -F "," '{for (i=1 ; i<=NF ; i++) {print "field:",i,$i}}'

field: 1 name
field: 2 age
field: 3 UNIT
field: 1 Peter
field: 2 50
field: 3 IT
field: 1 Jane
field: 2 30
field: 3 HR
field: 1 John
field: 2 25
field: 3 IT
field: 1 Andreas
field: 2 45
field: 3 HR
