# [The Unix School: awk](https://www.theunixschool.com/p/awk-sed.html)
- Read & split file contents
- Pass arguments or shell variables
- Pattern matching

In [1]:
!awk

Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:		GNU long options: (standard)
	-f progfile		--file=progfile
	-F fs			--field-separator=fs
	-v var=val		--assign=var=val
Short options:		GNU long options: (extensions)
	-b			--characters-as-bytes
	-c			--traditional
	-C			--copyright
	-d[file]		--dump-variables[=file]
	-D[file]		--debug[=file]
	-e 'program-text'	--source='program-text'
	-E file			--exec=file
	-g			--gen-pot
	-h			--help
	-i includefile		--include=includefile
	-l library		--load=library
	-L[fatal|invalid|no-ext]	--lint[=fatal|invalid|no-ext]
	-M			--bignum
	-N			--use-lc-numeric
	-n			--non-decimal-data
	-o[file]		--pretty-print[=file]
	-O			--optimize
	-p[file]		--profile[=file]
	-P			--posix
	-r			--re-interval
	-s			--no-optimize
	-S			--sandbox
	-t			--lint-old
	-V			--version

To report bugs, see node `Bugs' in `gawk.info'
which is section `Reporting Problems and Bugs' in t

In [2]:
!awk -W version

GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2020 Free Software Foundation.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.


### Lesson 1: read a file & split the contents

In [3]:
!cat file1

Name Domain
Deepak Banking
Neha Telecom
Vijay Finance
Guru Migration

In [4]:
# print *only* the names, then domains, in the file
!awk '{print $1}' file1
!awk '{print $2}' file1

Name
Deepak
Neha
Vijay
Guru
Domain
Banking
Telecom
Finance
Migration


In [5]:
# print the names without the header record
# NR = line number; NR!=1 says to omit the 1st line.
!awk 'NR!=1{print $1}' file1

Deepak
Neha
Vijay
Guru


In [6]:
# print entire file contents - $0 = entire line.
!awk '{print $0}' file1

Name Domain
Deepak Banking
Neha Telecom
Vijay Finance
Guru Migration


In [7]:
# another way of printing everything ('1' = true for every line.)
!awk '1' file1

Name Domain
Deepak Banking
Neha Telecom
Vijay Finance
Guru Migration


In [8]:
!cat file2

Name,Domain,Expertise
Deepak,Banking,MQ Series
Neha,Telecom,Power Builder
Vijay,Finance,CRM Expert
Guru,Migration,Unix

In [9]:
# print 1st column of a .CSV file
# awk uses whitespace as a default delimiter.
# .CSV is comma delimited, so we need to specify that.

!awk -F"," '{print $1}' file2

Name
Deepak
Neha
Vijay
Guru


In [10]:
# (alternate syntax using the FS variable - 1st & 3rd columns)
!awk  '{print $1,$3}' FS="," file2

Name Expertise
Deepak MQ Series
Neha Power Builder
Vijay CRM Expert
Guru Unix


In [11]:
# 3rd column has multiple words, so readability is compromised.
# use a comma to separate the output with the OFS special variable.
# and omit the header with NR
!awk -F"," 'NR!=1{print $1,$3}' OFS="," file2

Deepak,MQ Series
Neha,Power Builder
Vijay,CRM Expert
Guru,Unix


### Lesson 2: passing arguments or shell variables to awk

In [12]:
# quoting file content
!cat file2

Name,Domain,Expertise
Deepak,Banking,MQ Series
Neha,Telecom,Power Builder
Vijay,Finance,CRM Expert
Guru,Migration,Unix

In [13]:
!awk -v q="'" '{print q $0 q}' file2

'Name,Domain,Expertise'
'Deepak,Banking,MQ Series'
'Neha,Telecom,Power Builder'
'Vijay,Finance,CRM Expert'
'Guru,Migration,Unix'


In [14]:
# double-quoting file contents
!awk '{print q $0 q}' q='"' file2

"Name,Domain,Expertise"
"Deepak,Banking,MQ Series"
"Neha,Telecom,Power Builder"
"Vijay,Finance,CRM Expert"
"Guru,Migration,Unix"


### Lesson 3: matching file patterns in Linux

In [15]:
!cat file3

Medicine,200
Grocery,500
Rent,900
Grocery,800
Medicine,600

In [16]:
# match only the records containing 'Rent'
!awk '/Rent/' file3

Rent,900


In [17]:
# match a pattern only in the 1st column
!awk -F, '$1 ~ /Rent/' file3

Rent,900


In [18]:
# Above also matches "Rents". Exact match:
!awk -F, '$1=="Rent"' file3

Rent,900


In [19]:
# print only the 2nd column for all "Medicine" records:
!awk -F, '$1 == "Medicine"{print $2}' file3

200
600


In [20]:
# match for patterns "Rent" or "Medicine"
!awk '/Rent|Medicine/' file3

Medicine,200
Rent,900
Medicine,600


In [21]:
# match for this above pattern only in the first column:
!awk -F, '$1 ~ /Rent|Medicine/' file3

Medicine,200
Rent,900
Medicine,600


In [22]:
# exactly match only for Rent or Medicine,
!awk -F, '$1 ~ /^Rent$|^Medicine$/' file3

Medicine,200
Rent,900
Medicine,600


In [23]:
# lines which does not contain the pattern Medicine:
!awk '!/Medicine/' file3

Grocery,500
Rent,900
Grocery,800


In [24]:
# all records whose amount is greater than 500:
!awk -F, '$2>500' file3

Rent,900
Grocery,800
Medicine,600


In [25]:
# print medicine record only if it is the 1st record (&& = logical AND):
!awk 'NR==1 && /Medicine/' file3

Medicine,200


In [26]:
# all Medicine records whose amount is greater than 500:
!awk -F, '/Medicine/ && $2>500' file3

Medicine,600


In [27]:
# all the Medicine records OR whose amount is greater than 600 (|| = logical OR):
!awk -F, '/Medicine/ || $2>600' file3

Medicine,200
Rent,900
Grocery,800
Medicine,600


### Lesson 4: Join or merge lines on finding a pattern

In [28]:
!cat file4

START
Unix
Linux
START
Solaris
Aix
SCO

In [29]:
# join lines after START, without a delimiter
# - accumulate lines following START;
# - print them before encountering the next START.
# - command inside {} only works if line contains START.
# - "next" prevents remaining part of command from being executed on START lines.

!awk '/START/{if (NR!=1)print "";next}{printf $0}END{print "";}' file4

UnixLinux
SolarisAixSCO


### Lesson 5: grouping CSV or text file data

In [31]:
# TODO

### Lesson 6: spliting files

In [32]:
# TODO

### Lesson 7: reading files with multiple delimiters

In [33]:
# TODO

### Lesson 8: accessing awk variables in a shell

In [34]:
# TODO

### Lesson 9: inserting, removing, updating CSV file data

In [35]:
# TODO

### Lesson 10: (gawk) time & date math

In [36]:
# TODO

### Lesson 11: time between datestamps

In [37]:
# file format: dates & times are separated by a space.
# column 1 = process name
# column 2 = process start time
# column 3 = process end time
!cat file11

P1,2012 12 4 21 36 48,2012 12 4 22 26 53
P2,2012 12 4 20 36 48,2012 12 4 21 21 23
P3,2012 12 4 18 36 48,2012 12 4 20 12 35

In [38]:
# time difference, in seconds
# mktime function returns Unix time for date time strings
!awk '{d2=mktime($3);d1=mktime($2);print $1","d2-d1,"secs";}' file11

P1,2012,0 secs
P2,2012,0 secs
P3,2012,0 secs
