## SED

-  is used to perform basic text transformations on an input stream <br>
-  more efficient because it only makes one pass over the input

In [2]:
!man sed


SED(1)                    BSD General Commands Manual                   SED(1)

NNAAMMEE
     sseedd -- stream editor

SSYYNNOOPPSSIISS
     sseedd [--EEaallnn] _c_o_m_m_a_n_d [_f_i_l_e _._._.]
     sseedd [--EEaallnn] [--ee _c_o_m_m_a_n_d] [--ff _c_o_m_m_a_n_d___f_i_l_e] [--ii _e_x_t_e_n_s_i_o_n] [_f_i_l_e _._._.]

DDEESSCCRRIIPPTTIIOONN
     The sseedd utility reads the specified files, or the standard input if no
     files are specified, modifying the input as specified by a list of com-
     mands.  The input is then written to the standard output.

     A single command may be specified as the first argument to sseedd.  Multiple
     commands may be specified by using the --ee or --ff options.  All commands
     are applied to the input in the order they are specified regardless of
     their origin.

     The following options are available:

     --

In [1]:
!cat input.txt

hello world
world world

world

In [11]:
!sed 's/ world/ hello/' input.txt > output.txt

In [12]:
!sed 's/ world/hello/' input.txt > output.txt

In [14]:
!sed 's/ world//' input.txt > output.txt

In [15]:
!cat output.txt

hello
world

world


In [None]:
%%bash
# can also do:
sed 's/hello/world/' < input.txt > output.txt
cat input.txt | sed 's/hello/world/' > output.txt

In [16]:
#print only a certain line number
!sed -n '2p' input.txt

world world


In [17]:
!sed -n '1p' input.txt

hello world


#### Sed on multiple files is treated as a single stream

In [18]:
!cat one.txt

one is one

In [19]:
!cat two.txt

one is one
two is two

In [20]:
!cat three.txt

one is one
two is two
three is three

In [21]:
!sed -n '1p ; $p' one.txt two.txt three.txt  #first line of the first file, last line of the third file

one is one
three is three


In [22]:
!cat sed.txt

* is a command

** is a command

* * is a command

**** is a command ****



In [23]:
!sed 's/\*/sed/g' sed.txt

sed is a command

sedsed is a command

sed sed is a command

sedsedsedsed is a command sedsedsedsed



In [25]:
!sed 's/*/sed/' sed.txt

sed is a command

sed* is a command

sed * is a command

sed*** is a command ****



## AWK
-  searches files for a text containing a specific pattern
-  when a text matches the pattern, it performs an action on that text

> **Syntax** <br>
awk options 'selection _criteria {action }' input-file > output-file

In [26]:
!man awk

AWK(1)                                                                  AWK(1)



awk

NNAAMMEE
       awk - pattern-directed scanning and processing language

SSYYNNOOPPSSIISS
       aawwkk [ --FF _f_s ] [ --vv _v_a_r_=_v_a_l_u_e ] [ _'_p_r_o_g_' | --ff _p_r_o_g_f_i_l_e ] [ _f_i_l_e _._._.  ]

DDEESSCCRRIIPPTTIIOONN
       _A_w_k scans each input _f_i_l_e for lines that match any of a set of patterns
       specified literally in _p_r_o_g or in one or more  files  specified  as  --ff
       _p_r_o_g_f_i_l_e.   With  each  pattern  there can be an associated action that
       will be performed when a line of a _f_i_l_e matches the pattern.  Each line
       is  matched  against the pattern portion of every pattern-action state-
       ment; the associated action is performed for each matched pattern.  The
       file  name  -- means the standard input.  Any _f_i_l_e of the 

       where  a  relop  is  any  of  the  six relational operators in C, and a
       matchop is either ~~ (matches) or !!~~ (does not match).  A conditional is
       an  arithmetic expression, a relational expression, or a Boolean combi-
       nation of these.

       The special patterns BBEEGGIINN and EENNDD  may  be  used  to  capture  control
       before  the first input line is read and after the last.  BBEEGGIINN and EENNDD
       do not combine with other patterns.

       Variable names with special meanings:

       CCOONNVVFFMMTT
              conversion format used when converting numbers (default %%..66gg)

       FFSS     regular expression used to separate  fields;  also  settable  by
              option --FF_f_s_.

       NNFF     number of fields in the current record

       NNRR     ordinal number of the current record

       FFNNRR    ordinal number of the current record in the current file

  

In [27]:
!cat employees.txt

Cerebellum manager account 45000
Cerebrum clerk account 25000
Hypothalamus manager sales 50000
Dendrites manager account 47000
Grey clerk sales 15000
White clerk sales 23000


In [28]:
%%bash
awk '{print}' employees.txt

Cerebellum manager account 45000
Cerebrum clerk account 25000
Hypothalamus manager sales 50000
Dendrites manager account 47000
Grey clerk sales 15000
White clerk sales 23000


In [29]:
%%bash
awk '/manager/ {print}' employees.txt

Cerebellum manager account 45000
Hypothalamus manager sales 50000
Dendrites manager account 47000


In [30]:
!awk '{print $1,$4}' employees.txt 

Cerebellum 45000
Cerebrum 25000
Hypothalamus 50000
Dendrites 47000
Grey 15000
White 23000


In [31]:
%%bash
### print the number of lines
awk 'END { print NR }' employees.txt

6


In [32]:
%%bash
awk 'END { print NF }' employees.txt

4


#### More info:
https://www.computerhope.com/unix/uawk.htm

## CUT

In [35]:
!man cut


CUT(1)                    BSD General Commands Manual                   CUT(1)

NAME
     cut -- cut out selected portions of each line of a file

SYNOPSIS
     cut -b list [-n] [file ...]
     cut -c list [file ...]
     cut -f list [-d delim] [-s] [file ...]

DESCRIPTION
     The cut utility cuts out selected portions of each line (as specified by
     list) from each file and writes them to the standard output.  If no file
     arguments are specified, or a file argument is a single dash (`-'), cut
     reads from the standard input.  The items specified by list can be in
     terms of column position or in terms of fields delimited by a special
     character.  Column numbering starts from 1.

     The list option argument is a comma or whitespace separated set of num-
     bers and/or number ranges.  Number ranges consist of a number, a dash
     (`-'), and a second number and select the fields or columns from the
     first number to the second, inclusive.  Numbers or number ran

In [36]:
!cat cut_example.txt

cat command for file oriented operations.
cp command for copy files or directories.
ls command to list out files and directories with its attributes.

In [37]:
### select a column of characters
!cut -c2 cut_example.txt

a
p
s


In [41]:
!cut -f2 cut_demo ## file has to be in columns (tab in text file)

world
are
is


In [44]:
!cat cut_example.txt | tr \\" " \\t | cut -f2 ## use tr to convert the spaces in the text file to tabs and then use cut

command
command
command


In [45]:
### select a range
!cut -c1-3 cut_example.txt

cat
cp 
ls 


In [50]:
%%bash
cat cut_example.txt | tr \\" " \\t | cut -f1-3

cat	command	for
cp	command	for
ls	command	to


In [51]:
%%bash
cat cut_example.txt | cut -c6-7 | tr a-z A-Z

OM
MM
MM


## TR
-  translate
-  delete
-  squeeze

In [33]:
!man tr


TR(1)                     BSD General Commands Manual                    TR(1)

NNAAMMEE
     ttrr -- translate characters

SSYYNNOOPPSSIISS
     ttrr [--CCccssuu] _s_t_r_i_n_g_1 _s_t_r_i_n_g_2
     ttrr [--CCccuu] --dd _s_t_r_i_n_g_1
     ttrr [--CCccuu] --ss _s_t_r_i_n_g_1
     ttrr [--CCccuu] --ddss _s_t_r_i_n_g_1 _s_t_r_i_n_g_2

DDEESSCCRRIIPPTTIIOONN
     The ttrr utility copies the standard input to the standard output with sub-
     stitution or deletion of selected characters.

     The following options are available:

     --CC      Complement the set of characters in _s_t_r_i_n_g_1, that is ``--CC ab''
             includes every character except for `a' and `b'.

     --cc      Same as --CC but complement the set of values in _s_t_r_i_n_g_1.

     --dd      Delete characters in _s_t_r_i_n_g_1 from the input.

     --s

In [34]:
%%bash
### open terminal and use:
tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

tr [:lower:] [:upper:]

tr a-z A-Z


TR [:LOWER:] [:UPPER:]

TR A-Z A-Z


In [None]:
### translate white spaces to tabs
echo "Let's write a sentence" | tr [:space:] '\t'

In [None]:
### use the -s option to squeeze multiple occurences of characters
echo "Let's write a sentence" | tr -s [:space:] '\t'

In [None]:
### convert multiple spaces into a single space
echo "This  is  for testing" | tr -s [:space:] ' '

In [None]:
### delete specified characters using the -d option
echo "toy truck" | tr -d 't'

In [None]:
### remove all digits
echo "the temperature is 97 degrees farenheit" | tr -d [:digit:]

In [None]:
### remove all characters except digits
echo "the temperature is 97 degrees farenheit" | tr -cd [:digit:]