# Lesson 2 - Session 2 - String mainpulation 

### Learning Objectives

1. Learn how to manipulate strings in Bash
2. Learn what is regular expression and how to use pattern matching it in Bash
3. Learn about useful tools like `jq`, `awk`, `sed` and `grep` for string manipulation in Bash
4. Show the example of string manipulation in Bash
5. Aknowledge the importance of string manipulation in data processing using bash script

## 1. Introduction

While bash script seems awkward for calculation and operations, it is very powerful for string manipulation. In this lesson, we will learn how to manipulate strings in Bash. We will also learn about regular expressions and how to use pattern matching in Bash. We will also learn about useful tools like `jq`, `awk`, `sed` and `grep` for string manipulation in Bash. We will show examples of string manipulation in Bash and acknowledge the importance of string manipulation in data processing using bash script.

## 1-1. Why bash script for string manipulation?

While it is true that every programming language can manipulate text, some languages are particularly well-suited for it. Python, for example, is widely used for its powerful built-in regular expression capabilities and data processing features. Perl is also one of the most powerful languages for string manipulation. However, learning Bash scripting for text processing is important as well. Not only is it the most commonly used language for system administration and automation, but it is also available on every Unix-like system by default, without requiring additional installation. Python, Lua, and Perl, on the other hand, need additional installation and configuration for many containers and cloud service VMs due to the need for lightweightness.

Of course, Bash scripting is not as powerful as Python or Perl for text processing, but it is still very powerful and can handle most text processing tasks. It is also very fast and efficient for simple text processing tasks. For example, if you need to extract a few lines from a log file, or if you need to extract a few fields from a Json or CSV file, Bash scripting is a good choice. It is also very useful for automating repetitive tasks, such as renaming files, searching for files, and so on.

Take a example of a simple configuration parsing and manipulation task. Suppose you have a configuration file `.env` that looks like this:

```bash
export PROXY_PASS="http://localhost:8080;"
export ALLOWED_IPS="192.168.0.0/24 1; 127.0.0.1/32 1; default 0;"
export VPC_IP_RANGES="192.168.0.0/16;"
```

and nginx file `nginx.conf.template` like this

```nginx
user www-data;
worker_processes auto;
pid /run/nginx.pid;
error_log /var/log/nginx/error.log;

events {
        worker_connections 768;
}

http {
    map $remote_addr $allowed_ip {
        ${ALLOWED_IPS}
    }
    server {
        root /var/www/html;
        index index.php index.html index.htm;

        listen 8080;
        server_name _;
        
        set_real_ip_from ${VPC_IP_RANGE}
        real_ip_header X-Forwarded-For;

        if ($allowed_ip = 0) {
            return 404;
        }
        
        location / {
            proxy_pass ${PROXY_PASS}
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
}
```

Then, you can install the configuration file into the nginx configuration file using the following Bash script:

In [15]:
%%bash

create_template_file() {
    declare -r nginx_template="
user www-data;
worker_processes auto;
pid /run/nginx.pid;
error_log /var/log/nginx/error.log;

events {
        worker_connections 768;
}

http {
    map \$remote_addr \$allowed_ip {
        \${ALLOWED_IPS}
    }
    server {
        root /var/www/html;
        index index.php index.html index.htm;

        listen 8080;
        server_name _;
        
        set_real_ip_from \${VPC_IP_RANGE}
        real_ip_header X-Forwarded-For;

        if ($allowed_ip = 0) {
            return 404;
        }
        
        location / {
            proxy_pass \${PROXY_PASS}
            proxy_set_header Host \$host;
            proxy_set_header X-Real-IP \$remote_addr;
            proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto \$scheme;
        }
    }
}"
    echo "$nginx_template" > nginx.conf.template
    # End of example
}

create_env_file() {
    cat <<EOF > .env
export PROXY_PASS="http://localhost:8080;"
export ALLOWED_IPS="192.168.0.0/24 1; 127.0.0.1/32 1; default 0;"
export VPC_IP_RANGES="192.168.0.0/16;"
EOF
}

create_template_file && create_env_file

set -a
. .env
set +a
envsubst '$PROXY_PASS $ALLOWED_IPS $VPC_IP_RANGE' < nginx.conf.template > nginx.conf

In [17]:
# must run this cell after running the above cell

with open("./nginx.conf", "r") as f:
    print(f.read())


user www-data;
worker_processes auto;
pid /run/nginx.pid;
error_log /var/log/nginx/error.log;

events {
        worker_connections 768;
}

http {
    map $remote_addr $allowed_ip {
        192.168.0.0/24 1; 127.0.0.1/32 1; default 0;
    }
    server {
        root /var/www/html;
        index index.php index.html index.htm;

        listen 8080;
        server_name _;
        
        set_real_ip_from 
        real_ip_header X-Forwarded-For;

        if ( = 0) {
            return 404;
        }
        
        location / {
            proxy_pass http://localhost:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
}



As you can see, the Bash script is very simple and easy to understand. It uses the `envsubst` command to substitute the environment variables in the `.env` file into the `nginx.conf.template` file. The resulting file is then saved as `nginx.conf`. This is a very simple example, but it shows how powerful Bash scripting can be for text processing tasks.

# 2. Useful tools for string manipulation in Bash

As you can see above, bash can be a powerful tools for string manipulation. However, there are some tools that can make string manipulation in bash even easier. Here are some of the most useful tools for string manipulation in bash:

## 2-1. `jq`

jq is json processor that can be used to manipulate json data. It is very powerful and can be used to extract, filter, and manipulate json data. Here is an example of how to use jq to extract a field from a json file:

Suppose you have a json file `data.json` that looks like this:

In [22]:
%%bash
cat <<EOF > data.json
{
    "name": "John Doe",
    "age": 30,
    "emails": [
        "johndoe@gmail.com",
        "janedoe@yahoo.com",
        "google@naver.com"
    ]
}
EOF

You can use jq to extract the `name` field from the json file like this:

In [23]:
%%bash

jq '.name' data.json

"John Doe"


Of course, you can extract one element from the list in json file. For example, you can extract the first email address from the `emails` field like this:

In [24]:
%%bash

jq '.emails[0]' data.json

"johndoe@gmail.com"


## 2-2. `awk`

If you use bash script for text processing, you will find `awk` to be very useful. `awk` is a powerful text processing tool that can be used to extract, filter, and manipulate text data. It can not only print file, but also can be used to extract fields, filter lines, calculate rows, and so on. This is an especially powerful tool for you can automate almost everything if you can use `awk` tool properly.

`awk` sees string data as records that has fields. By default, `awk` sees each line as a record and each field is separated by whitespace. You can change the field separator by using the `-F` option. You can also change the record separator by using the `-v` option. You can also use regular expressions to match fields. `awk` has many built-in functions that can be used to manipulate text data. You can also define your own functions in `awk`.

Here is an example of how to use `awk` to extract a field from a text file:

In [25]:
%%bash

cat <<EOF > data.rows
index name age email score grade
0 jane 30 jane@gmail.com 98 A
1 john 25 john@yahoo.com 85 B
2 alice 28 alice@outlook.com 88 B+
3 bob 22 bob@live.com 92 A-
4 charlie 35 charlie@icloud.com 78 C+
5 dave 27 dave@protonmail.com 81 B
6 eve 29 eve@zoho.com 95 A
EOF

### Print Rows

You can use `awk` to extract the `name` field from the text file like this:

In [44]:
!echo "$(awk '{print $2}' data.rows)"

:: names are :: 
name
jane
john
alice
bob
charlie
dave
eve


To print the row without the header, you can use just `NR > 1`, which indicates `row > 1` like the following command:

In [36]:
!echo ":: names are :: \n$(awk 'NR > 1 {print $2}' data.rows)"

:: names are :: 
jane
john
alice
bob
charlie
dave
eve


You can also use `awk` for the row like below:

In [43]:
!echo "Columns: $(awk 'NR==1' data.rows)"

Columns: index name age email score grade


To extract without the index, you can use command like:

In [48]:
!echo "Columns without index: \n$(awk 'NR==1 {for(i=2; i<=NF; i++) print $i}' data.rows)"

Columns without index: 
name
age
email
score
grade
