
# Introduction to UNIX Streams and Pipes

![ghostbusters](https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExejhtamY1OHdoaXg3M253MGQxOXZ2and1aW05a3ZtaHRuc3hlYjR6NSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/3o72FiKtrMAjIb0Rhu/giphy.gif)




---

## What Are UNIX Streams?

- **Streams** are a series of bytes of data that flow from one place to another.
- There are three standard streams:
  1. **Standard Input (stdin)**: Data coming into a program.
  2. **Standard Output (stdout)**: Data the program outputs.
  3. **Standard Error (stderr)**: Error messages from the program.

---




---

## Standard Streams Breakdown

- **stdin**: Usually from the keyboard but can be from files or other programs.
- **stdout**: Default is your terminal window, but it can be redirected elsewhere.
- **stderr**: Like stdout but meant for error messages, so it can be handled separately.

---



### 1. **Basic Example of UNIX Streams**

Here’s a simple example of using `fortune` to generate output, which is written to stdout:

```bash
fortune 
```

- **Explanation**: 
  - `fortune` outputs a random quote. It is commonly installed on linux systems and may be on osx...



In [None]:
%%bash



In [None]:
%%bash




---
## What Are UNIX Pipes?

- **Pipes (`|`)**: A method of connecting the output of one command directly into the input of another command.
  
  - Example:
    ```bash
    command1 | command2
    ```
  - Output of `command1` becomes input of `command2`.

---



#### 1. **Basic Example of UNIX Streams**

Here’s a simple example of using `fortune` to generate output, which can be displayed using `cowsay`:

```bash
fortune | cowsay
```

- **Explanation**: 
  - `fortune` outputs a random quote.
  - `|` pipes that output into `cowsay`, which formats it as a speech bubble around an ASCII cow.

---


In [None]:
%%bash




In [None]:
%%bash




---
## Benefits of Pipes

- **Modular Processing**: Each command in the pipeline does one job.
- **Efficiency**: Avoids creating temporary files.
- **Flexibility**: You can combine simple commands to perform complex tasks.

---




---
## Real-Life Example of Pipes

- Find all text files in a directory and count the number of lines:
  ```bash
  find . -name "*.txt" | xargs wc -l
  ```

---



In [None]:
%%bash



In [None]:
%%bash




---
## Redirecting Streams

- Use **`>`** to redirect stdout to a file:
  ```bash
  echo "Hello, World!" > output.txt
  ```
- Use **`>>`** to append to a file:
  ```bash
  echo "More data" >> output.txt
  ```
- Redirect **stderr** with **`2>`**:
  ```bash
  some_command2 2> error_log.txt
  ```

---



In [37]:
%%bash


some_command2 2>file.log

CalledProcessError: Command 'b'\n\nsome_command2 2>file.log\n'' returned non-zero exit status 127.

In [39]:
%%bash





In [39]:
%%bash






---
## Combining stdout and stderr

- Redirect both stdout and stderr:
  ```bash
  some_command2 > output.txt 2>&1
  ```

---



In [None]:
%%bash



In [None]:
%%bash




---
## Filters in Pipes

- **Filters**: Commands that process input and produce output.
  - Examples: `grep`, `sort`, `cut`, `awk`, `sed`.
  
- Example:
  ```bash
  ps aux | grep "python" | sort -nrk 3
  ```
  - Finds running Python processes and sorts by memory usage.

---



In [None]:
%%bash

ps aux | grep python | sort -nrk 3 | head -n 5 | cowsay

In [None]:
%%bash




---
## Summary

- **Streams**: stdin, stdout, stderr – standard communication channels.
- **Pipes**: Connect output of one command to the input of another.
- **Redirection**: Modify where input/output goes, even to files.
- **Filters**: Tools to manipulate data within pipes for flexible processing.


# Note: Most command line bioinformatics programs can be used with streams, pipes, redirection and filters.
---



# UNIX Text Processing Tools: grep, sort, cut, awk, and sed

---

## Introduction to `grep`

- **`grep`** is used to search for patterns within files.
- It stands for **global regular expression print**.
  
### Syntax:
```bash
grep [options] pattern [file...]
```

### Example:
```bash
grep "error" log.txt
```
Searches for the word "error" in log.txt.

---

---

## grep Options

-i: Case-insensitive search.
-v: Invert match (show lines that don't match the pattern).
-r: Search directories recursively.

Example:
```bash
grep -i "warning" log.txt
```

Case-insensitive search for "warning" in log.txt.

---

---

# Introduction to sort

sort is used to sort lines of text files.

Syntax:

```bash
sort [options] [file...]
```

Example:
```bash
sort data.txt
```

Sorts the contents of data.txt in alphabetical order.

---

---

## sort Options

-r: Sort in reverse order.
-n: Numeric sort.
-k: Sort by a specific column.

Example:
```bash

sort -k 2,2 -n data.txt
```

Sorts data.txt numerically based on the second column.

---

---

## Introduction to cut

cut is used to extract specific fields from files.

Syntax:
```bash
cut [options] [file...]
```

Example:
```bash
cut -d "," -f 1,3 data.csv
```

Extracts the 1st and 3rd columns from data.csv using , as a delimiter.

---

---

## cut Options

-d: Specify the delimiter.
-f: Specify fields to extract.

Example:
```bash
cut -d " " -f 2-4 data.txt
```

Extracts the 2nd to 4th fields from data.txt using a space delimiter.


---


## Introduction to awk

awk is a powerful text processing tool, particularly for structured data.

Syntax:
```bash
awk 'pattern {action}' [file...]
```

Example:

```bash
awk '{print $1, $3}' data.txt
```

Prints the 1st and 3rd columns of each line from data.txt.

## awk Example with Conditions

You can use conditions within awk to filter data:

Example:
```bash
awk '$3 > 100 {print $1, $3}' data.txt
```

Prints the 1st and 3rd columns where the value in the 3rd column is greater than 100.


## Introduction to sed

sed is a stream editor for filtering and transforming text.

Syntax:
```bash
sed 'command' [file...]
```

Example:
```bash
sed 's/error/ERROR/g' log.txt
```

Replaces all occurrences of "error" with "ERROR" in log.txt.

## sed Options and Examples

-i: Edit files in place.

s/pattern/replacement/: Replace pattern with replacement.

Example:
```bash
sed -i 's/warning/NOTICE/g' log.txt
```

Replaces "warning" with "NOTICE" in log.txt and saves the changes.

## Combining Tools: A Practical Example

Use a combination of grep, cut, and sort to process data:

Example:
```bash
grep "error" log.txt | cut -d " " -f 2,4 | sort -n
```

Finds lines with "error" in log.txt, extracts the 2nd and 4th fields, and sorts them numerically.

## Summary

grep: Searches for patterns.
sort: Sorts lines.
cut: Extracts specific columns.
awk: Processes structured text with conditions.
sed: Edits and transforms text streams.


## Bash Examples for Each Tool

1. grep Examples:
```bash
# Basic usage to search for a pattern
grep "error" log.txt

# Case-insensitive search
grep -i "error" log.txt

# Search recursively through directories
grep -r "error" /var/logs/
```

2. sort Examples:
```bash
# Sort alphabetically
sort data.txt

# Sort numerically
sort -n numbers.txt

# Sort by the second column
sort -k 2,2 -n data.txt
```

3. cut Examples:

```bash
# Extract first and third fields from a CSV file
cut -d "," -f 1,3 data.csv

# Extract fields from 2 to 4 using space as delimiter
cut -d " " -f 2-4 data.txt
```

4. awk Examples:

```bash
# Print the 1st and 3rd columns of a file
awk '{print $1, $3}' data.txt

# Print lines where the 3rd column is greater than 100
awk '$3 > 100 {print $1, $3}' data.txt
```

5. sed Examples:
```bash
# Replace "error" with "ERROR" in a file
sed 's/error/ERROR/g' log.txt

# Edit a file in place, replacing "warning" with "NOTICE"
sed -i 's/warning/NOTICE/g' log.txt
```

6. Combining Tools:

```bash
# Combine grep, cut, and sort to process data
grep "error" log.txt | cut -d " " -f 2,4 | sort -n

```