# Automation and Make

As we work in a project, we often encounter certain commands and operations that we end up running multiple times. Many of these operations regards the behaviour of certain programs and corresponds to programs that we execute from the terminal.For example, so far in this course we were
- File management: creation of files/folders. 
- Running code from Python scripts and Jupyter notebooks that perform certain analysis, reading data and genereting outputs. 
- Creating virtual environemnte; activate them; install new packages; creating an iPython kernel. 
- Creating a JupyterBook 

As our workflow of grows, these operations start becoming more complex and dependent of each other. Make allow us not just to automatize the execution of programs, but also keep a track of the network of commands between the different parts of out project. 

Now, we can execute python scrips and even Jupyter notebooks 

## 0. Setup

Let's consider the following piece of code inside our [Eratosthenes project](https://github.com/UCB-stat-159-s23/facusapienza21-eratosthenes). Let's create a new Python script called `calculate_prime.py` with the following piece of code
```python
# calculate_primes.py

import sys
import math
import numpy as np

def sieve(nmax):
    """
    Function to compute prime numbers. 
    
    Arguments: 
        - nmax: integer. Upper bound for prime search.
    Ourputs:
        - all_primes: list. List with all the prime numbers slower than nmax
    
    """

    all_primes = []

    if nmax == 2: 
        all_primes = [2]
    else:
        primes_head = [2]
        first = 3
        primes_tail = np.arange(first,nmax+1,2)
        while first <= round(math.sqrt(primes_tail[-1])):
            first = primes_tail[0]
            primes_head.append(first)
            non_primes = first * primes_tail
            primes_tail = np.array([ n for n in primes_tail[1:]
                                    if n not in non_primes ])

    all_primes = primes_head + primes_tail.tolist()
    
    return all_primes


if __name__ == '__main__':
    n = int(sys.argv[1])
    print(sieve(n))
```

The last part of `calculate_prime.py` includes the `__main__` header. This is what allow us to run and read arguments directly from the terminal. Now, from the terminal we can run `sieve()` with 
```bash
python calculate_sieve.py 10
```
which should print the list `[2, 3, 5, 7]`.
```{warning}
Remember to check in which environment you are running this code! If you do this from the `base` environment this won't work, since `numpy` is not installed there. As we always emphasize, always check in which environment you are running code. You can activate the `notebook` environment or use the environment you created for the Eratosthenes project in [Lab 04](https://ucb-stat-159-s23.github.io/site/lab/lab04/lab04.html).
```
Now, let's move thing a little bit around. Instead of passing the argument variables by the terminal and then printing the outputs, let's create an `input.txt` and `output.txt` file that reads a list of arguments and save them in an output file. We can archive this by modifying the previous script to include 
```python
if __name__ == '__main__':
    input_file = sys.argv[1]
    output_file = sys.argv[2]
    # Read each line of the file
    with open(input_file) as file:
        lines = file.read().splitlines()
    results = []
    for n in lines:
        results.append(sieve(int(n)))
    # Save values
    with open(output_file, 'w') as output:
        for i, res in enumerate(results):
            output.write("{} {}\n".format(lines[i], res))
```

Create now an `data/input.txt` file with one integer number per line, create a folder called `results`, and now execute 
```bash
python3 calculate_prime.py data/input.txt results/output.txt
```
This will create the file `output.txt` file inside the folder `results` with the printed outputs.

## 1. Automation with Bash

Now, if we now want to perform one simple operation, we can run individually commands form the terminal. However, 

1. This doesn't look fully reproducible 
2. It doesn't escalate very well when our analysis requires execution of multiple program lines. 
3. Do not generalize very well to cases with different input/output files. 

Notice that the workflow introduced in the previous section required at least three steps: the activation of the correct conda environment, the creation of the output folder, and the execution of the Python script. 

A first solution to some of this problems will be to create a Bash script that executes all these operations. Let's make this 
```bash
#!/bin/bash

conda activate notebook
mkdir results 
python calculate_prime.py input.txt results/output.txt
```
The header of the file has the shebang `#!` that indicates that this is an executable file. You will probably need to change the permission to the file in order to execute it. Explore the `chmod` command in bash for doing this



### 2. Our first Makefile

### Example: Make for creating a new environment