# Week 4: Automating Tasks with Python

*February 8, 2024*

**Jumpstart Comprehension Check**

What will be printed to the screen as a result of running this Python code?

```py
co2_list = [315, 317, 323, 330, 334, 339]
sites = ["Mauna Loa", "Barrow", "Mauna Loa", "Mauna Loa", "Barrow", "Mauna Loa"]

for i, value in enumerate(co2_list):
    if sites[i] == "Mauna Loa":
        if sites[i-1] == "Barrow":
            print(value)
            break
        print(value)
```


## Writing Python Scripts

### Your First Python Script

We may want this as a script because...?

In [1]:
import pandas as pd

data = pd.read_csv("http://files.ntsg.umt.edu/data/GIS_Programming/data/GPPDI_ClassA_NPP_162_R2.csv")

# Group the data by VEG_TYPE
summary = data.groupby("VEG_TYPE")

# Print the means of the TNPP_C column
print(summary["TNPP_C"].mean().round(0))

VEG_TYPE
Alpine meadow steppe          263.0
Boreal Forest / Deciduous     435.0
Boreal Forest / Evergreen     421.0
Cold desert steppe            160.0
Forest-meadow-paramo          227.0
Miombo woodland               599.0
Savanna                       115.0
Sub-tropical semi-desert       93.0
Temperate coniferous fore     516.0
Temperate dry steppe          187.0
Tropical Forest               981.0
Tropical forest               974.0
coniferous forest             409.0
derived savanna               799.0
grassland                     227.0
humid savanna                 204.0
humid temperate prairie       282.0
mixed prairie                 358.0
shortgrass prairie            367.0
temperate dry steppe          165.0
tropical grassland           1592.0
upland oak forest             940.0
western montane forest        565.0
Name: TNPP_C, dtype: float64


### Accepting User Input

### Parsing User Input

```py
# scripts/Week02_dms_to_dd.py

import sys

def dms2dd(degrees, minutes, seconds):
    degs = int(degrees)
    mins = int(minutes)
    secs = int(seconds)
    return degs + (mins / 60.0) + (secs / (60.0 * 60.0))

result = dms2dd(sys.argv[1], sys.argv[2], sys.argv[3])
print(result)
```

---

**Try running this as:**


```sh
python dms2dd.py 45 12 57
```


**NOTE:**

- **Why do we start indexing `sys.argv` at 1 instead of 0?**
- **Why do we call the `print()` function at the end of this script?**
- **Why do we convert our function's inputs to integers?**
- **What happens if you remove the trailing zero from `60.0`?**

---

## Documentation

Can provide a documentation string with three quotes:

'''

Now everything between the strings is documentation.

'''

Doc strings go directly under the `def` line. They appear when you call `help()` on the function.

Example below.

In [3]:
import sys

def dms2dd(degrees, minutes, seconds):
    '''
    Converts degrees, minutes, seconds to decimal degrees.

    Parameters
    ---------
    degrees: int
    minutes: int
    seconds: int

    Returns
    ---------
    float
    '''
    degs = int(degrees)
    mins = int(minutes)
    secs = int(seconds)
    return degs + (mins / 60.0) + (secs / (60.0 * 60.0))

In [4]:
help(dms2dd)

Help on function dms2dd in module __main__:

dms2dd(degrees, minutes, seconds)
    Converts degrees, minutes, seconds to decimal degrees.
    
    Parameters
    ---------
    degrees: int
    minutes: int
    seconds: int
    
    Returns
    ---------
    float



---

## From Scripts to Modules

In [5]:
import datetime

import matplotlib.pyplot as plt

#### 'Dunders' or double-underscore modules

In [7]:
datetime.__file__

'C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\lib\\datetime.py'

In [8]:
datetime.__name__

'datetime'

Alias does not change actual name, as shown below

In [9]:
import datetime as dt
dt.__name__

'datetime'

In [10]:
print(__name__)

__main__


---

## The Python Module Pattern

The `dms2dd.py` script you just wrote is an important Python programming pattern that you'll use again and again in your career. Consider this example:

---

```py
import pandas as pd

GROUP_COLUMN = "VEG_TYPE"
VALUE_COLUMN = "TNPP_C"

def main(csv_file): 
    data = pd.read_csv(CSV_FILE)
    summary = group_data(data_frame)
    # Print the means of the TNPP_C column
    print(summary[VALUE_COLUMN].mean().round(0))


def group_data(data_frame):
    # Group the data by VEG_TYPE
    return data_frame.groupby(GROUP_COLUMN)


if __name__ == '__main__':
    import sys
    main(sys.argv[1])
```

---

Even though the `group_data()` function consists of just one line of code, you can imagine that in a real-world example this function could be pretty complicated. 

**It's good practice to identify re-useable parts of your code and put them in separate functions. This also helps to organize your code and make it clearer to read.** The `main()` function is just that: the main part of your script that orchestrates all the tasks that should be executed when you run the script.

---

## Repeating Tasks within Loops

In [None]:
import pandas as pd

data = pd.read_csv("http://files.ntsg.umt.edu/data/GIS_Programming/data/monthly_in_situ_co2_mlo.csv")
data

### Reading Data from Files

We can also *write* data files using this same pattern.

In [None]:
import datetime

with open("temp.csv", "w") as file:
    writer = csv.writer(file)
    
    # Write the header row first
    writer.writerow(["date", "value"])
    
    record = data[data['year'] == 2012]['co2_ppm'].values
    
    # Then, write each data record as a new row
    for month in range(1, 12):
        # Create a date for each month, e.g., the 1st of each month
        date = datetime.date(2012, month, 1)
        writer.writerow([date.strftime('%Y-%m-%d'), record[month-1]])