# Automating processing

## ❓ Questions
- How do I automate my data workflows?


## ❗ Objectives
- Create a pipeline from download to processing.
- Understand available methods of automation.


# Initial setup for Google Drive
Some parameters we'll need throughout the lesson. Please run these cells!

In [None]:
import os
from os.path import join

from google.colab import drive
google_dir = '/content/drive'
drive.mount(google_dir)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
os.listdir(join(google_dir, 'MyDrive'))

['Corporate-Powerpoint-Template.pptx',
 'Show and Tell.pptx',
 'CIC drop in session - Keegan.gdoc',
 'Quick_meeting_slides_2021-05-09.pptx',
 'asdaf',
 'pipe3_20k.mp4',
 'chl_7_bin_1_stride_4fps.mkv',
 'WAT_Waste_City_of_canning_Project_Lessons_Learnt.xlsx',
 'slides',
 'WCCC_yolov4-unseen.mp4',
 'CIC Carpentries Collaborative Google Doc.gdoc',
 'RezBaz 22.gslides',
 'S&T',
 'ASDAF_BMT_UHI_JIRA_New_Project_Questionnaire.xlsx',
 'CHL_Weekly_2018_1440p.mp4',
 'Chlor_a_Weekly_2018_1440p.mp4',
 'UChl_abs_Weekly_2018_1440p.mp4',
 'solo work',
 'Untitled form.gform',
 'CIDS Computational Resources 2024-03-22.gslides',
 'Colab Notebooks',
 'CIC_Carpentries_Python-master',
 'workshop_google',
 '202404_Intro_Rrs.gslides']

In [None]:
project_dir = join(google_dir, 'MyDrive', "workshop_google")
storage_location = join(project_dir, "workshop_data")

os.makedirs(storage_location, exist_ok=True)

In [None]:
!ls {project_dir}

data		 google_requirements.txt  notebook_pictures  notebooks_colab  workshop_data
environment.yml  LICENSE		  notebooks	     README.md


In [None]:
!pip install -r {project_dir}/google_requirements.txt

Collecting rioxarray (from -r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 2))
  Downloading rioxarray-0.15.5-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.5/60.5 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Collecting earthpy (from -r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 4))
  Downloading earthpy-0.9.4-py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
Collecting pystac-client (from -r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 11))
  Downloading pystac_client-0.8.2-py3-none-any.whl (33 kB)
Collecting rasterio>=1.3 (from rioxarray->-r /content/drive/MyDrive/workshop_google/google_requirements.txt (line 2))
  Downloading rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl (21.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.5/21.5 MB[0m [31

# Automating the entire process
Previously we've applied single processes to our images.  
Part of automation is encapsulating these processes into functions, and encapsualting those functions into larger processes.

Let's try to put several of the lessons we've done together into functions in a script that we can call in this notebook.

# Automatically Running a script
The final part of the puzzle here is no longer activating the script ourselves.  

There are many ways to do this with just as many providers, including:  
- Pawsey's Nimbus Cloud
- ARDC Nectar
- AWS EC2
- Azure Virtual Machines
- Your own Linux machine

As Nimbus seems quite popular, we'll focus on tools that can be used from Nimbus, such as `cron`.