## Command Line Automation with Python

 - Shell operations inside of the IPython terminal
 - The ! syntax executes shell commands
 - The output of shell commands can be captured by assigning them to variables
 - You can use these variables with methods such as type() and this is very powerful as you can blend Python and IPython
 - If you use this syntax in a python script however it will throw a Syntax Error
 - The Python subprocess module can be used instead
 - You can use the Python interpreter either by running a full script or by passing a program to it via -c

In [1]:
! python -c "import datetime;print(datetime.datetime.utcnow())"

2023-06-18 13:05:27.275938


In [2]:
# Display free disk usage using the h flag
!df -h

Filesystem       Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk3s1s1  228Gi  8.4Gi  139Gi     6%  355384 1456296520    0%   /
devfs           202Ki  202Ki    0Bi   100%     698          0  100%   /dev
/dev/disk3s6    228Gi   20Ki  139Gi     1%       0 1456296520    0%   /System/Volumes/VM
/dev/disk3s2    228Gi  4.4Gi  139Gi     4%     936 1456296520    0%   /System/Volumes/Preboot
/dev/disk3s4    228Gi   42Mi  139Gi     1%      54 1456296520    0%   /System/Volumes/Update
/dev/disk1s2    500Mi  6.0Mi  483Mi     2%       3    4942880    0%   /System/Volumes/xarts
/dev/disk1s1    500Mi  6.2Mi  483Mi     2%      31    4942880    0%   /System/Volumes/iSCPreboot
/dev/disk1s3    500Mi  364Ki  483Mi     1%      34    4942880    0%   /System/Volumes/Hardware
/dev/disk3s5    228Gi   76Gi  139Gi    36% 2022727 1456296520    0%   /System/Volumes/Data
map auto_home     0Bi    0Bi    0Bi   100%       0          0  100%   /System/Volumes/Data/home


In [6]:
# Create a random day generator
for i in range(10):
    ! python -c "from random import choices;days = ['Mo','Tu','We','Th','Fr','Sa','Su']; print(choices(days))"


['Th']
['Su']
['We']
['We']
['Sa']
['Th']
['Mo']
['Sa']
['Tu']
['Mo']


In [7]:
# Navigate to the assets/git_assets directory and check how many total files have the extension .png
# Store the results of command to the variable var.
var = !ls -h assets/git_assets/*.png

# Run len() on var to print the number of .png files that live in the directory.
print(len(var))


4


#### Unix

 - Unix pihilosophy: simple tools can be combined for sophisticated solutions via piping the output of one command to another
 - ls, grep (filter), cat, awk

<img src="assets/cmd_auto/unix_pipes.png" style="height: 280px;"/>

#### Capture the shell output with bash magic function

In [13]:
%%bash --out output
echo "Running Directory Audit Script"
CSV=`ls -l test_dir/*.csv | wc -l`
TXT=`ls -l test_dir/*.csv | wc -l`
echo 'The directory contains this a total # *.csv files: ' $CSV
echo 'The directory contains this a total # *.txt files: ' $TXT

#### Capturing shell output with ! Syntax

In [None]:
ls_count = !ls -l | awk '{SUM+=$5} END {print SUM}'
type(ls_count)

Recently you setup a large network file system in your cloud deployment that all of the instances that perform machine learning will communicate with. As a result you want to verify that when the network filesystem is mounted on a new system, each worker node is able to create files with the correct permissions.

In [None]:
# Import packages
import subprocess
import os

# Setup
file_location = "/tmp/file.txt"
expected_uid = 1000

# Create a file
proc = subprocess.Popen(["touch", file_location])

# Check user permissions
stat = os.stat(file_location)
if stat.st_uid == expected_uid:
    print(f"File System exported properly: {expected_uid} == {stat.st_uid}")
else:
    print(f"File System NOT exported properly: {expected_uid} != {stat.st_uid}")


In [None]:
import subprocess

# Execute Unix command `head` safely as items in a list
with subprocess.Popen(["head", "/home/repl/workspace/poem.txt"], stdout=subprocess.PIPE) as head:
  
    # Print each line of list returned by `stdout.readlines()`
    for line in head.stdout.readlines():
        print(line)
    
# Execute Unix command `wc -c` safely as items in a list
with subprocess.Popen(["wc", "-c", "/home/repl/workspace/poem.txt"], stdout=subprocess.PIPE) as word_count:
  
    # Print the string output of standard out of `wc -c`
    print(word_count.stdout.read())

In [None]:
import subprocess

# Use subprocess to run the `ps aux` command that lists running processes
with subprocess.Popen(["ps", "aux"], stdout=subprocess.PIPE) as proc:
    process_output = proc.stdout.readlines()

# Look through each line in the output and skip it if it contains "python"
for line in process_output:
    if b"python" in line:
        continue
    print(line)

In [None]:
from subprocess import Popen, PIPE
import json
import pprint

# Use the with context manager to run subprocess.Popen()
with Popen(["pip","list","--format=json"], stdout=PIPE) as proc:
    # Pipe the output of subprocess.Popen() to stdout
    result = proc.stdout.readlines()

# Convert the JSON payload to a Python dictionary
# JSON is a datastructure similar to a Python dictionary
converted_result = json.loads(result[0])

# Display the result in the IPython terminal
pprint.pprint(converted_result)

----------
## Data processing in the shell

### CURL

- client url
- Unix command line tool
- transfers data to and from a server
- used to download data from HTTP(S) sites and FTP servers

- download from https://curl.se/download.html


- curl [option flags] [URL]
    - curl HTTP HTTPS FTP SFTP
    

- use the optional flag -O to save a file with its original name
    - curl -O https://someWebSiteName.com/datafilename.txt
- use the optional flag -o to save a file with a new name
    - curl -o renameddatafilename.txt https://someWebSiteName.com/datafilename.txt
    
    
- download every file hosted on https://someWebSiteName.com that starts with datafilename
    - curl -O https://websitename.com/datafilename*.txt
- download every file hosted on https://someWebSiteName.com that starts with datafilename001.txt and ends with datafilename100.txt
    - curl -O https://websitename.com/datafilename[001-100].txt
    
    
- -L redirects the HTTP url if a 300 error occurs
- -C resumes a previous file transfer if it times out vefore completion
- curl -L -O -C https://websitename.com/datafilename[001-100].txt


- As an alternative to curl, use wget

### csvkit
- a suite of command-line tools for converting to and working with CSV, the king of tabular file formats
- download from https://csvkit.readthedocs.io/en/latest/
> in2csv SpotifyData.xlsx > SpotifyData.csv

- csvcut returns the first column in the data
> csvcut -c 1 Spotify_MusicAttributes.csv
- csvgrep filter sby row using exact match or regex fuzzy matching
    - must be used with -m (exact row value to filter), -r (regex pattern), -f (path to file)
> csvgrep -c "track_id" -m 5RCPsfzmEpTXMCTNk7wEfQ Spotify_Popularity.csv

- other commands include csvstack (for stacking multiple csvs), csvlook (data preview on the command line) , csvstat (descriptive summary statistics per column)

### sql2csv
- executes an sql query on any sql database system and outputs the result to a csv file
> sql2csv --db "sqlite:///SpotifyDatabase.db" \
        --query "SELECT * FROM Spotify_Popularity" \
            > \> Spotify_Popularity.csv