### CDS NYU
### DS-GA 1007 | Programming for Data Science
### Lecture 04
### September 20, 2024

---

# Interacting with Programs

## Introduction


The operating system (OS) of a computer can organize your data files and folders, install and/or update Python interpreters, libraries/packages, and set up a Python environment exactly as you need it.

To interact with the OS on modern computers the starting point (once logged into your OS) is often a graphical user interface (GUI). But GUIs, as discussed in the lecture, offer relatively little control over the quasi-infinite number of instructions which an OS can receive. GUI instructions are also very manual, not easily reproducible, and can become obsolete when new versions of the OS GUI are released. In contrast, scripted instructions passed to a command line interface (CLI) are portable, automatable, timeless, and offer total control over your OS.

- To access the CLI from your OS's GUI, you need to open a console (terminal window) which is an emulator of the so-called "Shell", a special type of program (an interpreter) which reads keyboard commands from the Unix (or Linux) language and passes them to the OS to carry out

- Shell takes in alphanumeric symbols and transforms them into numeric (machine) instructions for the OS. The most common Shell is called BASH, which stands for Bourne Again SHell, in reference to Steve Bourne (not Jason:), the creator of the first Shell program called SH

- All modern OS (Windows, Mac, Linux) come with a pre-installed Shell terminal emulator program

- There are at least 100+ very useful Linux built in commands, each with its set of optional parameters. We will review a sample of ones I found most essential over the years, to show you the kind of interactions with the OS you can have when you (the programmer) send direct (scripted) commands to your OS


## To access the Shell
▶ Windows: Open cmd (Command Prompt)

▶ Mac: Open Terminal

▶ Linux: Open Linux Console

▶ Python Interpreter: Precede command by ”!” (or "%", or nothing for most common Linux commands)

▶ (Web-) Applications: It varies (on Jupyter: Terminal)

## Resources

* Introduction to the Linux Shell operating system: https://swcarpentry.github.io/shell-novice/reference/

## Navigate Files and Directories

**pwd**: Print name of current directory 

**ls**: List directory contents

**cd**: Change directory

**file**: Determine file type

**less**: View text file contents 

**head/tail**: Output first/last part of file

In [1]:
# Print name of working directory = print current path 
%pwd


'/Users/qianruzhang/Documents/GitHub/Programming-for-Data-Science/Week04'

In [2]:
# List all files and directories within pwd
%ls

2024_dsga1007_lect04.ipynb


In [3]:
# Change directory (Need use "%" instead of "!" to change directory from Jupyter Notebook)
%cd ..
%cd code
%cd {YourPath}/code 

/Users/qianruzhang/Documents/GitHub/Programming-for-Data-Science
[Errno 2] No such file or directory: 'code'
/Users/qianruzhang/Documents/GitHub/Programming-for-Data-Science
[Errno 2] No such file or directory: '{YourPath}/code'
/Users/qianruzhang/Documents/GitHub/Programming-for-Data-Science


In [4]:
# View type of file
!file lyrics.txt

lyrics.txt: cannot open `lyrics.txt' (No such file or directory)


In [5]:
# View text file contents
!less lyrics.txt
#!more lyrics.txt
#!head lyrics.txt
#!head -10 lyrics.txt
#!tail lyrics.txt
#!tail -f lyrics.txt # Track in real time; To exit in Shell, type Crtl-C 

lyrics.txt: No such file or directory


## Manipulate files and Directories

**cp**: Copy files and directories

**mv**: Move/rename files and directories 

**mkdir**: Create directories

**rm**: Remove files and directories

**chmod**: Change a file’s permissions

Tips for variable names in Linux:
1. Only use letters, numbers, '.', '-', '_'
2. Do not use whitespaces. Instead use '-' or '_'  
3. Do not begin the name with '-' 
4. Commands treat names starting with '-' as options  
5. Surround the name in quotes '""' if name contains whites spaces (can also use backslash '\\' to define explicit white spaces)

In [None]:
%mkdir newsong
ls
cp lyrics.txt newsong/.
ls -R
mv lyrics.txt newsong/. 
ls
cp newsong/lyrics.txt .
ls -l
chmod a+w lyrics.txt
ls -l 
rm -r newsong

In [None]:
# Note the wild characters * and ?, which work with all commands (ls, cp, etc)
!ls *.txt # A * replaces any number of alphanumeric characters
!ls *.?? # Each ? replaces a single alphanumeric character 

## Find things 

**man**: Display a command’s user manual page

**find**:  Find objects whose names match a pattern 

**grep**: Find lines matching a pattern in a text file

**sed s/x/y/ f**:  Find and replace x by y in file f

**history -n**: Print last n commands typed in

In [None]:
# Display a command’s user manual page
%man ls

In [None]:
# Find all directories in current directory
!find . -type d

In [None]:
# Find all files in current directory
!find . -type f

In [None]:
# Find a file called "lyrics.txt" from two levels up through all possible two levels down
!find ../../*/*/lyrics.txt

In [None]:
# Find lines containing a word or pattern in a file
!grep Hallelujah lyrics.txt 

In [None]:
# Find and replace all occurences of a word or pattern in a file
!sed s/Hallelujah/Eureka/g lyrics.txt > newsong.txt

In [None]:
# One of the most useful command when working with CLI
%history

## Parse text files in VI (beyond scope of this course)

Several editors exist to parse files directly in the Shell, we will review only one, VI, and how some basic keystrokes can let you find or replace what you are looking for in text files

In [None]:
# Open Shell terminal emulator to parse file in VI. Warning: Don't do it from Jupyter Notebook, open a Linux terminal
# vi newsong.txt

## Manage Processes

A **process** (a.k.a. **job**) represents the execution of a program on a computer. The OS is the system that operates all these processes in parallel. Or rather, it operates them one at a time so fast that, to our perception, these processes seem to take place in parallel. Many commands exist to monitor, launch, inquire, pause or kill, every process taking place on the computer. Let's review the most common ones below 

In [None]:
# Let's use a custom script as example of process
!python factorialalgo.py 10 > out.log

### Example: Redirect output to file and manage processes (beyond scope of this course)

In [None]:
'''
Open a Shell terminal emulator, then run the following commands in the Shell, 
to practice running jobs, monitoring jobs, pausing and killing jobs, 
in foreground, in background, and on single or multiple jobs running in parallel
'''

# python factorialalgo.py 1000000 > out1M.log
# Ctrl-z
# jobs
# bg
# jobs
# top
# fg %1
# Ctrl-c
# python factorialalgo.py 1000000 > out1M.log &
# python factorialalgo.py 2000000 > out2M.log &
# jobs
# tail out1M.log
# tail -f out1M.log
# Ctrl-c
# kill %1
# jobs
# kill %2
# jobs

### Simple exercise (needed for homework) using the pipe "|" command 

In [None]:
%cd animal-counts
!less animals.txt

In [None]:
# Let's look at a new command to quickly parse a file: cut
!man cut

In [None]:
# The -d flag defines a delimiter to split each line (e.g., comma)
# The -f flag selects a specific column field in each line
!cut -d "," -f 2 animals.txt

In [None]:
# Pipe the output to the input of the sort command
!cut -d "," -f 2 animals.txt | sort

In [None]:
# Pipe the new output to the input of yet another command
!cut -d "," -f 2 animals.txt | sort | uniq

**Output**: List unique animals in the file, sorted by alphabetic order.

## Some more advanced Linux commands

### Archive files and folders (beyond scope of this course)

In [None]:
# Package multiple files and folders into a single "archive" file
%cd ../..
!tar -cvf lect04.tar code *.pdf

# Same but with option to compress files first (reduce file size)
!tar -czvf lect04.tar.gz code *.pdf

# Uncompress and extract an archive (= reverse process) to a new folder
!mkdir dropithere
!tar -xzvf lect04.tar.gz -C ./dropithere


### The For Loop (needed for homework)

```for VARIABLE in SET  
do  
    COMMAND  
done```  

In [None]:
!for i in 1 2 3 4 5; do echo $i; done

In [None]:
%cd {YourPath}/code  
!for f in *.log; do grep 'Job completed' $f; done

$(command) inserts a command’s (or variable's) output in place

More details: https://www.cyberciti.biz/faq/bash-for-loop/

## Shell Scripts

#### How can I save and re-use commands?

#### => Write a script that contains a command or series of commands, and save this script in a file.

Then, you can execute this Shell script at any later time from the command line interface, the Shell interpreter will read and execute each commands stored in the file.

**Shell scripts can take arguments when invoked. As shown below, ```$1```, ```$2```, ... refer to the first command-line argument, second command-line argument, etc** (place variables in quotes if the values might have spaces in them). This lets future users (and future you:) decide what input to process.

#### Let us look at an example:
Script to select line above and line below any chorus in the file lyrics.txt

In [None]:
#%cd .. 
!head -n 11 lyrics.txt # Select first 11 lines in file lyrics.txt

In [None]:
!head -n 11 lyrics.txt | tail -n 6 # Select 6 lines prior line 11 in file lyrics.txt

**You can store the above command (= set of piped commands) into a file, which we then call a Shell script.** Let us call this file `selectlines.sh`

To make this Shell script easier to use, you can read the values of parameters given as arguments on the command line when the script is invoked, and assign these values to variables:

```head -n $3 $1 | tail -n $2```

Now you can invoke the Shell script like so: 

```bash selectlines.sh lyrics.txt 6 11```

Instead of specifying the bash interpreter, you can assign it to the script by adding ```#!/bin/bash/``` in the first line of the script.

Change the file's permissions to make it "executable" by typing ```chmod a+x selectlines.sh```.

You can now directly use this script as a new, custom Shell 'command':

```./selectlines.sh lyrics.txt 6 11```


In [None]:
# chmod a+x selectlines.sh
!./selectlines.sh lyrics.txt 6 11

## Manage libraries and environments (beyond scope of this course)

### Python Virtual Environments: 

- It may not be possible for one Python installation to meet the requirements of every program and application

- The requirement in type and versions of Python libraries needed for two different programs can be in conflict

- Creating a virtual environment, that is a self-contained directory tree that contains a specific Python setup with specific versions of Python interpreter and external packages, exists for this purpose

More details: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf



### Examples of typical Conda commands

```
python 
Ctrl-D

conda env list
conda create --name testenv python=3.9 # check the version first
conda activate testenv 
conda deactivate 

conda activate testenv 
conda update scikit-learn
conda install scikit-learn
conda install numpy 
conda list
conda deactivate
conda env remove --name testenv 
conda env list
```

## Additional resources on CLI and Conda installation setup: 
    
https://nyu-cds.github.io/courses/advanced-setup/


To install:  

1. Shell CLI (Bash)  
 - Linux, MacOS provided  
 - Windows: https://www.windowscentral.com/how-install-bash-shell-command-line-windows-10  


2. Conda {Linux, Mac, Windows}:   
https://conda.io/projects/conda/en/latest/user-guide/install/index.html  


3. Ipython  
https://ipython.org/install.html  


4. Jupyter notebook  
https://jupyter.org/install

    