# Demo Setup

This demo will be run from the ACCRE cluster within a Jupyter Notebook. This can be done by first setting up a ssh tunnel from your local machine to the cluster, as we discussed at the last seminar (e.g. see: https://github.com/accre/Python/tree/master/notebooks). 

# Magic Commands and Cells

Recall that Jupyter notebooks support a bunch of magic commands and cell magic. In this tutorial, standard Unix commands will be run with line magics. Those commands that are not supported by line magic will be run with bash cell magic instead. Don't let this trip you up, you can ignore the magic symbols and just imagine that we are actually typing these commands from a normal bash session while connected to the cluster via ssh.

In [1]:
% lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %install_default_config  %install_ext  %install_profiles  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%latex  %%

# Getting Started

Let's start by creating a directory to work from for this demo. 

In [2]:
%cd ~

/gpfs22/home/frenchwr


In [3]:
%%bash
demodir=$(pwd)/pizza-and-programming   # create local shell variable that we can use throughout this demo
[[ -d $demodir ]] || mkdir $demodir    # check if directory exists; if not, create it

- Note that the double bracket notation is support by Bash but not all Unix shells. The double bracket is generally more powerful than a single bracket and is known as the extended test command. 
- The || operator will only execute the command on the right if the exit code of the command on the left is non-zero. In this case, we make a new directory only if it does not already exist.

In [4]:
%cd pizza-and-programming/

/gpfs22/home/frenchwr/pizza-and-programming


# grep, sed, and awk

These are three of the most powerful commands for doing quick line-by-line processing of files. Each of these commands supports a large number of options. In scientific and high-performance computing, they are commonly used for post-processing of simulation/analysis output or for preparing simulation/analysis input.

In [5]:
%%bash
demodir=$(pwd)
setpkgs -a git
[[ -d $demodir/SimpleMD ]] || git clone https://github.com/frenchwr/SimpleMD    # clone SimpleMD repo 

In [6]:
%cd SimpleMD

/gpfs22/home/frenchwr/pizza-and-programming/SimpleMD


In [7]:
%ls

[0m[01;33mbin[0m/     [01;33minc[0m/      nums.csv  [01;33mobj[0m/       sim-output.dat  [01;33msrc[0m/
[01;33mimages[0m/  Makefile  nums.txt  README.md  simulate.sh
[m

- grep stands for "get regular expression and print." In its most basic form, it's used to match lines within a file (or, perhaps more often, piped output from another command) containing a specified string or pattern. For example: 

In [8]:
%%bash
grep "CC" Makefile

CC=gcc
	$(CC) $(LDFLAGS) $(LDLIBS) $(OBJECTS) -o $@
	$(CC) $(CFLAGS) -c $< -o $@


In [9]:
%%bash
grep "cc" Makefile   # grep is case sensitive

CC=gcc


In [10]:
%%bash
grep -i "cc" Makefile   # case insensitive match

CC=gcc
	$(CC) $(LDFLAGS) $(LDLIBS) $(OBJECTS) -o $@
	$(CC) $(CFLAGS) -c $< -o $@


- One extremely useful option for grep is the -r option, which allows you to search recursively through groups of files embedded in source directories and so on. For example, let's say you are debugging a huge piece of code that you inherited from a previous grad student, and you want to see all lines of code containing a certain variable (called _virial_ in this example). Recursive grep will help you do this quite easily:

In [11]:
%%bash
grep -r -n virial src/    # -n also gives us the line number

src/energy_force.c:51:   myatoms->virial = 0.0;
src/energy_force.c:76:            myatoms->virial -= fterm * dis2;
src/energy_force.c:97:   myatoms->virial *= 24.0 * len_jo->eps;
src/props.c:40:                myatoms->virial / ( 3.0 * m_pars->float_N ) -


- grep can also be used to print consecutive lines within a file either before or after the occurance of a specificed string. For instance, in our README.md file we have the following output:

In [12]:
%cat README.md

# Simple MD Program

* This is a very basic NVE moleculer dynamics program that will
be used for the course I teach in the spring.

* This program borrows heavily from David Keffer's "The Working Person's 
Guide to Molecular Dynamics Simulations", although I have written the program in a different
language and also expanded/replaced certain parts of the program.

* The idea for the class is that students will start with this basic
serial version of the code and then add parallel support with PThreads,
OpenMP, MPI, CUDA, or whatever.

## Project Layout

- **bin/**: binary built to this directory.
- **images/**: photo gallery of simulation snapshots.
- **inc/**: contains header files with function prototypes, struct definitions, etc.
- **obj/**: object files stored in this directory.
- **src/**: source files.
- **Makefile**: automated build script.
- **simulate.sh**: shell script with examples for running program.

## Building Program

To build simply type:

``

- Let's say we'd like to compute some statistics in the example output from a simulation shown in the REAME.md file. To start, we first need to grab the lines containing the numeric data we care about. 

In [13]:
%%bash
# -A option prints n lines after matched line
# -B option prints n lines before matched line
# -v option is a reverse match (i.e match lines not containing specified string)
grep -A 1000 "Timestep" README.md | grep -B 1000 "Simulation Complete" | egrep -v "Timestep|Simulation Complete" | head

      0   3.883e-01  -8.432e-01  -4.54901e-01  150.00  -7.240e-05
    100   2.790e-01  -7.329e-01  -4.53904e-01  107.78  -7.789e-06
    200   3.488e-01  -8.022e-01  -4.53412e-01  134.75  -1.379e-05
    300   3.913e-01  -8.446e-01  -4.53270e-01  151.15  -1.782e-05
    400   3.670e-01  -8.203e-01  -4.53274e-01  141.76  -3.988e-06
    500   3.754e-01  -8.286e-01  -4.53236e-01  145.01  -7.795e-06
    600   3.850e-01  -8.383e-01  -4.53259e-01  148.73  -8.743e-06
    700   3.775e-01  -8.308e-01  -4.53237e-01  145.83  -1.073e-06
    800   4.006e-01  -8.539e-01  -4.53247e-01  154.75  -5.536e-06
    900   3.634e-01  -8.167e-01  -4.53250e-01  140.38  8.770e-06


- Voila! We used a combination of grep and egrep (extended grep) to pull out the pertinent lines. 
- This works but it's a little cumbersome and prone to error if we can't predict the number of lines we'll be grabbing ahead of time.
- Let's try with sed and/or awk instead.
- sed is a search and replace tool. We are not going to focus much on sed in this demo, but it can be used in the following way to extract the lines we are interested in. 

In [14]:
%%bash
sed -n '/Timestep/,/Simulation Complete/{/Timestep/b;/Simulation Complete/b;p}' README.md | head

      0   3.883e-01  -8.432e-01  -4.54901e-01  150.00  -7.240e-05
    100   2.790e-01  -7.329e-01  -4.53904e-01  107.78  -7.789e-06
    200   3.488e-01  -8.022e-01  -4.53412e-01  134.75  -1.379e-05
    300   3.913e-01  -8.446e-01  -4.53270e-01  151.15  -1.782e-05
    400   3.670e-01  -8.203e-01  -4.53274e-01  141.76  -3.988e-06
    500   3.754e-01  -8.286e-01  -4.53236e-01  145.01  -7.795e-06
    600   3.850e-01  -8.383e-01  -4.53259e-01  148.73  -8.743e-06
    700   3.775e-01  -8.308e-01  -4.53237e-01  145.83  -1.073e-06
    800   4.006e-01  -8.539e-01  -4.53247e-01  154.75  -5.536e-06
    900   3.634e-01  -8.167e-01  -4.53250e-01  140.38  8.770e-06


sed: couldn't flush stdout: Broken pipe


- In its most common application, ```sed``` is used as a search and replace tool:

In [62]:
%%bash
echo 'Original line:'
grep Pressure README.md
echo 'Updated line:'
sed 's/Timestep/Column_1/' README.md | grep Pressure

Original line:
 Timestep     KE          PE        TotE        Temp   Pressure
Updated line:
 Column_1     KE          PE        TotE        Temp   Pressure


In [71]:
%%bash
echo 'Original line:'
grep 'run the program with default parameters' README.md
echo -e '\nUpdated line:'
sed 's/a list/A LIST/' README.md | grep 'run the program with default parameters'
echo -e '\nFinal line:'
sed 's/a list/A LIST/g' README.md | grep 'run the program with default parameters'

Original line:
This will run the program with default parameters. To see a list of command-line options and a list of defaults, pass **--help** as a command-line argument:

Updated line:
This will run the program with default parameters. To see A LIST of command-line options and a list of defaults, pass **--help** as a command-line argument:

Final line:
This will run the program with default parameters. To see A LIST of command-line options and A LIST of defaults, pass **--help** as a command-line argument:


- ```awk``` can also be used to pull out the lines of output we want with slightly less code than ```grep``` and ```sed```. Think of ```awk``` as a tool that perform line-line-analysis on files, and print useful information along the way.
- Here, we are using awk to match the same strings as before and then using a simple dummy variable to instruct ```awk``` to print lines of output between the lines containing these strings.

In [15]:
%%bash
# variable f controls which records are shown
awk '/Timestep/{f=1;next} /Simulation Complete/{f=0} f' README.md | head

      0   3.883e-01  -8.432e-01  -4.54901e-01  150.00  -7.240e-05
    100   2.790e-01  -7.329e-01  -4.53904e-01  107.78  -7.789e-06
    200   3.488e-01  -8.022e-01  -4.53412e-01  134.75  -1.379e-05
    300   3.913e-01  -8.446e-01  -4.53270e-01  151.15  -1.782e-05
    400   3.670e-01  -8.203e-01  -4.53274e-01  141.76  -3.988e-06
    500   3.754e-01  -8.286e-01  -4.53236e-01  145.01  -7.795e-06
    600   3.850e-01  -8.383e-01  -4.53259e-01  148.73  -8.743e-06
    700   3.775e-01  -8.308e-01  -4.53237e-01  145.83  -1.073e-06
    800   4.006e-01  -8.539e-01  -4.53247e-01  154.75  -5.536e-06
    900   3.634e-01  -8.167e-01  -4.53250e-01  140.38  8.770e-06




In [16]:
%%bash
awk '/Timestep/{f=1;next} /Simulation Complete/{f=0} f' README.md > nums.txt
echo "********Thermo stats (mean/std)********"
awk '{mean += $2; sumsq+=$2*$2} END {print "KE: " mean/NR,"/",sqrt(sumsq/NR - (mean/NR)**2)}' nums.txt
awk '{mean += $3; sumsq+=$3*$3} END {print "PE: " mean/NR,"/",sqrt(sumsq/NR - (mean/NR)**2)}' nums.txt
awk '{mean += $4; sumsq+=$4*$4} END {print "TotE: " mean/NR,"/",sqrt(sumsq/NR - (mean/NR)**2)}' nums.txt
awk '{mean += $5; sumsq+=$5*$5} END {print "Temp: " mean/NR,"/",sqrt(sumsq/NR - (mean/NR)**2)}' nums.txt
awk '{mean += $6; sumsq+=$6*$6} END {print "Pres: " mean/NR,"/",sqrt(sumsq/NR - (mean/NR)**2)}' nums.txt

********Thermo stats (mean/std)********
KE: 0.386036 / 0.0171719
PE: -0.839323 / 0.017134
TotE: -0.453286 / 0.000177977
Temp: 149.123 / 6.6329
Pres: -4.58708e-06 / 9.11028e-06


- The first block of code is executed for each line in nums.txt, while the code inside the END block is executed once after the entire file has been processed.
- For each line of input, awk allows you to match a string (optional) and then perform some action on the corresponding record (i.e. line).
- awk supports a bunch of built-in functions like sqrt(), rand(), etc. 
- awk also supports a bunch of built-in variables like NR (current record number being processed) shown above. You can also pass your own variables to awk from the shell:

In [17]:
%%bash
# convert temperatures from Kelvin to Celsius
K_to_C="273.15"
awk -v temp_conv="$K_to_C" '{print "Temp in C: " $5-temp_conv}' nums.txt | head   # print first ten lines of output

Temp in C: -123.15
Temp in C: -165.37
Temp in C: -138.4
Temp in C: -122
Temp in C: -131.39
Temp in C: -128.14
Temp in C: -124.42
Temp in C: -127.32
Temp in C: -118.4
Temp in C: -132.77


In [18]:
%%bash
# simple use case for awk is to pull out fields of data you care about for plotting and so on.
awk '{print $1,$6}' nums.txt | head    # print first ten lines of output for brevity

0 -7.240e-05
100 -7.789e-06
200 -1.379e-05
300 -1.782e-05
400 -3.988e-06
500 -7.795e-06
600 -8.743e-06
700 -1.073e-06
800 -5.536e-06
900 8.770e-06


In [19]:
%%bash
# save output in a csv format
awk '{print $1","$6}' nums.txt > nums.csv   
head nums.csv  

0,-7.240e-05
100,-7.789e-06
200,-1.379e-05
300,-1.782e-05
400,-3.988e-06
500,-7.795e-06
600,-8.743e-06
700,-1.073e-06
800,-5.536e-06
900,8.770e-06


In [20]:
%%bash
# Feed awk a different field separator
awk -F"," '{print $1,$2}' nums.csv | head   # convert back to whitespace field separators

0 -7.240e-05
100 -7.789e-06
200 -1.379e-05
300 -1.782e-05
400 -3.988e-06
500 -7.795e-06
600 -8.743e-06
700 -1.073e-06
800 -5.536e-06
900 8.770e-06


In [21]:
%%bash
# note that awk print records if its entire expression evaluates to a non-zero number
awk '1' nums.txt | head

      0   3.883e-01  -8.432e-01  -4.54901e-01  150.00  -7.240e-05
    100   2.790e-01  -7.329e-01  -4.53904e-01  107.78  -7.789e-06
    200   3.488e-01  -8.022e-01  -4.53412e-01  134.75  -1.379e-05
    300   3.913e-01  -8.446e-01  -4.53270e-01  151.15  -1.782e-05
    400   3.670e-01  -8.203e-01  -4.53274e-01  141.76  -3.988e-06
    500   3.754e-01  -8.286e-01  -4.53236e-01  145.01  -7.795e-06
    600   3.850e-01  -8.383e-01  -4.53259e-01  148.73  -8.743e-06
    700   3.775e-01  -8.308e-01  -4.53237e-01  145.83  -1.073e-06
    800   4.006e-01  -8.539e-01  -4.53247e-01  154.75  -5.536e-06
    900   3.634e-01  -8.167e-01  -4.53250e-01  140.38  8.770e-06


In [22]:
%%bash
# only print even lines of output
awk 'NR % 2 == 0' nums.txt | head    # use modulo operator to compare remainder of NR/2 operation

    100   2.790e-01  -7.329e-01  -4.53904e-01  107.78  -7.789e-06
    300   3.913e-01  -8.446e-01  -4.53270e-01  151.15  -1.782e-05
    500   3.754e-01  -8.286e-01  -4.53236e-01  145.01  -7.795e-06
    700   3.775e-01  -8.308e-01  -4.53237e-01  145.83  -1.073e-06
    900   3.634e-01  -8.167e-01  -4.53250e-01  140.38  8.770e-06
   1100   3.943e-01  -8.476e-01  -4.53267e-01  152.33  -6.303e-06
   1300   3.886e-01  -8.419e-01  -4.53277e-01  150.13  -3.920e-06
   1500   3.950e-01  -8.483e-01  -4.53284e-01  152.57  -7.619e-06
   1700   3.877e-01  -8.410e-01  -4.53306e-01  149.78  -7.130e-06
   1900   4.003e-01  -8.536e-01  -4.53320e-01  154.63  -1.702e-06


In [23]:
%%bash
# only print odd lines of output
awk 'NR % 2 == 1' nums.txt | head    # use modulo operator to compare remainder of NR/2 operation

      0   3.883e-01  -8.432e-01  -4.54901e-01  150.00  -7.240e-05
    200   3.488e-01  -8.022e-01  -4.53412e-01  134.75  -1.379e-05
    400   3.670e-01  -8.203e-01  -4.53274e-01  141.76  -3.988e-06
    600   3.850e-01  -8.383e-01  -4.53259e-01  148.73  -8.743e-06
    800   4.006e-01  -8.539e-01  -4.53247e-01  154.75  -5.536e-06
   1000   3.879e-01  -8.412e-01  -4.53294e-01  149.84  -2.372e-06
   1200   3.781e-01  -8.314e-01  -4.53226e-01  146.07  -2.062e-06
   1400   3.756e-01  -8.289e-01  -4.53264e-01  145.09  4.032e-06
   1600   3.804e-01  -8.336e-01  -4.53252e-01  146.93  -4.261e-06
   1800   3.795e-01  -8.329e-01  -4.53360e-01  146.61  6.970e-06


In [24]:
%%bash
# it is often convenient to print only lines of a certain length
awk 'length($0) > 120' README.md   # only print lines greater than 120 characters

gcc -lm  obj/atoms.o obj/cl_parse.o obj/energy_force.o obj/initialization.o obj/integrator.o obj/mddriver.o obj/params.o obj/print_traj.o obj/props.o obj/timer.o -o bin/run_md
Make automates the build process, so if you only edit a single source file, only that source file will be recompiled. If you want to blow away all the object files and the program binary, you can run:
This will run the program with default parameters. To see a list of command-line options and a list of defaults, pass **--help** as a command-line argument:
The project root directory also contains a Bash script for running a number of pre-determined simulations. To run with the default parameters simply type:
Below is an example of the output from a simulation with the default parameters using GCC 4.4.7 with no optimization on a Intel Xeon E5620 @ 2.40GHz.


In [25]:
%%bash
awk 'NF > 30' README.md   # only print lines with greater than 30 fields

Make automates the build process, so if you only edit a single source file, only that source file will be recompiled. If you want to blow away all the object files and the program binary, you can run:


In [26]:
%%bash
# print first field of lines with greater than 3 fields where comma is field-separator
awk -F"," 'NF > 3{print $1}' README.md   

OpenMP
Make automates the build process


In [27]:
%%bash
# note that grep and awk are often used in sequence when awk can be used alone
grep "Simulation Complete!" README.md | awk '{print $1}'
awk '/Simulation Complete!/{print $1}' README.md

Simulation
Simulation


# Loops

- To perform Unix/Bash commands over a series of files, loops are often needed.
- Pattern matching (in particular a glob pattern using *) is limited in what it can accomplish.

In [28]:
%%bash
ls -lh

total 52K
drwxr-xr-x. 2 frenchwr accre  512 Dec 22 17:41 bin
drwxr-xr-x. 2 frenchwr accre  512 Dec 22 17:41 images
drwxr-xr-x. 2 frenchwr accre 2.0K Dec 22 17:41 inc
-rw-r--r--. 1 frenchwr accre 1.1K Dec 22 17:41 Makefile
-rw-r--r--. 1 frenchwr accre 1.6K Dec 27 15:00 nums.csv
-rw-r--r--. 1 frenchwr accre 6.5K Dec 27 15:00 nums.txt
drwxr-xr-x. 2 frenchwr accre  512 Dec 22 17:41 obj
-rw-r--r--. 1 frenchwr accre  11K Dec 22 17:41 README.md
-rw-r--r--. 1 frenchwr accre 6.5K Dec 23 07:06 sim-output.dat
-rw-r--r--. 1 frenchwr accre  240 Dec 22 17:41 simulate.sh
drwxr-xr-x. 2 frenchwr accre 2.0K Dec 22 17:41 src


In [29]:
%%bash
ls -lh sim*

-rw-r--r--. 1 frenchwr accre 6.5K Dec 23 07:06 sim-output.dat
-rw-r--r--. 1 frenchwr accre  240 Dec 22 17:41 simulate.sh


In [30]:
%%bash
cp sim* newfiles*

cp: target `newfiles*' is not a directory


- The above fails because the glob operator expands the pattern prior to passing it off to the shell to execute.
- The above is actually attempting to run:

```cp sim-output.dat simulate.sh newfiles*```

- The ```cp``` command expects its final argument to be a directory if it receives more than two arguments.

In [31]:
%%bash
for file in sim-output.dat simulate.sh
do
   cp $file newfiles-$file
done
ls -lh newfiles*
rm newfiles*    # cleanup

-rw-r--r--. 1 frenchwr accre 6.5K Dec 27 15:00 newfiles-sim-output.dat
-rw-r--r--. 1 frenchwr accre  240 Dec 27 15:00 newfiles-simulate.sh


- It's often helpful to echo out the command that will be run before actually executing the loop.

In [32]:
%%bash
for file in sim*
do
   echo "Would be running: cp $file newfiles-$file"
done

Would be running: cp sim-output.dat newfiles-sim-output.dat
Would be running: cp simulate.sh newfiles-simulate.sh


- You can also loop over output from a command, line-by-line.

In [33]:
%%bash
for file in $(ls sim*)
do
   echo "---First line of $file:---"
   head -1 $file
done

---First line of sim-output.dat:---
      0   3.883e-01  -8.432e-01  -4.54901e-01  150.00  -7.240e-05
---First line of simulate.sh:---
#!/bin/bash


In [34]:
%%bash
# let's use a for loop to get the memory info for a group of SLURM jobs
# first, let's look at the output we will be parsing
single_job=$(squeue --states=running | tail -1 | awk '{print $1}')
echo -e "Info about a single job:\n"
scontrol show job $single_job
echo -e "-------------------------------------------------------------------\n"
# in practice you might only care about your jobs, so you can run squeue -u <userid>
for jobid in $(squeue --states=running | tail -10 | awk '{print $1}')
do
   echo "Printing memory allocation for job $jobid:"
   scontrol show job $jobid | awk -F"," '/mem=/{print $2}' | awk -F"=" '{print $2}'
done

Info about a single job:

JobId=11690825 JobName=5her_aMD
   UserId=hem5(454520) GroupId=p_meiler(10023) MCS_label=N/A
   Priority=1 Nice=0 Account=csb_gpu QOS=csb_maxwell
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=4-15:34:34 TimeLimit=5-00:00:00 TimeMin=N/A
   SubmitTime=2016-12-22T23:24:57 EligibleTime=2016-12-22T23:24:57
   StartTime=2016-12-22T23:26:14 EndTime=2016-12-27T23:26:15 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=maxwell AllocNode:Sid=vmps09:23303
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=vmp1245
   BatchHost=vmp1245
   NumNodes=1 NumCPUs=2 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,mem=60G,node=1,gres/gpu=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=60G MinTmpDiskNode=0
   Features=(null) Gres=gpu:2 Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/dors/

# Finding Files and Directories

In [35]:
%%bash
find .    # print all files and directories in your current directory (recursively)

.
./sim-output.dat
./bin
./bin/README.md
./inc
./inc/energy_force.h
./inc/timer.h
./inc/cl_parse.h
./inc/initialization.h
./inc/props.h
./inc/integrator.h
./inc/params.h
./inc/atoms.h
./inc/print_traj.h
./Makefile
./nums.txt
./simulate.sh
./nums.csv
./obj
./obj/README.md
./images
./images/125atoms.png
./images/5000atoms.png
./images/README.md
./README.md
./.gitignore
./.git
./.git/description
./.git/logs
./.git/logs/refs
./.git/logs/refs/heads
./.git/logs/refs/heads/master
./.git/logs/refs/remotes
./.git/logs/refs/remotes/origin
./.git/logs/refs/remotes/origin/HEAD
./.git/logs/HEAD
./.git/hooks
./.git/hooks/pre-commit.sample
./.git/hooks/update.sample
./.git/hooks/applypatch-msg.sample
./.git/hooks/pre-push.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-rebase.sample
./.git/hooks/prepare-commit-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/refs
./.git/refs/heads
./.git/refs/heads/master
./.git/refs/tags
./.git/refs/remotes
./.git/refs/remo

In [36]:
%%bash
find . | grep inc   # preferred method shown below

./inc
./inc/energy_force.h
./inc/timer.h
./inc/cl_parse.h
./inc/initialization.h
./inc/props.h
./inc/integrator.h
./inc/params.h
./inc/atoms.h
./inc/print_traj.h


In [37]:
%%bash
find inc/   # better to write this instead of grepping as shown above

inc/
inc/energy_force.h
inc/timer.h
inc/cl_parse.h
inc/initialization.h
inc/props.h
inc/integrator.h
inc/params.h
inc/atoms.h
inc/print_traj.h


In [38]:
%%bash
find . -type d   # only returns directories

.
./bin
./inc
./obj
./images
./.git
./.git/logs
./.git/logs/refs
./.git/logs/refs/heads
./.git/logs/refs/remotes
./.git/logs/refs/remotes/origin
./.git/hooks
./.git/refs
./.git/refs/heads
./.git/refs/tags
./.git/refs/remotes
./.git/refs/remotes/origin
./.git/objects
./.git/objects/pack
./.git/objects/info
./.git/branches
./.git/info
./src


In [39]:
%%bash
find . -type f

./sim-output.dat
./bin/README.md
./inc/energy_force.h
./inc/timer.h
./inc/cl_parse.h
./inc/initialization.h
./inc/props.h
./inc/integrator.h
./inc/params.h
./inc/atoms.h
./inc/print_traj.h
./Makefile
./nums.txt
./simulate.sh
./nums.csv
./obj/README.md
./images/125atoms.png
./images/5000atoms.png
./images/README.md
./README.md
./.gitignore
./.git/description
./.git/logs/refs/heads/master
./.git/logs/refs/remotes/origin/HEAD
./.git/logs/HEAD
./.git/hooks/pre-commit.sample
./.git/hooks/update.sample
./.git/hooks/applypatch-msg.sample
./.git/hooks/pre-push.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-rebase.sample
./.git/hooks/prepare-commit-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/refs/heads/master
./.git/refs/remotes/origin/HEAD
./.git/packed-refs
./.git/objects/pack/pack-d79addadb8a306fc967a4883bedf6999cd7b9606.idx
./.git/objects/pack/pack-d79addadb8a306fc967a4883bedf6999cd7b9606.pack
./.git/index
./.git/config
./.git/HEAD
./.git/inf

In [40]:
%%bash
find . -type f -maxdepth 1   # limit search to files within current directory (i.e. no searching in subdirectories)

./sim-output.dat
./Makefile
./nums.txt
./simulate.sh
./nums.csv
./README.md
./.gitignore


In [41]:
%%bash
find . -type d -mindepth 2 -maxdepth 2   # limit search to directories at depth=2 (subdirectories of subdirectories)

./.git/logs
./.git/hooks
./.git/refs
./.git/objects
./.git/branches
./.git/info


In [42]:
%%bash
find . -name props.c

./src/props.c


In [43]:
%%bash
find . -newer ./src/props.c

.
./sim-output.dat
./nums.txt
./nums.csv
./.git
./.git/index
./src
./src/timer.c


In [44]:
%%bash
ls -l src/
find . -name '*.c' -exec chown frenchwr:accre {} +    # change ownership on all .c files
# single quote important to prevent expansion of glob before find command executes
ls -l src/

total 84
-rw-r--r--. 1 frenchwr accre 2049 Dec 22 17:41 atoms.c
-rw-r--r--. 1 frenchwr accre 3544 Dec 22 17:41 cl_parse.c
-rw-r--r--. 1 frenchwr accre 4524 Dec 22 17:41 energy_force.c
-rw-r--r--. 1 frenchwr accre 3725 Dec 22 17:41 initialization.c
-rw-r--r--. 1 frenchwr accre 3454 Dec 22 17:41 integrator.c
-rw-r--r--. 1 frenchwr accre 4625 Dec 22 17:41 mddriver.c
-rw-r--r--. 1 frenchwr accre 1857 Dec 22 17:41 params.c
-rw-r--r--. 1 frenchwr accre  785 Dec 22 17:41 print_traj.c
-rw-r--r--. 1 frenchwr accre 2606 Dec 22 17:41 props.c
-rw-r--r--. 1 frenchwr accre  951 Dec 22 17:41 README.md
-rw-r--r--. 1 frenchwr accre 2089 Dec 22 17:41 timer.c
total 84
-rw-r--r--. 1 frenchwr accre 2049 Dec 22 17:41 atoms.c
-rw-r--r--. 1 frenchwr accre 3544 Dec 22 17:41 cl_parse.c
-rw-r--r--. 1 frenchwr accre 4524 Dec 22 17:41 energy_force.c
-rw-r--r--. 1 frenchwr accre 3725 Dec 22 17:41 initialization.c
-rw-r--r--. 1 frenchwr accre 3454 Dec 22 17:41 integrator.c
-rw-r--r--. 1 frenchwr accre 4625 Dec 22 17

In [45]:
%%bash
wc -l $(find . -name '*.h')  # embedding the output from a find to pass to another command is very common/powerful

  14 ./inc/energy_force.h
  14 ./inc/timer.h
  14 ./inc/cl_parse.h
  10 ./inc/initialization.h
  12 ./inc/props.h
  11 ./inc/integrator.h
  29 ./inc/params.h
  24 ./inc/atoms.h
   8 ./inc/print_traj.h
 136 total


# Brace Expansion

In [46]:
%%bash
echo {1..20}    # iterate from 1 to 20 with stepsize of 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20


In [47]:
%%bash
echo {1,20,40,60}     # rather than providing a range you can also just hard code the values

1 20 40 60


In [48]:
%%bash
echo {1..20..2}    # use stepsize of 2

1 3 5 7 9 11 13 15 17 19


In [49]:
%%bash
echo a{11..22}b    # embed expansion between other strings

a11b a12b a13b a14b a15b a16b a17b a18b a19b a20b a21b a22b


In [50]:
%%bash
mkdir -p tmp/{src,inc,bin,lib} # especially useful if specifying long file path
ls -lh tmp
rm -r tmp  # cleanup

total 0
drwxr-xr-x. 2 frenchwr accre 512 Dec 27 15:00 bin
drwxr-xr-x. 2 frenchwr accre 512 Dec 27 15:00 inc
drwxr-xr-x. 2 frenchwr accre 512 Dec 27 15:00 lib
drwxr-xr-x. 2 frenchwr accre 512 Dec 27 15:00 src


In [51]:
%%bash
# brace expansions with numeric ranges often used with for loops
for i in {1..10}
do
   mkdir run$i
   cp simulate.sh run$i
done
rm -r run{1..10}  # cleanup

In [52]:
%%bash
echo {a..z}    # you can also provide alphabetic ranges

a b c d e f g h i j k l m n o p q r s t u v w x y z


In [53]:
%%bash
echo {A..Z}   # all caps

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


In [54]:
%%bash
echo {A..z}   # upper and lower case ranges, with a few other ascii characters between

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [  ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z


In [55]:
%%bash
echo {a..z..2}    # can also use stepsizes with alphabetic ranges

a c e g i k m o q s u w y


In [56]:
%%bash 
echo {Z..A..5}

Z U P K F A


In [57]:
%%bash
cp nums.txt{,.bak}  # equivalent to running "cp nums.txt nums.txt.bak"; especially useful with long file paths
ls nums.txt*
rm nums.txt.bak # cleanup

nums.txt
nums.txt.bak
