ISRC Python Workshop: Introduction to Unix Bash commands

___Introduction to Unix Bash commands___

While I use Jupyter notebooks for illustration purposes, it is more common to directly use [terminal](https://en.wikipedia.org/wiki/Terminal_(macOS)). You can find it in `Others -> Terminal` or by spotlight search.

If you are also interested in using Bash in notebook, please checkout [takluyver/bash_kernel](https://github.com/takluyver/bash_kernel)

<hr>

@author: Zhiya Zuo

@email: zhiya-zuo@uiowa.edu

<hr>

#### Introduction

Bash commands are no different from many other languages such as Java or Python. We can what we code. For example, we can print out the current working directory.

In [3]:
pwd

/Users/zhiyzuo/OneDrive - University of Iowa/ISRC Python Workshop


We can also print out what are in the current working directories.

In [4]:
ls

[31m0-Installation-Environment-Setup.ipynb[39;49m[0m
[31m1-Variables-Data_Structures-Control_Logic.ipynb[39;49m[0m
[31m2-Functions-External_Libraries-File_IO.ipynb[39;49m[0m
[31m3-Network-Analysis-with-NetworkX.ipynb[39;49m[0m
[31m4-Visualization-with-Matplotlib.ipynb[39;49m[0m
5-Getting-Data-Using-APIs.ipynb
[31mBash-Tutorial.ipynb[39;49m[0m
another_tmp_pd.csv
[34marchived-files[39;49m[0m
[34mdata[39;49m[0m
renamed_tmp1.csv
[34msample-data[39;49m[0m
tmp_pd.csv
[31mtwitter_keys.csv[39;49m[0m
weather_keys.csv


Just as graphical user interfaces (GUIs), we can speak "bash language" to interact with our computers. In fact, they are more powerful. For example, you cannot use the [HPC](https://hpc.uiowa.edu/) systems until you know something about shell programming. Note that Bash is only one of the shell programs but probably the most popular one.

---

#### Working directories

Let's get started by the concept of ___working directory___. As its name suggests, working directory is where you work in, or just which folder/directory you are at right now. As the previous example shows, there is a ___program___ called `pwd` that can help us do such thing.

In [5]:
pwd

/Users/zhiyzuo/OneDrive - University of Iowa/ISRC Python Workshop


And, as before, we can list what we have in our current working directory by `ls`

In [6]:
ls

[31m0-Installation-Environment-Setup.ipynb[39;49m[0m
[31m1-Variables-Data_Structures-Control_Logic.ipynb[39;49m[0m
[31m2-Functions-External_Libraries-File_IO.ipynb[39;49m[0m
[31m3-Network-Analysis-with-NetworkX.ipynb[39;49m[0m
[31m4-Visualization-with-Matplotlib.ipynb[39;49m[0m
5-Getting-Data-Using-APIs.ipynb
[31mBash-Tutorial.ipynb[39;49m[0m
another_tmp_pd.csv
[34marchived-files[39;49m[0m
[34mdata[39;49m[0m
renamed_tmp1.csv
[34msample-data[39;49m[0m
tmp_pd.csv
[31mtwitter_keys.csv[39;49m[0m
weather_keys.csv


What if I do not want to stay here? Suppose I want to go to ___sample-data___ folder, I can ___change directory___ by `cd`

In [7]:
cd sample-data

Now we can verify that we indeed changed our directory by `pwd` and `ls`

In [8]:
pwd

/Users/zhiyzuo/OneDrive - University of Iowa/ISRC Python Workshop/sample-data


In [9]:
ls

[31mkarate.gml[39;49m[0m        sample_tweets.csv [34mterrorists[39;49m[0m


However, there's no need to `cd` every time, if we just want to check or read some files in places outside our current working directory. We can use either the ___absolute___ or ___relative path___. What we get from `pwd` is an absolute path that shows the full path. Let's try an example with this. Let's say we are going to print the contents in a CSV file called ___tmp1.csv___, which is a level up compared to our current path.

In [11]:
cat /Users/zhiyzuo/OneDrive\ -\ University\ of\ Iowa/ISRC\ Python\ Workshop/tmp1.csv

0
1
2
3
4
5
6
7
8
9


Note that we use escaping characters for each space for our path (although this is actually NOT a good habit)

Instead of typing everything, we can use ___relative path___. The name is pretty self-explanatory: we can refer to a place, relative to our current working directory. Two important notations here:
- `.` (a dot) means current directory
- `..` (two dots without spaces) means upper level directory.

For example, we can use paths after `ls` command to print files in that given path.

In [12]:
ls .

[31mkarate.gml[39;49m[0m        sample_tweets.csv [34mterrorists[39;49m[0m


In [13]:
ls ..

[31m0-Installation-Environment-Setup.ipynb[39;49m[0m
[31m1-Variables-Data_Structures-Control_Logic.ipynb[39;49m[0m
[31m2-Functions-External_Libraries-File_IO.ipynb[39;49m[0m
[31m3-Network-Analysis-with-NetworkX.ipynb[39;49m[0m
[31m4-Visualization-with-Matplotlib.ipynb[39;49m[0m
5-Getting-Data-Using-APIs.ipynb
[31mBash-Tutorial.ipynb[39;49m[0m
another_tmp_pd.csv
[34marchived-files[39;49m[0m
[34mdata[39;49m[0m
[34msample-data[39;49m[0m
tmp1.csv
tmp_pd.csv
[31mtwitter_keys.csv[39;49m[0m
weather_keys.csv


Therefore, by `..` (two dots ), we can go back one level without switching working directory:

In [14]:
cat ../tmp1.csv

0
1
2
3
4
5
6
7
8
9


Finally, it is noteworthy that `~` (tilde) means ___home directory___ in unix systems.

In [15]:
ls ~

[34mApplications[39;49m[0m                  alldump
[34mDesktop[39;49m[0m                       [34mecho[39;49m[0m
[34mDocuments[39;49m[0m                     mycert.pem
[34mDownloads[39;49m[0m                     mykey.key
[34mDropbox (Personal)[39;49m[0m            [34mnltk_data[39;49m[0m
[34mDropbox (Zhiya-UIowa)[39;49m[0m         pg_upgrade_internal.log
[34mGoogle Drive[39;49m[0m                  pg_upgrade_server.log
[34mLibrary[39;49m[0m                       pg_upgrade_utility.log
[34mMovies[39;49m[0m                        pub.bib
[34mMusic[39;49m[0m                         [34mscikit_learn_data[39;49m[0m
[34mOneDrive - University of Iowa[39;49m[0m [34mseaborn-data[39;49m[0m
[34mPictures[39;49m[0m                      [34mycm_build[39;49m[0m
[34mPublic[39;49m[0m


---

#### Options/Input arguments 

Shell commands can take input arguments or options. A convention is to use `-` (dash) to specify arguments. For example, we can ask `ls` to show detailed information of each file/folder:

In [16]:
ls -l

total 24
-rwxr-xr-x  1 zhiyzuo  staff  5077 Jan 30 13:57 [31mkarate.gml[39;49m[0m
-rw-r--r--  1 zhiyzuo  staff  1950 Feb  7 14:57 sample_tweets.csv
drwxr-xr-x@ 7 zhiyzuo  staff   224 Jan 30 13:57 [34mterrorists[39;49m[0m


We can aggregate different options by directly appending options one after another. The following example shows how to show size in human readable formats (`-h` option) along with a detailed view (`-l`)

In [17]:
ls -lh

total 24
-rwxr-xr-x  1 zhiyzuo  staff   5.0K Jan 30 13:57 [31mkarate.gml[39;49m[0m
-rw-r--r--  1 zhiyzuo  staff   1.9K Feb  7 14:57 sample_tweets.csv
drwxr-xr-x@ 7 zhiyzuo  staff   224B Jan 30 13:57 [34mterrorists[39;49m[0m


Sometimes commands take in arguments for various purposes. Again, using `ls` as example, it can take ___path___ as an argument. Without the path, it will by default show the current listings, as shown above. Given a path, it will list items in that path:

In [18]:
ls ../

[31m0-Installation-Environment-Setup.ipynb[39;49m[0m
[31m1-Variables-Data_Structures-Control_Logic.ipynb[39;49m[0m
[31m2-Functions-External_Libraries-File_IO.ipynb[39;49m[0m
[31m3-Network-Analysis-with-NetworkX.ipynb[39;49m[0m
[31m4-Visualization-with-Matplotlib.ipynb[39;49m[0m
5-Getting-Data-Using-APIs.ipynb
[31mBash-Tutorial.ipynb[39;49m[0m
another_tmp_pd.csv
[34marchived-files[39;49m[0m
[34mdata[39;49m[0m
[34msample-data[39;49m[0m
tmp1.csv
tmp_pd.csv
[31mtwitter_keys.csv[39;49m[0m
weather_keys.csv


In [19]:
ls ../archived-files

[31m1_basics.ipynb[39;49m[0m         [31m3_topic_modeling.ipynb[39;49m[0m
[31m2_web_scraping.ipynb[39;49m[0m   [34mnotebooks_printouts[39;49m[0m


Note that all these options can hardly be memorized. Often we will refer to the manual (or documentation). To do this, we can use `man command_name`. For example:

In [22]:
man ls | head -20


LS(1)                     BSD General Commands Manual                    LS(1)

NAME
     ls -- list directory contents

SYNOPSIS
     ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]

DESCRIPTION
     For each operand that names a file of a type other than directory, ls
     displays its name as well as any requested, associated information.  For
     each operand that names a file of type directory, ls displays the names
     of files contained within that directory, as well as any requested, asso-
     ciated information.

     If no operands are given, the contents of the current directory are dis-
     played.  If more than one operand is given, non-directory operands are
     displayed first; directory and non-directory operands are sorted sepa-
     rately and in lexicographical order.


Here, `man` is a ___command___ that takes one input argument (which should be a Bash command) and outputs the corresponding manual. Therefore, we can definitely pull up the manual for `man` 🤓

In [23]:
man man | head -20

man(1)                                                                  man(1)



NAME
       man - format and display the on-line manual pages

SYNOPSIS
       man  [-acdfFhkKtwW]  [--path]  [-m system] [-p string] [-C config_file]
       [-M pathlist] [-P pager] [-B browser] [-H htmlpager] [-S  section_list]
       [section] name ...


DESCRIPTION
       man formats and displays the on-line manual pages.  If you specify sec-
       tion, man only looks in that section of the manual.  name  is  normally
       the  name of the manual page, which is typically the name of a command,
       function, or file.  However, if name contains  a  slash  (/)  then  man
       interprets  it  as a file specification, so that you can do man ./foo.5
       or even man /cd/foo/bar.1.gz.


Note that I use `| head -20` to limit the number of output to 20 lines/rows. `|` is ___pipe character___ and `head` is a command to show the ___head___ of some output, where `- 20` limit to the first 20 lines/rows. Detailed coverage is beyond the scope of this workshop though.

---

#### Some practical commands

Given that we now understand some basics of Bash, it is a good time to know more commonly used commands. Before we do anything, I will swich my working directory back one level.

In [24]:
cd ..

In [25]:
pwd

/Users/zhiyzuo/OneDrive - University of Iowa/ISRC Python Workshop


##### Move `mv`

Move command `mv` is an interesting one. You can use it to do two things
1. Move files/folders
2. Change file/folder names. Essentiall, `mv` rename an item by "moving it to another item"

Let's try move the file ___tmp1.csv___ to one level up and move it back. Note that for `mv`, we need two arguments: 
1. what to be moved?
2. where to?

In [26]:
mv tmp1.csv ../

Check if ___tmp1.csv___ is indeed in ___../___

In [27]:
ls ../

[34m2017summer[39;49m[0m                   [34mRanking-Hiring[39;49m[0m
[34m2018JCDL-poster[39;49m[0m              [34mSupply-Chain-ABM[39;49m[0m
[34m2018spring[39;49m[0m                   [34mTime-Series-Precition[39;49m[0m
[34mCollaboration-in-Multi[39;49m[0m       [34mTopics-over-Time-Replication[39;49m[0m
[34mDATA-ARCHIVES[39;49m[0m                [34mWeibo[39;49m[0m
[34mData[39;49m[0m                         [34mdmig[39;49m[0m
[34mISRC Python Workshop[39;49m[0m         [34miSchool[39;49m[0m
Icon?                        [34mjava-tm[39;49m[0m
[34mIntern-App-2018Summer[39;49m[0m        [34mjava-topic-model[39;49m[0m
[34mJTM[39;49m[0m                          [34mpaper-review-invitation[39;49m[0m
[34mLearn-Notebooks[39;49m[0m              [34mread-java-tm[39;49m[0m
[34mNRC[39;49m[0m                          tmp1.csv
[34mPolicy-School[39;49m[0m


Move it back. Recall that `.` (dot) means current working directory

In [28]:
mv ../tmp1.csv .

In [29]:
ls

[31m0-Installation-Environment-Setup.ipynb[39;49m[0m
[31m1-Variables-Data_Structures-Control_Logic.ipynb[39;49m[0m
[31m2-Functions-External_Libraries-File_IO.ipynb[39;49m[0m
[31m3-Network-Analysis-with-NetworkX.ipynb[39;49m[0m
[31m4-Visualization-with-Matplotlib.ipynb[39;49m[0m
5-Getting-Data-Using-APIs.ipynb
[31mBash-Tutorial.ipynb[39;49m[0m
another_tmp_pd.csv
[34marchived-files[39;49m[0m
[34mdata[39;49m[0m
[34msample-data[39;49m[0m
tmp1.csv
tmp_pd.csv
[31mtwitter_keys.csv[39;49m[0m
weather_keys.csv


We can rename it by doing

In [30]:
mv tmp1.csv renamed_tmp1.csv

In [31]:
ls

[31m0-Installation-Environment-Setup.ipynb[39;49m[0m
[31m1-Variables-Data_Structures-Control_Logic.ipynb[39;49m[0m
[31m2-Functions-External_Libraries-File_IO.ipynb[39;49m[0m
[31m3-Network-Analysis-with-NetworkX.ipynb[39;49m[0m
[31m4-Visualization-with-Matplotlib.ipynb[39;49m[0m
5-Getting-Data-Using-APIs.ipynb
[31mBash-Tutorial.ipynb[39;49m[0m
another_tmp_pd.csv
[34marchived-files[39;49m[0m
[34mdata[39;49m[0m
renamed_tmp1.csv
[34msample-data[39;49m[0m
tmp_pd.csv
[31mtwitter_keys.csv[39;49m[0m
weather_keys.csv


And we can see there's a ___renamed_tmp1.csv___ but not ___tmp1.csv___ now.

##### Copy `cp`

Copy command `cp` is very similar to `mv`, except that it is not moving but copying a specific file/folder. For example, we can duplicate ___tmp_pd.csv___ by doing:

In [32]:
cp tmp_pd.csv another_tmp_pd.csv

We can verify by printing both files

In [33]:
cat tmp_pd.csv

0,1,2
1.0,2.0,3.0
4.0,5.0,6.0


In [34]:
cat another_tmp_pd.csv

0,1,2
1.0,2.0,3.0
4.0,5.0,6.0


##### Reading files

There are many commands for doing this. We just used `cat`. `cat` will directly print everything into the terminal/standard output. It can also be used for concatenating files.

In [35]:
cat renamed_tmp1.csv

0
1
2
3
4
5
6
7
8
9


When we are reading large files, we probably don't want this. We can use `less`. Let's try it in terminal because `less` will not work propoerly in Jupyter notebook.

---

#### Conclusions

Due to time contraints, we can only cover these simple examples. There are really a lot more to read: `grep`, `sed`, `ssh`, `scp`, `ps`, `rm`, etc... Bash command is really powerful and used extensitvely for various purposes. Below I list two tutorials for Bash that I find really helpful.

Further readings:

- http://www.bash.academy/
- https://ryanstutorials.net/bash-scripting-tutorial/