# Lab 6 : Working on the command line  

## Learning Objectives


* Getting Started Using UNIX Commands
* Writing a Python Program and running it on the command line
* Python modules for manipulating files and directories

## 6.1 Getting Started Using UNIX Commands in the Terminal

The "Terminal" or "Shell" provides the gateway to running UNIX commands. 

On **RStudio Cloud** the Terminal is openned from the Jupyter Project Home Page. Just select New > Terminal. A terminal will appear with 

In [None]:
/cloud/project$

On the **Unity cluster**, an option to open a Terminal will appear when all windows are closed or when you go to File > New Launcher and a new tab will appear. You will need to change into your working directory by typing the command for changing directories (cd)

In [None]:
jlb_umass_edu@node8:~$ 

On **Unity** You will need to change into your working directory by typing the command for changing directories (cd) then typing home and your directory name (mine is jlb_umass_edu). If you don't know yours type cd home and then ls to list the directories.  Then type cd "your directory name"

In [None]:
cd home/jlb_umass_edu

We can make a new directory using the 'mkdir' command

In [None]:
mkdir Lab6_files

Now type 'ls' to list the directories and files 

In [None]:
ls

Let's move into the Lab6_files directory using 'cd' change directories

In [None]:
cd Lab6_files

To see the full directory path 

In [None]:
pwd

You can move up one directory by

In [None]:
cd ..

Make a file using your text editor called readme.txt and put in any text. Type ls to see the file

In [None]:
ls

You can use the 'cat' or 'more' command to see the contents of the file. The 'more' command will display one page at a time.

In [None]:
cat readme.txt

To make a copy of the file use 'cp'

In [None]:
cp readme.txt readme_first.txt
ls
more readme_first.txt

To move files to a new directory use 'mv'

In [None]:
mv readme.txt Lab6_files
cd Lab6_files # Now move to the directory
ls # list the contents

to delete (remove) a file use 'rm'

In [None]:
rm readme.txt
ls 

The list of all unix programs on RStudio Cloud (this is just a small subset) and Unity can be found by typing 

In [None]:
ls /bin

As bioinformaticists we deal with large files, which are usually archived and/or compressed to save disk space and lower transfer times. 'Tar' and 'Gzip' are two of the most common utilities for archiving and compressing files. Tar is used for archiving and Gzip is used for compression, however the two are most often used in conjunction. Last week I decompressed the Genbank files before putting them in Moodle and Github. You can do this yourself on the command line.  Go to  https://www.ncbi.nlm.nih.gov/assembly/GCF_000888735.1 Click on Download assembly then download the the Genome fasta (.fna) file to your computer. The file should be 
genome_assemblies_genome_fasta.tar


In [None]:
tar -xvf genome_assemblies_genome_fasta.tar
ls

This will create a directory ncbi-genomes-2021-10-13 with the file GCF_000888735.1_ViralProj60053_genomic.fna.gz

**There are many nice on line tutorials available for Unix. Check out <a href="http://www.linuxcommand.org/lc3_learning_the_shell.php" target="_blank">Learning the shell</a> and <a href="https://swcarpentry.github.io/shell-novice/" target="_blank">Software Carpentry "The Unix Shell"</a>.**

In [None]:
cd ncbi-genomes-2021-10-13
gzip -d GCF_000888735.1_ViralProj60053_genomic.fna.gz

This will create the decompressed file GCF_000888735.1_ViralProj60053_genomic.fna that you can load into your programs.

## 6. 2 Writing a Python Program and running it on the command line in the Terminal

Open your text editor. Save the file as "hello.py" in the directory you created above the Lab6_files. Type in your text editor 

In [None]:
print ('Hello World!  I am writing computer programs.')

Then go to the "Terminal". Check to make sure you are in the same directory (Lab6_files where "hello.pl" was saved by typing 

In [None]:
ls

"hello.py" should be listed. Now to run the program type

In [None]:
python3 hello.py

The %system command lets us execute Unix shell commands in the jupyter notebook. Many basic unix commands can be run as is in a Jupyter notebook. However, so require a %system or %sx command to run. This is true of running python in the notebook. 

In [None]:
%system python3 hello.py

## 6.3 The Python os module

In Session 1 we learned to manipulate files and directories on your native operating system using terminal and Unix commands (Ubuntu and OS X) or the Command Prompt and MS-DOS (Windows). The functions that the Python OS module provides allows you to interface with the underlying operating system that Python is running on – be that Windows, Mac or Linux. 
The Python OS module also provides a range of useful methods to manipulate files and directories. To use this module you need to import it first and then call any related functions.

In [None]:
#!/usr/bin/env python

# Example 6.1
# This is a not meant to be run as a program.  
# Either use the ipython interpreter or select and run commands in Spyder as in Session 9

import os

# To the path of your current working directory
os.getcwd()
print(os.getcwd())
#  Windows or Anaconda command prompt = cd
#  Apple OSX or Linux = pwd

# Create a directory "test"
os.mkdir("test")
#  Windows or Anaconda command prompt = mkdir
#  Apple OSX or Linux = mkdir

# Changing into that directory
os.chdir("test")
#  Windows or Anaconda command prompt = chdir
#  Apple OSX or Linux = cd

print(os.getcwd())

# create file
outfile = open("example.txt", "w")
outfile.write('example text in my example file')
outfile.close()

# get a list of directory contents (files and directories)
print(os.listdir('.'))
#  Windows or Anaconda command prompt = dir
#  Apple OSX or Linux = ls

# get a list of directory contents (files and directories)
os.remove('example.txt')
#  Windows or Anaconda command prompt = del
#  Apple OSX or Linux = rm

# To move up one directory
os.chdir("..")
#  Windows or Anaconda command prompt = chdir ..
#  Apple OSX or Linux = cd ..

# Delete/Remove "test" directory. Note the directory must be empty
os.rmdir("test")
#  Windows or Anaconda command prompt = del 
#  Apple OSX or Linux = rm
# To remove a directory and all of its contents use shutil.rmtree() - remember to import shutil

print(os.getcwd())

os.system("mkdir TEST")

I try to keep this class operating system INDEPENDENT, but if you want to directly interact with the terminal in OSX or Linux or the command prompt in Windows use the os.system() command.  For example os.system("makedir TEST") to make a new directory

As we have seen a path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash ("/") in Unix or OS X and the backslash character ("\") in Windows.  

In [None]:
#!/usr/bin/env python

# Example 6.2
#
# A program for making example directories and files
# The output of this program will be a set of directories and files

# Usage: python make_example_directories.py


import os

print(os.getcwd())

# make the main directory
os.mkdir("main_directory")

# move into the main directory
os.chdir("main_directory")

# make the sub directories
os.mkdir("sub_directory1")
os.mkdir("sub_directory2")
os.mkdir("sub_directory3")

# get a list of the subdirectories
list_sub_dir = os.listdir('.')

# make a set of files with different extensions in each sub directory
for sub_dir in list_sub_dir :
    os.chdir(sub_dir)
    outfilename1 = sub_dir + ".file1.txt"
    outfile1 = open(outfilename1, 'w')
    outfile1.write ('text from %s\n' % (outfilename1))
    outfilename2 = sub_dir + ".file2.faa"
    outfile2 = open(outfilename2, 'w')
    outfile2.write ('protein from %s\n' % (outfilename2))
    outfilename3 = sub_dir + ".file3.gbk"
    outfile3 = open(outfilename3, 'w')
    outfile3.write ('Genbank Record from %s\n' % (outfilename3))
    print(os.listdir())
    os.chdir("..")

# close the files
outfile1.close()
outfile2.close()
outfile3.close()

# print path of working directory and sub directories
print(os.getcwd())
print(os.listdir())

# At the moment main_directory is the current working directory
# Move back to your working directory
# If you are funning this in Juptyer notebooks and do not go back to your original working directory
# Then main_directory would be your starting working directory.

os.chdir("..")
print(os.getcwd())


## 6.4 The Python shutil module

The shutil module offers operations for working files and collections of files. In particular, functions are provided which support file copying and removal. For operations on individual files. It overlaps in function with some of the os module, but I use it for moving and copying files.


In [None]:
#!/usr/bin/env python

# Example 6.3
#
# A program for moving files

# Usage: python move_file.py

import os
import shutil

outfile = open("example.txt", "w")
outfile.write('example text in my example file')
outfile.close()

os.mkdir('test')

shutil.move('example.txt','test/example.txt' )
# remember the slashes go in the other direction on Windows

## 6.5 The Python glob module

Another useful tool in getting files in a directory is glob.  The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. No tilde expansion is done, but \*, ?, and character ranges expressed with [ ] will be correctly matched.

In [None]:
#!/usr/bin/env python

# Example 6.4
# glob

# Usage: python myglob.py

import glob

txt_files = glob.glob('*.txt')

print (txt_files)

These results are the ones in my working directory. Note the output of the above program will depend on what .txt files you have in your working directory. 

Below is a program that will make a set of directories and files for testing your knowledge of the above commands and your code in the examples and exercises

In [None]:
#!/usr/bin/env python

# Example 6.5

# This example traverses the directory structure getting the contents of all GenBank record files (.gbk)
# This program assumes the directory structure from make_example_directories.py  (Example 6.2)
# It should be run from the same directory that the main directory is in

# Usage: python traverse_directories.py


import os
import glob

outfile1 = open('all_gbk_files.txt', 'w')

# move into the main directory
os.chdir("main_directory")

# get a list of the subdirectories
list_sub_dir = os.listdir('.')

# go into each sub directory and get the contexts of the gbk files
for sub_dir in list_sub_dir :
    os.chdir(sub_dir)
    gbk_files = glob.glob('*.gbk')
    for file in gbk_files :
        ind_gbk = open(file, 'r')
        file_contents = ind_gbk.read()
        outfile1.write(file_contents)
        # Also you can print to screen
        print(file_contents)
    os.chdir("..")

# Move back to your working directory
os.chdir("..")
# close the files
outfile1.close()



## Exercises

1. In the "Getting Started Using UNIX Commands". Run the above commands yourself in RStudio Cloud or on Unity. You do not need to show or turn in any of these commands or the results.

2.  All of the above commands in Getting Started Using UNIX Commands" can be run in a Jupyter notebook. Technically you need to put %system or %sx before the commands, but most run fine without these magic commands. Capture the above commands in your Jupyter notebook.  They will not all run as expected.

3. Write a simple Python program and run it on the command line.  You do not need to show or turn in this result.

4. Now run the program in Jupyter using %system python3.

3. Copy and paste Lab5 Ex5 into a text editor and save it as a .py file. Make sure the .faa file is in your directory. Then run your python program on the terminal and in Jupyter notebook.

4. Run the commands in Example 6.1 in your RStudio Cloud or Unity directory in your Jupyter notebook

5. Run the program in Example 6.2. Inspect(show) the results using 'ls' and 'cd' to move in and out of the directories.

6. Change the Example 6.5 to get the contents of the .faa files made when running Example 6.2