In [8]:
import os
os.getcwd()
#os.chdir(os.getcwd() + "/data")
os.chdir("/Users/Alexa/Desktop/swc-python/data")

-----


# 7 November 2018

- [read from / write to files](#working-with-files)
- [file manipulations](#file-manipulations): to create, list, remove files etc.
- [break and continue](#break-and-continue): to control flow in loops

## working with files

If youre using `numpy` to read data, you don't often end up using these low-level commands... but if your data isn't already nice and tabulated for `numpy`, you'll need these.

- disk file vs. file object (file handle): The handle lets you manipulate the file without necessarily opening it in memory.
- 3 modes to open a file: `r` (read), `w` (write), `a` (append)

In [7]:
fh = open("newfile", 'w') # creates file handle. This value isn't a file, it's a file *handle*.
try: # A "try" statement lets you risk errors without it necessarily quitting the program.
     # It "catches" the error and 
    fh.write("hello world\n") # problem if disk quota full, etc.
finally:
    fh.close() # need to close to clean up, even if problems earlier

We don't *like* using "try" and "finally" and "close" every darn time, so Python offers a shortcut: the `with`/`as` statement. This comes with the "try" and "close" built right in.

In [None]:
with open("newfile", 'w') as fh: # i.e. take results of the "with" part and save as "fh".
    fh.write("hello world\n")    # It's all in a `try` block too.

# fh is closed now

methods for file handles:
* `.write()`: write a string
* `.writelines()` (if it's an array)
* `read()`: read one character
* `.readline()`: read everything til the next \n. Each time you call this, it reads the *next* line.
* `.readlines()`: read everything at once

example: read fasta protein files from bds data (chapter 3)

- treat sequence names differently (lines starting with ">")
- concatenate lines that are for the same sequence
- output file with protein from all fasta files, with new format: 1 sequence = 1 line, with species name preceding the sequence itself

In [None]:
with open("tb1-protein.fasta","r") as fh:
  for line in fh:
    print("line=", line, sep="", end="")

Note that you can iterate over a file handle, like we have over lists: each element is a line of the file.

The above is equivalent to:

In [None]:
with open("tb1-protein.fasta","r") as fh:
  linelist = fh.readlines()
  for line in linelist:
    print("line=", line, sep="", end="")

with open("tb1-protein.fasta","r") as fh:
  line = fh.readline() # header line only
  print("line=", line, sep="", end="")
  dna = ""
  while line: # will be false at the end of file: ''
    line = fh.readline()
    print("line=", line, sep="", end="")
    dna += line.strip() # Remember, .strip() pulls off whitespace at either end of a string.

print("dna=", dna, sep="", end="")

Let's take the working code we've figured out above, and put it in a function for later use:

In [None]:
def reformat_onefile(fin, fout):
  """assumes fin not open, fout already open for writing."""
  with open(fin,"r") as fh:
    for line in fh:
      line = line.strip()
      if not line:
        continue # skip the rest if empty line
      if line.startswith(">"): # header line
        fout.write(line)
        fout.write("\n") # after header
      else:              # dna sequence line
        fout.write(line)
  fout.write("\n") # after end of full sequence

import sys
reformat_onefile("tb1-protein.fasta", sys.stdout) # check function

import glob
filenames = glob.glob("*-protein.fasta")
with open("all1linesequences.fasta", "w") as outfile:
  for fname in filenames:
    print("next: will reformat",fname)
    reformat_onefile(fname, outfile)

note: `sys.stdout` is a file handle open for writing :)

## file manipulations

- in module `os`: `listdir`, `mkdir`, `makedirs`, `rename`, `remove`, `rmdir`,
  `chdir`, `path.exists`, `path.isdir`, `path.isfile`
- in module `shutil`: `copy`, `copytree`, `rmtree`

In [None]:
import os
os.listdir()
os.remove(".DS_Store")
os.mkdir("try1")
os.rmdir("try1")
os.makedirs("try/data/dna")
os.listdir("try")
os.chdir("try")
os.path.isdir("data/dna")
os.path.realpath("data/dna") # absolute path
os.path.isfile("data/dna/gene1.fa")
shutil.copy("../lizard/cten_16s.fasta?sequence=1", "data/dna/cten_16s.fa")
shutil.copy("../lizard/cten_16s.fasta?sequence=1", "data/dna")
os.system("touch readme.md")

## break and continue

extremely useful!
`break` to break out of a loop:

In [5]:
i=0
while True:
  i += 1
  print("code for i =",i,"here")
  if i >= 4:
    break
i # 4

code for i = 1 here
code for i = 2 here
code for i = 3 here
code for i = 4 here


4

`continue` to *directly* continue to the next iteration of the loop,
*bypassing* all remaining code for the current iteration:

In [6]:
for i in range(0,10000):
  if i==3 or i >= 5:
    continue
  print("code here not bypassed, i =", i)
i # 9999

code here not bypassed, i = 0
code here not bypassed, i = 1
code here not bypassed, i = 2
code here not bypassed, i = 4


9999

also: `pass` to do nothing, useful for new not-ready code: a function
must have at least 1 line.