![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg)

<center>
<h1><font size="+3">GSFC Python Bootcamp</font></h1>
</center>

---
<center>
<H1 style="color:red">
File Input and Output (IO)
</H1>
</center>


In [None]:
from __future__ import print_function

In [None]:
# Get files needed for this presentation

import urllib.request
url = "https://raw.githubusercontent.com/astg606/py_materials/master/input_output/"
list_files = ["fileFormats.jpg", "demo.txt", "cat.jpg", "grades.csv"]
for file in list_files:
    urllib.request.urlretrieve(url+file, file) 

## <font color='red'>Types of Files</font>

In [None]:
from IPython.display import Image
slide = Image(filename = 'fileFormats.jpg')
slide

## <font color="red"> Text Files</font>

A **text file** has no specific encoding and can be opened by a standard text editor without any special handling. Every text file must adhere to a set of rules:

* Text files have to be readable as is.
* Data in a text file is organized by lines. 
* Text files all have an unseen character at the end of each line which lets the text editor know that there should be a new line. When interacting with these files, you can take advantage of that character. In Python, it is denoted by the “\n”.

## <font color="red">Reading Text Files</font>

* Before you can read (or write) a file, you have to open it using Python's built-in `open()` function. 
* The `open()` function creates a file object, which would be utilized to call other support methods associated with it.

In [None]:
help(open)

**The Basic Syntax**

```python
file object = open(file_name [, access_mode][, buffering])
```

* `file_name` − The file_name argument is a string value that contains the name of the file that you want to access.
* `access_mode` − The access_mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. This is optional parameter and the default file access mode is read (r).
* `buffering` − If the buffering value is set to 0, no buffering takes place. If the buffering value is 1, line buffering is performed while accessing a file. If you specify the buffering value as an integer greater than 1, then buffering action is performed with the indicated buffer size. If negative, the buffer size is the system default (default behavior).


**Summary of  `open()` file access modes**


| Mode | Description |
| --- | --- |
| r | Opens a file for reading only. Default mode. | 
| rb | Opens a file for reading only in binary format. | 
| r+ | Opens a file for both reading and writing. |
| rb+ | Opens a file for both reading and writing in binary format. |
| w | Opens a file for writing only. Overwrites file if it exists. Creates a new file if it does not exist. | 
| wb | Opens a file for writing only in binary format. |
| w+ | Opens a file for both writing and reading. |
| wb+ | Opens a file for both writing and reading in binary format. |
| a | Opens a file for appending. The file pointer is at the end of the file if it exists. |
| ab | Opens a file for appending in binary format. | 
| a+ | Opens a file for both appending and reading. | 
| ab+| Opens a file for both appending and reading in binary format. |

In [None]:
file_object = open('demo.txt', 'r') # 'r' is default

A file_object can be treated as a sequence of strings

#### What type of object is file_object?

In [None]:
print("file_object is of type: ", type(file_object))

In [None]:
# file_object.<TAB>
dir(file_object) # attributes and methods of file objects

In [None]:
# Examples
print(file_object.name)
print(file_object.mode)
print(file_object.closed)

### Print all the lines and count the number of lines

In [None]:
my_file = open ('demo.txt','r')
count = 0
for line in my_file: # treating my_file as a sequence of strings
    count = count + 1
    print(line)
print("My file has "+str(count)+" lines.")    
my_file.close()

#### Note: Each line includes a non-printing character called the newline character "\n"

### Exercise 
Read the text file demo.txt and count the number of lines excluding empty lines.

### Reading the entire file at once

In [None]:
my_file = open ('demo.txt','r')

# read() reads the _entire_ file, returns a string object
data = my_file.read()           
print("Contents of file are of type: ", type(data))

# close file handle
my_file.close()

# Now data is in memory
heading="Contents of file"
print("\n"+heading+"\n"+"-"*len(heading))
print(data)

### Read file chunks

In [None]:
my_file = open('demo.txt', 'r')
data = "Dummy string"         
while data:
   data = my_file.read(64)   # read in 64-byte chunk sizes
   print(data)
my_file.close()

### Read one line at a time

In [None]:
my_file = open ('demo.txt')
data = my_file.readline()
print (data)
my_file.close()

### Read all the lines in the text file

In [None]:
my_file = open ('demo.txt')
data = my_file.readlines()                        
print(data)                # Note data is a list
my_file.close()

### Exercise 
Read the text file demo.txt and find all instances of the word "Luke"

### Automatically closing files

In [None]:
# to open a file, process its contents, and make sure to close it, you can simply do:

with open ('demo.txt', 'r') as f:
    data = f.read() 
    print('--> Is file closed? ', f.closed)
    # file will be closed after exiting this block of code
    
print('<-- Is file closed? ', f.closed)
print(f.mode)

## <font color="red">Writing Text Files</font>

* The `write()` method writes any string to an open file.
* The `write()` method does not add a newline character (`'\n'`) to the end of the string. 

In [None]:
# Example:
with open('elements.txt', 'w') as f: # 'w' creates a new file
    f.write('Noble gases:')              
    f.writelines(['He', 'Ne', 'Ar'])  # writelines writes each element on its own

In [None]:
try:
    !cat elements.txt
except:
    with open('elements.txt', 'r') as f:
        contents = f.read()
        print(contents)

Note: python will not write '\n' for you

#### 'a+' vs 'r+'

In [None]:
with open('elements.txt', 'a') as f:  # 'a' is 'append' mode, no reading
    contents = f.read()

In [None]:
with open('elements.txt', 'a+') as f:  # 'a++' appending and reading
    contents = f.read()               
    print (f.tell())                
    f.write('Kr\n')                    

f.tell() tells me that file pointer is at EOF and I appended 'Kr\n' at that position

In [None]:
try:
    !cat elements.txt
except:
    with open('elements.txt', 'r') as f:
        contents = f.read()
        print(contents)

In [None]:
with open('elements.txt', 'r+') as f: # 'r+' reading and writing
    print (f.tell())                  # file pointer is at 'beginning of file'
    f.write('Halogens:\n')          
    f.writelines(['F\n', 'Cl\n'])

f.tell() tells me that file pointer is at BOF and I wrote text starting at that position

In [None]:
try:
    !cat elements.txt
except:
    with open('elements.txt', 'r') as f:
        contents = f.read()
        print(contents)

### Quiz
Write a program that reads file 'demo.txt' and writes out a new file with the lines in reversed order (i.e. the first line in the old file becomes the last one in the new file.)

## Summary of basic file IO functions and methods

<table style="width:100%">
  <tr>
    <th>Methods and functions</th>
    <th>Description</th> 
  </tr>
  <tr>
    <td>open()</td>
    <td>Returns a file object and is most commonly used with two arguments: open(filename, mode)</td> 
  </tr>
  <tr>
    <td>file.close()</td>
    <td>Close the file.</td> 
  </tr>
  <tr>
    <td>file.read([size])</td>
    <td>Read the entire file. If size is specified then read at most size bytes.</td> 
  </tr>
  <tr>
    <td>file.readline([size])</td>
    <td>Read one line from the file. If size is specified then read at most size bytes.</td> 
  </tr>
  <tr>
    <td>file.readlines([size])</td>
    <td>Read all the lines from the file. If size is specified then read at most size bytes.</td> 
  </tr>
  <tr>
    <td>file.tell()</td>
    <td>Returns file object's current position in the file.</td> 
  </tr>
  <tr>
    <td>file.seek(int)</td>
    <td>Changes the file object's current position to the given int.</td> 
  <tr>
    <td>file.write(string)</td>
    <td>Writes the contents of string to the file.</td> 
  </tr>
</table>

### Handling delimited files

In [None]:
try:
    !cat grades.csv
except:
    with open('grades.csv', 'r') as f:
        contents = f.read()
        print(contents)

One could use csv package: import csv

In [None]:
with open('grades.csv', 'r') as f:
    for line in f:
        print(line.strip().split(','))

Each row of the input data is parsed and converted to a list of strings.

### Binary data IO

In [None]:
s = b"Hello world!"

with open('hello.bin','wb') as f:
    f.write(s)
    
with open('hello.bin','rb') as f:
    data = f.read()
print(data)

with open('hello.bin','rb') as f:
    byte = f.read(1)
    print(byte,)
    while byte != "":
        byte = f.read(1)
        print(byte,)

The differences between binary and ascii encoding won't be obvious for simple alphanumeric strings, but will become important if you're processing text that includes characters not in the ascii character set.

In [None]:
with open('cat.jpg', 'rb') as f:
    data = f.readline()
print (data)

In [None]:
':'.join(x.encode('hex') for x in data)

Hex dump is useful for debugging. In a hex dump, each byte (8-bits) is represented as a two-digit hexadecimal number.

In [None]:
with open('cat.jpg', 'rb') as f:
    data = f.read()
 
    if data.startswith(b'\xff\xd8'):
        info = 'This is a jpeg file (%d bytes long)'
    else:
        info = 'This is a random file (%d bytes long)'

    print (info % len(data))

In [None]:
from IPython.display import Image
kitty = Image(filename = 'cat.jpg')
kitty

### OS dependent functions

In [None]:
import os

Python os module provides methods that help you perform all kinds of file-processing operations, such as renaming and deleting files (as well as file IO).

In [None]:
help(os.read)
fd = os.open('demo.txt', os.O_RDWR)
ret = os.read(fd, 15)
print('Result from os.read:'+'\n'+20*'-'+'\n'+ret)
os.close(fd)

#### You need to test whether or not a file or directory exists.

In [None]:
print(os.path.exists('/etc/passwd'))
print(os.path.exists('/etc/spam'))

In [None]:
filename = '/etc/spam'
if os.path.exists(filename):
    with open(filename) as f:
        data = f.readline()
    print(data)    
else:
    print (filename + ' does not exist')

#### List files in current directory

In [None]:
listdir =  os.listdir(".")
for file in listdir:
   print file

#### Dealing with directories

In [None]:
os.mkdir("newdir")
os.chdir("newdir")
print(os.getcwd())

In [None]:
os.chdir("..")
print(os.getcwd())

In [None]:
os.rmdir("newdir")

#### Accessing environment

In [None]:
print(os.environ['HOME'])
print(os.environ.has_key("HOME"))

# using get will return `None` if a key is not present rather than raise a `KeyError`
print(os.environ.get('KEY_THAT_MIGHT_EXIST'))

# os.getenv is equivalent, and can also give a default value instead of `None`
print(os.getenv('KEY_THAT_MIGHT_EXIST', 'SOME_KEY'))

# Setting variable
os.environ['PythonTraining'] = 'is fun'
print(os.environ.get('PythonTraining'))

#### Other utilities

In [None]:
print(os.path.isfile('/etc/passwd'))
print(os.path.isdir('/etc/passwd'))
print(os.path.islink('python'))
print(os.path.realpath('python'))
print(os.path.getsize('/etc/passwd'))

In [None]:
import shutil

The shutil module offers a number of high-level operations on <b>files and collections of files</b>. In particular, functions are provided which support file copying and removal. 
For operations on individual files, see also the os module.

In [None]:
help (shutil.copy)
help (shutil.move)

# Extra material

### File position

In [None]:
with open ('demo.txt') as f:
    f.seek(5)                   # seek(offset) Changes file object's position
    data = f.readline() 
    print (data)

In [None]:
with open ('demo.txt') as f:
    f.seek(5)
    data = f.readline() 
    print (data)
    k = f.tell()              # tell() returns current position in file
    print (k)

### Using print  to automatically add new lines

In [None]:
with open ('elementsWithNewLine.txt', 'w') as f:
    print('Noble gases', file=f)       # print automatically adds newline
    for gas in ['He', 'Ne', 'Ar', 'Kr']:
        print(gas, file=f)

In [None]:
!cat elementsWithNewLine.txt