# Files Input/Output
<a href="https://colab.research.google.com/github/rambasnet/FDSPython-Notebooks/blob/master/Ch10-Files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- data is usually stored in secondary storage medium such as hard drive, flash drive, cd-rw, etc. using named locations called files
- files can be organized into folders
- programs often need to read data from files and save data back to files for long-term storage
- this chapter demostrates how to read and write plain text files
- use open() built-in function to work with files
```python
fileio = open(fileName, mode='r')
```
- `open()` let's you open file in different mode to read (default), write, append, etc.
- see help(open) for details

#### File I/O can only read and write string data

In [2]:
help(open)

Help on function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position).
    In

## write text data to a file
- 3-step process

1. open file with a name in write 'w' or 'a' mode
2. write data
3. close file

In [None]:
# old school - not preferred!!
fw = open('test1.txt', 'w') # w is for write mode
fw.write('words\n=====\n')
fw.write('apple\nball\ncat\ndog\n')
print(fw.write('zebra\n'))
fw.close() #must close the file to actually write data
# to the secondoary storage

In [1]:
help(fw)

NameError: name 'fw' is not defined

In [None]:
# newer and better syntax - preferred way!!
alist = [1, 2, 3]
with open('words.txt', 'w') as fout:
    fout.write('apple\nball\ncat\ndog\n')
    fout.write('elephant\n')
    fout.write('zebra\n')
    fout.write(str(1))
    fout.write('\n')
    fout.write(str(alist))
    

# file will be automatically closed when with block is finished executing
# fout.write('test\n') # this will not be written as the file is closed; and throws I/O error

## read text data from a file
1. open file with its name; can provide relative or absolute path
2. read in various ways; one line at a time, all lines, bytes, whole file, etc.
3. use data
4. close file

### various ways to read data
1. `read(size=-1)` : read at most size characters from stream or EOF (End of File) marker
2. `readline()` : read until newline or EOF marker
3. `readlines()` : read and return a list of lines from the input file

In [None]:
# read whole file as list of lines
fr = open('words.txt', 'r') # 'r' or read mode by default; file must exist
data = fr.readlines()
fr.close()

In [None]:
data[0].strip()

In [None]:
with open('words.txt', 'r') as fr:
    data= fr.readlines()

In [None]:
help(fr)

In [None]:
data

In [None]:
for el in data:
    print(el.strip())

In [None]:
data.sort()
    

In [None]:
data

In [None]:
with open('words1.txt', 'w') as newFile: 
    for word in data:
        newFile.write(word)

## read data line by line
- let's create a file with about 10 integers one per line
- then, read the integer line by line into a list of integers

In [None]:
# create a file with 10 integers
# one integer per line
import random
with open('integers.txt', 'a') as fout:
    for i in range(10):
        num = random.randint(1, 1000)
        fout.write(str(num) + '\n')

In [None]:
# read the integer line by line into a list
intList = []
with open('integers.txt', 'r') as fin:
    while True:
        num = fin.readline()
        num = num.strip() # strip \n
        if not num:
            break
        print('num = ', num, type(num))
        intList.append(int(num))

In [None]:
print(intList)

## reading the whole file at once
- read /usr/share/dict/words file on linux/mac
- windows path might be "C:/temp/words.txt" or c:\\temp\\words.txt"
- if the file doesn't exist, use provided words.txt file or create a text file with a bunch of words in it using an editor

In [None]:
# read first 10 lines using head program
! head /usr/share/dict/words

In [None]:
# read last 10 lines using head program
! tail /usr/share/dict/words

In [None]:
file = '/usr/share/dict/words' # works on mac/linux
with open(file) as f:
    data = f.read()


In [None]:
data

In [None]:
words = data.split('\n')
print('There are {0} words in the file.'.format(len(words)))

In [None]:
data.find('needle')

In [None]:
data[831052:831052+6]

In [None]:
# let's print first 10 words
print(words[:10])

In [None]:
help(list)

In [None]:
words.index('needle')

In [None]:
words[123097]

## reading the whole file as list of lines

In [None]:
file = '/usr/share/dict/words'
with open(file) as f:
    lines = f.readlines()

print('There are {0} words in the file.'.format(len(data)))

In [None]:
lines[:2]

In [None]:
for word in lines[:10]:
    print(word.strip())

In [None]:
for word in lines[len(lines)-10:]:
    print(word.strip())

## select a random word from list of words
- import random
- random.choice(wordList)

In [None]:
import random
word = random.choice(lines)
word = word.lower()
print(f'random word = {word}')

## exercises
1. Write a program that reads a file and writes out a new file with the lines in reversed order (i.e. the first line in the old file becomes the last one in the new file.)
2. Write a program that reads a file and prints only those lines that contain the substring snake.
3. Write a program that reads a text file and produces an output file which is a copy of the file, except the first five columns of each line contain a four digit line number, followed by a space. Start numbering the first line in the output file at 1. Ensure that every line number is formatted to the same width in the output file. Use one of your Python programs as test data for this exercise: your output should be a printed and numbered listing of the Python program.
4. Write a program that undoes the numbering of the previous exercise: it should read a file with numbered lines and produce another file without line numbers.