# Chapter 8 - Text files

Plain text files (.txt) contains lines of text consisting of only characters and the newline character `\n`. 

## Opening text files:

Text files are opened using the built-in `open` function. The open function requires the file name and the "mode" to open the file in.

Below we open files in `read` mode.

However, we must remember to add the file extension to the file name.

In [None]:
input_file = open('myfile', 'r')

In [None]:
input_file = open('myfile.txt', 'r')

More often, the file name is stored in a separate variable:

In [None]:
file_name = 'myfile.txt'
input_file = open(file_name, 'r')

If you get weird characters, like `ï»¿`, then you have to specify an encoding:

In [None]:
file_name = 'myfile.txt'
input_file = open(file_name, 'r', encoding='utf-8-sig')

If the text file is not in the working directory (i.e. the same folder as the program resides in), the path to the file must be specified.

When you specify a path, use the forward slash `/` and not the backward slash `\` as the backward slash is used with several special characters like `\n`.

In [None]:
file_name = 'subfolder1/myfile.txt' # (this subfolder is specific to my computer)
input_file = open(file_name, 'r')

In [None]:
path = 'subfolder1/'
file_name = 'myfile.txt'
input_file = open(path + file_name, 'r')

Even if the file is in a different directory than the program, the file can still be read by specifying the **full path** to the file.

In [None]:
path = 'C:/Users/erik_/PROG1000 Jupyter/lecture14 - text files open, read, write/subfolder1/' # (the path is specific to my computer)
file_name = 'myfile.txt'
input_file = open(path + file_name, 'r')

Ps: Notice that opening the file does not actually show you its content.

In [None]:
print(input_file)

It is good programming practice to always close files when you are done with them using the `close` function.

In [None]:
input_file.close()

We can also open files in `write` mode. Below we create an empty file in the working directory.

In [None]:
# remember file extension
output_file = open('output_file.txt', 'w')

In [None]:
output_file.close()                  # remember to close file when done

Now that we have learned how to open files and close them, let us move on to reading the content.

## Reading text files:

In order to actually inspect the contents of the opened text file, its content must be read. 

There are two main methods for reading the lines of the file:
1. Readlines method
2. For-loop

#### 1. Readlines method:

This method reads all of the lines in the file and stores them in a list.

In [None]:
# open file
file_name = 'myfile.txt'
input_file = open(file_name, 'r')

In [None]:
# read all lines
lines = input_file.readlines()

In [None]:
lines

In [None]:
len(lines)

In [None]:
lines[2]

In [None]:
for line in lines:
    print(line)

In [None]:
for line in lines:
    print(line,end='')

In [None]:
# close file
input_file.close()

#### 2. For-loop:

Instead of reading all lines with the readlines function, we can read each line in a `for loop`.

In [None]:
# open file
file_name = 'myfile.txt'
input_file = open(file_name, 'r')

In [None]:
# read line-by-line
for line in input_file:
    print(line, end='')

In [None]:
line

In [None]:
# close file
input_file.close() 

E.g. the textfile `babynames2000s.txt` contains a list of the 200 most popular babynames in the 2000's. We can use a `for loop` to print all of the babynames.

In [None]:
# open file
fname = 'babynames2000s.txt'
infile = open(fname, 'r')

for line in infile:
    print(line)

# close file
infile.close()

Ps: Notice that a file can be read at once by using the built-in `read` method. However, this also reads the newline character. The file contents is therefore read as one long string. 

In [None]:
# open file
file_name = 'myfile.txt'
input_file = open(file_name, 'r')

In [None]:
# read file
line = input_file.read()
line

In [None]:
# close file
input_file.close()

## Writing text files:

We can write text files by changing the operation mode from `'r'` to `'w'` in the `open` function.

First, we must create an empty file.

In [None]:
output_file = open('myfile2.txt', 'w')

Second, the lines must be written to the file using the `write` method. Remember to add the newline character `\n` after each line.

In [None]:
output_file.write('I wrote this line.\n')
output_file.write('I wrote the second line.\n')
output_file.write('I wrote the third line.\n')
output_file.write('I wrote the fourth line.\n')

In [None]:
output_file.close() 

But if we try without the newline character \n

In [None]:
output_file = open('myfile3.txt', 'w')

In [None]:
output_file.write('I wrote this line.')
output_file.write('I wrote the second line.')
output_file.write('I wrote the third line.')
output_file.write('I wrote the fourth line.')

In [None]:
output_file.close()

Notice that if you want to add lines to an existing file, you must open the file in the `a` mode, which allows for **appending** lines at the end of the file.

In [None]:
output_file = open('myfile2.txt', 'a')

In [None]:
output_file.write('This is another line...\n')
output_file.write('...and another line.\n')

output_file.close() 

We can also copy from one file to another. 

First, we must open the file that we wish to copy from, and create an empty file to copy to.

In [None]:
input_file = open('myfile.txt', 'r')        # open existing file in read mode
output_file = open('myfile_copy.txt', 'w')  # open new file in write mode

Then we can copy line-by-line from the input file to the output file.

In [None]:
for line in input_file:
    output_file.write(line)

# remember to close both files
input_file.close()
output_file.close() 

## String traversal:

Once the file is read, we want to investigate the lines in the file.

In [None]:
input_file = open('myfile.txt', 'r')

We can loop over the lines and for instance count the number of characters in each line.

In [None]:
for line in input_file:
    length = len(line)
    print(f'The length of the line: {length}')
    
input_file.close()

However, often we want to investigate the *actual characters* in the lines.

We can loop over the characters in the last line and for instance count the number of blank spaces.

In [None]:
line

In [None]:
# initialize the counter
num_space = 0

# loop over each character in the line
for char in line:
    # if character is empty space...
    if char == ' ':
        # ...increment counter with 1
        num_space = num_space + 1

print(f'There are {num_space} number of spaces in this line.')

We can do this for each line in the text file by combining two for loops:

1. Loop over each line in the file.
2. For each line, loop over the characters in that line.

In [None]:
input_file = open('myfile.txt', 'r')

In [None]:
# loop over each line
for line in input_file:
    # initialize the counter
    num_space = 0
    # loop over each character in the line
    for char in line:
        # if character is empty space...
        if char == ' ':
            # ...increment counter with 1
            num_space = num_space + 1
    print(f'There are {num_space} number of spaces in this line.')

In [None]:
input_file.close()