# Class 29-30 - File Processing Examples
**COMP130 - Introduction to Computing**  
**Dickinson College**  

### Creating Plain Text Files

A *plain text* file contains only characters, and very commonly only the characters in the *ASCII set*.  It does not contain any *formatting information* such fonts, italics, boldface, colors and such.  These files can be created using lots of different tools such as:
- TextEdit on Mac (Use settings to choose *plain text*)
- Notepad on Windows
- gedit on Linux
- __Text File option in the JupyterLab launcher__
- and many many others.

Plain text data is extremely *portable*.  Files created in one plain text editor can be opened in another.  Files created on one type of computer can be opened on another.  Files written (saved) by one program can be read (opened) by others.  This portability makes plain text files a very common medium for sharing data.

We will create a plain text file named `sample.txt` using the JupyterLab Text File editor and then open it in another application.

### Reading Plain Text Files in Python

Python programs can open plain text files.

In [None]:
in_file = open('sample.txt')

Each call to the `readline` method of the file returns the next line from the file.

In [None]:
line = in_file.readline()
print(line)

A file should always be closed when you are finished using it.

In [None]:
in_file.close()

### Reading All of the Lines

We can read all of the lines in a file using a `while` loop.  When all lines have been read `readline` returns an empty string, which is interpreted as `False` by Python.

In [None]:
in_file = open('sample.txt')

line = in_file.readline()
while line:
    print(line)
    line = in_file.readline()
    
in_file.close()

Like iterating over other things, Python provides a `for in` statement shortcut for reading all of the lines in a file.

In [None]:
in_file = open('sample.txt')

for line in in_file:
    print(line)
    
in_file.close()

### `readline` and the Newline Character

Notice the extra blank lines in the output above. To understand where these extra lines are coming from, recall that when we print a `\n` character the printing moves to the next line.  Notice the extra blank line at the end of the following output.

In [None]:
print('---')
print('This has \nsome newline\ncharacters in it\n')
print('---')

Each line (except the last) in a plain text file has a *newline* character (`\n`) at the end.  This tells the programs that read such files where each line ends.  Most programs use this as a signal to move to the next line before displaying any more text. It is why our text file is displayed with 5 lines in it. 

Lines read from a file using the `readline` method retain the *newline* (`\n`) character at the end of the string. Thus, when we use `print` we get two new lines. One from the newline character in the string and one because `print` goes to the next line after printing.

In [None]:
in_file = open('sample.txt')
line = in_file.readline()
in_file.close()

print("--")
print(line)       # One newline from the \n and one from print.
print("--")

print(len(line))  # one more than appears when printed... that's the \n

If the newline is not desired the `strip` or `rstrip` method in the `String` class can be used.

In [None]:
line = line.strip()
print(len(line))
print("--")
print(line)       # no more extra blank line.
print("--")

### Splitting Strings

When we have a string that is a collection of information (e.g. words separated by spaces, data separated by comma, etc) the `split` method from the `String` class can be used to divide it into its individual parts.  We can then access those individual parts using the `[ ]` notation, similar to accessing the characters in a `String`.

In [None]:
in_file = open('sample.txt')
line = in_file.readline()
   
words = line.split()    # Divide line at spaces.

print(len(words))       # Work with individal words.
print(words[0])
print(words[1])
print(words[2])

in_file.close()

In [None]:
in_file = open('sample.txt')
line = in_file.readline()
   
words = line.split()    

print(len(words))      
print(words[0])
print(words[1])
print(words[2])

in_file.close()

### Iterating over Split Strings

Once a `String` is split we can iterate over the individual elements using a `while` loop.

In [None]:
index = 0
while index < len(words):
    print(words[index])
    index = index + 1

Not surprisingly at this point Python also allows us to iterate over the elements using a `for in` loop.

In [None]:
for word in words:
    print(word)

![Stop sign](stop.png)
End of Class 29 material.

### Writing Files

Python programs can write data into plain text files as well.  When writing to a text file it is necessary to include the newline character (`\n`) anywhere you want a line to end.

In [None]:
out_file = open('myfile.txt', 'w')

out_file.write('Put me in a file.\n')
out_file.write('Me too!\n')
out_file.write('Hey, me three!')
out_file.write("Don't forget me.")

out_file.close()