# Week 6: Reading Files!

Now we start getting into the good stuff.

This week we're learning the basics of handling files for reading in and writing out.  We're used to our operating systems handling all these things for us, so we need to reproduce those backend mechanisms ourselves.  This is a topic where even they very high level language of python starts feeling very low level (closer to computers than human language).  As you start out, you may have to rely on just rotely reproducing these patterns.  So long as you know how they are operating, you can reasonably move forward without really understanding what is happening.  That said, this can be a dangerous game when you are writing files.

You will find that there are 3 main patterns for reading in files and processing the data.  These 3 are core to python and you will find that they are roughly interchangable.  Not that they do the same things, but that you can almost always accomplish whatever you need to do with the file contents no matter which pattern you choose to use.  
Nearly all students find that they have a favorite.  This week's homework forces you to use all three, because sometimes your go-to pattern isn't the best.  But moving forward, you should be able to use your favorite most of the times.  The division usually happens between readline() and readlines()/read().  Much like knitting and crochet, one side usually finds the other to be unfathomable.

You'll have to fuss at things in different ways, but don't worry about trying to use the 'right' one.  I suggest that you use the one you understand the most and go from there.  As you gain more python experience you'll get more comfortable and start adding in more.

Unless the assignment names a specific method to be used, you should use the method that you are the most comfortable with.

While there are 3 basic methods of reading in files, each is going to have roughly the same framework:

1. Open the file in whichever mode (reading or writing) you need 
    * This will be with `file_in = open(file_path, mode)` (note that this is just the pattern to use, you'll have to supply the file path and mode values yourself)
    * Remember that you will always have to do an assignment statement to save the file object so you can act on it later.  I'm using `file_in` as an example variable name, but you can change it.
    * This creates you file IO object you will then mess with in step 2
    * you may need to have `open(file_path, mode, encoding = 'utf-8')` if your computer is based in another lanugage.
2. Do stuff to that file IO object
    * This is where each well be really different, but you'll be doing various actions to your `file_in` object, or whatever the variable name of your file IO object is.
3. Close the file.
    * this will always be the same pattern, `file_in.close()`. (replace `file_in` with your file IO variable name)

There are ways of handling files in sort of a one shot, but we'll get to that later.

An important thing to keep in mind:  the variable that you create with your `open()` expression will be your access point to the content of that file.  The content within that file is accessed by that object at your direction.  That is the purpose of this object.  This object is how python knows about your file.  But this doesn't mean that the file IO object knows about your file's content.  You'll need to direct that object to access or write the content that you want.

# Some other cautions 

Just a word of warning, there are some packages (like pandas) have their own methods for reading in files that take care of all these things for you, so you won't see the use of `open()`, `.close()`, or any other functions for reading the file contents.

You should also only be using plain text files for the file reading methods that we will be discussing here.  They will not work with propietary files like Word or Excel documents.  This is why certain packages have their own functions, as I mentioned just before.  They are equipped with additional tools to read those kinds of files.

# Once you have that file object

There are some structural considerations to keep in mind.  Some reading methods bypass having to deal with this stuff, but these things will still be happening even if you can get away without having to think about them.

# The line, revisited

We've spent several of our class sessions talking about the importance of lines in data processing.  Processing files is one of those reasons that we have focused on it.  Several of the file reading using navigates through the content based on newlines.  Newlines are really one of the most essential units of data.

## The cursor

Don't curse the cursor!  When you open a file for reading, there's an invisible cursor that the system places within that file.  It starts at the very beginning of the file and moves forward at your direction.  This means you can direct it to go forward character by character or line by line.  But you can never move backwards!  This means that you cannot 'reread' a file once you've already done a read action to it.  Once the cursor has found the end of the file, it stops and will not move forward unless it receives a command to do so.

For example:

1. You open a file for reading
2. You use `.read()` on that file.  The cursor is now at the end of the file.
3. You cannot then use `.readlines()` on it after because `.readlines()` is going to attempt to start reading from the end of that file.  You'll get nothing.

Just as we need to take some time to think about how we might want to loop over something, once you're accustomed to how these different methods interact with data, you can take some time to think about how you want to maneuver the cursor over the file that you are working with.  

# The 3 essential patterns:  `.read()`, `.readlines()`, and `.readline()`

We're going to cover these in order of complexity.

1. `.read()` reads the entire contents of the file into a single string, that you can then operate on like any normal string.  You will have a very short file reading section in your program.  This is often the best one to get started with, because we've been used to having all our text and data as one big string. 

2. `.readlines()` will read an entire file into a list of the lines in the file, so each line in the file will become an element in that list.  This preserves all the order and content of the file, but gives it back to you in a handy structure of a list.  This in valuable, of course, when you want a list of lines and effectively is like running `.split('\n')` on your file contents.  This helps you skip a step if that is your end goal.

3. `.readline()` will read the file line by line starting at the beginning.  This is the one where the cursor is something that you are directly interacting with, as it will only move forward within the file at your direction.

## But why deal with files anyhow?


The other benefit is that the script and code can be made to be independent of the file.  For example, you may have a report to write each month on website log metricts.  Each month you receive a data file from a colleague, the server, or downloaded from something like google analytics.  The contents of the data file obviously change each month, but the structure does not.  This means that you can write your analysis script once and rerun it over the file that you recieve each month.  You may need to change parameters along the way, but the hard work is done.


## `.read()`: read all the things into one string

When usure about which method to use, use `.read()`

This is the basic formula for `.read()`, which will read the entire contents of the file into a single string.  We encountered this last week when I showed how we read in that data file all at once.  After you have read the file contents into that string, you will operate on that string instead of the file object.  The nice thing is now that the contents are a regular string, so you can use all your normal string operations on it.  

In fact, this is the point of this kind of thing.  The program reads the data into active memory, then does stuff to it.  This allows you to have a very large data file live outside of your script, perhaps even an another server or database.


The formula for `.read()` is one of the simplest, and this is usually done at the beginning of your program.

* Step 1
    * `fileIOvariable = open(filepath, mode)`
* Step 2
    * `file_contents = fileIOvariable.read()`
* Step 3
    * `fileIOvariable.close()`

In [2]:
my_file = open('smalltext.txt', 'rt') # 1

all_the_text = my_file.read() # 2

print("This is all the text from the .read():")
print("---------------------!!!!!!!!!")
print(all_the_text) # just the first 10 lines, please

my_file.close() # 3

This is all the text from the .read():
---------------------!!!!!!!!!
Hello, I would like to science.
Please show me where the science is.
I am a meat popcicle.


That is it.  At this point, all the data that you want is inside of `all_the_text`.  After the `.read()` has been executed the cursor is at the end of the file.  There is nothing left to read, so when you ask it to read the file again, there is no more text to traverse over so you get back an empty string.

We can see `.read()` in action when this happens.

In [1]:
my_file = open('smalltext.txt', 'rt')

all_the_text = my_file.read()

print("This is all the text from the FIRST .read():")
print("---------------------")
print(all_the_text) # just the first 10 lines, please

# now the cursor is at the end of the text in the file
# we can still use .read(), but since it is at the end of the file we get an empty string back

maybe_more = my_file.read() # here's the second file read attempt

print("---------------------")
print("This is all the text from the SECOND .read():")
print("---------------------")
print(maybe_more) # an empty string

my_file.close()

This is all the text from the FIRST .read():
---------------------
Hello, I would like to science.
Please show me where the science is.
I am a meat popcicle.
---------------------
This is all the text from the SECOND .read():
---------------------



Printing these out sort of hide what's happening in these strings. So let's use some handy interactive stuff.

The `all_the_text` variable contains the contents of our first `.read()` call, so it has all the contents of our file.

The `maybe_more` variable contains whatever was returned from our second `.read()` call, so we can see better now that it is an empty string.

In [2]:
all_the_text

'Hello, I would like to science.\nPlease show me where the science is.\nI am a meat popcicle.'

In [3]:
maybe_more

''

You can no longer do reading or writing operations to a file IO object that has been closed (so `.close()` has been called on it.

You will generate an error if you try to do that.  I'm adapting our previous example to show what happens when `.read()` is called on our file object after `.close()` has been called on it.  This generates a pretty helpful error message, saying that "`I/O operation on closed file.`"

In [4]:
my_file = open('smalltext.txt', 'rt')

all_the_text = my_file.read()

print("This is all the text from the FIRST .read():")
print("---------------------")
print(all_the_text) # just the first 10 lines, please

my_file.close()

maybe_more = my_file.read()


This is all the text from the FIRST .read():
---------------------
Hello, I would like to science.
Please show me where the science is.
I am a meat popcicle.


ValueError: I/O operation on closed file.

## `.readlines()` note the plural

Like `.read()`, this read method will read the entire file's contents.  Instead of getting a string containing all the contents, you'll get a list with all the contents split up on lines.

Note that interestingly enough, this will split all the lines up after the newline, but unlike using `.split('\n')`, the newline characters will be retained.

The formula:

* Step 1
    * `fileIOvariable = open(filepath, mode)`
    * This is the same as before
* Step 2
    * `file_contents_list = fileIOvariable.readlines()`
    * This is pretty much the same as before, but calling `.readlines()` instead of `.read()`
* Step 3
    * `fileIOvariable.close()`
    * this is the same as before

In [5]:
my_file = open('smalltext.txt', 'rt') # 1

contents_list = my_file.readlines() # 2

my_file.close()  # 3

In [6]:
print(contents_list)

['Hello, I would like to science.\n', 'Please show me where the science is.\n', 'I am a meat popcicle.']


As mentioned before, this is a method of convienience.  If you want to end up with a list of the lines you can do that directly with this method.  The usage pattern is the same as `.read()` but you are calling `.readlines()` instead.

Honestly, that's it.

## `.readline()` note the singular here

So `maybemore` worked, but is empty.  The cursor had no more text to go through, so it just gave us an empty string.  But let's look at this a different way and see if we can halt the cursor in the middle of a file.

In [1]:
my_file = open('boomboom.txt', 'rt')

for justdothis5times in range(5):
    print(my_file.readline())
    
therest = my_file.read()

A told B, and B told C, "I'll meet you at the top of the coconut tree."

"Wheee!" said D to E F G, "I'll beat you to the top of the coconut tree."

Chicka chicka boom boom! Will there be enough room? Here comes H up the coconut tree,

and I and J and tag-along K, all on their way up the coconut tree.

Chicka chicka boom boom! Will there be enough room? Look who's coming! L M N O P!



In [2]:
print(therest)

And Q R S! And T U V! Still more - W! And X Y Z!
The whole alphabet up the - Oh, no! Chicka chicka... BOOM! BOOM!
Skit skat skoodle doot. Flip flop flee. Everybody running to the coconut tree.
Mamas and papas and uncles and aunts hug their little dears, then dust their pants.
"Help us up," cried A B C.
Next from the pileup skinned-knee D and stubbed-toe E and patched-up F. Then comes G all out of breath.
H is tangled up with I. J and K are about to cry. L is knotted like a tie.
M is looped. N is stopped. O is twisted alley-oop. Skit skat skoodle doot. Flip flop flee.
Look who's coming! It's black-eyed P, Q R S, and loose-tooth T. Then U V W wiggle-jiggle free.
Last to come X Y Z. And the sun goes down on the coconut tree...
But - chicka chicka boom boom! Look, there's a full moon.
A is out of bed, and this is what he said, "Dare double dare, you can't catch me.
Chicka chicka BOOM! BOOM!Chicka chicka BOOM! BOOM!
I'll beat you to the top of the coconut tree."
Chicka chicka BOOM! BOOM!


What happened here?  We can see that the `.readline()` bit grabbed the first 5 lines and then the `.read()` got the rest of the lines.  There also seems to be extra newlines happening in the first section?

I used a for loop with `range(5)` to repeat `.readline()` 5 times.  This meant it acted 5 independent times, so it printed out 5 lines that each ended in `\n`, which means that gets rendered as an extra newline, because there's no text right after it.

At this point, the cursor is sitting at the beginning of line 6 just waiting.  When I call `.read()` it goes through the remaining portion of the file.  There's no extra newlines happening because there's text to sit in those slots.

# A problem to work through

(editor's note: I kind of hate this example and usually skip this, but don't want to trash it just yet).

Here's something silly, but it's a task.

Read through a text file and change it so that the lines alternate between upper and lower case (starting with upper), then write out the new file. 

So we want something like this:
```
LINE 1
line 2
LINE 3
line 4
```

Let's break this down:

1. Read in the file
2. Transform the text
3. Write out the file

We can go ahead and set up items 1 and 3.

In [3]:
file_in = open('boomboom.txt', 'rt')

write_me = file_in.read()

file_out = open('newboom.txt', 'wt')

file_out.write(write_me)

file_in.close()
file_out.close()

In the code in the cell just above, we can see that we aren't transforming the text yet, but we are at leat set up for the read in and write out.  While we're playing with things, let's comment out the write stuff and just use print statements.

In [7]:
file_in = open('boomboom.txt', 'rt')

write_me = file_in.read()


print(write_me)
# file_out = open('newboom.txt', 'wt')

# file_out.write(write_me)

file_in.close()
# file_out.close()

A told B, and B told C, "I'll meet you at the top of the coconut tree."
"Wheee!" said D to E F G, "I'll beat you to the top of the coconut tree."
Chicka chicka boom boom! Will there be enough room? Here comes H up the coconut tree,
and I and J and tag-along K, all on their way up the coconut tree.
Chicka chicka boom boom! Will there be enough room? Look who's coming! L M N O P!
And Q R S! And T U V! Still more - W! And X Y Z!
The whole alphabet up the - Oh, no! Chicka chicka... BOOM! BOOM!
Skit skat skoodle doot. Flip flop flee. Everybody running to the coconut tree.
Mamas and papas and uncles and aunts hug their little dears, then dust their pants.
"Help us up," cried A B C.
Next from the pileup skinned-knee D and stubbed-toe E and patched-up F. Then comes G all out of breath.
H is tangled up with I. J and K are about to cry. L is knotted like a tie.
M is looped. N is stopped. O is twisted alley-oop. Skit skat skoodle doot. Flip flop flee.
Look who's coming! It's black-eyed P, Q R S, 

Great! Now let's think of an approach for how to do this.  We know that we can use `.upper()` and `.lower()` to transform the text, but the issue is: how do we alternate lines of text?

Think of it this way: we need to move through the file in terms of pairs.  One upper, one lower then one upper and one lower.

We can use the cursor to our advantage here.  If we're already iterating through reading a file using a for loop, e.g.

```python
for line in file_in:
    print(line)

```

We're just using this for loop as a convenient way to move the cursor one by one.  So can we move thorugh two lines within that for loop?  We can't just use line twice becuase it'll always be the second line, but we have another method in our pockets:  `.readline()`.  As we saw with our `range(5)` example, we can call `.readline()` an arbitrary number of times.  It won't mess up our for loop, but it will give us the next line, and push the cursor forward by one.

In other words:


```python

# let's say the cursor starts at 0

for line in file_in: # cursor + 1; cursor is now 1
    print(line)      # cursor doesn't change; cursor remains 1
    print(file_in.readline()) # cursor + 1; cursor is now 2
    
# now this will run again

# ...

# cursor is at 2 from the previous loop 

for line in file_in: # cursor + 1; cursor is now 3
    print(line)      # cursor doesn't change; cursor remains 3
    print(file_in.readline()) # cursor + 1; cursor is now 4
    
# and so on
```

Let's play with this using just a regular counter and an accumulator pattern.  

In [8]:
total = 0

lotsofones = [1] * 10
print(lotsofones)

for one in lotsofones:
    total = total + one
    total = total + 1
    print(total)

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
2
4
6
8
10
12
14
16
18
20


So plunking all of this in together:

In [10]:
file_in = open('boomboom.txt', 'rt')

new_text = []

for line in file_in:
    new_text.append(line.upper())
    new_text.append(file_in.readline().lower())

write_me = "".join(new_text) # I retained the newlines as they came in from the original file
                             # so I don't need to put them back in

print(write_me)
# file_out = open('newboom.txt', 'wt')

# file_out.write(write_me)

# file_in.close()
# file_out.close()

A TOLD B, AND B TOLD C, "I'LL MEET YOU AT THE TOP OF THE COCONUT TREE."
"wheee!" said d to e f g, "i'll beat you to the top of the coconut tree."
CHICKA CHICKA BOOM BOOM! WILL THERE BE ENOUGH ROOM? HERE COMES H UP THE COCONUT TREE,
and i and j and tag-along k, all on their way up the coconut tree.
CHICKA CHICKA BOOM BOOM! WILL THERE BE ENOUGH ROOM? LOOK WHO'S COMING! L M N O P!
and q r s! and t u v! still more - w! and x y z!
THE WHOLE ALPHABET UP THE - OH, NO! CHICKA CHICKA... BOOM! BOOM!
skit skat skoodle doot. flip flop flee. everybody running to the coconut tree.
MAMAS AND PAPAS AND UNCLES AND AUNTS HUG THEIR LITTLE DEARS, THEN DUST THEIR PANTS.
"help us up," cried a b c.
NEXT FROM THE PILEUP SKINNED-KNEE D AND STUBBED-TOE E AND PATCHED-UP F. THEN COMES G ALL OUT OF BREATH.
h is tangled up with i. j and k are about to cry. l is knotted like a tie.
M IS LOOPED. N IS STOPPED. O IS TWISTED ALLEY-OOP. SKIT SKAT SKOODLE DOOT. FLIP FLOP FLEE.
look who's coming! it's black-eyed p, q r s, 

'A TOLD B, AND B TOLD C, "I\'LL MEET YOU AT THE TOP OF THE COCONUT TREE."\n"wheee!" said d to e f g, "i\'ll beat you to the top of the coconut tree."\nCHICKA CHICKA BOOM BOOM! WILL THERE BE ENOUGH ROOM? HERE COMES H UP THE COCONUT TREE,\nand i and j and tag-along k, all on their way up the coconut tree.\nCHICKA CHICKA BOOM BOOM! WILL THERE BE ENOUGH ROOM? LOOK WHO\'S COMING! L M N O P!\nand q r s! and t u v! still more - w! and x y z!\nTHE WHOLE ALPHABET UP THE - OH, NO! CHICKA CHICKA... BOOM! BOOM!\nskit skat skoodle doot. flip flop flee. everybody running to the coconut tree.\nMAMAS AND PAPAS AND UNCLES AND AUNTS HUG THEIR LITTLE DEARS, THEN DUST THEIR PANTS.\n"help us up," cried a b c.\nNEXT FROM THE PILEUP SKINNED-KNEE D AND STUBBED-TOE E AND PATCHED-UP F. THEN COMES G ALL OUT OF BREATH.\nh is tangled up with i. j and k are about to cry. l is knotted like a tie.\nM IS LOOPED. N IS STOPPED. O IS TWISTED ALLEY-OOP. SKIT SKAT SKOODLE DOOT. FLIP FLOP FLEE.\nlook who\'s coming! it\'s bl

# Another problem statement

Your friend says to you that there's a secret code in the Raven, where the first word of each line can go together to make a snarky political statement.  So you decide to write a python script that goes through each line and captures the first word of each line.

## Where to start?

We need to think of the things that we know we can do.  

1. our ability to loop over lines in a text file
2. our ability to split a line of text up into words
3. our ability to get the first word out of that list of words

We can tackle these one at a time.  We may not have done all these things all together, but we have done them all separately.  Let's remind ourselves of how we do this from the smallest to the largest.

## Get the first thing out of a list

Remember our positions, right?  Given a list called `stuff`, no matter what's in there, so long as it has at least one thing in there we can say `stuff[0]` to get that thing out of it.

In [4]:
stuff1 = [1, 2, 3]
print(stuff1[0])

1


In [5]:
stuff2 = ['a', 'b', 'c']
print(stuff2[0])

a


Let's also take the time to remind ourselves what it looks like when there's nothing in there.

In [6]:
stuff3 = []
print(stuff3[0])

IndexError: list index out of range

## Making a list of words

Remember that we don't care about the punctuation here, let's just use the whitespace.

In [7]:
sentence = "I am a meat popsicle."
words = sentence.split()
print(words)

['I', 'am', 'a', 'meat', 'popsicle.']


In [9]:
# and putting it together

print(words[0])

I


## Loop over a list of lines in a file

We know several ways to get lines out of a document.  We can use `.readlines()`, we can split up a string we read in from `.read()`, and a few other things.  Let's use a pattern that we can see on page 157.

There's a lot of shorthand so that we don't need to use `.readline()`.  This difference can make what's happening quite difficult to follow.

Let's start on our copy of the raven.  We've got a lot going on here, so this is where clear and correct variable names come in handy.

**Even though you might be copying a formula from the book, you really must change the variable names for you own purpose.**

In [15]:
infile = open('raven.txt', 'r') # makes our fileio object

for line in infile: # loop over the lines in the file io object
    # we've got our lines coming in, so we can start messing about with the content
    words = line.split()
    print(words)

['Once', 'upon', 'a', 'midnight', 'dreary,', 'while', 'I', 'pondered,', 'weak', 'and', 'weary,']
['Over', 'many', 'a', 'quaint', 'and', 'curious', 'volume', 'of', 'forgotten', 'lore—']
['While', 'I', 'nodded,', 'nearly', 'napping,', 'suddenly', 'there', 'came', 'a', 'tapping,']
['As', 'of', 'some', 'one', 'gently', 'rapping,', 'rapping', 'at', 'my', 'chamber', 'door.']
['“’Tis', 'some', 'visitor,”', 'I', 'muttered,', '“tapping', 'at', 'my', 'chamber', 'door—']
['Only', 'this', 'and', 'nothing', 'more.”']
[]
['Ah,', 'distinctly', 'I', 'remember', 'it', 'was', 'in', 'the', 'bleak', 'December;']
['And', 'each', 'separate', 'dying', 'ember', 'wrought', 'its', 'ghost', 'upon', 'the', 'floor.']
['Eagerly', 'I', 'wished', 'the', 'morrow;—vainly', 'I', 'had', 'sought', 'to', 'borrow']
['From', 'my', 'books', 'surcease', 'of', 'sorrow—sorrow', 'for', 'the', 'lost', 'Lenore—']
['For', 'the', 'rare', 'and', 'radiant', 'maiden', 'whom', 'the', 'angels', 'name', 'Lenore—']
['Nameless', 'here', 'for

Looks good!  We've got our lines being split into words.  Let's try to get the first word out of each.

In [13]:
infile = open('raven.txt', 'r')

for line in infile:
    words = line.split()
    print(words[0]) # let's add our position lookup

Once
Over
While
As
“’Tis
Only


IndexError: list index out of range

That's surprising, but we can look at our lists coming out of the lines.  When the lines with just the newline are brought in and split, it returns an empty list.  So we can't access the first element of an empty list. 

We don't yet have the trick to do a boolean check of the length, so we'll need to back up and think about how we can prepare our file.

The problem here are the newlines, which we can mess with more directly if we read everything in as a string.  Which would be `.read()`.

So we know that we will want the lines out, so splitting on `\n`.  Our problem is that we've got these `\n\n` pairs that cause an empty string to be returned.  We can use `.replace()` to change these `\n\n` into `\n`.  We still want to keep a newline in there so we can split it up the way we want.

In [18]:
infile = open('raven.txt', 'r') # makes our fileio object

text = infile.read()

infile.close()

corrected_text = text.replace('\n\n', '\n')
lines = corrected_text.split('\n')
print(lines)

['    Once upon a midnight dreary, while I pondered, weak and weary,', 'Over many a quaint and curious volume of forgotten lore—', '    While I nodded, nearly napping, suddenly there came a tapping,', 'As of some one gently rapping, rapping at my chamber door.', '“’Tis some visitor,” I muttered, “tapping at my chamber door—', '            Only this and nothing more.”', '    Ah, distinctly I remember it was in the bleak December;', 'And each separate dying ember wrought its ghost upon the floor.', '    Eagerly I wished the morrow;—vainly I had sought to borrow', '    From my books surcease of sorrow—sorrow for the lost Lenore—', 'For the rare and radiant maiden whom the angels name Lenore—', '            Nameless here for evermore.', '    And the silken, sad, uncertain rustling of each purple curtain', 'Thrilled me—filled me with fantastic terrors never felt before;', '    So that now, to still the beating of my heart, I stood repeating', '    “’Tis some visitor entreating entrance at m

Now we can loop through our lines and try splitting them up again.

In [20]:
infile = open('raven.txt', 'r') # makes our fileio object

text = infile.read()

infile.close()

corrected_text = text.replace('\n\n', '\n')
lines = corrected_text.split('\n')

for line in lines:
    print(line.split()) # looks better!!
    

['Once', 'upon', 'a', 'midnight', 'dreary,', 'while', 'I', 'pondered,', 'weak', 'and', 'weary,']
['Over', 'many', 'a', 'quaint', 'and', 'curious', 'volume', 'of', 'forgotten', 'lore—']
['While', 'I', 'nodded,', 'nearly', 'napping,', 'suddenly', 'there', 'came', 'a', 'tapping,']
['As', 'of', 'some', 'one', 'gently', 'rapping,', 'rapping', 'at', 'my', 'chamber', 'door.']
['“’Tis', 'some', 'visitor,”', 'I', 'muttered,', '“tapping', 'at', 'my', 'chamber', 'door—']
['Only', 'this', 'and', 'nothing', 'more.”']
['Ah,', 'distinctly', 'I', 'remember', 'it', 'was', 'in', 'the', 'bleak', 'December;']
['And', 'each', 'separate', 'dying', 'ember', 'wrought', 'its', 'ghost', 'upon', 'the', 'floor.']
['Eagerly', 'I', 'wished', 'the', 'morrow;—vainly', 'I', 'had', 'sought', 'to', 'borrow']
['From', 'my', 'books', 'surcease', 'of', 'sorrow—sorrow', 'for', 'the', 'lost', 'Lenore—']
['For', 'the', 'rare', 'and', 'radiant', 'maiden', 'whom', 'the', 'angels', 'name', 'Lenore—']
['Nameless', 'here', 'for', 

In [21]:
infile = open('raven.txt', 'r') # makes our fileio object

text = infile.read()

infile.close()

corrected_text = text.replace('\n\n', '\n')
lines = corrected_text.split('\n')

for line in lines:
    words = line.split()
    print(words[0])
    

Once
Over
While
As
“’Tis
Only
Ah,
And
Eagerly
From
For
Nameless
And
Thrilled
So
“’Tis
Some
This
Presently
“Sir,”
But
And
That
Darkness
Deep
Doubting,
But
And
This
Merely
Back
Soon
“Surely,”
Let
Let
’Tis
Open
In
Not
But,
Perched
Perched,
Then
By
“Though
Ghastly
Tell
Quoth
Much
Though
For
Ever
Bird
With
But
That
Nothing
Till
On
Then
Startled
“Doubtless,”
Caught
Followed
Till
Of
But
Straight
Then,
Fancy
What
Meant
This
To
This
On
But
She
Then,
Swung
“Wretch,”
Respite—respite
Quaff,
Quoth
“Prophet!”
Whether
Desolate
On
Is
Quoth
“Prophet!”
By
Tell
It
Clasp
Quoth
“Be
“Get
Leave
Leave
Take
Quoth
And
On
And
And
And
Shall


Well, we've done the thing and you can tell your friend that they're full of crap!

# An example of applying a function to each line of a text file

So let's say you have a file with one entry of data per line.  For example, in The Raven, let's say that one line of the text is one entry of data.  This is a relatively standard form of organizing data, because it is so easy to operate on data at the line level.

Now let's say you have a function that you want to apply to that file, running each line of the file through that function.  You want to gather the results from this function, and as a start write out a file with those results.  This refers to the batch processing that is introduced within the Zelle book.

For the sake of example, we're going to run each line through the `len()` function.  This will count the number of characters within each line of text.  You can replace `len` with any other function that accepts a single string value.  (like your acronym function!!)

Let's build this up together, but first, come up with a battle plan.

## step 1:  read through the file and access the text

Read in the file and be able to access each line of text.  We can do this from a variety of approaches, but let's just get our plumbing in order.

We're going to start with code that we've used before.

In [1]:
infile = open('raven.txt', 'r') # makes our fileio object

for line in infile:
    print(line)

infile.close()

    Once upon a midnight dreary, while I pondered, weak and weary,

Over many a quaint and curious volume of forgotten lore—

    While I nodded, nearly napping, suddenly there came a tapping,

As of some one gently rapping, rapping at my chamber door.

“’Tis some visitor,” I muttered, “tapping at my chamber door—

            Only this and nothing more.”



    Ah, distinctly I remember it was in the bleak December;

And each separate dying ember wrought its ghost upon the floor.

    Eagerly I wished the morrow;—vainly I had sought to borrow

    From my books surcease of sorrow—sorrow for the lost Lenore—

For the rare and radiant maiden whom the angels name Lenore—

            Nameless here for evermore.



    And the silken, sad, uncertain rustling of each purple curtain

Thrilled me—filled me with fantastic terrors never felt before;

    So that now, to still the beating of my heart, I stood repeating

    “’Tis some visitor entreating entrance at my chamber door—

Some late v

Here we're only printing out the line of text.  Which is a great place to start, because we've isolated them and can manipulate each.

## Step 2: do the thing you want to the line

Now that we have each line of text isolated, we can act on them.  Take a moment to identify where each line of text is being held in the code.  Thus answering the question, which variable should be passed into `len()`?

The answer is `line`.  This is our iterable variable, and holds each line of text, one at a time.  This is the one that we should manipulate for the content.

So let's add that into the above code.

In [2]:
infile = open('raven.txt', 'r') # makes our fileio object

for line in infile:
    line_length = len(line)
    print(line_length)

infile.close()

67
57
67
59
62
41
1
60
64
63
65
61
40
1
67
64
69
63
59
42
1
64
60
67
65
66
45
1
76
64
67
69
65
42
1
66
58
70
67
58
45
1
72
62
72
67
58
48
1
62
59
75
65
65
41
1
70
56
59
66
63
43
1
65
63
69
73
64
44
1
63
65
61
70
56
35
1
61
73
62
63
67
43
1
63
62
65
69
66
44
1
73
64
81
62
65
41
1
73
66
63
61
63
41
1
72
62
68
65
62
41
1
79
65
70
63
70
41
1
70
57
69
74
65
38


So here we are running `line` through `len()` and saving that returned value as `line_length`.  Then we are printing that out.

Printing the value is a great place to start because it allows you to visualize what's going on with the results.  We could also print both the line and the length, so we can see the original and the results.  This helps us further check out work and confirm that we are accomplishing what we wanted.  

In [3]:
infile = open('raven.txt', 'r') # makes our fileio object

for line in infile:
    line_length = len(line)
    print(line_length, line)

infile.close()

67     Once upon a midnight dreary, while I pondered, weak and weary,

57 Over many a quaint and curious volume of forgotten lore—

67     While I nodded, nearly napping, suddenly there came a tapping,

59 As of some one gently rapping, rapping at my chamber door.

62 “’Tis some visitor,” I muttered, “tapping at my chamber door—

41             Only this and nothing more.”

1 

60     Ah, distinctly I remember it was in the bleak December;

64 And each separate dying ember wrought its ghost upon the floor.

63     Eagerly I wished the morrow;—vainly I had sought to borrow

65     From my books surcease of sorrow—sorrow for the lost Lenore—

61 For the rare and radiant maiden whom the angels name Lenore—

40             Nameless here for evermore.

1 

67     And the silken, sad, uncertain rustling of each purple curtain

64 Thrilled me—filled me with fantastic terrors never felt before;

69     So that now, to still the beating of my heart, I stood repeating

63     “’Tis some visitor 

I'm printing the length first before the line content for readability.  The lines vary in length so much that the length numbers wouldn't line up in a nice way. Having them come first makes it easier for us to glance at things.

## Step 3:

Write out the results.

Now that we have the output that we want, we can add our outfile into the mix.

In [4]:
infile = open('raven.txt', 'r') # makes our fileio object
outfile = open('raven_line_length.txt', 'w')

for line in infile:
    line_length = len(line)
    print(line_length, file = outfile) # changing this back to just the line length as well

infile.close()
outfile.close()

And we're done!  This illustrates the three stages of adapting a piece of code to batch processing.  First, read in the file.  Second, act on each line of text, getting the result for that line that you need.  Print them out to check you work.  Then third, redirect that print to an outfile.

## another perspective

This code looks quite clean because we have a function that is doing the work for us.  So we don't have a lot of nesting or extra stuff inside our for loop reading the file.

However, if we don't have a function, but we do have a chunk of code that operates on a single line of text, we can still use that.  It'll get moved in under the for loop that's reading in the file.

So if we have something that's stripping out while space and reversing the word order of the text, we can do something like this:

In [1]:
# start with the code that works

line = "     And his eyes have all the seeming of a demon’s that is dreaming,"

clean = line.strip()

reverse = []

splitwords = clean.split()

for pos in range(len(splitwords) -1, -1, -1): # run backwards from last to first
    word = splitwords[pos]
    reverse.append(word)
    
reversed_line = " ".join(reverse)
print(reversed_line)

dreaming, is that demon’s a of seeming the all have eyes his And


So here we have the reversed sentence, and this code required our use of a for loop.  We can apply this to all of the raven via scooting this under the for loop going over each line.

This means we'll have something like:

``` python
for line in file:
    do stuff
    for word in line:
        do more stuff
    print(go to the outfile)
```

Let's follow our previous three steps that we used for doing this with a function.  Instead of building this up and adding the function, we're going to build it up and add this code in.

In [32]:
# exactly the same code that we went with before.
infile = open('raven.txt', 'r')

for line in infile:
    print(line)
    
infile.close()

    Once upon a midnight dreary, while I pondered, weak and weary,

Over many a quaint and curious volume of forgotten lore—

    While I nodded, nearly napping, suddenly there came a tapping,

As of some one gently rapping, rapping at my chamber door.

“’Tis some visitor,” I muttered, “tapping at my chamber door—

            Only this and nothing more.”



    Ah, distinctly I remember it was in the bleak December;

And each separate dying ember wrought its ghost upon the floor.

    Eagerly I wished the morrow;—vainly I had sought to borrow

    From my books surcease of sorrow—sorrow for the lost Lenore—

For the rare and radiant maiden whom the angels name Lenore—

            Nameless here for evermore.



    And the silken, sad, uncertain rustling of each purple curtain

Thrilled me—filled me with fantastic terrors never felt before;

    So that now, to still the beating of my heart, I stood repeating

    “’Tis some visitor entreating entrance at my chamber door—

Some late v

Now we're going to move the bulk of our code over, and print out the results for each line.  I'm going to copy over all of the code, except for 'line = ...' because the line is being defined within the for loop.

The code below has exactly our previous code, just without that first line. Because my variable name matched what I used as my iterable, I don't need to fuss over where the split is happening.

In [33]:
infile = open('raven.txt', 'r')

for line in infile:
    clean = line.strip()

    reverse = []

    splitwords = clean.split()

    for pos in range(len(splitwords) -1, -1, -1): # run backwards from last to first
        word = splitwords[pos]
        reverse.append(word)

    reversed_line = " ".join(reverse)
    print(reversed_line)
    
infile.close()

weary, and weak pondered, I while dreary, midnight a upon Once
lore— forgotten of volume curious and quaint a many Over
tapping, a came there suddenly napping, nearly nodded, I While
door. chamber my at rapping rapping, gently one some of As
door— chamber my at “tapping muttered, I visitor,” some “’Tis
more.” nothing and this Only

December; bleak the in was it remember I distinctly Ah,
floor. the upon ghost its wrought ember dying separate each And
borrow to sought had I morrow;—vainly the wished I Eagerly
Lenore— lost the for sorrow—sorrow of surcease books my From
Lenore— name angels the whom maiden radiant and rare the For
evermore. for here Nameless

curtain purple each of rustling uncertain sad, silken, the And
before; felt never terrors fantastic with me me—filled Thrilled
repeating stood I heart, my of beating the still to now, that So
door— chamber my at entrance entreating visitor some “’Tis
door;— chamber my at entrance entreating visitor late Some
more.” nothing and is it T

Great, so that's working over each line and printing it out.  Now, we can add the outfile.

In [34]:
infile = open('raven.txt', 'r')
outfile = open('raven_reversed.txt', 'w')

for line in infile:
    clean = line.strip()

    reverse = []

    splitwords = clean.split()

    for pos in range(len(splitwords) -1, -1, -1): # run backwards from last to first
        word = splitwords[pos]
        reverse.append(word)

    reversed_line = " ".join(reverse)
    print(reversed_line, file = outfile)
    
infile.close()
outfile.close()