# Files 
Up to this point, we have been using data values that are either entered by the user or are persistent in our programs. This limits the types of tasks our applications can handle. In this chapter we will learn how to use python to interface with external files. We will learn how to open files, read files line-by-line and write data to new or existing files. 

## Opening Files
Because python is an interpreted language, interfacing with the operating system is handled by the interpreter. Therefore, when files are ‘opened’ python requests a file handler that acts as a middleman between python and the file. If you think of it from an ownership perspective, python owns the commands in python and the operating system owns the files stored on the computer. Neither really understands how to interact with one-another, so a middleman is needed to translate python’s requests into directions the operating system understands. 

with open('random_numbers.txt') as fileHandler: 


In [None]:
with open('support/random_numbers.txt') as fileHandler: 
    print(fileHandler)

The **with** keyword above provides a safe way of working with files because files must be opened and closed to ensure data integrity. If your code crashes after you open the file, the file may be left open and may be corrupted. The with statement ensures that files are closed when the with block of code completes. For indentation purposes, code that needs access to the file handler should be indented one level beyond the indentation level of the with (just like conditional and iteration blocks).  

## Reading files
There are several mechanisms for reading from files. The first, and probably easiest, is to treat the file handler as an iterable list where each iteration through the for loop produces the next line of text from the source file. The code below reads the text one line at a time and exits the for loop when there are no lines left to read. 

In [None]:
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        print(line) 

Notice that this code seems to put an additional return after every line. This is because each line in the source file has a newline character (\n) at the end of it, and the print function automatically appends a newline character to the end of printed statements. Therefore, it will be good practice to use the .rstrip() method to clean any extra returns from the line we’ve read from the source file. 

In [None]:
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        cleanLine = line.rstrip() 
        print(cleanLine) 

You can also read one line using the .readline() method. 

In [None]:
with open('support/random_numbers.txt') as fileHandler: 
    line = fileHandler.readline() 
    cleanLine = line.rstrip() 
    print(cleanLine) 

Or, you can read the entire files into a variable. 

In [None]:
with open('support/random_numbers.txt') as fileHandler: 
    lines = fileHandler.read() 
    cleanLines = lines.rstrip() 
    print(cleanLines) 

## Searching a file
When searching a file, it is often best to scan the file line-by-line. This will prevent any memory or storage issues if you are trying to process a very large data file (>1GB). For small files, it may be easiest to simply read in the entire file, but I would recommend you tailor your thinking to a line-by-line processing approach as it is a more generic, portable approach. 

In [None]:
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        cleanLine = line.rstrip() 
        if len(cleanLine) > 10: 
            print("this line is longer than 25 characters:") 
            print(cleanLine) 

Your search for content needs to be aware of the fact that lines read in from a file are always read in as a string. Therefore, if you expect numbers, you will need to take the necessary steps to convert the content into a numeric form. The code above checks the length of the line to verify it is longer than 10 characters. If we wanted to convert the line to a list and then check for lines with more than 10 items, we would do this: 

In [None]:
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        cleanLine = line.rstrip() 
        numList = cleanLine.split(",") 
        if len(numList) > 10: 
            print("this line has more than 10 list items:") 
            print(numList) 

Because we are going line by line, we may need to aggregate content from different areas in the file. If we wanted to collect all the numbers greater than 500 in to a list, we would do the following:  

In [None]:
bigNumList = [] 
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        cleanLine = line.rstrip() 
        numList = cleanLine.split(",") 
        for num in numList: 
            if int(num) > 500: 
                bigNumList.append(int(num)) 
print(bigNumList) 

## Writing a file
Any time our application processes data, we will probably want to save the results in some format. To save them as a text file, we would simply use the write() method of our file handler to send data to our file just as we would use the print() function to send data to the console. 

In [None]:
with open('support/big_numbers.txt', 'w') as fileHandler: 
    for bigNum in bigNumList: 
        fileHandler.write(str(bigNum) + "\n") 

In the code above, we create a file handler using the ‘w’ option which tells the operating system to open the file for the purpose of writing data to it. Be careful with the ‘w’ option because it will erase the content of the file if the file already exists. Otherwise, it will create the file. Also note the use of the str() method when writing to our file. Python can only write strings so any attempt to write an integer will result in a type conflict error. Finally, I append the newline character to the end of each line. This forces each number to a newline in our file. Without this character, all of the numbers would have appeared on the same line. 