# Introduction
In this module, we explore new ways of acquiring data in our application. Up to this point, we have been using data values that are either entered by the user or are persistent in our programs. This limits the types of tasks our applications can handle. In this chapter we will learn how to use python to interface with external files. We will learn how to open files, read files line-by-line and write data to new or existing files. 

# Files 
## Opening Files
Because python is an interpreted language, interfacing with the operating system is handled by the interpreter. Therefore, when files are ‘opened’ python requests a file handler that acts as a middleman between python and the file. If you think of it from an ownership perspective, python owns the commands in python and the operating system owns the files stored on the computer. Neither really understands how to interact with one-another, so a middleman is needed to translate python’s requests into directions the operating system understands. 

In [1]:
with open('support/random_numbers.txt') as fileHandler: 
    print(fileHandler)
    print(type(fileHandler))

<_io.TextIOWrapper name='support/random_numbers.txt' mode='r' encoding='UTF-8'>
<class '_io.TextIOWrapper'>


The **with** keyword above provides a safe way of working with files because files must be opened and closed to ensure data integrity. If your code crashes after you open the file, the file may be left open and may be corrupted. The with statement ensures that files are closed when the with block of code completes. For indentation purposes, code that needs access to the file handler should be indented one level beyond the indentation level of the with (just like conditional and iteration blocks).  

## Reading files
### Line-by-Line
There are several mechanisms for reading from files. The first, and probably easiest, is to treat the file handler as an iterable list where each iteration through the for loop produces the next line of text from the source file. The code below reads the text one line at a time and exits the for loop when there are no lines left to read. 

Using fileHandler as an iterable...

In [2]:
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler:
        print(line)
        break

926,927,928,929,930



In [3]:
with open('support/random_numbers.txt') as fileHandler: 
    lineNumber = 0
    for line in fileHandler:
        print(type(line))
        print(f"Line {lineNumber}: {line}") 
        lineNumber += 1

<class 'str'>
Line 0: 926,927,928,929,930

<class 'str'>
Line 1: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660

<class 'str'>
Line 2: 488,489,490,491,492,493,494,495

<class 'str'>
Line 3: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45

<class 'str'>
Line 4: 169,170,171,172,173,174,175,176,177,178,179,180

<class 'str'>
Line 5: 419,420

<class 'str'>
Line 6: 455,456,457,458,459,460,461,462,463,464,465

<class 'str'>
Line 7: 883,884,885

<class 'str'>
Line 8: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

<class 'str'>
Line 9: 786,787,788,789,790,791,792,793,794,795

<class 'str'>
Line 10: 137,138,139,140,141,142,143,144,145,146,147,148,149,150

<class 'str'>
Line 11: 912,913,914,915

<class 'str'>
Line 12: 81,82,83,84,85,86,87,88,89,90

<class 'str'>
Line 13: 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270

<class 'str'>
Line 14: 450

<class 'str'>
Line 15: 649,650,651,652,653,654,655,656,657,658,659,660

<class 'str'>
Line 16: 194,195

<class 'str

Using .readline()...

In [8]:
with open('support/random_numbers.txt') as fileHandler: 
    line = fileHandler.readline()
    print(line)

926,927,928,929,930



In [9]:
with open('support/random_numbers.txt') as fileHandler: 
    lineNumber = 0
    while True:
        line = fileHandler.readline()
        if not line:
            break
        print(type(line))
        print(f"Line {lineNumber}: {line}") 
        lineNumber += 1

<class 'str'>
Line 0: 926,927,928,929,930

<class 'str'>
Line 1: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660

<class 'str'>
Line 2: 488,489,490,491,492,493,494,495

<class 'str'>
Line 3: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45

<class 'str'>
Line 4: 169,170,171,172,173,174,175,176,177,178,179,180

<class 'str'>
Line 5: 419,420

<class 'str'>
Line 6: 455,456,457,458,459,460,461,462,463,464,465

<class 'str'>
Line 7: 883,884,885

<class 'str'>
Line 8: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

<class 'str'>
Line 9: 786,787,788,789,790,791,792,793,794,795

<class 'str'>
Line 10: 137,138,139,140,141,142,143,144,145,146,147,148,149,150

<class 'str'>
Line 11: 912,913,914,915

<class 'str'>
Line 12: 81,82,83,84,85,86,87,88,89,90

<class 'str'>
Line 13: 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270

<class 'str'>
Line 14: 450

<class 'str'>
Line 15: 649,650,651,652,653,654,655,656,657,658,659,660

<class 'str'>
Line 16: 194,195

<class 'str

Using .readlines()...

In [20]:
with open('support/random_numbers.txt') as fileHandler: 
    lines = fileHandler.readlines()
    print(lines[0])
    print(len(lines))

926,927,928,929,930

25


In [11]:
with open('support/random_numbers.txt') as fileHandler: 
    lines = fileHandler.readlines()
    lineNumber = 0
    for line in lines:
        print(type(line))
        print(f"Line {lineNumber}: {line}") 
        lineNumber += 1

<class 'str'>
Line 0: 926,927,928,929,930

<class 'str'>
Line 1: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660

<class 'str'>
Line 2: 488,489,490,491,492,493,494,495

<class 'str'>
Line 3: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45

<class 'str'>
Line 4: 169,170,171,172,173,174,175,176,177,178,179,180

<class 'str'>
Line 5: 419,420

<class 'str'>
Line 6: 455,456,457,458,459,460,461,462,463,464,465

<class 'str'>
Line 7: 883,884,885

<class 'str'>
Line 8: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

<class 'str'>
Line 9: 786,787,788,789,790,791,792,793,794,795

<class 'str'>
Line 10: 137,138,139,140,141,142,143,144,145,146,147,148,149,150

<class 'str'>
Line 11: 912,913,914,915

<class 'str'>
Line 12: 81,82,83,84,85,86,87,88,89,90

<class 'str'>
Line 13: 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270

<class 'str'>
Line 14: 450

<class 'str'>
Line 15: 649,650,651,652,653,654,655,656,657,658,659,660

<class 'str'>
Line 16: 194,195

<class 'str

Notice that this code seems to put an additional return after every line. This is because each line in the source file has a newline character (\n) at the end of it, and the print function automatically appends a newline character to the end of printed statements. Therefore, it will be good practice to use the .strip() method to clean any extra returns from the line we’ve read from the source file. 

In [12]:
with open('support/random_numbers.txt') as fileHandler: 
    lineNumber = 0
    for line in fileHandler:
        print(type(line))
        print(f"Line {lineNumber}: {repr(line)}") 
        lineNumber += 1

<class 'str'>
Line 0: '926,927,928,929,930\n'
<class 'str'>
Line 1: '646,647,648,649,650,651,652,653,654,655,656,657,658,659,660\n'
<class 'str'>
Line 2: '488,489,490,491,492,493,494,495\n'
<class 'str'>
Line 3: '31,32,33,34,35,36,37,38,39,40,41,42,43,44,45\n'
<class 'str'>
Line 4: '169,170,171,172,173,174,175,176,177,178,179,180\n'
<class 'str'>
Line 5: '419,420\n'
<class 'str'>
Line 6: '455,456,457,458,459,460,461,462,463,464,465\n'
<class 'str'>
Line 7: '883,884,885\n'
<class 'str'>
Line 8: '241,242,243,244,245,246,247,248,249,250,251,252,253,254,255\n'
<class 'str'>
Line 9: '786,787,788,789,790,791,792,793,794,795\n'
<class 'str'>
Line 10: '137,138,139,140,141,142,143,144,145,146,147,148,149,150\n'
<class 'str'>
Line 11: '912,913,914,915\n'
<class 'str'>
Line 12: '81,82,83,84,85,86,87,88,89,90\n'
<class 'str'>
Line 13: '256,257,258,259,260,261,262,263,264,265,266,267,268,269,270\n'
<class 'str'>
Line 14: '450\n'
<class 'str'>
Line 15: '649,650,651,652,653,654,655,656,657,658,659,66

In [17]:
with open('support/random_numbers.txt') as fileHandler: 
    lineNumber = 0
    for line in fileHandler:
        cleanLine = line.strip()
        print(type(cleanLine))
        print(f"Line {lineNumber}: {repr(cleanLine)}") 
        lineNumber += 1

<class 'str'>
Line 0: '926,927,928,929,930'
<class 'str'>
Line 1: '646,647,648,649,650,651,652,653,654,655,656,657,658,659,660'
<class 'str'>
Line 2: '488,489,490,491,492,493,494,495'
<class 'str'>
Line 3: '31,32,33,34,35,36,37,38,39,40,41,42,43,44,45'
<class 'str'>
Line 4: '169,170,171,172,173,174,175,176,177,178,179,180'
<class 'str'>
Line 5: '419,420'
<class 'str'>
Line 6: '455,456,457,458,459,460,461,462,463,464,465'
<class 'str'>
Line 7: '883,884,885'
<class 'str'>
Line 8: '241,242,243,244,245,246,247,248,249,250,251,252,253,254,255'
<class 'str'>
Line 9: '786,787,788,789,790,791,792,793,794,795'
<class 'str'>
Line 10: '137,138,139,140,141,142,143,144,145,146,147,148,149,150'
<class 'str'>
Line 11: '912,913,914,915'
<class 'str'>
Line 12: '81,82,83,84,85,86,87,88,89,90'
<class 'str'>
Line 13: '256,257,258,259,260,261,262,263,264,265,266,267,268,269,270'
<class 'str'>
Line 14: '450'
<class 'str'>
Line 15: '649,650,651,652,653,654,655,656,657,658,659,660'
<class 'str'>
Line 16: '194

### All at once
Or, you can read the entire file contents into a variable. 

In [19]:
with open('support/random_numbers.txt') as fileHandler: 
    fileContent = fileHandler.read() 
print(fileContent)
print(repr(fileContent))
print("Length of lines:" + str(len(fileContent)))
print(type(fileContent))

926,927,928,929,930
646,647,648,649,650,651,652,653,654,655,656,657,658,659,660
488,489,490,491,492,493,494,495
31,32,33,34,35,36,37,38,39,40,41,42,43,44,45
169,170,171,172,173,174,175,176,177,178,179,180
419,420
455,456,457,458,459,460,461,462,463,464,465
883,884,885
241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
786,787,788,789,790,791,792,793,794,795
137,138,139,140,141,142,143,144,145,146,147,148,149,150
912,913,914,915
81,82,83,84,85,86,87,88,89,90
256,257,258,259,260,261,262,263,264,265,266,267,268,269,270
450
649,650,651,652,653,654,655,656,657,658,659,660
194,195
825
739,740,741,742,743,744,745,746,747,748,749,750
805,806,807,808,809,810
395,396,397,398,399,400,401,402,403,404,405
358,359,360
351,352,353,354,355,356,357,358,359,360
575,576,577,578,579,580,581,582,583,584,585
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

'926,927,928,929,930\n646,647,648,649,650,651,652,653,654,655,656,657,658,659,660\n488,489,490,491,492,493,494,495\n31,32,33,34,35,36,37,38,39,40,41,42,43,4

## Searching a file
When searching a file, it is often best to scan the file line-by-line. This will prevent any memory or storage issues if you are trying to process a very large data file (>1GB). For small files, it may be easiest to simply read in the entire file, but I would recommend you tailor your thinking to a line-by-line processing approach as it is a more generic, portable approach. 

In [21]:
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        cleanLine = line.strip() 
        if len(cleanLine) > 25:
            print(type(cleanLine))
            print(f"this line is longer than 25 characters: {cleanLine}") 

<class 'str'>
this line is longer than 25 characters: 646,647,648,649,650,651,652,653,654,655,656,657,658,659,660
<class 'str'>
this line is longer than 25 characters: 488,489,490,491,492,493,494,495
<class 'str'>
this line is longer than 25 characters: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45
<class 'str'>
this line is longer than 25 characters: 169,170,171,172,173,174,175,176,177,178,179,180
<class 'str'>
this line is longer than 25 characters: 455,456,457,458,459,460,461,462,463,464,465
<class 'str'>
this line is longer than 25 characters: 241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
<class 'str'>
this line is longer than 25 characters: 786,787,788,789,790,791,792,793,794,795
<class 'str'>
this line is longer than 25 characters: 137,138,139,140,141,142,143,144,145,146,147,148,149,150
<class 'str'>
this line is longer than 25 characters: 81,82,83,84,85,86,87,88,89,90
<class 'str'>
this line is longer than 25 characters: 256,257,258,259,260,261,262,263,264,265,266,2

Your search for content needs to be aware of the fact that lines read in from a file are always read in as a string. Therefore, if you expect numbers, you will need to take the necessary steps to convert the content into a numeric form. The code above checks the length of the line to verify it is longer than 10 characters. If we wanted to convert the line to a list and then check for lines with more than 10 items, we would do this: 

In [22]:
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        cleanLine = line.strip() 
        numList = cleanLine.split(",") 
        if len(numList) > 10: 
            print(type(numList))
            print(f"this line has more than 10 elements: {numList}") 

<class 'list'>
this line has more than 10 elements: ['646', '647', '648', '649', '650', '651', '652', '653', '654', '655', '656', '657', '658', '659', '660']
<class 'list'>
this line has more than 10 elements: ['31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45']
<class 'list'>
this line has more than 10 elements: ['169', '170', '171', '172', '173', '174', '175', '176', '177', '178', '179', '180']
<class 'list'>
this line has more than 10 elements: ['455', '456', '457', '458', '459', '460', '461', '462', '463', '464', '465']
<class 'list'>
this line has more than 10 elements: ['241', '242', '243', '244', '245', '246', '247', '248', '249', '250', '251', '252', '253', '254', '255']
<class 'list'>
this line has more than 10 elements: ['137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '148', '149', '150']
<class 'list'>
this line has more than 10 elements: ['256', '257', '258', '259', '260', '261', '262', '263', '264', '265', 

Because we are going line by line, we may need to aggregate content from different areas in the file. If we wanted to collect all the numbers greater than 500 in to a list, we would do the following:  

In [23]:
bigNumList = [] 
with open('support/random_numbers.txt') as fileHandler: 
    for line in fileHandler: 
        cleanLine = line.strip() 
        numList = cleanLine.split(",") 
        for num in numList: 
            if int(num) > 500: 
                bigNumList.append(int(num))

print(bigNumList) 

[926, 927, 928, 929, 930, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 883, 884, 885, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 912, 913, 914, 915, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 825, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 805, 806, 807, 808, 809, 810, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585]


## Writing a file
Any time our application processes data, we will probably want to save the results in some format. To save them as a text file, we would simply use the write() method of our file handler to send data to our file just as we would use the print() function to send data to the console. 

In [24]:
with open('support/big_numbers.txt', 'w') as fileHandler: 
    for bigNum in bigNumList: 
        fileHandler.write(str(bigNum) + "\n") 

In the code above, we create a file handler using the ‘w’ option which tells the operating system to open the file for the purpose of writing data to it. Be careful with the ‘w’ option because it will erase the content of the file if the file already exists. Otherwise, it will create the file. Also note the use of the str() method when writing to our file. Python can only write strings so any attempt to write an integer will result in a type conflict error. Finally, I append the newline character to the end of each line. This forces each number to a newline in our file. Without this character, all of the numbers would have appeared on the same line. 
# Exercise
Write code to scan random_numbers.txt for lines with more than four or more 7's (you can use the .count('7') method on strings to count the occurrance of characters). 

In [1]:
# Step 1...

# Step 2...